Setting Up the Project Requirements with UV package manager

Install `UV`: A Python Package Manager

If you have python installed

pip install uv

Alternatively, install with a standalone installer

curl -LsSf https://astral.sh/uv/install.sh | sh

Install Required Packages

The following command will install all the required packages

uv sync

Functionality

The project currently provides the following functionality:

Scrape meta data from arxiv, ieeexplore, sciencedirect and springer
Embed the scraped data and push to vector database
Download embeddings from vector database

Scrape Meta Data

The following command will generate necessary cleaned resources in cache

uv run main.py --scrape

You need to have google chrome installed if you're going to use the scrape function.

Embed the scraped data and push to vector database

Set up the database password environment variable

export VECTOR_DB_PWD=thepasswordprovided

The following command will extract embedding from the scraped cache and push to the vectordb. The collection name needs to be a valid table name, so it can't contain special characters. The model name is optional, and defaults to "Alibaba-NLP/gte-multilingual-base", check the Embedding Models Used section for more details.

uv run main.py -g <collection_name> --model <model_name>

Example:

uv run main.py -g gte --model "Alibaba-NLP/gte-multilingual-base"

Download Embeddings from vector database

The following command will download the embeddings from the vector database into a cache folder

uv run main.py -d <collection_name> --cache_dir <cache_dir>

Example:

uv run main.py -d gte --cache_dir embeddings

How to use RAG and Search for relevant papers

Method 1: Use the command line

We provided a simple interactive command line interface to use RAG and Search for relevant papers.

# To start the interactive RAG interface
uv run main.py --rag <collection_name>

# To start the interactive search interface
uv run main.py --search <collection_name>

Example:

uv run main.py --rag gte

Method 2: Use the API and (Optional) Web UI

The following command will run the web UI

uv run main.py --api <collection_name>

# If you want to run the API on a specific host and port
uv run main.py --api <collection_name> --api_host <host> --api_port <port>

Example:

uv run main.py --api gte --api_host 0.0.0.0 --api_port 8000

We proivde a local web UI to use the API, you can access it by opening the web_ui.html file in your browser.

Statistics

Papers

We manually categorized the papers into the following categories

	category	count	description
0	ml_general	89	General Machine Learning
1	dl_nlp	56	Deep Learning for NLP
2	cv_pattern	53	Computer Vision Pattern Recognition
3	cv_generative	43	Computer Vision Generative Models
4	dl_rnn	36	Deep Learning with RNNs
5	audio	25	Audio
6	dl_rl	18	Deep Learning for Reinforcement Learning

Embedding Models Used

The following models are used and tested for embedding the scraped data, you can use other huggingface models as well. However, some models might not be supported by the langchain huggingface module.

Alibaba-NLP/gte-multilingual-base
NovaSearch/jasper_en_vision_language_v1

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
embedding		embedding
notebooks		notebooks
scrape		scrape
screenshots		screenshots
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
Roadmap.pdf		Roadmap.pdf
api.py		api.py
api_test.py		api_test.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock
web_ui.html		web_ui.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Setting Up the Project Requirements with UV package manager

Install `UV`: A Python Package Manager

Install Required Packages

Functionality

Scrape Meta Data

Embed the scraped data and push to vector database

Download Embeddings from vector database

How to use RAG and Search for relevant papers

Method 1: Use the command line

Method 2: Use the API and (Optional) Web UI

Statistics

Papers

Embedding Models Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

greasycat/SmartTutor

Folders and files

Latest commit

History

Repository files navigation

Setting Up the Project Requirements with UV package manager

Install UV: A Python Package Manager

Install Required Packages

Functionality

Scrape Meta Data

Embed the scraped data and push to vector database

Download Embeddings from vector database

How to use RAG and Search for relevant papers

Method 1: Use the command line

Method 2: Use the API and (Optional) Web UI

Statistics

Papers

Embedding Models Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Install `UV`: A Python Package Manager

Packages