Knowledge-Enhanced Retrieval-Augmented Generation for Effective Recommendation

This repository hosts the code of KERAG-R.

1.Install environment

Method 1: Using the Provided Docker Image (Recommended)

We implement KERAG-R in Python Version 3.10.13, and PyTorch Version 2.5.1+cu121.

The ./KERAG-R/requirements.txt file lists the core dependencies.

Our experiments are conducted on a computing cluster.

Pull the prebuilt Docker image as the base environment:

docker pull reconmmendationsystem/notebook:cuda12.1_unsloth

After starting the container, install the following additional packages in order to avoid version conflicts:

pip install vllm==0.6.5

pip install transformers==4.47.0

pip install https://download.pytorch.org/whl/cu121/torch-2.5.1%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=92af92c569de5da937dd1afb45ecfdd598ec1254cf2e49e3d698cb24d71aae14

pip install accelerate==1.2.0

pip install peft==0.13.2

pip install jsonlines

pip install flash-attn==2.8.3

Method 2: Installing Directly on a Local or Cluster Environment (Without Docker)

If you do not wish to use the provided Docker image, you can install the dependencies directly in a fresh Python 3.10 environment

Create and activate a virtual environment (optional but recommended):

conda create -n kerag-r python=3.10
conda activate kerag-r

or

python3.10 -m venv kerag-r
source kerag-r/bin/activate

Install dependencies from requirements.txt:

pip install -r ./KERAG-R/requirements.txt

Install specific versions to ensure compatibility (same order as Method 1):

This method allows you to run KERAG-R without using Docker, but make sure your CUDA version matches the PyTorch wheel you install (above uses CUDA 12.1).

2.Quick Start

Run main.py in the ./KERAG-R/train+inference/ directory, and the output file is saved in the ./KERAG-R/train+inference/ path:

Instruction tuning Llama3 (train) and inference

python main.py pipeline \
  --hf_token "hf_xxx" \
  --model_name meta-llama/Llama-3.1-8B-Instruct \
  --train_data_file ./listwisetrain.jsonl \
  --per_device_train_batch_size 2 \
  --gradient_accumulation_steps 16 \
  --infer_model meta-llama/Llama-3.1-8B-Instruct \
  --adapter_dir ./trained_model \
  --infer_data_file ./test.jsonl \
  --batch_size 80

3. Process the Original Dataset

Load original dataset

User-item interaction files and knowledge graph files are in the ./KERAG-R/dataset directory.

The interaction files for ml-10m and AmazonBook are too large to be included in this repository. Please download them from their official sources before running the code.

Preprocessing the dataset

All datasets can be processed following the steps below.

Users also can directly use the files provided in the ml-1m dataset included in this repository.

Use the ratings file into the ./KERAG-R/data-process/ directory, then run split.py to split the dataset. The input file is ratings.csv. The output files are like.txt, dislike.txt, train_set.txt, valid_set.txt, and test_set.txt. All output files will be saved in the ./KERAG-R/data-process/ directory.

4. Run the Initial Recommendation Model

In order to obtain the pkl file required to build prompts as well as the initial recommendation list and ground truth, KERAG-R needs to run the initial recommendation model first.

Processing Format

Use ./KERAG-R/data-process/processing-format.ipynb to modify the dislike.txt of the corresponding dataset to a file with spaces as delimiters to adapt to the initial recommendation model. The input file is dislike.txt , which is obtained from step 3. The output file is dislike_set.txt of the corresponding dataset.

Run the Initial Recommendation Model

Run ./KERAG-R/top-k-recommendation.py to use initial recommendation model to obtain the relevant files needed to build prompts later.

!python top-k-recommendation.py

The input files are train_set.txt, test_set.txt, valid_set.txt, and dislike_set.txt in the corresponding dataset files in the ./KERAG-R/dataset/ directory, which are obtained from step 3. The output files are the initial recommendation list file LightGCNrec_save_dict1.csv, the ground truth file LightGCNgt_save_dict1.csv in the model_result subfolder of the corresponding dataset file in the ./KERAG-R/dataset/ directory, and user.pkl, user_id_mapping.pkl, rating_matrix.pkl, pred.pkl, item.pkl, item_id_mapping.pkl, and item_id_mapping-all.pkl in the ./KERAG-R/ directory.

The parameters and configuration file of the model is in the ./KERAG-R/conf directory.

5. Use GraphRAG to get triples from the knowledge graph.

Run graphrag.py in the ./KERAG-R/ directory to obtain the retrieved KG triples. Use processed_kg_id.tsv from ./KERAG-R/ml-1m/ into the current directory as the input file. The output file will be pretrain-output_kg_id.tsv.

python graphrag.py

6. Build Train and Test Prompts, Run Inference with the Instruction-Tuned LLM.

Build Train Prompt

The input files are: train_set.txt, dislike.txt, movie_info.csv, which are obtained from step 3; and processed_kg_text.tsv, pretrain-output_kg_id.tsv, which are obtained from step 5; and user.pkl, user_id_mapping.pkl, rating_matrix.pkl, pred.pkl, item.pkl, item_id_mapping.pkl, and item_id_mapping-all.pkl, which are obtained from step 4. The output file is listwisetrain.jsonl in the ./KERAG-R/make-prompt/ directory. Run make-train-prompt.py in the ./KERAG-R/make-prompt/ directory to generate listwisetrain.jsonl for training.

python make-train-prompt.py

Build Inference Prompt

The input files are: train_set.txt, dislike.txt, movie_info.csv, which are obtained from step 3; and user.pkl, user_id_mapping.pkl, rating_matrix.pkl, pred.pkl, item.pkl, item_id_mapping.pkl, item_id_mapping-all.pkl, item information file, LightGCNrec_save_dict1.csv and LightGCNgt_save_dict1, which are obtained from step 4; and processed_kg_text.tsv, pretrain-output_kg_id.tsv, which are obtained from step 5. The output files is test.jsonl in the ./KERAG-R/make-prompt/ directory. Run make-test-prompt.py in the ./KERAG-R/make-prompt/ directory to get test.jsonl for inference.

python make-test-prompt.py

Place the previously generated training and test prompts into the ./KERAG-R/train+inference/ directory. Then, run train.py and inference.py in the ./KERAG-R/train+inference/ directory to perform instruction tuning of the LLM and run inference (as shown in the Quick Start). The output file is inference.txt.

!python train.py \
  --hf_token "" \
  --model_name meta-llama/Llama-3.1-8B-Instruct \
  --data_file ./listwisetrain.jsonl \
  --per_device_train_batch_size 2 \
  --gradient_accumulation_steps 16

!python inference.py 
    --model meta-llama/Llama-3.1-8B-Instruct \
    --adapter_dir ./trained_model \
    --data_file ./test.jsonl \
    --batch_size 80 \
    --hf_token ""

After inference is completed, run the ./KERAG-R/train+inference/evaluation.ipynb script to process the data and calculate the metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
KERAG-R		KERAG-R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge-Enhanced Retrieval-Augmented Generation for Effective Recommendation

1.Install environment

2.Quick Start

Instruction tuning Llama3 (train) and inference

3. Process the Original Dataset

4. Run the Initial Recommendation Model

Processing Format

Run the Initial Recommendation Model

5. Use GraphRAG to get triples from the knowledge graph.

6. Build Train and Test Prompts, Run Inference with the Instruction-Tuned LLM.

Build Train Prompt

Build Inference Prompt

About

Uh oh!

Releases

Packages

Languages

terrierteam/KERAG-R

Folders and files

Latest commit

History

Repository files navigation

Knowledge-Enhanced Retrieval-Augmented Generation for Effective Recommendation

1.Install environment

2.Quick Start

Instruction tuning Llama3 (train) and inference

3. Process the Original Dataset

4. Run the Initial Recommendation Model

Processing Format

Run the Initial Recommendation Model

5. Use GraphRAG to get triples from the knowledge graph.

6. Build Train and Test Prompts, Run Inference with the Instruction-Tuned LLM.

Build Train Prompt

Build Inference Prompt

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages