Skip to content

leo-bpark/DocSourceLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Source Identification of LLM

🌻Official Repo for ICPRAI 2024 - Identifying the Source of Generation for Large Language Models

pip install -e . 

Example Code

Please run the following code first which trains identifier for Llama2 model.

bash shells/example.sh

Option 1

Check the save file in

outputs/train_identifier/cut_labels_100/llama2_7b/tiny/bigram/seed_0/layer_26/generated

Option 2

Check the jupyter notebook:visualize for the interactive codes.

Code Structure

--scripts
    -- gather_activation.py     # gather activation of LLM
    -- train_identifier.py      # trains an identifier 
    -- generate_and_identify.py # generated texts from a prompt and identify labels
--sip_lib
    --data          # processing docuemnts.
    --hooks         # for gathering activations 
    --identifiers   # torch modules for the FFNs
    --utils         # store utility functions
    --make_llm.py   # LLM loading and info

Run all

To run all LLMs, MLP types, n_grams run the following shell scripts

bash shells/gather_activaiton.sh
bash shells/train_identifiers.sh

Citation

TBD

About

Official Repo for ICPRAI 2024 - Identifying the Source of Generation for Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published