Layer-wise Swapping

Merge safety-tuned and multilingual-tuned LLMs by analyzing per-layer/module weight changes.

Usage

Layer-wise Swapping

Swap at layer granularity (entire transformer layers):

python layer_swap.py \
    -b meta-llama/Llama-3.1-8B-Instruct \
    -s ./checkpoint/safety_model \
    -m ./checkpoint/multi_model \
    -o ./checkpoint/merged

Module-wise Swapping

Swap at module granularity (Attention vs FFN separately):

python module_swap.py \
    -b meta-llama/Llama-3.1-8B-Instruct \
    -s ./checkpoint/safety_model \
    -m ./checkpoint/multi_model \
    -o ./checkpoint/merged

Arguments

Argument	Description	Default
`-b`, `--base-model`	Base model (HuggingFace ID or path)	Required
`-s`, `--safety-model`	Safety-tuned model path	Required
`-m`, `--multi-model`	Multilingual-tuned model path	Required
`-o`, `--output`	Output path	Required
`--tau`	Threshold for decision	0.001
`--alpha`	Blend ratio	0.5
`--figure-dir`	Directory for figures	`<output>/figures`

How It Works

Compute relative weight changes (ΔW) compared to base model
For each layer/module, compare safety vs multilingual changes
Decision per layer/module:
- diff > tau → Use safety
- diff < -tau → Use multilingual
- Otherwise → Blend with ratio α

Data Preprocessing

See data/README.md for preparing multilingual SFT datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Layer-wise Swapping

Usage

Layer-wise Swapping

Module-wise Swapping

Arguments

How It Works

Data Preprocessing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
README.md		README.md
layer_swap.py		layer_swap.py
module_swap.py		module_swap.py
visualization.py		visualization.py

Folders and files

Latest commit

History

Repository files navigation

Layer-wise Swapping

Usage

Layer-wise Swapping

Module-wise Swapping

Arguments

How It Works

Data Preprocessing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages