This is the official repository for the paper:
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
Deokhyung Kang, Seonjeong Hwang, Daehui Kim, Hyounghun Kim, Gary Geunbae Lee
📄 [arXiv:2510.27269]
This repository provides the analysis, detection, and mitigation framework proposed in the paper: Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
The repository is designed as a paper-aligned companion repository, enabling direct and faithful reproduction of the paper’s main experimental results.
Specifically, this repo supports:
- Residual-based attribution analysis of multilingual reasoning gaps (Section 3)
- Detection of understanding failures from reasoning traces (Section 4)
- Mitigation via Selective Translation guided by failure detection (Section 5)
Each major section of the paper is mapped to a corresponding directory with a dedicated README explaining how to reproduce the results step by step.
All experiments can be run in a Conda environment with Python 3.12.
conda create -n rlm_analysis python=3.12
conda activate rlm_analysis
pip install -e .Next, install a modified version of fastText that is compatible with numpy>=2.0.
git clone https://github.com/deokhk/fastText_numpy_2.0.git
cd fastText_numpy_2.0
pip install -e .We use FastText for language identification.
Please download the FastText language identification model (lid.176.ftz) and move it to the misc/ directory:
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.ftz
mv lid.176.ftz ./misc/The repository structure is organized to closely mirror the paper’s experimental pipeline.
RLM_analysis/
├── experiments/ # Guide for reproducing the paper is provided here
├── misc/ # Where the lid.176.ftz file should be placed
├── outputs/ # Where outputs for all experiments will be saved
├── src/ # Source code
├── translated_data/ # Translated queries
└── pyproject.toml # For installing requirements
Each major component under the ./experiments directory corresponds to a section of the paper:
| Paper Section | Topic | Code Entry Point |
|---|---|---|
| §3 | Why Does the Multilingual Reasoning Gap Emerge? | 3_multilingual_reasoning_gap_attribution/ |
| §4 | Detecting Understanding Failures | 4_detecting_understanding_failures/ |
| §5 | Selective Translation | 5_selective_translation/ |
Each directory contains its own README with detailed instructions for reproducing the corresponding experiments, including required inputs, scripts, and expected outputs.
To facilitate reproducibility, we provide pretrained models and experiment outputs via Hugging Face Hub.
We release experiment outputs for Qwen3-4B, including prober checkpoints and intermediate results, as a Hugging Face
dataset: deokhk/multilingual_reasoning_gap_outputs
You may download and place these files directly under:
./outputs/
You can download the files using the following example:
from huggingface_hub import snapshot_download
repo_id = "deokhk/multilingual_reasoning_gap_outputs"
snapshot_download(
repo_id=repo_id,
repo_type="dataset",
local_dir="./outputs/",
local_dir_use_symlinks=False,
)The fine-tuned mmBERT understanding failure detector used in Section 4 is released at the Hugging Face
repository: deokhk/mmbert_ft_understandability_Qwen3-4B
After downloading, place the contents under:
./outputs/experiments/mmbert_ft_understandability/Qwen3-4B/
To reproduce the full experimental pipeline, we recommend running the sections in order:
- Section 3 – Multilingual reasoning gap attribution
- Section 4 – Understanding failure detection
- Section 5 – Selective Translation with failure-aware routing
Please refer to the README inside each section directory for detailed, step-by-step instructions.
If you find this work useful, please cite:
@misc{kang2025multilingualreasoninggapsemerge,
title={Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?},
author={Deokhyung Kang and Seonjeong Hwang and Daehui Kim and Hyounghun Kim and Gary Geunbae Lee},
year={2025},
eprint={2510.27269},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.27269},
}This project is released under the Apache 2.0 License. Please refer to individual datasets and model licenses for their respective terms.
