Skip to content

Feature/calibration data device#2421

Open
avtc wants to merge 7 commits intoModelCloud:mainfrom
avtc:feature/calibration-data-device
Open

Feature/calibration data device#2421
avtc wants to merge 7 commits intoModelCloud:mainfrom
avtc:feature/calibration-data-device

Conversation

@avtc
Copy link
Contributor

@avtc avtc commented Feb 19, 2026

@Qubitium Hi, this feature allows specify in config the device where calibration data inputs/outputs will be stored, allowing to use more calibration data samples for quantization, because calibration data can be placed on device different to cuda:0 which already stores all layer modules.

Before the feature initial calibration data was stored on CPU and after first pass it was stored on DEVICE_0 (cuda:0 usually).
After the feature if calibration_data_device is not set initial behavior preserved.
calibration_data_device can be set to "cpu", "cuda:1" (or any other torch device), and to "balanced" - in "balanced" mode calibration data distributed between compute devices available: DEVICE_0 .. DEVICE_N

P.S. I have used this feature previously several times but on another old branch. This PR is based on latest master.
Also I have fixed examples in config file for using moe parameter, and fixed sys.abiflags typo which failed build.

Note: the handling of layer with all modules excluded from quantization was also fixed, as current main code did not do forward replay it seems.

I have run several small tests (few first layers) ensuring nothing fail with auto_forward_data_parallel enabled and disabled, on qwen3-30b-a3b with calibration_data_device set to cpu, cuda:1, balanced and removed from config.

@Qubitium
Copy link
Collaborator

@avtc Thanks again for amother gem! Can you whip up some unit tests so there is good test coverage on the diffs so I can run it ok our gpus and check for regressions.

@avtc avtc marked this pull request as draft February 23, 2026 15:08
@avtc avtc marked this pull request as ready for review February 23, 2026 18:38
@avtc
Copy link
Contributor Author

avtc commented Feb 23, 2026

@Qubitium I have added tests with help of GLM-5, please review if it is OK

@Qubitium
Copy link
Collaborator

@avtc Will be checking and merging in the next 48 hours.

@Qubitium
Copy link
Collaborator

@avtc Sorry this took so long. I am currently doing a refractor which should add some hooks to make this PR less code-heavy. Since GPT-QModel will support gguf amongst other weight-only (no calibration) required quantization methods, I need to heavily refractor the current api so we can handle lots of hooks and less messing with lifecycles.

@Qubitium
Copy link
Collaborator

@avtc Pleae add/invite me to your PR branch so I can push changes. I will need to merge with v6.0/main once it's ready and do refractoring.

@avtc
Copy link
Contributor Author

avtc commented Mar 10, 2026

@Qubitium I have 'Allow edits from maintainers' enabled, so you should have write access to this branch. Feel free to push the refactoring and merge with v6.0/main whenever you're ready!

@Qubitium
Copy link
Collaborator

Status: refractor/merge pending this v6.0 branch stable:

#2456

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants