Conversation
…anced to distribute between compute devices
|
@avtc Thanks again for amother gem! Can you whip up some unit tests so there is good test coverage on the diffs so I can run it ok our gpus and check for regressions. |
…ss" - filter only by compute_device_filter
|
@Qubitium I have added tests with help of GLM-5, please review if it is OK |
|
@avtc Will be checking and merging in the next 48 hours. |
|
@avtc Sorry this took so long. I am currently doing a refractor which should add some hooks to make this PR less code-heavy. Since GPT-QModel will support gguf amongst other weight-only (no calibration) required quantization methods, I need to heavily refractor the current api so we can handle lots of hooks and less messing with lifecycles. |
|
@avtc Pleae add/invite me to your PR branch so I can push changes. I will need to merge with v6.0/main once it's ready and do refractoring. |
|
@Qubitium I have 'Allow edits from maintainers' enabled, so you should have write access to this branch. Feel free to push the refactoring and merge with v6.0/main whenever you're ready! |
|
Status: refractor/merge pending this v6.0 branch stable: |
@Qubitium Hi, this feature allows specify in config the device where calibration data inputs/outputs will be stored, allowing to use more calibration data samples for quantization, because calibration data can be placed on device different to
cuda:0which already stores all layer modules.Before the feature initial calibration data was stored on
CPUand after first pass it was stored onDEVICE_0(cuda:0 usually).After the feature if
calibration_data_deviceis not set initial behavior preserved.calibration_data_devicecan be set to "cpu", "cuda:1" (or any other torch device), and to "balanced" - in "balanced" mode calibration data distributed between compute devices available:DEVICE_0..DEVICE_NP.S. I have used this feature previously several times but on another old branch. This PR is based on latest master.
Also I have fixed examples in config file for using
moeparameter, and fixedsys.abiflagstypo which failed build.Note: the handling of layer with all modules excluded from quantization was also fixed, as current main code did not do forward replay it seems.
I have run several small tests (few first layers) ensuring nothing fail with
auto_forward_data_parallelenabled and disabled, on qwen3-30b-a3b withcalibration_data_deviceset tocpu,cuda:1,balancedand removed from config.