Skip to content

Conversation

@qti-ashimaj
Copy link
Contributor

Describe your changes

Add flag to apply DeduplicateHashedInitializersPass post graph surgery.
With the DeduplicateHashedInitializersPass, the VRAM usage for onnx static quantization increased multifold, hence adding an option to keep this pass.
For Qwen2.5-1.5B-Instruct model, using DeduplicateHashedInitializersPass needs ~58GB VRAM while without DeduplicateHashedInitializersPass needs only ~14GB VRAM for static quantization

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

@qti-ashimaj qti-ashimaj force-pushed the dev/qti-ashimaj/dedupinit branch from d8dc7cc to 6851d82 Compare December 24, 2025 05:59
@qti-ashimaj qti-ashimaj marked this pull request as ready for review December 24, 2025 06:07
@jambayk jambayk requested a review from xiaoyu-work January 1, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant