-
Notifications
You must be signed in to change notification settings - Fork 177
v0 add autoquant #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
v0 add autoquant #402
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR introduces support for 'autoquant', a new automatic quantization feature in the Infinity project. The changes span multiple files and include implementation, documentation, and testing updates.
- Added 'autoquant' as a new option in the Dtype enum and CLI documentation, enabling automatic quantization for improved model performance
- Implemented 'autoquant' support in the SentenceTransformerPatched class and quantization interface
- Added 'torchao' dependency to pyproject.toml, likely to support the new autoquant functionality
- Created a new test function to verify the autoquant feature's effectiveness and accuracy
- Updated README with information on new multi-modal support (CLIP, CLAP) and text classification capabilities
9 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings
|
|
||
| import numpy as np | ||
| import requests # type: ignore | ||
| import torch.ao.quantization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: This import is unused in the current file. Consider removing it if not needed.
| model = torch.quantization.quantize_dynamic( | ||
| model.to("cpu"), # the original model | ||
| {torch.nn.Linear}, # a set of layers to dynamically quantize | ||
| dtype=torch.qint8, | ||
| ) | ||
| model = torch.ao.quantization.quantize_dynamic( | ||
| model, {torch.nn.Linear}, dtype=torch.qint8 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Two quantization methods are applied sequentially. This might lead to unexpected behavior or reduced model performance. Consider using only one method or clarify why both are necessary.
| bettertransformer=False, | ||
| ) | ||
| ) | ||
| sentence = "This is a test sentence." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: This line is unused and can be removed.
| if __name__ == "__main__": | ||
| test_autoquant_quantization() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Running a single test function in main might not be ideal. Consider using a test runner or removing this block if not necessary.
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files@@ Coverage Diff @@
## main #402 +/- ##
==========================================
- Coverage 79.01% 73.24% -5.77%
==========================================
Files 40 40
Lines 3173 3184 +11
==========================================
- Hits 2507 2332 -175
- Misses 666 852 +186 ☔ View full report in Codecov by Sentry. |
This pull request introduces several changes to the
infinity_emblibrary, focusing on adding support for a newautoquantdata type, updating documentation, and improving the quantization process. The most important changes include adding theautoquantdata type, updating the CLI documentation, modifying quantization logic, and adding unit tests forautoquantquantization.New Features:
autoquantdata type toDtypeenum inlibs/infinity_emb/infinity_emb/primitives.py.autoquantinlibs/infinity_emb/infinity_emb/transformer/quantization/interface.pyandlibs/infinity_emb/infinity_emb/transformer/quantization/quant.py[1] [2].Documentation Updates:
autoquantindocs/docs/cli_v2.md.Codebase Improvements:
Makefileto usepoetry runfor generating OpenAPI and CLI v2 documentation inlibs/infinity_emb/Makefile[1] [2].Dependency Updates:
torchaoas an optional dependency inlibs/infinity_emb/pyproject.toml[1] [2].Testing Enhancements:
autoquantquantization inlibs/infinity_emb/tests/unit_test/transformer/quantization/test_interface.py.