Skip to content

Conversation

@michaelfeil
Copy link
Owner

Related Issue

Checklist

  • I have read the CONTRIBUTING guidelines.
  • I have added tests to cover my changes.
  • I have updated the documentation (docs folder) accordingly.

Additional Notes

Add any other context about the PR here.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR refactors the model selection and batch handling system to improve multiprocessing capabilities and support for new models like nomic-embed-text-v1.5.

  • Replaced direct model instantiation with factory functions in /libs/infinity_emb/infinity_emb/inference/batch_handler.py for better multiprocessing support
  • Added CallableReturningBaseTypeHint Protocol in /libs/infinity_emb/infinity_emb/transformer/abstract.py to improve type safety
  • Simplified select_model() in /libs/infinity_emb/infinity_emb/inference/select_model.py to return callable engine functions instead of tuples with timing info
  • Added tiktoken as required dependency in /libs/infinity_emb/pyproject.toml for nomic-embed-text-v1.5 support
  • Modified telemetry in /libs/infinity_emb/infinity_emb/infinity_server.py to use empty dicts instead of engine capabilities

9 file(s) reviewed, 8 comment(s)
Edit PR Review Bot Settings | Greptile

max_inference_t = 4e-3

# TODO: Can be parallelized
for device_map in engine_args._loading_strategy.device_mapping: # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: type: ignore on device_mapping access should be replaced with proper type annotation

assert len(engine_replicas) > 0, "No engine replicas were loaded"

return engine_replicas, min_inference_t, max_inference_t
return engine_replicas # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: type: ignore on return is unnecessary since return type matches annotation

return EmbedderEngine.from_inference_engine(engine_args.engine)


def _get_engine_replica(unloaded_engine, engine_args, device_map) -> "BaseTypeHint":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: function lacks type hints for unloaded_engine and device_map parameters

th = threading.Thread(
target=send_telemetry_start,
args=(engine_args_list, [e.capabilities for e in app.engine_array]), # type: ignore
args=(engine_args_list, [{} for e in app.engine_array]), # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Passing empty dictionaries instead of actual engine capabilities will result in loss of telemetry data about model capabilities

onnxruntime-gpu = {version = "1.19.*", optional=true}
tensorrt = {version = "^10.6.0", optional=true}
soundfile = {version="^0.12.1", optional=true}
tiktoken = "^0.8.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: tiktoken should be marked as optional since it's only needed for specific models. Add optional=true to the dependency.

model_warmup=False,
)
)
[model_func() for model_func in model_funcs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider catching potential exceptions when calling model functions - initialization could fail for various reasons

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 89.18919% with 8 lines in your changes missing coverage. Please review.

Project coverage is 79.56%. Comparing base (8ac0b3c) to head (80d65be).
Report is 33 commits behind head on main.

Files with missing lines Patch % Lines
...finity_emb/infinity_emb/inference/batch_handler.py 90.90% 3 Missing ⚠️
...nfinity_emb/infinity_emb/inference/select_model.py 90.00% 2 Missing ⚠️
libs/infinity_emb/infinity_emb/engine.py 66.66% 1 Missing ⚠️
.../infinity_emb/infinity_emb/transformer/abstract.py 75.00% 1 Missing ⚠️
...ty_emb/infinity_emb/transformer/embedder/neuron.py 50.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #468      +/-   ##
==========================================
+ Coverage   79.51%   79.56%   +0.05%     
==========================================
  Files          41       41              
  Lines        3417     3441      +24     
==========================================
+ Hits         2717     2738      +21     
- Misses        700      703       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants