Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the micro-learning flow to support multiple embedding models per collection/language by propagating lists of embedding model IDs through helper functions, DB queries, and API endpoints.
Changes:
- Updated micro-learning endpoints to derive and use a list of embedding model IDs when fetching subjects and context documents.
- Refactored SQL query helpers to accept
embedding_models_idslists and changed the embedding-model lookup to return multiple matches. - Updated tests and search subject-vector retrieval to work with the new “multiple models” behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/app/tests/api/api_v1/test_micro_learning.py | Updates mocks for new list-of-models return type from helper. |
| src/app/services/sql_db/queries.py | Refactors subject/context queries to filter by multiple embedding model IDs; changes model lookup to return a list. |
| src/app/services/search.py | Updates subject-vector lookup to use model name → embedding models → list of IDs. |
| src/app/services/helpers.py | Refactors collection_and_model_id_according_lang to return a list of EmbeddingModel objects. |
| src/app/api/api_v1/endpoints/micro_learning.py | Updates endpoints to pass lists of embedding model IDs into DB query helpers. |
Comments suppressed due to low confidence (1)
src/app/services/helpers.py:197
collection_and_model_id_according_langnow returns a list ofEmbeddingModelobjects, but the function name/docstring still refer to a single "model id". Rename the function (and update the docstring) to reflect the new semantics (e.g., "models"/"embedding_models") so callers don’t assume a single ID.
async def collection_and_model_id_according_lang(
lang: str | None, sp
) -> tuple[Collection, list[EmbeddingModel]]:
"""
Get the collection info and model id according to the language.
Args:
sp: The search service.
lang: The language to get the collection info and model id for. If None, the default language is used wich is multilingual.
Returns:
A tuple of the collection info and model id.
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/app/services/sql_db/queries.py
Outdated
| from sqlalchemy import func, select | ||
| from welearn_database.data.enumeration import Step | ||
| from welearn_database.data.models import ( | ||
| from welearn_database.data.models import ( # FilterType,; FilterUsedInQuery, |
There was a problem hiding this comment.
The inline comment in the welearn_database.data.models import (# FilterType,; FilterUsedInQuery,) looks like leftover editing noise and makes the import harder to read. Please remove it (and if those symbols are still needed, import them cleanly or document the reason separately).
| from welearn_database.data.models import ( # FilterType,; FilterUsedInQuery, | |
| from welearn_database.data.models import ( |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull request refactors the handling of embedding model IDs throughout the micro-learning API and related services. The main change is to support multiple embedding models for a given collection or language, instead of just one, which improves flexibility and accuracy in querying and retrieving subject and context documents. Several functions and endpoints now accept and process lists of embedding model IDs, and related tests have been updated accordingly.
API and Endpoint Changes:
get_subject_listandget_full_journeyendpoints to retrieve and use a list of embedding model IDs instead of a single ID, ensuring all relevant models are considered when fetching subjects and context documents. [1] [2]Service and Helper Refactoring:
collection_and_model_id_according_langto return a list ofEmbeddingModelobjects, and updated its consumers to handle lists of IDs. [1] [2]get_subject_vectorto accept a model name and retrieve vectors using all corresponding embedding models, improving subject vector retrieval accuracy. [1] [2] [3]Database Query Updates:
get_subject,get_subjects, andget_context_documentsto accept and filter by lists of embedding model IDs, ensuring queries return results for all applicable models. [1] [2] [3] [4]get_embeddings_model_id_according_nameto return a list ofEmbeddingModelobjects instead of a single ID, removing caching logic and simplifying the query. [1] [2]Testing Adjustments: