Fix/micro learning by lpi-tn · Pull Request #137 · CyberCRI/WeLearn-api

lpi-tn · 2026-03-10T09:36:27Z

This pull request refactors the handling of embedding model IDs throughout the micro-learning API and related services. The main change is to support multiple embedding models for a given collection or language, instead of just one, which improves flexibility and accuracy in querying and retrieving subject and context documents. Several functions and endpoints now accept and process lists of embedding model IDs, and related tests have been updated accordingly.

API and Endpoint Changes:

Updated get_subject_list and get_full_journey endpoints to retrieve and use a list of embedding model IDs instead of a single ID, ensuring all relevant models are considered when fetching subjects and context documents. [1] [2]

Service and Helper Refactoring:

Refactored collection_and_model_id_according_lang to return a list of EmbeddingModel objects, and updated its consumers to handle lists of IDs. [1] [2]
Modified get_subject_vector to accept a model name and retrieve vectors using all corresponding embedding models, improving subject vector retrieval accuracy. [1] [2] [3]

Database Query Updates:

Changed get_subject, get_subjects, and get_context_documents to accept and filter by lists of embedding model IDs, ensuring queries return results for all applicable models. [1] [2] [3] [4]
Updated get_embeddings_model_id_according_name to return a list of EmbeddingModel objects instead of a single ID, removing caching logic and simplifying the query. [1] [2]

Testing Adjustments:

Modified tests to mock and validate the new behavior with lists of embedding models, ensuring correctness of the updated endpoints and service logic. [1] [2] [3]

…iple models

…models

…iple models

…models

…ro-learning

Copilot

Pull request overview

This PR refactors the micro-learning flow to support multiple embedding models per collection/language by propagating lists of embedding model IDs through helper functions, DB queries, and API endpoints.

Changes:

Updated micro-learning endpoints to derive and use a list of embedding model IDs when fetching subjects and context documents.
Refactored SQL query helpers to accept embedding_models_ids lists and changed the embedding-model lookup to return multiple matches.
Updated tests and search subject-vector retrieval to work with the new “multiple models” behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
src/app/tests/api/api_v1/test_micro_learning.py	Updates mocks for new list-of-models return type from helper.
src/app/services/sql_db/queries.py	Refactors subject/context queries to filter by multiple embedding model IDs; changes model lookup to return a list.
src/app/services/search.py	Updates subject-vector lookup to use model name → embedding models → list of IDs.
src/app/services/helpers.py	Refactors `collection_and_model_id_according_lang` to return a list of `EmbeddingModel` objects.
src/app/api/api_v1/endpoints/micro_learning.py	Updates endpoints to pass lists of embedding model IDs into DB query helpers.

Comments suppressed due to low confidence (1)

src/app/services/helpers.py:197

collection_and_model_id_according_lang now returns a list of EmbeddingModel objects, but the function name/docstring still refer to a single "model id". Rename the function (and update the docstring) to reflect the new semantics (e.g., "models"/"embedding_models") so callers don’t assume a single ID.

async def collection_and_model_id_according_lang(
    lang: str | None, sp
) -> tuple[Collection, list[EmbeddingModel]]:
    """
    Get the collection info and model id according to the language.
    Args:
        sp: The search service.
        lang: The language to get the collection info and model id for. If None, the default language is used wich is multilingual.

    Returns:
        A tuple of the collection info and model id.
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/app/services/search.py

src/app/api/api_v1/endpoints/micro_learning.py

src/app/tests/api/api_v1/test_micro_learning.py

src/app/services/sql_db/queries.py

Copilot · 2026-03-10T09:41:03Z

src/app/services/sql_db/queries.py

 from sqlalchemy import func, select
 from welearn_database.data.enumeration import Step
-from welearn_database.data.models import (
+from welearn_database.data.models import (  # FilterType,; FilterUsedInQuery,


The inline comment in the welearn_database.data.models import (# FilterType,; FilterUsedInQuery,) looks like leftover editing noise and makes the import harder to read. Please remove it (and if those symbols are still needed, import them cleanly or document the reason separately).

Suggested change

from welearn_database.data.models import ( # FilterType,; FilterUsedInQuery,

from welearn_database.data.models import (

src/app/services/sql_db/queries.py

src/app/api/api_v1/endpoints/micro_learning.py

src/app/services/sql_db/queries.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

lpi-tn added 13 commits March 5, 2026 12:04

Remove unused cache dict

8b5b2f6

feat(micro-learning): update embedding model handling to support mult…

2727965

…iple models

feat(micro-learning): update functions to support multiple embedding …

dbcfde0

…models

type ignore

cf5b0a2

refactor(queries): clean up unused imports in queries.py

df64c55

Remove unused cache dict

d5b2a66

feat(micro-learning): update embedding model handling to support mult…

f8b14cd

…iple models

feat(micro-learning): update functions to support multiple embedding …

cd3a57a

…models

type ignore

94291d9

refactor(queries): clean up unused imports in queries.py

c04374e

Merge remote-tracking branch 'origin/Fix/micro-learning' into Fix/mic…

9c89346

…ro-learning

Fix test

ebe5004

fix(micro-learning): pass model to get_subject_vector in search.py

be8668e

lpi-tn requested review from Copilot and sandragjacinto March 10, 2026 09:36

Copilot started reviewing on behalf of lpi-tn March 10, 2026 09:36 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

refactor: organize imports in agents.py, helpers.py, and queries.py

23908cf

sandragjacinto reviewed Mar 10, 2026

View reviewed changes

src/app/services/sql_db/queries.py Show resolved Hide resolved

sandragjacinto approved these changes Mar 10, 2026

View reviewed changes

Apply suggestions from code review

e0435fe

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

lpi-tn merged commit 5e18f51 into main Mar 10, 2026
3 checks passed

lpi-tn deleted the Fix/micro-learning branch March 10, 2026 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/micro learning#137

Fix/micro learning#137
lpi-tn merged 15 commits intomainfrom
Fix/micro-learning

lpi-tn commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	from welearn_database.data.models import ( # FilterType,; FilterUsedInQuery,
	from welearn_database.data.models import (

Conversation

lpi-tn commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants