Conversation
…ormance improvements included
…ls.py; added caching for API functions
There was a problem hiding this comment.
Pull request overview
Refactors the USDM JSON generator scripts by centralizing shared helper logic into a new usdm_utils.py, while also introducing additional caching/concurrency around CDISC API usage and making related web-app background enrichment adjustments.
Changes:
- Added
src/usdm/usdm_utils.pyto host shared DB/API helper functions (many now cached viafunctools.lru_cache). - Updated USDM generator modules to import shared helpers instead of duplicating local functions.
- Adjusted CDISC enrichment/property-population background workflows and enrichment concurrency in the FastAPI app.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/usdm/usdm_utils.py | New centralized helper module for USDM generators (DB lookups, CDISC calls, caching, loader helpers). |
| src/usdm/generate_usdm.py | Uses shared metadata helper and _nz from web utils instead of local duplicates. |
| src/usdm/generate_study_timings.py | Drops duplicated helpers; relies on usdm_utils + shared _connect / _nz. |
| src/usdm/generate_study_epochs.py | Drops duplicated epoch-term resolution logic; uses usdm_utils helper. |
| src/usdm/generate_study_cells.py | Drops duplicated element-id helper; uses usdm_utils. |
| src/usdm/generate_scheduled_decision_instances.py | Drops duplicated condition-assignment helper; uses usdm_utils. |
| src/usdm/generate_scheduled_activity_instances.py | Drops duplicated activity-id helper; uses usdm_utils. |
| src/usdm/generate_schedule_timelines.py | Moves dynamic generator-loader helpers to usdm_utils. |
| src/usdm/generate_encounters.py | Uses shared timing/rule/code helpers from usdm_utils; tweaks codeSystem output. |
| src/usdm/generate_elements.py | Uses shared transition-rule helpers from usdm_utils. |
| src/usdm/generate_biomedical_concepts.py | Refactors biomedical concept export to DB-driven joins and parallel synonym prefetch. |
| src/usdm/generate_arms.py | Uses shared type/data-origin terminology helpers from usdm_utils. |
| src/usdm/generate_activities.py | Uses shared biomedical concept ID helper from usdm_utils. |
| src/soa_builder/web/utils.py | Removes a generator-specific helper that has moved into usdm_utils. |
| src/soa_builder/web/app.py | Adjusts enrichment concurrency and refactors BC property population background task behavior. |
You can also share your feedback on Copilot code review. Take the survey.
Ci configuration pre-commit-config removed black and falke8 hooks and replaced with ruff check and ruff format
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 25 changed files in this pull request and generated 6 comments.
You can also share your feedback on Copilot code review. Take the survey.
| cur.execute( | ||
| "SELECT MAX(CAST(SUBSTR(biomedical_concept_property_uid, 27) AS INTEGER))" | ||
| " FROM biomedical_concept_property" | ||
| " WHERE soa_id=?" | ||
| " AND biomedical_concept_property_uid LIKE 'BiomedicalConceptProperty_%'", | ||
| (soa_id,), | ||
| ) | ||
| n = (cur.fetchone()[0] or 0) + 1 | ||
| bcp_uid = f"BiomedicalConceptProperty_{n}" |
There was a problem hiding this comment.
Inside _populate_bc_properties_bg, the UID generation does a SELECT MAX(...) for every variable in properties_to_insert. That makes the transaction O(n²) in the number of DSS variables and needlessly increases lock time. Compute the starting max once before the loop and increment a local counter as you insert each property UID.
| @@ -115,25 +35,6 @@ def _get_type_code_tuple(soa_id: int, code_uid: str) -> Tuple[str, str, str, str | |||
| return code_code, code_decode, code_system, code_system_version | |||
There was a problem hiding this comment.
_get_type_code_tuple is annotated as returning Tuple[str, str, str, str], but the implementation returns four lists (e.g., code_code = [r[1] for r in rows]). Please align the return type annotation (and any callers) with the actual return shape (e.g., Tuple[List[str], List[str], List[str], List[str]]) to avoid masking real bugs.
Refactor of all generators to move helper functions to usdm_utils.py.
Added functools caching to API calls.