Conversation
Merging branch ui-layout to apply ui style changes.
code_uid column should be 'unique within an SOA' so added a uniqueness constraint on (soa_id, code_uid) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
removed logger info statement Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
removed logger INFO statement Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
corrected spelling Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR enhances arm functionality by adding support for type classification and data origin tracking. The changes introduce a code junction table to maintain immutable identifiers (Code_N) that map arms to terminology codelists, allowing arms to be classified by type (C174222) and data origin type (C188727). Additionally, cache management was improved for biomedical concept categories and concepts by category, with new force-refresh UI controls.
Key Changes
- Added
codetable to store immutable Code_uid identifiers linking arms to terminology codelists - Enhanced arm forms with dropdown selections for type (C174222) and data origin type (C188727)
- Implemented caching with force-refresh capability for biomedical concept categories and per-category concepts
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/soa_builder/web/app.py |
Core changes: added code table logic, enriched arm UI with terminology dropdowns, implemented category/concept caching with force parameter, added audit trail display for arms |
src/soa_builder/web/templates/edit.html |
Restructured layout moving epochs section, added arm type/data origin type dropdowns, added arm audit section |
src/soa_builder/web/templates/concept_category_detail.html |
Added force-refresh link for bypassing cache |
src/soa_builder/web/templates/concept_categories.html |
Added force-refresh link and restructured table columns |
src/soa_builder/web/initialize_database.py |
Added code table schema with unique constraint on soa_id+code_uid |
src/soa_builder/web/db.py |
Enhanced connection with WAL mode and timeout settings for concurrency |
src/soa_builder/web/static/style.css |
Updated font-family to simpler stack |
tests/test_concept_categories.py |
Updated tests to pass force=True parameter |
tests/test_categories_cache.py |
New tests for categories cache with force bypass |
tests/test_categories_ui_force.py |
New UI tests for categories force refresh |
tests/test_concept_category_force_refresh.py |
New tests for per-category concept force refresh |
tests/test_concepts_by_category_ui_force.py |
New UI tests for concepts by category force bypass |
tests/test_code_uid_generation.py |
New tests for Code_N generation logic |
normalize_soa.py |
Removed unused imports |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert({json.dumps('Unknown Arm Type selection: ' + arm_type_submission)});window.location='/ui/soa/{soa_id}/edit';</script>", |
Check warning
Code scanning / CodeQL
Reflected server-side cross-site scripting Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To eliminate the XSS vulnerability, any use of user input in server-rendered HTML/JS should be safely escaped both for HTML and JS contexts. Although json.dumps is used for JavaScript string encoding, we must ensure any HTML context (such as inside <script>/HTML) also receives proper escaping. For the fix:
- Use FastAPI's
html.escape()(or fallback to standard libraryhtml.escape()) to escapearm_type_submissionbefore interpolating in the HTML response. - Ensure that
soa_idis safely encoded for inclusion in the redirect URL (as it's an integer, casting tostrand using f-strings is safe, but explicitly escape). - Only edit the lines generating the HTMLResponse at line 3762 in
src/soa_builder/web/app.py. - Add any necessary imports (e.g.,
import htmlifhtml.escapeis not already imported).
This will prevent malicious content in arm_type_submission from being interpreted as HTML/JS if injected, and ensure no untrusted data can break out of the intended string contexts.
| @@ -36,7 +36,7 @@ | ||
| from fastapi.staticfiles import StaticFiles | ||
| from fastapi.templating import Jinja2Templates | ||
| from pydantic import BaseModel | ||
|
|
||
| import html | ||
| from ..normalization import normalize_soa | ||
| from .initialize_database import _connect, _init_db | ||
| from .migrate_database import ( | ||
| @@ -3759,7 +3759,7 @@ | ||
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert({json.dumps('Unknown Arm Type selection: ' + arm_type_submission)});window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| f"<script>alert({json.dumps('Unknown Arm Type selection: ' + html.escape(arm_type_submission))});window.location='/ui/soa/{html.escape(str(soa_id))}/edit';</script>", | ||
| status_code=400, | ||
| ) | ||
|
|
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert({json.dumps(f'Unknown Data Origin Type selection: {data_origin_type_submission}')});window.location='/ui/soa/{soa_id}/edit';</script>", |
Check warning
Code scanning / CodeQL
Reflected server-side cross-site scripting Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix the issue, we must ensure that any user-provided data that is included in the returned HTML (particularly inside inline script blocks) is properly escaped so that it cannot prematurely terminate literals or execute arbitrary JavaScript. The safest solution here is to not return user data in a script block at all, or to ensure robust encoding for all user data, especially if it is interpolated inside a JS string. Since this code aims to show an error via an alert and redirect, we can encode the data more defensively:
- Escape the user-provided value using
html.escape()to prevent breaking out of HTML/script contexts. - Rely on
json.dumps()to escape problematic JS characters, but also validate and sanitize inputs to ensure safety. - If the value might contain
</script>, replace it to avoid closing the script tag accidentally (rare, but possible). - If the underlying intent is to present an error, consider rendering the error in a non-script HTML context (e.g., a rendered page).
In context, the safe short-term patch is to wrap the interpolated value in both html.escape() before json.dumps(), ensuring any problematic HTML characters are neutralized prior to rendering the script. Required change is in src/soa_builder/web/app.py at line 3822—add the import for html if needed and modify the response construction.
| @@ -27,6 +27,7 @@ | ||
| from contextlib import asynccontextmanager | ||
| from datetime import datetime, timezone | ||
| from typing import Any, List, Optional | ||
| import html | ||
|
|
||
| import pandas as pd | ||
| import requests | ||
| @@ -3818,8 +3819,11 @@ | ||
| arm_id, | ||
| ) | ||
| conn.close() | ||
| # Escape user input for HTML context before passing to json.dumps for JS string escaping | ||
| safe_data_origin_type = html.escape(data_origin_type_submission) | ||
| alert_msg = f"Unknown Data Origin Type selection: {safe_data_origin_type}" | ||
| return HTMLResponse( | ||
| f"<script>alert({json.dumps(f'Unknown Data Origin Type selection: {data_origin_type_submission}')});window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| f"<script>alert({json.dumps(alert_msg)});window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| status_code=400, | ||
| ) | ||
| # Maintain/Upsert immutable Code_N for DDF mapping |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -2642,6 +2685,7 @@ def _reindex(table: str, soa_id: int): | |||
|
|
|||
| @app.delete("/soa/{soa_id}/visits/{visit_id}") | |||
| def delete_visit(soa_id: int, visit_id: int): | |||
There was a problem hiding this comment.
[nitpick] The docstring format is inconsistent. Most endpoint docstrings are single-line imperative (e.g., 'Delete Visit from an SoA.'), but several new/updated functions use passive voice or differ in style. Consider standardizing all docstrings to use the imperative mood for consistency.
| <button style="background:#1976d2;color:#fff;border:none;padding:4px 10px;border-radius:4px;cursor:pointer;">Add Arm (auto StudyArm_N)</button> | ||
| </form> | ||
| </details> | ||
|
|
There was a problem hiding this comment.
[nitpick] Extra blank line before closing div. While not functionally problematic, this creates inconsistent whitespace in the template that could be removed for cleaner formatting.
src/soa_builder/web/app.py
Outdated
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Arm Type selection: {html.escape(str(arm_type_submission))}');window.location='/ui/soa/{int(soa_id)}/edit';</script>", |
There was a problem hiding this comment.
While html.escape is used, the escaped content is placed inside a JavaScript string literal within an alert. This approach is vulnerable to XSS if the submission contains single quotes or backslashes. Use json.dumps for proper JavaScript string escaping, as done correctly at line 3767.
| f"<script>alert('Unknown Arm Type selection: {html.escape(str(arm_type_submission))}');window.location='/ui/soa/{int(soa_id)}/edit';</script>", | |
| f"<script>alert('Unknown Arm Type selection: {json.dumps(str(arm_type_submission))});window.location='/ui/soa/{int(soa_id)}/edit';</script>", |
src/soa_builder/web/app.py
Outdated
| escaped_selection = ( | ||
| html.escape(data_origin_type_submission, quote=True) | ||
| .replace("\\", "\\\\") | ||
| .replace("'", "\\'") | ||
| ) | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Data Origin Type selection: {escaped_selection}');window.location='/ui/soa/{soa_id}/edit';</script>", |
There was a problem hiding this comment.
Manual JavaScript string escaping is error-prone and incomplete. This doesn't handle all edge cases (e.g., newlines, Unicode escapes). Use json.dumps() instead, which properly escapes strings for JavaScript contexts, as done at lines 3767 and 3827.
| escaped_selection = ( | |
| html.escape(data_origin_type_submission, quote=True) | |
| .replace("\\", "\\\\") | |
| .replace("'", "\\'") | |
| ) | |
| return HTMLResponse( | |
| f"<script>alert('Unknown Data Origin Type selection: {escaped_selection}');window.location='/ui/soa/{soa_id}/edit';</script>", | |
| escaped_selection = json.dumps(data_origin_type_submission) | |
| return HTMLResponse( | |
| f"<script>alert({escaped_selection});window.location='/ui/soa/{soa_id}/edit';</script>", |
| row_info = cur_info.fetchone() | ||
| if row_info: | ||
| soa_name_val, study_id_val, study_label_val, study_desc_val, created_at_val = ( | ||
| row_info | ||
| ) |
There was a problem hiding this comment.
[nitpick] The unpacking logic is repeated in both the if and else branches with different fallback values. Consider extracting this into a helper function or simplifying the pattern to reduce duplication and improve maintainability.
src/soa_builder/web/app.py
Outdated
| type = a.get("type") | ||
| type_display = code_to_submission.get(type) | ||
| data_origin_type = a.get("data_origin_type") | ||
| data_origin_type_display = ddf_code_to_submission.get(data_origin_type) | ||
| if type_display is None and type: | ||
| type_display = type if type in submission_values else None |
There was a problem hiding this comment.
Variable name 'type' shadows the built-in type() function. Use a more descriptive name like 'arm_type' or 'type_code' to avoid shadowing built-ins.
| type = a.get("type") | |
| type_display = code_to_submission.get(type) | |
| data_origin_type = a.get("data_origin_type") | |
| data_origin_type_display = ddf_code_to_submission.get(data_origin_type) | |
| if type_display is None and type: | |
| type_display = type if type in submission_values else None | |
| arm_type = a.get("type") | |
| type_display = code_to_submission.get(arm_type) | |
| data_origin_type = a.get("data_origin_type") | |
| data_origin_type_display = ddf_code_to_submission.get(data_origin_type) | |
| if type_display is None and arm_type: | |
| type_display = arm_type if arm_type in submission_values else None |
Consistent capitalization Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cur = conn.cursor() | ||
| cur.execute( | ||
| "SELECT id,name,label,description,order_index FROM arm WHERE soa_id=? ORDER BY order_index", | ||
| "SELECT id,name,label,description,order_index,COALESCE(type,''),COALESCE(data_origin_type,'') FROM arm WHERE soa_id=? ORDER BY order_index", |
There was a problem hiding this comment.
The SELECT statement returns 7 columns but the code constructs dict entries using hardcoded indices (r[0] through r[6]). This is fragile and error-prone if column order changes. Consider using column names with cur.description or aliasing columns to make the mapping more explicit and maintainable.
| cur_info = conn_info.cursor() | ||
| cur_info.execute( | ||
| "SELECT name, created_at, study_id, study_label, study_description FROM soa WHERE id=?", | ||
| "SELECT name, study_id, study_label, study_description, created_at FROM soa WHERE id=?", |
There was a problem hiding this comment.
The SELECT column order was changed from the original (name, created_at, study_id, study_label, study_description) to (name, study_id, study_label, study_description, created_at), but the variable assignment at line 2260 expects the new order. While the code is correct, this reordering introduces unnecessary risk. Consider keeping the original column order for consistency with other queries in the codebase.
| <button style="background:#1976d2;color:#fff;border:none;padding:4px 10px;border-radius:4px;cursor:pointer;">Add Epoch</button> | ||
| </form> | ||
| </details> | ||
| <details id="elements-section" open class="collapsible"> |
There was a problem hiding this comment.
[nitpick] The removal of style=\"margin-top:10px;\" from this line (previously line 137) while keeping it on line 200 for arms-section creates inconsistent spacing between sections. Consider applying consistent spacing to all section details elements.
| <details id="elements-section" open class="collapsible"> | |
| <details id="elements-section" open class="collapsible" style="margin-top:10px;"> |
| @@ -14,7 +18,7 @@ <h2>Biomedical Concept Categories (<span id="conceptTotal">{{ count }}</span>)</ | |||
| {% if rows %} | |||
| <table border="1" cellspacing="0" cellpadding="4" id="categoriesTable"> | |||
| <thead> | |||
There was a problem hiding this comment.
The table header structure is incomplete. Line 21 adds a new 'Concepts' header, but the opening <tr> tag that should precede it was removed. This creates invalid HTML.
| <thead> | |
| <thead> | |
| <tr> |
| @@ -259,7 +258,6 @@ def build_visit_activities( | |||
| vas: List[VisitActivity] = [] | |||
| next_id = 1 | |||
| for a_idx, r in enumerate(rows, start=1): | |||
There was a problem hiding this comment.
[nitpick] The variable 'activity_name' was removed but 'a_idx' is still present in the enumerate(). If 'activity_name' is truly unused, consider also removing 'a_idx' or using '_' for the unused index variable: for _, r in enumerate(rows, start=1):
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Arm Type selection: {json.dumps(str(arm_type_submission))});window.location='/ui/soa/{int(soa_id)}/edit';</script>", |
Check warning
Code scanning / CodeQL
Reflected server-side cross-site scripting Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix the issue, all instances where user-supplied input from form submissions (arm_type_submission and similarly, data_origin_type_submission if used in an alert) are interpolated into HTML/JavaScript must be properly escaped to prevent XSS.
- The best practice is to escape the value for safe inclusion inside a JavaScript string inside an HTML page. Since it's being injected into a single-quoted JavaScript string, escaping single quotes and other relevant characters is important.
- The Python standard library's
html.escape()function should be used to escape values before including them in any HTML output. - Additionally, for use inside a
<script>/JS context, it's even more robust to use a dedicated function to safely encode for JavaScript (by escaping',",\,<,>, and possibly more). However,html.escape()will suffice in most cases, as its primary role is to prevent HTML-breaking, and using double quotes inside single quotes mitigates some risk. - In this block, before interpolating values into the HTMLResponse, pass them through
html.escape(requires importinghtml). - Update the block in src/soa_builder/web/app.py, replacing direct use of the user-supplied value with its escaped form.
- Add
import htmlat the top of the file if not already present.
| @@ -22,6 +22,7 @@ | ||
| import re | ||
| import re as _re | ||
| import urllib.parse | ||
| import html | ||
| import tempfile | ||
| import time | ||
| from contextlib import asynccontextmanager | ||
| @@ -3592,8 +3593,10 @@ | ||
| soa_id, | ||
| ) | ||
| conn.close() | ||
| # Escape user input for safe HTML/JavaScript inclusion | ||
| safe_arm_type_submission = html.escape(str(arm_type_submission)) | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Arm Type selection: {json.dumps(str(arm_type_submission))});window.location='/ui/soa/{int(soa_id)}/edit';</script>", | ||
| f"<script>alert('Unknown Arm Type selection: {json.dumps(safe_arm_type_submission)}');window.location='/ui/soa/{int(soa_id)}/edit';</script>", | ||
| status_code=400, | ||
| ) | ||
| # Create Code_N |
| # Properly escape the value for safety in HTML/JS context | ||
| escaped_selection = json.dumps(data_origin_type_submission) | ||
| return HTMLResponse( | ||
| f"<script>alert({escaped_selection});window.location='/ui/soa/{soa_id}/edit';</script>", |
Check warning
Code scanning / CodeQL
Reflected server-side cross-site scripting Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
The main risk is that interpolating untrusted data into a <script> block could allow an attacker to inject arbitrary JavaScript code if escaping is incomplete or broken. To ensure safety:
- User-submitted values displayed via JS must be JSON-encoded (which is already being done), but also ensure no other context leaks (e.g., HTML attributes, URLs).
- Route parameters (
soa_id) should be type-checked, but should also always be escaped when re-injected into HTML or JS contexts—even if FastAPI type conversion is used. - Use Python's
html.escape()(standard library from Python 3.2+) for HTML escaping and continue usingjson.dumps()for JS context. - Thus, change the return line on error (line 3635) to:
- Escape
soa_idfor HTML context (e.g., usinghtml.escape(str(soa_id))). - Leave
escaped_selectionasjson.dumps(data_origin_type_submission), which is correct for placing into JS. - Use in the return string:
<script>alert({escaped_selection});window.location='/ui/soa/{escaped_soaid}/edit';</script>.
- Escape
- Add the appropriate import:
import html. - If
soa_idis always an int due to FastAPI's type system, escaping viahtml.escape(str(soa_id))is defensive and correct.
These changes should be applied in src/soa_builder/web/app.py only in the error response fragment.
| @@ -24,6 +24,7 @@ | ||
| import urllib.parse | ||
| import tempfile | ||
| import time | ||
| import html | ||
| from contextlib import asynccontextmanager | ||
| from datetime import datetime, timezone | ||
| from typing import Any, List, Optional | ||
| @@ -3629,10 +3630,11 @@ | ||
| soa_id, | ||
| ) | ||
| conn.close() | ||
| # Properly escape the value for safety in HTML/JS context | ||
| # Properly escape the value for safety in HTML/JS and URL context | ||
| escaped_selection = json.dumps(data_origin_type_submission) | ||
| escaped_soaid = html.escape(str(soa_id)) | ||
| return HTMLResponse( | ||
| f"<script>alert({escaped_selection});window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| f"<script>alert({escaped_selection});window.location='/ui/soa/{escaped_soaid}/edit';</script>", | ||
| status_code=400, | ||
| ) | ||
| # Create Code_N (continue numbering) |
Added arm functionality