Conversation
Merging branch ui-layout to apply ui style changes.
| new_arm_id = None | ||
| if not new_arm_id: | ||
| return HTMLResponse( | ||
| f"<script>alert('Failed to create arm');window.location='/ui/soa/{soa_id}/edit';</script>", |
Check warning
Code scanning / CodeQL
Reflected server-side cross-site scripting Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
The vulnerability arises from directly embedding soa_id, a user-provided parameter (though currently validated as an int), into JavaScript code returned to the client. To fully mitigate any XSS risk, soa_id should be safely encoded before embedding it in HTML or JavaScript sent to the user.
The best fix in this case is to use html.escape() (from the standard library) or fastapi.escape() to ensure that if soa_id ever contains unexpected characters, they are properly escaped. However, since soa_id is expected to be an integer, you can safely convert it using str() before interpolation, and/or escape it for extra safety. Since only shown code snippets may be changed, import html if not present and use html.escape(str(soa_id)) when writing to HTML/JavaScript.
Changes to make:
- In
src/soa_builder/web/app.py, importhtmlnear other imports. - In the HTMLResponse containing
window.location='/ui/soa/{soa_id}/edit', usehtml.escape(str(soa_id))in the f-string to escapesoa_idbefore using it.
| @@ -36,7 +36,7 @@ | ||
| from fastapi.staticfiles import StaticFiles | ||
| from fastapi.templating import Jinja2Templates | ||
| from pydantic import BaseModel | ||
|
|
||
| import html | ||
| from ..normalization import normalize_soa | ||
| from .initialize_database import _connect, _init_db | ||
| from .migrate_database import ( | ||
| @@ -3552,8 +3552,9 @@ | ||
| except Exception: | ||
| new_arm_id = None | ||
| if not new_arm_id: | ||
| escaped_soa_id = html.escape(str(soa_id)) | ||
| return HTMLResponse( | ||
| f"<script>alert('Failed to create arm');window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| f"<script>alert('Failed to create arm');window.location='/ui/soa/{escaped_soa_id}/edit';</script>", | ||
| status_code=500, | ||
| ) | ||
| # Read optional type fields with hyphenated names |
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Arm Type selection: {arm_type_submission}');window.location='/ui/soa/{soa_id}/edit';</script>", |
Check warning
Code scanning / CodeQL
Reflected server-side cross-site scripting Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix this issue, user-controlled input used inside an HTML (and specifically, JavaScript) context must be escaped to prevent arbitrary script execution. The most robust solution is to ensure arm_type_submission is properly escaped before insertion. Since the alert message is delivered inside a script block (not as HTML element content), you must encode the value so that quotes and special characters do not terminate the string or break out of the alert. The safest approach is to serialize the message argument to JavaScript using json.dumps, which will handle escaping of all special characters and quotes, making it safe for embedding in a JS string.
Additionally, other interpolated variables in the script (like soa_id) should ideally be validated or at least stringified/escaped, though if they only ever come from path parameters coerced as int, risk is limited.
You'll need to:
- Import
jsonat the top if not already available (it already is). - Replace the string interpolation for the alert argument with a call to
json.dumps(). - For maximum safety, interpolate variables for the JS location assignment either outside the script or encode with
urllib.parse.quote()—but sincesoa_idis always an int (from the route declaration), direct string interpolation does not present an XSS risk.
Replace the vulnerable line:
return HTMLResponse(
f"<script>alert('Unknown Arm Type selection: {arm_type_submission}');window.location='/ui/soa/{soa_id}/edit';</script>",
status_code=400,
)With:
return HTMLResponse(
f"<script>alert({json.dumps('Unknown Arm Type selection: ' + arm_type_submission)});window.location='/ui/soa/{soa_id}/edit';</script>",
status_code=400,
)This approach ensures the injected string is safely quoted and escaped in the rendered JS, preventing XSS.
| @@ -3593,7 +3593,7 @@ | ||
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Arm Type selection: {arm_type_submission}');window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| f"<script>alert({json.dumps('Unknown Arm Type selection: ' + arm_type_submission)});window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| status_code=400, | ||
| ) | ||
| # Create Code_N |
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Data Origin Type selection: {data_origin_type_submission}');window.location='/ui/soa/{soa_id}/edit';</script>", |
Check warning
Code scanning / CodeQL
Reflected server-side cross-site scripting Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix this reflected XSS vulnerability, we need to ensure that all user-provided data written into HTML/JavaScript contexts is properly escaped. For HTML responses (especially when embedding data inside inline <script> blocks), use html.escape() or the framework's equivalent (fastapi.escape is not available; use Python's standard library). For this case, both data_origin_type_submission and soa_id should be safely converted/escaped before interpolating them into the response.
- For
data_origin_type_submission, usehtml.escape()to sanitize any user input before insertion. - For
soa_id, since it's an integer type, casting tostr()and escaping is safe and recommended. - Ensure
htmlfrom the standard library is imported.
Required changes:
- In the
src/soa_builder/web/app.pyfile, at the alert location:- Escape
data_origin_type_submissionbefore including it in thealertJavaScript. - (Optionally) Escape/cast
soa_idwhen used in the URL in the same response string.
- Escape
- Add
import htmlto the imports at the top.
| @@ -19,6 +19,7 @@ | ||
| import json | ||
| import logging | ||
| import os | ||
| import html | ||
| import re | ||
| import re as _re | ||
| import urllib.parse | ||
| @@ -3630,7 +3631,7 @@ | ||
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Data Origin Type selection: {data_origin_type_submission}');window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| f"<script>alert('Unknown Data Origin Type selection: {html.escape(str(data_origin_type_submission))}');window.location='/ui/soa/{html.escape(str(soa_id))}/edit';</script>", | ||
| status_code=400, | ||
| ) | ||
| # Create Code_N (continue numbering) |
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Arm Type selection: {arm_type_submission}');window.location='/ui/soa/{soa_id}/edit';</script>", |
Check warning
Code scanning / CodeQL
Reflected server-side cross-site scripting Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To remediate this XSS vulnerability, all user input rendered into HTML or JavaScript must be properly escaped or sanitized. In this case, since we're injecting user input inside a JavaScript string (inside <script>alert('...')</script>), it's necessary to escape or sanitize arm_type_submission so that special characters can't break out of the JavaScript string or introduce extra code.
The best approach is:
- Escape dangerous characters in
arm_type_submissionthat could terminate the string or escape into further script execution. This means escaping characters such as quotes, backslashes, angle brackets, line breaks, etc. - Use a helper function to safely encode such strings for JavaScript string contexts. Alternatively, escape the string for HTML context, which will protect against HTML-based injection but not necessarily for JavaScript contexts.
- For Python/FastAPI, you can use
html.escape()for HTML, and for JavaScript, ensure everything including apostrophes and backslashes is escaped. One robust and readable way is to usejson.dumpsto serialize the string, which correctly escapes characters for use in JavaScript strings, as:json.dumps(arm_type_submission). (This produces a valid JS string literal.)
Files/regions to change:
- In
src/soa_builder/web/app.py, within theui_update_armfunction, specifically the block returning the HTMLResponse containing the unescapedarm_type_submissionat line 3760.
Required imports/methods:
- Import
json(already present). - Replace the use of f-string interpolation with proper JavaScript string escaping, ideally using
json.dumps.
| @@ -3756,8 +3756,10 @@ | ||
| arm_id, | ||
| ) | ||
| conn.close() | ||
| # Escape arm_type_submission for JavaScript string context | ||
| safe_type = json.dumps(str(arm_type_submission)) | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Arm Type selection: {arm_type_submission}');window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| f"<script>alert('Unknown Arm Type selection: ' + {safe_type});window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| status_code=400, | ||
| ) | ||
|
|
| ) | ||
| conn.close() | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Data Origin Type selection: {data_origin_type_submission}');window.location='/ui/soa/{soa_id}/edit';</script>", |
Check warning
Code scanning / CodeQL
Reflected server-side cross-site scripting Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix the problem, user-supplied data (data_origin_type_submission and soa_id) must be escaped before it is interpolated into the JavaScript string literal in the response. Specifically, we should escape both for HTML (if they're interpolated outside tags) and for safe inclusion in JavaScript string literals, which is stricter (must escape single quotes, double quotes, newlines, backslashes, and closing script tag sequences). The best fix is to use the standard library or framework escaping functions to sanitize values both for HTML and JavaScript context.
- The recommended approach is to use the
html.escape()function from the standard library for HTML, and additionally to escape single quotes and prevent closing</script>tag injection within the interpolated values. - For the
alertstring (JS context, inside'...'), escape any single quote (') with a backslash, and for general XSS protection, also encode dangerous characters, which might be most easily done with a helper function. - For
soa_idused in the URL within JavaScript, ensure it's safe for both the HTML attribute and JS string – typically restrict to digits (it's an int, which is safe), but sanitize just in case.
Implementation plan:
- Define a helper function, e.g.,
js_string_escape, that escapes single quotes, backslashes, and</script>sequences. - In the affected line, use this function for both
data_origin_type_submissionandsoa_idbefore formatting them into the response string. - Import the required standard library (
html), and define the helper function within the file src/soa_builder/web/app.py.
Files/regions/lines to change:
- src/soa_builder/web/app.py:
- Add
import htmlat top if not present. - Add a helper
js_string_escape()function near the top or above its use. - Replace line 3820 to escape
data_origin_type_submissionandsoa_id.
- Add
| @@ -22,6 +22,7 @@ | ||
| import re | ||
| import re as _re | ||
| import urllib.parse | ||
| import html | ||
| import tempfile | ||
| import time | ||
| from contextlib import asynccontextmanager | ||
| @@ -3816,8 +3817,20 @@ | ||
| arm_id, | ||
| ) | ||
| conn.close() | ||
| # Helper to safely escape for JS string, avoiding XSS | ||
| def js_string_escape(s): | ||
| """Escapes string for safe JS single-quoted string inclusion.""" | ||
| # HTML escape first (for embedded in HTML) | ||
| esc = html.escape(str(s)) | ||
| # Escape single quotes and backslashes | ||
| esc = esc.replace('\\', '\\\\').replace("'", "\\'") | ||
| # Prevent closing script tag broken out (</script>) | ||
| esc = esc.replace('</script>', '<\\/script>') | ||
| return esc | ||
| safe_data_origin_type = js_string_escape(data_origin_type_submission) | ||
| safe_soa_id = js_string_escape(soa_id) | ||
| return HTMLResponse( | ||
| f"<script>alert('Unknown Data Origin Type selection: {data_origin_type_submission}');window.location='/ui/soa/{soa_id}/edit';</script>", | ||
| f"<script>alert('Unknown Data Origin Type selection: {safe_data_origin_type}');window.location='/ui/soa/{safe_soa_id}/edit';</script>", | ||
| status_code=400, | ||
| ) | ||
| # Maintain/Upsert immutable Code_N for DDF mapping |
There was a problem hiding this comment.
Pull request overview
This PR enhances arm management by adding dropdown selections for arm type and data origin type, implementing a full audit trail for arm operations, and displaying arm audit history in the edit interface. The changes also introduce caching mechanisms for biomedical concept categories to improve performance.
Key Changes:
- Added code mapping table and arm audit tracking infrastructure
- Implemented dropdowns for arm type (C174222) and data origin type (C188727) selection
- Added cache mechanisms with force refresh capability for concept categories
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/soa_builder/web/initialize_database.py | Creates new code table for storing terminology mappings |
| src/soa_builder/web/app.py | Adds caching for concept categories, enriches arm data with type/data_origin_type mappings, implements audit tracking for arm updates |
| src/soa_builder/web/templates/edit.html | Adds arm type and data origin type dropdown selectors, displays arm audit report, reorganizes layout |
| src/soa_builder/web/templates/concept_categories.html | Adds force refresh link and reorganizes table columns |
| src/soa_builder/web/templates/concept_category_detail.html | Adds force refresh link with cache bypass indicator |
| src/soa_builder/web/static/style.css | Updates font family |
| tests/test_concept_categories.py | Updates tests to use force=True parameter |
| tests/test_categories_cache.py | New test file for category caching behavior |
| tests/test_categories_ui_force.py | New test file for UI force refresh functionality |
| tests/test_concept_category_force_refresh.py | New test file for concept category force refresh |
| tests/test_concepts_by_category_ui_force.py | New test file for concepts by category UI force refresh |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <table border="1" cellspacing="0" cellpadding="4" id="categoriesTable"> | ||
| <thead> | ||
| <tr> | ||
| <th style="text-align:left;">Concepts</th> |
There was a problem hiding this comment.
[nitpick] The first column header 'Concepts' is ambiguous. Since the column contains a 'View Concepts' link, consider renaming to 'Actions' or 'View' for clarity.
| <th style="text-align:left;">Concepts</th> | |
| <th style="text-align:left;">View</th> |
| if existing: | ||
| try: | ||
| n = max(int(x.split("_")[1]) for x in existing) + 1 | ||
| except Exception: |
There was a problem hiding this comment.
The bare except clause catches all exceptions without logging or context. Consider logging the exception or being more specific about which exceptions to catch (e.g., ValueError, IndexError) to aid debugging.
| except Exception: | |
| except Exception as e: | |
| logger.exception("Failed to parse code_uid for existing arm types; falling back to len(existing) + 1") |
| cur.execute( | ||
| "SELECT code_uid FROM code WHERE soa_id=? AND code_uid LIKE 'Code_%'", | ||
| (soa_id,), | ||
| ) | ||
| existing = [x[0] for x in cur.fetchall() if x[0]] | ||
| n = 1 | ||
| if existing: | ||
| try: | ||
| n = max(int(x.split("_")[1]) for x in existing) + 1 | ||
| except Exception: | ||
| n = len(existing) + 1 | ||
| new_data_origin_uid = f"Code_{n}" | ||
| cur.execute( |
There was a problem hiding this comment.
This code block for computing the next Code_N identifier is duplicated in multiple places (lines 3582-3591, 3629-3640, 3784-3795, 3852-3863). Extract this logic into a helper function like _get_next_code_uid(cur, soa_id) to reduce duplication and improve maintainability.
| cur.execute( | |
| "SELECT code_uid FROM code WHERE soa_id=? AND code_uid LIKE 'Code_%'", | |
| (soa_id,), | |
| ) | |
| existing = [x[0] for x in cur.fetchall() if x[0]] | |
| n = 1 | |
| if existing: | |
| try: | |
| n = max(int(x.split("_")[1]) for x in existing) + 1 | |
| except Exception: | |
| n = len(existing) + 1 | |
| new_data_origin_uid = f"Code_{n}" | |
| cur.execute( | |
| new_data_origin_uid = _get_next_code_uid(cur, soa_id) | |
| cur.execute( |
code_uid column should be 'unique within an SOA' so added a uniqueness constraint on (soa_id, code_uid) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
removed logger info statement Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
removed logger INFO statement Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
corrected spelling Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| monkeypatch.setattr("requests.get", fake_get) | ||
| fetch_biomedical_concepts_by_category(raw_category) | ||
| fetch_biomedical_concepts_by_category(raw_category, force=True) |
There was a problem hiding this comment.
The updated tests now always use force=True, which bypasses cache behavior. Consider adding tests that verify normal cache hit behavior (force=False) to ensure the caching mechanism works correctly in production scenarios.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| # Build mapping code_uid -> submission value (Arm dataOriginType C188727) | ||
| conn_ddf_map = _connect() |
There was a problem hiding this comment.
Commented-out debug logging statement should be removed. If this logging is needed for troubleshooting, consider uncommenting it or removing it entirely to keep the codebase clean.
| # Build mapping code_uid -> submission value (Arm dataOriginType C188727) | |
| conn_ddf_map = _connect() |
| } | ||
|
|
||
| base_arms = _fetch_arms_for_edit(soa_id) | ||
| arms_enriched = [] |
There was a problem hiding this comment.
Commented-out debug logging statement should be removed. If this logging is needed for troubleshooting, consider uncommenting it or removing it entirely to keep the codebase clean.
| arms_enriched = [] | |
Added arm enhancements