pendingintent · pendingintent · Jan 20, 2026 · Jan 19, 2026 · Jan 20, 2026 · Jan 20, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,146 @@
+# SoA Workbench - Copilot Instructions
+
+## Project Overview
+Clinical trial Schedule of Activities (SoA) workbench: FastAPI web app + CLI tools for normalizing, expanding, and validating study visit matrices against USDM (Unified Study Definitions Model).
+
+**Core Architecture:**
+- **Web Layer** (`src/soa_builder/web/`): FastAPI app with router-based endpoints, HTMX UI, SQLite persistence
+- **Core Logic** (`src/soa_builder/`): Normalization, schedule expansion, validation modules
+- **USDM Generators** (`src/usdm/`): Transform database state → USDM JSON artifacts
+- **Data Model**: SQLite schema with audit trails, versioning (freezes), and biomedical concept linking
+
+**USDM Model Entities & Relationships** (critical for understanding the domain):
+- **StudyDesign**: Top-level container with arrays of: encounters, activities, arms, epochs, elements, studyCells, scheduleTimelines
+- **StudyElement** (`element` table): Structural design components (e.g., treatment periods, cohorts, crossover phases)
+  - UIDs: `StudyElement_N`; generated by `generate_elements.py`
+  - Purpose: Define "what study structure exists" (design-time components)
+  - Grouped via: **StudyCells** (arm + epoch + elementIds array)
+  - Attributes: transitionStartRule, transitionEndRule, studyInterventionIds
+- **ScheduledActivityInstance** (`instances` table): Temporal visit/timepoint occurrences where activities happen
+  - UIDs: `ScheduledActivityInstance_N`; generated by `generate_scheduled_activity_instances.py`
+  - Purpose: Define "when/where activities occur" (schedule-specific)
+  - Relationships: references epochId, encounterId, activityIds[], timelineId
+  - Contained by: **ScheduleTimeline** (with mainTimeline flag, entryCondition, timings[], instances[])
+- **StudyCell** (`study_cell` table): Junction entity combining armId + epochId + elementIds[]
+  - Defines which study elements apply to which arm/epoch combinations
+  - UID pattern: `StudyCell_N`
+- **ScheduleTimeline** (`schedule_timelines` table): Container for temporal scheduling
+  - Contains: instances[] (ScheduledActivityInstance or ScheduledDecisionInstance)
+  - Contains: timings[] (relative timing definitions), exits[]
+  - Attributes: mainTimeline (boolean), entryCondition, entryId
+- **Encounter** (`visit` table via encounter_uid): Physical/virtual visit where activities occur
+  - Referenced by: ScheduledActivityInstance.encounterId
+  - Linked to: Activities via matrix_cells
+- **Key Distinction**: Elements = structural design (periods, cohorts) | Instances = temporal schedule (visits, timepoints)
+
+## Critical Patterns
+
+### Database & Testing
+- **Test isolation**: Tests run against `soa_builder_web_tests.db` (set via `SOA_BUILDER_DB` env). `tests/conftest.py` enforces isolation by removing WAL/SHM files pre-session
+- **Connection pattern**: Always use `from .db import _connect` (handles pytest detection, WAL mode, busy timeouts)
+- **Schema migrations**: Lifespan event in `app.py` runs migrations in sequence—add new ones to `migrate_database.py`
+
+### Router Architecture
+Endpoints organized by domain in `src/soa_builder/web/routers/`:
+- Each router (visits, activities, epochs, arms, elements, etc.) handles JSON API + HTMX UI variants
+- Pattern: `@router.post("/soa/{soa_id}/visits")` for API, `@router.post("/ui/soa/{soa_id}/visits/create")` for forms
+- Audit trail via `_record_{entity}_audit()` helpers in `audit.py`
+
+### HTMX UI Conventions
+- Templates in `templates/` use `base.html` inheritance
+- Form submissions return HTML partials for HTMX swaps
+- Matrix edit interface (`edit.html`): drag-drop reordering, cell toggling with status rotation (blank → X → O → blank)
+- Modal pattern: target `#modal-host` for freeze/rollback/audit overlays
+
+### External API Integration
+**CDISC Library API** (biomedical concepts):
+- Requires `CDISC_SUBSCRIPTION_KEY` or `CDISC_API_KEY` env vars
+- Caching: `fetch_biomedical_concepts()` with TTL; force refresh via `POST /ui/soa/{id}/concepts_refresh`
+- Override for tests: `CDISC_CONCEPTS_JSON` env (file path or inline JSON)
+- Specializations: SDTM codelists via `fetch_sdtm_specializations()`
+
+### USDM Generation Pipeline
+Scripts in `src/usdm/` convert SoA database → USDM JSON:
+- `generate_activities.py`, `generate_arms.py`, `generate_study_epochs.py`, etc.
+- Each reads from SQLite, constructs USDM objects with UIDs, references, and terminology codes
+- Run via CLI: `python -m usdm.generate_activities --soa-id 1 --output-file output/activities.json`
+- Relies on junction tables (e.g., `activity_concept`, `code_junction_timings`) for terminology linkage
+
+## Key Development Workflows
+
+### Starting the Web Server
+```bash
+source .venv/bin/activate
+soa-builder-web  # or uvicorn soa_builder.web.app:app --reload --port 8000
+```
+Access at `http://localhost:8000`
+
+### Running Tests
+```bash
+pytest  # uses soa_builder_web_tests.db
+pytest tests/test_specific.py -v
+```
+**Important**: Test DB auto-cleans at session start. Manual cleanup if needed:
+```bash
+rm -f soa_builder_web_tests.db*
+```
+
+### Pre-commit Hooks
+```bash
+pre-commit install
+pre-commit run --all-files  # runs black + pytest + flake8
+```
+
+### CLI Commands
+```bash
+# Normalize wide CSV → relational tables
+soa-builder normalize --input files/SoA.csv --out-dir normalized/
+
+# Expand repeating rules → calendar instances
+soa-builder expand --normalized-dir normalized/ --start-date 2025-01-01
+
+# Validate imaging intervals
+soa-builder validate --normalized-dir normalized/
+```
+
+## Code Conventions
+
+### UID Generation
+- Auto-generated UIDs follow pattern: `{EntityName}_{incrementing_id}`
+- Use `get_next_code_uid()` / `get_next_concept_uid()` from `utils.py`
+- Once assigned, UIDs are immutable (e.g., `arm_uid`, `element_uid`)
+
+### Audit Pattern
+All entity mutations log before/after state:
+```python
+from .audit import _record_element_audit
+_record_element_audit(soa_id, "update", element_id, before=old_state, after=new_state)
+```
+
+### Reorder Operations
+- Client sends `order: List[int]` (entity IDs in new sequence)
+- Server recomputes `sequence_index` field for all items
+- Audit logged with `entity_reorder_audit` table
+
+### Freeze & Rollback
+- **Freeze**: Snapshot visits/activities/cells/epochs/arms to `{entity}_freeze` tables
+- **Rollback**: Restore from freeze, track diffs in `rollback_audit`
+- UI: Modal shows diff summary, confirms restore
+
+## Common Gotchas
+
+1. **Always activate venv first**: `source .venv/bin/activate` before any command
+2. **Test DB separation**: Don't run tests against prod DB—conftest enforces `SOA_BUILDER_DB`
+3. **HTMX partial responses**: UI endpoints must return HTML fragments, not full pages
+4. **SQLite WAL mode**: Production uses WAL; tests use DELETE for simpler cleanup
+5. **Concept API 401s in browser**: Direct API URLs fail (no auth headers)—use internal detail pages
+6. **Migration order matters**: New migrations go at end of lifespan event sequence
+7. **Pydantic schemas**: Use `schemas.py` models for request validation, not raw dicts
+8. **Router imports**: Import routers at top of `app.py`, mount with `app.include_router()`
+
+## Reference Files
+- **API endpoints catalog**: `docs/api_endpoints.csv` (165 endpoints: method, path, type, description, response format)
- **API endpoints catalog**: `docs/api_endpoints.csv` (165 endpoints: method, path, type, description, response format)
+- **API endpoints catalog**: `docs/api_endpoints.csv` (API endpoints: method, path, type, description, response format)
- **API endpoints catalog**: `docs/api_endpoints.csv` (165 endpoints: method, path, type, description, response format)
+- **API endpoints catalog**: `docs/api_endpoints.csv` (API endpoints: method, path, type, description, response format)
+- **Full API docs**: `README_endpoints.md` (curl examples, response schemas)
+- **Main README**: Installation, server start, test setup
+- **Database schema**: Infer from `initialize_database.py` + migrations in `migrate_database.py`
+- **Test patterns**: See `tests/test_bulk_import.py` for matrix operations, `test_element_audit_endpoint.py` for audit trails
diff --git a/README.md b/README.md
@@ -61,69 +61,45 @@ pytest
 rm -f soa_builder_web_tests.db soa_builder_web_tests.db-wal soa_builder_web_tests.db-shm
 ```
 
-> Full, updated endpoint reference (including Elements, freezes, audits, JSON CRUD and UI helpers) lives in `README_endpoints.md`. Consult that file for detailed request/response examples, curl snippets, and future enhancement notes.
+> **Full API Documentation**: See `README_endpoints.md` for complete endpoint reference with curl examples, request/response schemas, and usage patterns.
+>
+> **Endpoint Catalog**: See `docs/api_endpoints.csv` for sortable/filterable list of all 165+ endpoints.
 
-Endpoints:
+## USDM Export
+Export USDM-compliant JSON for integration with external systems:
+```bash
+# Get normalized USDM JSON for a study
+curl http://localhost:8000/soa/1/normalized
 
-See **docs/api_endpoints.xlsx**
+# Or use the USDM generator scripts directly
+python -m usdm.generate_activities --soa-id 1 --output-file activities.json
+python -m usdm.generate_encounters --soa-id 1 --output-file encounters.json
+python -m usdm.generate_study_epochs --soa-id 1 --output-file epochs.json
+# See src/usdm/ for all generator scripts
+```
 
-## Experimental (not yet supported)
-After populating data, retrieve normalized artifacts:
+## CLI Tools (Legacy)
+Command-line tools for CSV normalization and validation:
 ```bash
-curl http://localhost:8000/soa/1/normalized
+# Normalize wide CSV → relational tables
+soa-builder normalize --input files/SoA.csv --out-dir normalized/
+
+# Expand repeating rules → calendar instances
+soa-builder expand --normalized-dir normalized/ --start-date 2025-01-01
+
+# Validate imaging intervals
+soa-builder validate --normalized-dir normalized/
 ```
-### Source
-Input format: first column `Activity`, subsequent columns are visit/timepoint headers. Cells contain markers `X`, `Optional`, `If indicated`, or repeating patterns (`Every 2 cycles`, `q12w`).
-
-### Output Artifacts
-Running the script produces (in `--out-dir`):
-- `visits.csv` — One row per visit/timepoint with parsed window info, inferred category, repeat pattern.
-- `activities.csv` — Unique activities (one per original row).
-- `visit_activities.csv` — Junction table mapping activities to visits with status and flags.
-- `activity_categories.csv` — Heuristic classification of each activity (labs, imaging, dosing, admin, etc.).
-- `schedule_rules.csv` — Extracted repeating schedule logic from headers and cells (e.g., `q12w`, `Every 2 cycles`).
-- Optional: SQLite database (`--sqlite path`) containing all tables.
-
-### visits.csv Columns
-- `visit_id`: Sequential numeric id.
-- `label`: Original header text.
-- `visit_name`: Header stripped of parenthetical codes.
-- `visit_code`: Code extracted from parentheses (e.g., `C1D1`, `EOT`).
-- `sequence_index`: Positional order.
-- `window_lower` / `window_upper`: Parsed day offsets if available.
-- `repeat_pattern`: Detected repeating pattern (e.g., `every 2 cycles`).
-- `category`: Heuristic classification (screening, baseline, treatment, follow_up, eot).
-
-### activities.csv Columns
-- `activity_id`: Sequential id.
-- `activity_name`: Name from first column.
-
-### visit_activities.csv Columns
-- `id`: Junction id.
-- `visit_id`: FK to visits.
-- `activity_id`: FK to activities.
-- `status`: Raw cell content.
-- `required_flag`: 1 if cell starts with `X`.
-- `conditional_flag`: 1 if cell contains `Optional` or `If indicated`.
-
-### activity_categories.csv Columns
-- `activity_id`: FK to activities.
-- `category`: Assigned heuristic category label.
-
-### schedule_rules.csv Columns
-- `rule_id`: Unique rule id.
-- `pattern`: Normalized repeating pattern token (e.g., `q12w`).
-- `description`: Human readable description of pattern source.
-- `source_type`: `header` or `cell` origin.
-- `activity_id`: Populated if pattern came from a cell (else null).
-- `visit_id`: Populated if pattern came from a header.
-- `raw_text`: Original text fragment containing the pattern.
-
-
-
-# Notes:
-- HTMX is loaded via CDN; no build step required.
-- For production, configure a persistent DB path via SOA_BUILDER_DB env variable.
-
-Artifacts stored under `normalized/soa_{id}/`.
+See `.github/copilot-instructions.md` for detailed CLI usage patterns.
+
+---
+
+## Architecture Notes
+- **Web UI**: HTMX loaded via CDN; no build step required
+- **Database**: SQLite with WAL mode (production) or DELETE mode (tests)
+- **Test Isolation**: Tests use `soa_builder_web_tests.db` (set via `SOA_BUILDER_DB` env var)
+- **Production Config**: Set `SOA_BUILDER_DB` environment variable for persistent DB path
+- **USDM Generators**: Python scripts in `src/usdm/` transform database state → USDM JSON artifacts
+
+For detailed architectural patterns, USDM entity relationships, and development workflows, see `.github/copilot-instructions.md`.