A Python utility that validates USDM (Unified Study Definitions Model) JSON files using the CDISC Rules Engine (CORE).
- Python 3.10+
- CDISC Library API key (obtain from CDISC Library)
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install cdisc-rules-engineSet your CDISC Library API key:
export CDISC_API_KEY="your-api-key-here"
# or
export CDISC_LIBRARY_API_KEY="your-api-key-here"The utility automatically maps CDISC_API_KEY to CDISC_LIBRARY_API_KEY if needed.
python usdm_validate.py <usdm_file.json> [options]| Option | Description |
|---|---|
-v, --version |
USDM version: 3-0 or 4-0 (default: 4-0) |
-o, --output |
Output file for validation results (default: stdout) |
-f, --format |
Output format: json or text (default: text) |
--verbose |
Show verbose output during validation |
# Basic validation (USDM 4.0, text output)
python usdm_validate.py study.json
# Validate USDM 3.0 file
python usdm_validate.py study.json -v 3-0
# Output to JSON file
python usdm_validate.py study.json -o results.json -f json
# Verbose mode (shows library logging)
python usdm_validate.py study.json --verbose| Code | Meaning |
|---|---|
| 0 | Validation passed (no issues found) |
| 1 | Validation completed with issues found |
| 2 | Error during validation (file not found, invalid JSON, etc.) |
============================================================
USDM Validation Report
============================================================
File: study.json
Rules executed: 205
CT packages available: 200
CT packages loaded: sdtmct-2025-09-26, ddfct-2025-09-26
Found 609 validation issue(s):
(Plus 225 rule execution errors from 4 rules)
------------------------------------------------------------
Rule: CORE-000996
Message: The planned sex includes more than a single entry...
Errors (1):
- {'value': {...}, 'dataset': 'StudyDesignPopulation', ...}
{
"file": "study.json",
"rules_executed": 205,
"ct_packages_available": 200,
"ct_packages_loaded": ["sdtmct-2025-09-26", "ddfct-2025-09-26"],
"ct_packages": ["adamct-2014-09-26", ...],
"results": [...]
}- Initialization: Sets up the CDISC Rules Engine with in-memory cache
- CT Package Loading: Extracts
codeSystemVersionvalues from the USDM file and loads corresponding SDTM and DDF CT packages - Rule Download: Downloads USDM validation rules from CDISC Library (cached after first run)
- JSONata Setup: Downloads required JSONata custom functions from GitHub
- Validation: Executes each rule against the USDM data
- Reporting: Formats and outputs results, separating validation issues from execution errors
usdm_validate.py
├── setup_ct_packages() # Load CT package list and service
├── load_ct_package_data() # Load actual codelist terms
├── get_ct_versions_from_usdm() # Extract CT versions from data
├── setup_jsonata_resources() # Download JSONata functions
├── load_rules_from_library() # Download USDM rules
├── validate_usdm() # Main validation entry point
├── _run_validation() # Internal validation logic
├── format_results_text() # Text output formatter
└── format_results_json() # JSON output formatter
The CDISC Rules Engine produces extensive logging and print output that overwhelms the terminal. The solution:
# Disable all logging globally
logging.disable(logging.CRITICAL)
# Suppress stdout/stderr during validation
class SuppressOutput:
def __enter__(self):
sys.stdout = io.StringIO()
sys.stderr = io.StringIO()
def __exit__(self, *args):
# Restore original streamsThe rules engine expects resources in the current working directory. The utility changes to the site-packages directory during validation:
_CDISC_PACKAGE_DIR = Path(cdisc_rules_engine.__file__).parent.parent
os.chdir(_CDISC_PACKAGE_DIR) # Before validation
os.chdir(original_cwd) # After validationProblem: CT validation fails with "codeSystemVersion is not a valid terminology package date" even when the packages exist.
Root Cause: LibraryMetadataContainer needs both:
published_ct_packages: List of available package names (for date validation)ct_package_metadata: Actual codelist content (for code/decode validation)
Solution: Load both from CDISC Library:
# Get package list
ct_packages = library_service.get_all_ct_packages()
# Load actual codelist data for versions used in the USDM file
for ct_version in get_ct_versions_from_usdm(usdm_data):
for ct_type in ["sdtmct", "ddfct"]:
package_name = f"{ct_type}-{ct_version}"
ct_data = library_service.get_codelist_terms_map(package_name)
ct_package_metadata[package_name] = ct_data
# Pass both to the container
library_metadata = LibraryMetadataContainer(
published_ct_packages=ct_packages,
ct_package_metadata=ct_package_metadata,
)For USDM validation, the relevant CT package types are (from valid_codelist_dates.py):
sdtmct- SDTM Controlled Terminologyddfct- DDF (Digital Data Flow) Controlled Terminology
Some CORE rules use JSONata expressions that require custom functions. These are downloaded from GitHub:
JSONATA_FILES = [
"https://raw.githubusercontent.com/cdisc-org/cdisc-rules-engine/main/resources/jsonata/parse_refs.jsonata",
"https://raw.githubusercontent.com/cdisc-org/cdisc-rules-engine/main/resources/jsonata/sift_tree.jsonata",
]Problem: Many errors reported as validation issues are actually rule execution errors - they occur because the rule doesn't apply to this particular USDM file's structure.
Types of Execution Errors:
-
Column not found in data: Rules like CORE-000414, CORE-000416, CORE-000419, CORE-000849 are configured to apply to
ALLentities but check for fields likenextIdandpreviousIdthat only exist on certain entity types (Encounter, ScheduledActivityInstance, etc.). -
Preprocessing failed: Rules like CORE-000815 and CORE-000875 require joining with a
StudyCohortdataset, but studies without cohorts defined will fail preprocessing. The error message is:Failed to find related dataset for 'StudyCohort' in preprocessor.
Solution: Separate execution errors from validation findings:
def _is_execution_error(error: dict) -> bool:
if isinstance(error, dict):
error_type = error.get("error", "")
# Column not found - rule checks fields that don't exist on entity
if error_type == "Column not found in data":
return True
# Preprocessing failed - rule requires dataset not in USDM file
if error_type == "Error occurred during dataset preprocessing":
return True
return FalseThe output now shows:
Found 609 validation issue(s):
(Plus 227 rule execution errors from 6 rules)
Problem: Rules that validate XHTML content (CORE-000945, CORE-001069) fail with:
XSD file could not be found: Error reading file 'resources/schema/xml/cdisc-usdm-xhtml-1.0/usdm-xhtml-1.0.xsd'
Root Cause: The XSD schema files required for XHTML validation are not included in the pip package. The XHTML 1.1 schema is complex with 70+ interdependent files.
Solution: The utility automatically downloads the required XSD schema files from the CDISC Rules Engine GitHub repository on first run:
# GitHub base URL for XSD schema files
XSD_GITHUB_BASE = "https://raw.githubusercontent.com/cdisc-org/cdisc-rules-engine/main/resources/schema/xml"
# USDM XHTML schema files (required for XHTML validation rules)
USDM_XHTML_SCHEMA_FILES = [
"cdisc-usdm-xhtml-1.0/usdm-xhtml-1.0.xsd",
"cdisc-usdm-xhtml-1.0/usdm-xhtml-extension.xsd",
"cdisc-usdm-xhtml-1.0/usdm-xhtml-ns.xsd",
]
# Core XHTML 1.1 schema files (70+ files with complex dependencies)
XHTML_SCHEMA_FILES = [
"xhtml-1.1/xhtml11.xsd",
"xhtml-1.1/xhtml-hypertext-1.xsd",
# ... all 70+ XHTML 1.1 schema files
]
def setup_xsd_schema_resources():
"""Download XSD schema files from GitHub if they don't exist locally."""
# Creates directories and downloads all schema files
for schema_path in USDM_XHTML_SCHEMA_FILES + XHTML_SCHEMA_FILES:
url = f"{XSD_GITHUB_BASE}/{schema_path}"
urllib.request.urlretrieve(url, filepath)Key Insight: The XHTML 1.1 schema uses xs:redefine and xs:include extensively, requiring all schema files to be present. Partial downloads fail with errors like:
Failed to load the document 'xhtml-hypertext-1.xsd' for redefinition
Result: Rules CORE-000945 and CORE-001069 now execute successfully and validate XHTML content in eligibility criteria and narrative content items.
Some rules have bugs in the CDISC Rules Engine (JSONata/NoneType errors):
EXCLUDED_RULES = {
"CORE-000955", # JSONata bug
"CORE-000956", # JSONata bug
}Problem: from cdisc_rules_engine.services.cache.cache_populator_service import CachePopulator fails with ModuleNotFoundError: No module named 'scripts'.
Root Cause: CachePopulator imports from a scripts module that isn't part of the pip package.
Solution: Implement CT package loading directly using CDISCLibraryService.get_codelist_terms_map() instead of using CachePopulator.
| Rule | Description |
|---|---|
| CORE-001013 | Duplicate names across instances of the same class |
| CORE-001051 | NarrativeContent missing child or content item |
| CORE-000427 | Code/decode mismatch within codeSystem/version |
| CORE-000996 | Invalid planned sex configuration |
| CORE-001077 | Study intervention count mismatch for model type |
| Category | Rules | Description |
|---|---|---|
| Column not found | CORE-000414, CORE-000416, CORE-000419, CORE-000849 | Rules checking nextId/previousId on entities that don't have these fields |
| Preprocessing failed | CORE-000815, CORE-000875 | Rules requiring StudyCohort dataset for studies without cohorts defined |
Rules CORE-000945 and CORE-001069 validate XHTML content against the USDM XHTML schema. Common validation errors include:
| Error Type | Example |
|---|---|
| Invalid attribute | Element 'ol', attribute 'type': The attribute 'type' is not allowed |
| Unexpected element | Element 'style': This element is not expected |
| Tag mismatch | Opening and ending tag mismatch: p line 33 and td |
Set the environment variable:
export CDISC_API_KEY="your-key"The utility suppresses most output by default. Use --verbose only when debugging.
Ensure your USDM file's codeSystemVersion values correspond to published CT packages. Check the "CT packages loaded" line in the output.
The first run downloads rules and CT packages from CDISC Library. Subsequent runs use cached data.
This section documents everything needed to embed USDM CORE validation into another Python program.
import io
import json
import logging
import os
import sys
import urllib.request
from pathlib import Path
# CRITICAL: Map API key BEFORE importing the engine
if "CDISC_API_KEY" in os.environ and "CDISC_LIBRARY_API_KEY" not in os.environ:
os.environ["CDISC_LIBRARY_API_KEY"] = os.environ["CDISC_API_KEY"]
# CRITICAL: Disable logging BEFORE importing the engine
# The engine attaches a ConsoleLogger to the root logger on import.
# logging.getLogger().setLevel() does NOT work - it must be disabled globally.
logging.disable(logging.CRITICAL)
import cdisc_rules_engine
from cdisc_rules_engine.config import config
from cdisc_rules_engine.services.cache import CacheServiceFactory
from cdisc_rules_engine.rules_engine import RulesEngine
from cdisc_rules_engine.utilities.utils import get_rules_cache_key
from cdisc_rules_engine.services.cdisc_library_service import CDISCLibraryService
from cdisc_rules_engine.constants.cache_constants import PUBLISHED_CT_PACKAGES
from cdisc_rules_engine.models.library_metadata_container import LibraryMetadataContainerThe rules engine resolves resource paths (JSONata functions, XSD schemas) relative to os.getcwd(). It expects a resources/ directory in the current working directory. The pip-installed package places resources under the site-packages root:
_CDISC_PACKAGE_DIR = Path(cdisc_rules_engine.__file__).parent.parent
# e.g. /path/to/site-packages/
# You MUST chdir before running validation:
original_cwd = os.getcwd()
os.chdir(_CDISC_PACKAGE_DIR)
try:
# ... run validation ...
finally:
os.chdir(original_cwd)
# File paths passed to the engine must be absolute because of the chdir:
abs_path = os.path.abspath(file_path)The engine produces enormous stdout/stderr output (progress bars, debug prints, library logs). You must suppress it:
class SuppressOutput:
"""Context manager to suppress stdout/stderr from the rules engine."""
def __init__(self, suppress=True):
self.suppress = suppress
self._stdout = self._stderr = self._devnull = None
def __enter__(self):
if self.suppress:
self._devnull = io.StringIO()
self._stdout, self._stderr = sys.stdout, sys.stderr
sys.stdout = sys.stderr = self._devnull
return self
def __exit__(self, *args):
if self.suppress:
sys.stdout, sys.stderr = self._stdout, self._stderr
self._devnull.close()This is the most subtle gotcha. LibraryMetadataContainer has TWO separate mechanisms:
published_ct_packages- a list of package name strings (e.g.["sdtmct-2025-09-26", ...]). Used to validate thatcodeSystemVersiondates are valid published dates.ct_package_metadata- a dict mapping package names to actual codelist content. Used to validate that codes/decodes are correct within a codelist.
If you only set published_ct_packages, date validation works but code/decode validation silently fails, producing false positives (marking valid codes as errors).
# Step 1: Get the full list of published CT packages
library_service = CDISCLibraryService(api_key, cache)
packages = library_service.get_all_ct_packages()
ct_packages = [p.get("href", "").split("/")[-1] for p in packages]
cache.add(PUBLISHED_CT_PACKAGES, ct_packages)
# Step 2: Determine which CT versions the USDM file actually uses
def get_ct_versions_from_usdm(usdm_data: dict) -> set:
versions = set()
def extract(obj):
if isinstance(obj, dict):
if "codeSystemVersion" in obj:
versions.add(obj["codeSystemVersion"])
for v in obj.values(): extract(v)
elif isinstance(obj, list):
for item in obj: extract(item)
extract(usdm_data)
return versions
# Step 3: Load actual codelist data for each needed version
# USDM uses TWO CT package types: sdtmct and ddfct
ct_package_metadata = {}
for ct_version in get_ct_versions_from_usdm(usdm_data):
for ct_type in ["sdtmct", "ddfct"]:
package_name = f"{ct_type}-{ct_version}"
if package_name in ct_packages:
data = library_service.get_codelist_terms_map(package_name)
if data:
ct_package_metadata[package_name] = data
# Step 4: Create the container with BOTH
library_metadata = LibraryMetadataContainer(
published_ct_packages=ct_packages,
ct_package_metadata=ct_package_metadata,
)Three sets of resources are NOT included in the pip package and must be downloaded from the CDISC Rules Engine GitHub repository on first use:
1. JSONata custom functions (2 files):
resources/jsonata/parse_refs.jsonata
resources/jsonata/sift_tree.jsonata
Source: https://raw.githubusercontent.com/cdisc-org/cdisc-rules-engine/main/resources/jsonata/
2. USDM XHTML schema (3 files):
resources/schema/xml/cdisc-usdm-xhtml-1.0/usdm-xhtml-1.0.xsd
resources/schema/xml/cdisc-usdm-xhtml-1.0/usdm-xhtml-extension.xsd
resources/schema/xml/cdisc-usdm-xhtml-1.0/usdm-xhtml-ns.xsd
3. XHTML 1.1 schema (70+ files - ALL are required):
resources/schema/xml/xhtml-1.1/*.xsd
resources/schema/xml/xhtml-1.1/*.ent
Source: https://raw.githubusercontent.com/cdisc-org/cdisc-rules-engine/main/resources/schema/xml/xhtml-1.1/
WARNING: The XHTML 1.1 schema uses xs:redefine and xs:include extensively. If ANY file is missing, the schema parser fails with errors like Failed to load the document 'xhtml-hypertext-1.xsd' for redefinition. You must download ALL files.
These are downloaded into the site-packages directory (alongside the engine's existing resources/ directory). Check for existence before downloading to avoid re-downloading on every run.
# Initialize cache (in-memory by default)
cache = CacheServiceFactory(config).get_cache_service()
# Initialize the rules engine
rules_engine = RulesEngine(
cache=cache,
standard="usdm", # Must be "usdm"
standard_version="4-0", # "3-0" or "4-0"
dataset_paths=[abs_path], # List of absolute paths to USDM JSON files
library_metadata=library_metadata,
)
# Get datasets (the engine's internal representation of the USDM data)
datasets = rules_engine.data_service.get_datasets()
# Load rules from cache or download from CDISC Library
cache_key = get_rules_cache_key("usdm", "4-0")
rules = cache.get_all_by_prefix(cache_key)
if not rules:
result = library_service.get_rules_by_catalog("usdm", "4-0")
rules = result.get("rules", [])
cache_key = result.get("key_prefix", cache_key)
for rule in rules:
cache.add(f"{cache_key}/{rule.get('core_id', 'unknown')}", rule)
# Execute each rule
for rule in rules:
rule_id = rule.get("core_id", "unknown")
try:
rule_results = rules_engine.validate_single_rule(rule, datasets)
# rule_results is a dict: {dataset_name: result_dict_or_list}
except Exception:
# Some rules crash with JSONata/NoneType errors
passEach rule downloaded from CDISC Library has this structure:
{
"core_id": "CORE-000996", # Unique rule identifier
"description": "A planned sex ...", # Human-readable rule description
"rule_type": "Record Data", # Type of rule
"executability": "fully executable", # Whether the rule can be run
"status": "Published", # Publication status
"entities": { # Which USDM entities the rule applies to
"Include": ["StudyDesignPopulation"]
},
"conditions": { ... }, # Rule logic (all/any/operator tree)
"actions": [ # What to do on failure
{
"name": "generate_dataset_error_objects",
"params": {
"message": "The planned sex includes..." # Error message text
}
}
],
"datasets": [ ... ], # Optional: datasets to join (e.g. StudyCohort)
"output_variables": [ ... ], # Fields included in error output
"authorities": [ ... ], # Standards references
"standards": [
{"Name": "USDM", "Version": "4.0"}
]
}Key fields for integration:
rule.get("description")- the rule description (what it checks)rule.get("actions", [{}])[0].get("params", {}).get("message", "")- the error message (what it reports on failure)rule.get("core_id")- the unique rule ID
validate_single_rule() returns a dict. The values are lists or dicts containing:
{
"errors": [ # List of error dicts
{
"value": { # The data that triggered the error
"instanceType": "StudyDesignPopulation",
"id": "StudyDesignPopulation_1",
"path": "/study/versions/0/studyDesigns/0/population",
"name": "POP1",
# ... other fields from output_variables
},
"dataset": "StudyDesignPopulation",
"entity": "StudyDesignPopulation",
"instance_id": "StudyDesignPopulation_1",
"path": "/study/versions/0/studyDesigns/0/population"
}
],
"message": "..." # May or may not be present
}Execution error dicts (not data issues - rule doesn't apply) have a different shape:
# "Column not found" - rule checks fields that don't exist on the entity
{
"error": "Column not found in data",
"dataset": "Code.json",
"row": 1,
...
}
# "Preprocessing failed" - rule requires a dataset not in the USDM file
{
"error": "Error occurred during dataset preprocessing",
"message": "Failed to find related dataset for 'StudyCohort' in preprocessor",
"dataset": "StudyDesignPopulation.json"
}This is essential for clean output. The error field distinguishes execution errors from real findings:
def _is_execution_error(error: dict) -> bool:
if isinstance(error, dict):
error_type = error.get("error", "")
if error_type == "Column not found in data":
return True
if error_type == "Error occurred during dataset preprocessing":
return True
return FalseKnown execution error rules:
| Rule | Error Type | Reason |
|---|---|---|
| CORE-000414 | Column not found | Checks nextId on ALL entities; only exists on ordered types |
| CORE-000416 | Column not found | Checks previousId on ALL entities |
| CORE-000419 | Column not found | Checks nextId on ALL entities |
| CORE-000849 | Column not found | Checks nextId/previousId on ALL entities |
| CORE-000815 | Preprocessing failed | Requires StudyCohort dataset join |
| CORE-000875 | Preprocessing failed | Requires StudyCohort dataset join |
These rules crash the engine with unrecoverable errors:
EXCLUDED_RULES = {
"CORE-000955", # JSONata bug - crashes with NoneType error
"CORE-000956", # JSONata bug - crashes with NoneType error
}from cdisc_rules_engine.services.cache.cache_populator_service import CachePopulator fails with ModuleNotFoundError: No module named 'scripts'. The CachePopulator class imports from a scripts module that is part of the CLI tool, not the pip package. Use CDISCLibraryService methods directly instead.
import io, json, logging, os, sys, urllib.request
from pathlib import Path
if "CDISC_API_KEY" in os.environ and "CDISC_LIBRARY_API_KEY" not in os.environ:
os.environ["CDISC_LIBRARY_API_KEY"] = os.environ["CDISC_API_KEY"]
logging.disable(logging.CRITICAL)
import cdisc_rules_engine
from cdisc_rules_engine.config import config
from cdisc_rules_engine.services.cache import CacheServiceFactory
from cdisc_rules_engine.rules_engine import RulesEngine
from cdisc_rules_engine.utilities.utils import get_rules_cache_key
from cdisc_rules_engine.services.cdisc_library_service import CDISCLibraryService
from cdisc_rules_engine.constants.cache_constants import PUBLISHED_CT_PACKAGES
from cdisc_rules_engine.models.library_metadata_container import LibraryMetadataContainer
CDISC_PKG_DIR = Path(cdisc_rules_engine.__file__).parent.parent
EXCLUDED = {"CORE-000955", "CORE-000956"}
def validate(usdm_path: str, version: str = "4-0") -> list[dict]:
"""
Validate a USDM JSON file. Returns list of dicts:
[{"rule_id", "description", "message", "errors": [...]}]
"""
abs_path = os.path.abspath(usdm_path)
with open(abs_path) as f:
usdm_data = json.load(f)
original_cwd = os.getcwd()
os.chdir(CDISC_PKG_DIR)
_stdout, _stderr = sys.stdout, sys.stderr
sys.stdout = sys.stderr = io.StringIO()
try:
# Download resources if needed (JSONata + XSD schemas)
_setup_resources()
cache = CacheServiceFactory(config).get_cache_service()
api_key = os.environ.get("CDISC_LIBRARY_API_KEY", "")
library_service = CDISCLibraryService(api_key, cache)
# Load CT packages
packages = library_service.get_all_ct_packages()
ct_packages = [p.get("href", "").split("/")[-1] for p in packages]
cache.add(PUBLISHED_CT_PACKAGES, ct_packages)
# Load CT data for versions used in file
ct_metadata = {}
versions = _extract_ct_versions(usdm_data)
for v in versions:
for ct_type in ["sdtmct", "ddfct"]:
pkg = f"{ct_type}-{v}"
if pkg in ct_packages:
data = library_service.get_codelist_terms_map(pkg)
if data:
ct_metadata[pkg] = data
metadata = LibraryMetadataContainer(
published_ct_packages=ct_packages,
ct_package_metadata=ct_metadata,
)
engine = RulesEngine(
cache=cache, standard="usdm", standard_version=version,
dataset_paths=[abs_path], library_metadata=metadata,
)
datasets = engine.data_service.get_datasets()
# Load rules
cache_key = get_rules_cache_key("usdm", version)
rules = cache.get_all_by_prefix(cache_key)
if not rules:
result = library_service.get_rules_by_catalog("usdm", version)
rules = result.get("rules", [])
pfx = result.get("key_prefix", cache_key)
for r in rules:
cache.add(f"{pfx}/{r.get('core_id')}", r)
# Execute rules and collect findings
findings = []
for rule in rules:
rid = rule.get("core_id", "")
if rid in EXCLUDED:
continue
desc = rule.get("description", "")
msg = ""
actions = rule.get("actions", [])
if actions:
msg = actions[0].get("params", {}).get("message", "")
try:
results = engine.validate_single_rule(rule, datasets)
errors = _extract_errors(results)
if errors:
findings.append({
"rule_id": rid, "description": desc,
"message": msg, "errors": errors,
})
except Exception:
pass
return findings
finally:
sys.stdout, sys.stderr = _stdout, _stderr
os.chdir(original_cwd)
def _extract_ct_versions(data):
versions = set()
def walk(obj):
if isinstance(obj, dict):
if "codeSystemVersion" in obj:
versions.add(obj["codeSystemVersion"])
for v in obj.values(): walk(v)
elif isinstance(obj, list):
for i in obj: walk(i)
walk(data)
return versions
def _extract_errors(results):
"""Extract real validation errors, filtering out execution errors."""
real = []
for val in (results or {}).values():
items = val if isinstance(val, list) else [val]
for item in items:
if not isinstance(item, dict):
continue
for err in item.get("errors", []):
if isinstance(err, dict):
etype = err.get("error", "")
if etype in ("Column not found in data",
"Error occurred during dataset preprocessing"):
continue
real.append(err)
return real
def _setup_resources():
"""Download JSONata + XSD resources if not already present."""
# JSONata
jdir = CDISC_PKG_DIR / "resources" / "jsonata"
if not (jdir.exists() and any(jdir.glob("*.jsonata"))):
jdir.mkdir(parents=True, exist_ok=True)
for name in ["parse_refs.jsonata", "sift_tree.jsonata"]:
url = f"https://raw.githubusercontent.com/cdisc-org/cdisc-rules-engine/main/resources/jsonata/{name}"
urllib.request.urlretrieve(url, jdir / name)
# XSD schemas (check for USDM XHTML schema as sentinel)
xdir = CDISC_PKG_DIR / "resources" / "schema" / "xml"
sentinel = xdir / "cdisc-usdm-xhtml-1.0" / "usdm-xhtml-1.0.xsd"
if not sentinel.exists():
base = "https://raw.githubusercontent.com/cdisc-org/cdisc-rules-engine/main/resources/schema/xml"
# See USDM_XHTML_SCHEMA_FILES and XHTML_SCHEMA_FILES constants
# in usdm_validate.py for the full list of 70+ files
for subdir in ["cdisc-usdm-xhtml-1.0", "xhtml-1.1"]:
(xdir / subdir).mkdir(parents=True, exist_ok=True)
for path in USDM_XHTML_SCHEMA_FILES + XHTML_SCHEMA_FILES:
try:
urllib.request.urlretrieve(f"{base}/{path}", xdir / path)
except Exception:
passThe USDM XHTML schemas (3 files):
cdisc-usdm-xhtml-1.0/usdm-xhtml-1.0.xsd
cdisc-usdm-xhtml-1.0/usdm-xhtml-extension.xsd
cdisc-usdm-xhtml-1.0/usdm-xhtml-ns.xsd
The XHTML 1.1 schemas (all required due to xs:redefine/xs:include chains):
xhtml-1.1/aria-attributes-1.xsd
xhtml-1.1/xframes-1.xsd
xhtml-1.1/xhtml-access-1.xsd
xhtml-1.1/xhtml-applet-1.xsd
xhtml-1.1/xhtml-attribs-1.xsd
xhtml-1.1/xhtml-base-1.xsd
xhtml-1.1/xhtml-basic-form-1.xsd
xhtml-1.1/xhtml-basic-table-1.xsd
xhtml-1.1/xhtml-basic10-model-1.xsd
xhtml-1.1/xhtml-basic10-modules-1.xsd
xhtml-1.1/xhtml-basic10.xsd
xhtml-1.1/xhtml-basic11-model-1.xsd
xhtml-1.1/xhtml-basic11-modules-1.xsd
xhtml-1.1/xhtml-basic11.xsd
xhtml-1.1/xhtml-bdo-1.xsd
xhtml-1.1/xhtml-blkphras-1.xsd
xhtml-1.1/xhtml-blkpres-1.xsd
xhtml-1.1/xhtml-blkstruct-1.xsd
xhtml-1.1/xhtml-charent-1.xsd
xhtml-1.1/xhtml-csismap-1.xsd
xhtml-1.1/xhtml-datatypes-1.xsd
xhtml-1.1/xhtml-edit-1.xsd
xhtml-1.1/xhtml-events-1.xsd
xhtml-1.1/xhtml-form-1.xsd
xhtml-1.1/xhtml-frames-1.xsd
xhtml-1.1/xhtml-framework-1.xsd
xhtml-1.1/xhtml-hypertext-1.xsd
xhtml-1.1/xhtml-iframe-1.xsd
xhtml-1.1/xhtml-image-1.xsd
xhtml-1.1/xhtml-inlphras-1.xsd
xhtml-1.1/xhtml-inlpres-1.xsd
xhtml-1.1/xhtml-inlstruct-1.xsd
xhtml-1.1/xhtml-inlstyle-1.xsd
xhtml-1.1/xhtml-inputmode-1.xsd
xhtml-1.1/xhtml-lat1.ent
xhtml-1.1/xhtml-legacy-1.xsd
xhtml-1.1/xhtml-legacy-redecl-1.xsd
xhtml-1.1/xhtml-link-1.xsd
xhtml-1.1/xhtml-list-1.xsd
xhtml-1.1/xhtml-meta-1.xsd
xhtml-1.1/xhtml-metaAttributes-1.xsd
xhtml-1.1/xhtml-misc-1.xsd
xhtml-1.1/xhtml-mobile10-model-1.xsd
xhtml-1.1/xhtml-mobile10.xsd
xhtml-1.1/xhtml-nameident-1.xsd
xhtml-1.1/xhtml-notations-1.xsd
xhtml-1.1/xhtml-object-1.xsd
xhtml-1.1/xhtml-param-1.xsd
xhtml-1.1/xhtml-pres-1.xsd
xhtml-1.1/xhtml-print-model-1.xsd
xhtml-1.1/xhtml-print.xsd
xhtml-1.1/xhtml-rdfa-1.xsd
xhtml-1.1/xhtml-rdfa-model-1.xsd
xhtml-1.1/xhtml-rdfa-modules-1.xsd
xhtml-1.1/xhtml-role-1.xsd
xhtml-1.1/xhtml-role-attrib-1.xsd
xhtml-1.1/xhtml-ruby-1.xsd
xhtml-1.1/xhtml-script-1.xsd
xhtml-1.1/xhtml-simple-1.xsd
xhtml-1.1/xhtml-special.ent
xhtml-1.1/xhtml-ssismap-1.xsd
xhtml-1.1/xhtml-struct-1.xsd
xhtml-1.1/xhtml-style-1.xsd
xhtml-1.1/xhtml-symbol.ent
xhtml-1.1/xhtml-table-1.xsd
xhtml-1.1/xhtml-target-1.xsd
xhtml-1.1/xhtml-text-1.xsd
xhtml-1.1/xhtml-uri-1.xsd
xhtml-1.1/xhtml1-frameset.xsd
xhtml-1.1/xhtml1-strict.xsd
xhtml-1.1/xhtml1-transitional.xsd
xhtml-1.1/xhtml11-flat.xsd
xhtml-1.1/xhtml11-model-1.xsd
xhtml-1.1/xhtml11-modules-1.xsd
xhtml-1.1/xhtml11.xsd
xhtml-1.1/xml.xsd
cdisc-rules-engine- CDISC CORE validation enginejsonata- JSONata expression evaluator (transitive)- Standard library:
argparse,json,logging,os,sys,io,urllib
This utility is provided as-is for validating USDM files against CDISC conformance rules.