From 14018be873e1ce95220f8ec43b847aa06a43bd85 Mon Sep 17 00:00:00 2001 From: Gerry Campion Date: Thu, 13 Nov 2025 12:04:21 -0500 Subject: [PATCH 1/2] reorganize readme --- README.md | 260 +++++++++++++++++++++++++++--------------------------- 1 file changed, 128 insertions(+), 132 deletions(-) diff --git a/README.md b/README.md index b2070e719..963240e9c 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,20 @@ -### Supported python versions +[![](https://www.cdisc.org/themes/custom/cdiscd8/logo.svg)](https://www.cdisc.org) -[![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/release/python-3120) - -### Windows Command Compatibility - -Note: The Windows commands provided in this README are written for PowerShell. While most commands are compatible with both PowerShell and Command Prompt, some adjustments may be necessary when using Command Prompt. If you encounter any issues running these commands in Command Prompt, try using PowerShell or consult the Command Prompt documentation for equivalent commands. +[![](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/release/python-3120) [![](https://img.shields.io/pypi/v/cdisc-rules-engine.svg)](https://pypi.org/project/cdisc-rules-engine) [![](https://img.shields.io/docker/v/cdiscdocker/cdisc-rules-engine?label=docker)](https://hub.docker.com/r/cdiscdocker/cdisc-rules-engine) # cdisc-rules-engine Open source offering of the CDISC Rules Engine, a tool designed for validating clinical trial data against data standards. -To learn more, visit our official CDISC website or for other implementation options, see our DockerHub repository: -
-[CDISC Website](https://www.cdisc.org/) -
-[CDISC Rules Engine on DockerHub](https://hub.docker.com/repository/docker/cdiscdocker/cdisc-rules-engine/general) -### **Quick start** +## Quick start -To quickly get up and running with CORE, users can download the latest executable version of the engine for their operating system from here: +Note: The Windows commands provided in this README are written for PowerShell. While most commands are compatible with both PowerShell and Command Prompt, some adjustments may be necessary when using Command Prompt. If you encounter any issues running these commands in Command Prompt, try using PowerShell or consult the Command Prompt documentation for equivalent commands. + +To quickly get up and running with CORE, users can download the latest executable version of the engine for their operating system from the [Releases](https://github.com/cdisc-org/cdisc-rules-engine/releases) Once downloaded, simply unzip the file and run the following command based on your Operating System: -Windows: +### Windows: ``` .\core.exe validate -s -v -d path/to/datasets @@ -29,7 +22,7 @@ Windows: # ex: .\core.exe validate -s sdtmig -v 3-4 -d .\xpt\ ``` -Linux/Mac: +### Linux/Mac: ``` ./core validate -s -v -d path/to/datasets @@ -50,68 +43,9 @@ Linux/Mac: > chmod +x ./core > ``` -### **Command-line Interface** - -### **Getting Started** - -In the terminal, navigate to the directory you intend to install CORE rules engine in - -1. Clone the repository: - - ``` - git clone https://github.com/cdisc-org/cdisc-rules-engine - ``` - -2. Ensure you have Python 3.12 installed: - You can check your Python version with: - ``` - python --version - ``` - If you don't have Python 3.12, please download and install it from [python.org](https://www.python.org/downloads/) or using your system's package manager. - -### **Code formatter** +## Command-line Interface -This project uses the `black` code formatter, `flake8` linter for python and `prettier` for JSON, YAML and MD. -It also uses `pre-commit` to run `black`, `flake8` and `prettier` when you commit. -Both dependencies are added to _requirements-dev.txt_. - -**Required** - -Setting up `pre-commit` requires one extra step. After installing it you have to run - -`pre-commit install` - -This installs `pre-commit` in your `.git/hooks` directory. - -### **Installing dependencies** - -These steps should be run before running any tests or core commands using the non compiled version. - -- Create a virtual environment: - - `python -m venv ` - -NOTE: if you have multiple versions of python on your machine, you can call python 3.12 for the virtual environment's creation instead of the above command: -`python3.12 -m venv ` - -- Activate the virtual environment: - -`.//bin/activate` -- on linux/mac
-`.\\Scripts\Activate` -- on windows - -- Install the requirements. - -`python -m pip install -r requirements-dev.txt` # From the root directory - -### **Running The Tests** - -From the root of the project run the following command (this will run both the unit and regression tests): - -`python -m pytest tests` - -### **Running a validation** - -#### From the command line +### Running a validation (`validate`) Clone the repository and run `python core.py --help` to see the full list of commands. @@ -213,7 +147,7 @@ Run `python core.py validate --help` to see the list of validation options. --help Show this message and exit. ``` -##### Available log levels +#### Available log levels - `debug` - Display all logs - `info` - Display info, warnings, and error logs @@ -221,7 +155,7 @@ Run `python core.py validate --help` to see the list of validation options. - `error` - Display only error logs - `critical` - Display critical logs -##### **Validate folder** +#### Validate folder To validate a folder using rules for SDTM-IG version 3.4 use the following command: @@ -243,9 +177,10 @@ CORE supports the following dataset file formats for validation: - Define-XML files should be provided via the `--define-xml-path` (or `-dxp`) option, not through the dataset directory (`-d` or `-dp`). - If you point to a folder containing unsupported file formats, CORE will display an error message indicating which formats are supported. -##### **Validate single rule** +#### Validate single rule `python core.py validate -s sdtmig -v 3-4 -dp -lr --meddra ./meddra/ --whodrug ./whodrug/` + Note: JSON dataset should match the format provided by the rule editor: ```json @@ -271,7 +206,7 @@ Note: JSON dataset should match the format provided by the rule editor: } ``` -##### **Understanding the Rules Report** +#### **Understanding the Rules Report** The rules report tab displays the run status of each rule selected for validation @@ -280,9 +215,45 @@ The possible rule run statuses are: - `SUCCESS` - The rule ran and data was validated against the rule. May or may not produce results - `SKIPPED` - The rule was unable to be run. Usually due to missing required data, but could also be cause by rule execution errors. -# Additional Core Commands +#### Setting DATASET_SIZE_THRESHOLD for Large Datasets + +The CDISC Rules Engine respects the `DATASET_SIZE_THRESHOLD` environment variable to determine when to use Dask for large dataset processing. Setting this to 0 coerces Dask usage over Pandas. A .env in the root directory with this variable set will cause this implementation coercion for the CLI. This can also be done with the executable releases via multiple methods: + +##### Windows (Command Prompt) + +```cmd +set DATASET_SIZE_THRESHOLD=0 && core.exe validate -rest -of -config -commands +``` -**- update-cache** - update locally stored cache data (Requires an environment variable - `CDISC_LIBRARY_API_KEY`) This is stored in the .env folder in the root directory, the API key does not need quotations around it. +##### Windows (PowerShell) + +```powershell +$env:DATASET_SIZE_THRESHOLD=0; core.exe validate -rest -of -config -commands +``` + +##### Linux/Mac (Bash) + +```bash +DATASET_SIZE_THRESHOLD=0 ./core -rest -of -config -commands +``` + +##### .env File (Alternative) + +Create a `.env` file in the root directory of the release containing: + +``` +DATASET_SIZE_THRESHOLD=0 +``` + +Then run normally: `core.exe validate -rest -of -config -commands + +--- + +**Note:** Setting `DATASET_SIZE_THRESHOLD=0` tells the engine to use Dask processing for all datasets regardless of size, size threshold defaults to 1/4 of available RAM so datasets larger than this will use Dask. See env.example to see what the CLI .env file should look like + +### Updating the Cache (`update-cache`) + +Update locally stored cache data (Requires an environment variable - `CDISC_LIBRARY_API_KEY`) This is stored in the .env folder in the root directory, the API key does not need quotations around it. ```bash python core.py update-cache @@ -292,14 +263,14 @@ The possible rule run statuses are: To obtain an api key, please follow the instructions found here: . Please note it can take up to an hour after sign up to have an api key issued -# Custom Standards and Rules +##### Custom Standards and Rules -## Custom Rules Management +###### Custom Rules Management - **Custom rules** are stored in a flat file in the cache, indexed by their core ID (e.g., 'COMPANY-000123' or 'CUSTOM-000123'). - Each rule is stored independently in this file, allowing for efficient lookup and management. -## Custom Standards Management +###### Custom Standards Management - **Custom standards** act as a lookup mechanism that maps a standard identifier to a list of applicable rule IDs. - When adding a custom standard, you need to provide a JSON file with the following structure: @@ -345,13 +316,13 @@ To obtain an api key, please follow the instructions found here: ` + +NOTE: if you have multiple versions of python on your machine, you can call python 3.12 for the virtual environment's creation instead of the above command: +`python3.12 -m venv ` + +- Activate the virtual environment: + +`.//bin/activate` -- on linux/mac
+`.\\Scripts\Activate` -- on windows + +- Install the requirements. + +`python -m pip install -r requirements-dev.txt` # From the root directory + +### Creating an executable version **Note:** Further directions to create your own executable are contained in [README_Build_Executable.md](README_Build_Executable.md) if you wish to build an unofficial release executable for your own use. @@ -467,7 +481,7 @@ _Note .venv should be replaced with path to python installation or virtual envir This will create an executable version in the `dist` folder. The version does not require having Python installed and can be launched by running `core` script with all necessary CLI arguments. -### **Creating .whl file** +### Creating .whl file All non-python files should be listed in `MANIFEST.in` to be included in the distribution. Files must be in python package. @@ -498,58 +512,40 @@ To upload built distributive to pypi `py -m pip install --upgrade twine` `py -m twine upload --repository {repository_name} dist/*` -## Submit an Issue - -If you encounter any bugs, have feature requests, or need assistance, please submit an issue on our GitHub repository: - -[https://github.com/cdisc-org/cdisc-rules-engine/issues](https://github.com/cdisc-org/cdisc-rules-engine/issues) - -When submitting an issue, please include: - -- A clear description of the problem or request -- Steps to reproduce the issue (for bugs) -- Your operating system and environment details -- Any relevant logs or error messages - -# Setting DATASET_SIZE_THRESHOLD for Large Datasets - -The CDISC Rules Engine respects the `DATASET_SIZE_THRESHOLD` environment variable to determine when to use Dask for large dataset processing. Setting this to 0 coerces Dask usage over Pandas. A .env in the root directory with this variable set will cause this implementation coercion for the CLI. This can also be done with the executable releases via multiple methods: +## Contributing -## Quick Commands +### Code formatter -### Windows (Command Prompt) +This project uses the `black` code formatter, `flake8` linter for python and `prettier` for JSON, YAML and MD. +It also uses `pre-commit` to run `black`, `flake8` and `prettier` when you commit. +Both dependencies are added to _requirements-dev.txt_. -```cmd -set DATASET_SIZE_THRESHOLD=0 && core.exe validate -rest -of -config -commands -``` +Setting up `pre-commit` requires one extra step. After installing it you have to run -### Windows (PowerShell) +`pre-commit install` -```powershell -$env:DATASET_SIZE_THRESHOLD=0; core.exe validate -rest -of -config -commands -``` +This installs `pre-commit` in your `.git/hooks` directory. -### Linux/Mac (Bash) +### Running The Tests -```bash -DATASET_SIZE_THRESHOLD=0 ./core -rest -of -config -commands -``` +From the root of the project run the following command (this will run both the unit and regression tests): -## .env File (Alternative) +`python -m pytest tests` -Create a `.env` file in the root directory of the release containing: +### Submit an Issue -``` -DATASET_SIZE_THRESHOLD=0 -``` +If you encounter any bugs, have feature requests, or need assistance, please submit an issue on our GitHub repository: -Then run normally: `core.exe validate -rest -of -config -commands +[https://github.com/cdisc-org/cdisc-rules-engine/issues](https://github.com/cdisc-org/cdisc-rules-engine/issues) ---- +When submitting an issue, please include: -**Note:** Setting `DATASET_SIZE_THRESHOLD=0` tells the engine to use Dask processing for all datasets regardless of size, size threshold defaults to 1/4 of available RAM so datasets larger than this will use Dask. See env.example to see what the CLI .env file should look like +- A clear description of the problem or request +- Steps to reproduce the issue (for bugs) +- Your operating system and environment details +- Any relevant logs or error messages -## Updating USDM JSON Schema +### Updating USDM JSON Schema Currently, the engine supports USDM JSON Schema validation against versions 3.0 and 4.0. The schema definition files are located at: From 83f4bb3c8e561d2716602ef1e1a1b34bc4a35f41 Mon Sep 17 00:00:00 2001 From: Gerry Campion Date: Thu, 13 Nov 2025 13:00:23 -0500 Subject: [PATCH 2/2] cli clarification --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 963240e9c..0238a505a 100644 --- a/README.md +++ b/README.md @@ -45,6 +45,8 @@ Once downloaded, simply unzip the file and run the following command based on yo ## Command-line Interface +**Note**: the following examples are applicable to the source code and have references to "`python core.py`". When using the executable version as described in the [Quick Start](#quick-start) above, instances of "`python cored.py`" should be replaced with "`.\core.exe`" (Windows) or "`./core`" (Linux/Mac). You can also run directly on the source code by following the [Cloning](#cloning) instructions. + ### Running a validation (`validate`) Clone the repository and run `python core.py --help` to see the full list of commands.