#1616: User-friendly errors for missing/wrong Datasets tab and columns, standard validation, and library metadata lookup failures by RakeshBobba03 · Pull Request #1626 · cdisc-org/cdisc-rules-engine

RakeshBobba03 · 2026-02-18T21:33:57Z

This PR improves error handling and messages for test data and standard/version usage (issue #1616 ).

Datasets: When the Excel test data workbook is missing the "Datasets" sheet or required column headers (e.g. "Filename", "Label"), the engine now raises a clear, user-facing error (e.g. ExcelTestDataError or InvalidDatasetFormat) that explains what is missing and that names are case-sensitive, instead of a generic KeyError such as "one or more datasets missing the following keys {'label'}". The API (TestRule) validates the incoming datasets payload and returns BadRequestError with guidance when required keys are missing. Reader calls in the validation script are wrapped so that malformed or unreadable data files produce a single, consistent error instead of raw exceptions.

Standard and library metadata: At the start of validation in core.py, the standard is checked against the supported list (StandardTypes enum); invalid values produce a clear error and point users to the --custom-standard flag. The --custom-standard flag correctly skips this check. When the standard/version combination has no library metadata (e.g. invalid Library tab or CLI arguments), the library lookup is wrapped in a dedicated catch that raises a specific error (LibraryMetadataNotFoundError) with a message that explains the invalid Standard/Version combination and hints at checking the Library tab or CLI arguments.

…s, standard validation, and library metadata lookup failures

…data is absent in cache

SFJohnson24 · 2026-02-20T16:47:41Z

TestRule/__init__.py

-            f"one or more datasets missing the following keys {missing_keys}"
+        missing_list = sorted(missing_keys)
+        raise BadRequestError(
+            f"Test data is missing required dataset properties: {missing_list}. "


reporting the keys doesnt work--can we remove this logic. if the dataset workbook is missing, it just says the label is missing. Since filename and label match with the corresponding dataset workbooks, it doesnt work exactly as implemented. I do think checking and telling them about the Datasets works but the reporting logic with missing keys is not functioning correctly and could be skipped

The logic that reported which keys were missing is removed. We still check for required dataset properties and raise a single message that tells users to check the Datasets tab and column headers (case-sensitive), without listing key names.

SFJohnson24 · 2026-02-20T16:49:30Z

TestRule/__init__.py



+_DATASETS_TAB_GUIDANCE = (
+    "Make sure there is a 'Datasets' tab in your test data workbook (name is "


I am not wild about putting this at the top of the file since it is only needed by 1 function and only referenced once. This should be the error that on line 53.

The _DATASETS_TAB_GUIDANCE constant is removed. That text is now inlined in the single BadRequestError message in validate_datasets_payload (at the raise), so it’s only defined where it’s used.

cdisc_rules_engine/services/data_services/excel_data_service.py

SFJohnson24 · 2026-02-20T17:24:46Z

scripts/script_utils.py

    for file_name in ct_files:
        ct_version = file_name.split(".")[0]
        published_ct_packages.add(ct_version)
        if (


I believe we will need some logic potentially here to address point D from my comment when incorrect CT pacakages are given

Done. (1) In script_utils.get_library_metadata_from_cache, right after the CT loop, we now validate that every requested -ct package exists in the cache; if any are missing we raise CTPackageNotFoundError with the missing list, available packages, and an update-cache hint. (2) In run_single_rule_validation we added a check so that when codelists are requested but not found in the cache we raise CTPackageNotFoundError with a message to check the Library tab or codelist names (covers incorrect CT/standard/version from the editor). TestRule’s handle_exception returns 400 with the CT error message for both paths.

I just tested and your logic in get_library_metadata_from_cache actually catches bad CT from the define--can we revert the changes below?

SFJohnson24

this looks good-- can we address the comment about CT and the issue in TestRule. Tested well otherwise. Nice work

…ge validation (script_utils + run_single_rule_validation)

SFJohnson24 · 2026-02-25T18:43:45Z

cdisc_rules_engine/exceptions/custom_exceptions.py

    description = "Dataset data is malformed."


+INVALID_DATASET_FORMAT_REASON = (


both this and LIBRARY_METADATA_NOT_FOUND_HINT seem out of place in a file for exceptions. I understand they are reused but they also seem unnecessary. The exception is telling the user what went wrong, not how to fix it. Telling the user their data at a path cant be read and also that the standard/version they entered failed to find library metadata is enough without these constants.

SFJohnson24 · 2026-02-25T19:00:04Z

cdisc_rules_engine/services/data_services/excel_data_service.py

-        metadata = datasets_worksheet[
-            datasets_worksheet[DATASET_FILENAME_COLUMN] == dataset_name
-        ]
+        _worksheet = kwargs.get("_worksheet")


what is the purpose of this new worksheet argument and if/else handling? I am assuming this is to prevent repeat excel reads? We are already caching the result @cached_dataset(DatasetTypes.RAW_METADATA.value)

The _worksheet argument is there so we don’t re-read the same Excel sheet multiple times. In get_datasets() we parse the Datasets sheet once, then for each row we call get_raw_dataset_metadata(dataset_name, _worksheet=worksheet). Without passing the worksheet, each of those calls would read the sheet again before the cache is populated. The @cached_dataset decorator helps when the same dataset is requested again later, but on the first pass we’d still do N reads of the same sheet. Passing the already-parsed DataFrame avoids that. I can remove it and rely only on the cache if you’d prefer simpler code and are okay with that first-pass cost.

good catch--missed that. Could we clean up the implementation a little and utilize caching instead of an additional argument? Something like:

@functools.lru_cache(maxsize=None) def _get_datasets_worksheet(self) -> pd.DataFrame: return pd.read_excel( self.dataset_path, sheet_name=DATASETS_SHEET_NAME, na_values=[""], keep_default_na=False, ) ``` then call self._get_datasets_worksheet() inside get_raw_dataset_metadata.

SFJohnson24 · 2026-02-25T19:02:01Z

scripts/run_validation.py

        engine_logger.setLevel(log_level)


+def _get_datasets_or_raise(data_service):


this is a script helper and should be moved to ./script_utils.py

SFJohnson24 · 2026-02-25T19:09:02Z

scripts/script_utils.py

+                f"{CT_PACKAGE_NOT_FOUND_PREFIX} in cache: "
+                f"{', '.join(str(c) for c in sorted_missing)}. "
+                f"Available packages: {available}. "
+                "Run 'core.py update-cache' to refresh the cache."


I entered an incorrect CT and landed in this error but updating the cache will not resolve my incorrect CT package. Again, exceptions are for IDing what went wrong, can we please remove the hints for what the user needs to do to fix things.

SFJohnson24 · 2026-02-25T19:11:39Z

TestRule/__init__.py

-            f"one or more datasets missing the following keys {missing_keys}"
+        raise BadRequestError(
+            "Test data is missing required dataset properties. "
+            "This usually means the 'Datasets' sheet in your Excel file is missing "


'test data is incorrect and missing required formatting'

SFJohnson24 · 2026-02-25T19:12:22Z

cdisc_rules_engine/exceptions/custom_exceptions.py

+    code = 400
+    description = (
+        f"{CT_PACKAGE_NOT_FOUND_PREFIX} in cache. "
+        "Check package names and run 'core.py update-cache' if needed."


again please remove hints. telling them their CT is incorrect is enough

SFJohnson24 · 2026-02-25T19:33:07Z

scripts/run_validation.py

+    codelists = codelists or []
    for codelist in codelists:
        ct_package_metadata[codelist] = cache.get(codelist)
+    if codelists:


It seems I was mistaken about this. Bad CT in testrule gets caught [here ](https://github.com/cdisc-org/cdisc-rules-engine/blob/fb8a8d5f123beaeba254e98b24604bcfb5a7f6a4/TestRule/__init__.py#L105) and errors out so this codeblock is never used.

re: this comment--what is the intention of this codeblock? the CLI catches bad CT in get_library_metadata_from_cache() at line 165 in this file and editor never sends the request as it crashes when trying to populate the cache

cdisc-rules-engine/TestRule/__init__.py

Line 105 in fb8a8d5

asyncio.run(cache_populator.load_codelists(codelists))

here

scripts/script_utils.py

… _get_datasets_or_raise to script_utils, unify CT check after Define

…hub.com/cdisc-org/cdisc-rules-engine into 1616-datasets-library-validation-errors

SFJohnson24 · 2026-03-02T20:40:40Z

scripts/run_validation.py

-                dataset.full_path = new_file
-                dataset.record_count = num_rows
-                dataset.original_path = file_path
+        datasets = _get_datasets_or_raise(data_service)


I incorrectly said this should be in scripts--I apologize. Is there a reason this was changed and the raises were not just added to data_service.get_datasets() and instead function was made that is just get_datasets with a raise?

There wasn’t a strong reason for the wrapper; it was just the approach I took at the time. Your approach is better. I’ve removed _get_datasets_or_raise and now call data_service.get_datasets() directly. The raise logic lives in the data services: ExcelDataService.get_datasets() and LocalDataService.get_datasets() both use try/except and raise the appropriate errors (ExcelTestDataError / InvalidDatasetFormat). No wrapper anymore.

SFJohnson24

This looks good--just a few housekeeping things:

The logic in get_library_metadata_from_cache is quite robust and is getting the bad CT from the CLI and from define so we can revert some logic to keep our codebase clean.
caching over a new argument--that way the caller doesnt need to remember to give a workbook, the optimization is included in the call to the function
adding the raise to the original data_service function versus making a new function

…get_raw_dataset_metadata

…hub.com/cdisc-org/cdisc-rules-engine into 1616-datasets-library-validation-errors

SFJohnson24

PR correctly adds handling laid out in initial Issue

#1616: User-friendly errors for missing/wrong Datasets tab and column…

6ad14ad

…s, standard validation, and library metadata lookup failures

RakeshBobba03 temporarily deployed to DEV February 18, 2026 21:34 — with GitHub Actions Inactive

RakeshBobba03 added 2 commits February 18, 2026 16:55

Add USDM to StandardTypes enum

b2c8c88

Do not raise LibraryMetadataNotFoundError for USDM when standard meta…

b6205fe

…data is absent in cache

RakeshBobba03 linked an issue Feb 18, 2026 that may be closed by this pull request

For test data, better error message when the "Library" or "Datasets" tab, or the column headers within them are not found or malformed #1616

Closed

RakeshBobba03 marked this pull request as ready for review February 18, 2026 22:27

RakeshBobba03 requested review from RamilCDISC, SFJohnson24 and gerrycampion February 18, 2026 22:27

SFJohnson24 reviewed Feb 20, 2026

View reviewed changes

cdisc_rules_engine/services/data_services/excel_data_service.py Show resolved Hide resolved

SFJohnson24 reviewed Feb 20, 2026

View reviewed changes

SFJohnson24 requested changes Feb 20, 2026

View reviewed changes

RakeshBobba03 added 2 commits February 24, 2026 09:44

Merge branch 'main' into 1616-datasets-library-validation-errors

8daff55

Remove datasets key reporting, inline Datasets guidance, add CT packa…

dc76434

…ge validation (script_utils + run_single_rule_validation)

RakeshBobba03 requested a review from SFJohnson24 February 24, 2026 17:02

SFJohnson24 reviewed Feb 25, 2026

View reviewed changes

SFJohnson24 requested changes Feb 25, 2026

View reviewed changes

SFJohnson24 reviewed Feb 25, 2026

View reviewed changes

scripts/script_utils.py Outdated Show resolved Hide resolved

RakeshBobba03 added 3 commits February 25, 2026 15:18

Merge branch 'main' into 1616-datasets-library-validation-errors

14f1395

Merge branch 'main' into 1616-datasets-library-validation-errors

34286bc

Improve Library/Datasets/CT validation errors; remove hint text, move…

cc22934

… _get_datasets_or_raise to script_utils, unify CT check after Define

RakeshBobba03 requested a review from SFJohnson24 February 26, 2026 19:37

Merge branch '1616-datasets-library-validation-errors' of https://git…

1d5f6d2

…hub.com/cdisc-org/cdisc-rules-engine into 1616-datasets-library-validation-errors

SFJohnson24 reviewed Mar 2, 2026

View reviewed changes

SFJohnson24 requested changes Mar 2, 2026

View reviewed changes

RakeshBobba03 added 2 commits March 5, 2026 10:22

Merge branch 'main' into 1616-datasets-library-validation-errors

9208207

Remove dataset wrapper, centralize CT check, add cached_worksheet to …

90ecaba

…get_raw_dataset_metadata

RakeshBobba03 requested a review from SFJohnson24 March 5, 2026 17:09

RakeshBobba03 temporarily deployed to DEV March 5, 2026 17:35 — with GitHub Actions Inactive

SFJohnson24 added 5 commits March 5, 2026 16:27

Merge branch '1616-datasets-library-validation-errors' of https://git…

b4d92b1

…hub.com/cdisc-org/cdisc-rules-engine into 1616-datasets-library-validation-errors

Merge branch 'main' into 1616-datasets-library-validation-errors

f8b087b

cache & cdash

a8d4c4c

moved script out of exceptions

aef4e6d

error text

d879fcb

SFJohnson24 temporarily deployed to DEV March 10, 2026 19:02 — with GitHub Actions Inactive

test

55f8290

SFJohnson24 temporarily deployed to DEV March 10, 2026 19:04 — with GitHub Actions Inactive

SFJohnson24 approved these changes Mar 10, 2026

View reviewed changes

Merge branch 'main' into 1616-datasets-library-validation-errors

80bfe5d

SFJohnson24 had a problem deploying to DEV March 10, 2026 19:04 — with GitHub Actions Failure

SFJohnson24 temporarily deployed to DEV March 10, 2026 19:14 — with GitHub Actions Inactive

SFJohnson24 merged commit c7c0a62 into main Mar 10, 2026
13 of 14 checks passed

SFJohnson24 deleted the 1616-datasets-library-validation-errors branch March 10, 2026 19:20



		_DATASETS_TAB_GUIDANCE = (
		"Make sure there is a 'Datasets' tab in your test data workbook (name is "

		description = "Dataset data is malformed."


		INVALID_DATASET_FORMAT_REASON = (

		engine_logger.setLevel(log_level)


		def _get_datasets_or_raise(data_service):

Conversation

RakeshBobba03 commented Feb 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SFJohnson24 Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 left a comment

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SFJohnson24 Mar 2, 2026 •

edited

Loading

SFJohnson24 Feb 25, 2026 •

edited

Loading

SFJohnson24 Mar 2, 2026 •

edited

Loading

SFJohnson24 Mar 2, 2026 •

edited

Loading