Skip to content

#1267 changed full path to original path in stdm reporting#1618

Merged
SFJohnson24 merged 5 commits intomainfrom
1267-update-parquet-reporting
Mar 2, 2026
Merged

#1267 changed full path to original path in stdm reporting#1618
SFJohnson24 merged 5 commits intomainfrom
1267-update-parquet-reporting

Conversation

@alexfurmenkov
Copy link
Collaborator

No description provided.

@alexfurmenkov alexfurmenkov linked an issue Feb 16, 2026 that may be closed by this pull request
@alexfurmenkov alexfurmenkov marked this pull request as ready for review February 16, 2026 17:25
"label": dataset.label,
"path": str(Path(dataset.full_path or "").parent),
"path": str(
Path(dataset.original_path or dataset.full_path or "").parent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our intent is to always show the original path and not the parquet file path ever. So it should be dataset.original_path or "" only. @SFJohnson24 Could you please confirm which one you prefer?

Copy link
Collaborator

@SFJohnson24 SFJohnson24 Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RamilCDISC yes we are not trying to show the path to the parquet files in the reporting, we want the original path from the submitted data

Copy link
Collaborator

@gerrycampion gerrycampion Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexfurmenkov @SFJohnson24 @RamilCDISC
EDIT
I think I partially resolved this issue for the issue reporting tabs:
https://github.com/cdisc-org/cdisc-rules-engine/pull/1551/changes#diff-61bbdda116cb3d9e8bce503b115f0c16fe154902e296785925d922821eae917b

I don't think I fixed the datasets tabs. I agree with Ramil and Sam.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @RamilCDISC, updated code as discussed.
Had to change dataset metadata in test_get_export so it will contain original_path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPD: we should keep dataset.original_path or dataset.full_path or "" since dataset.original_path is only filled when we have large_dataset_validation

Copy link
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran validations locally setting the DATASET_SIZE_THRESHOLD=0 in .env, the following reports still show path "/var/folders/7t/g8jmyzxn6k797cg9xt0sd9z40000gn/T" One report was generated using xpt and the other user excel file as dataset.

CORE-Report-2026-02-24T13-46-47.xlsx

CORE-Report-2026-02-24T13-48-23.xlsx

@alexfurmenkov
Copy link
Collaborator Author

Hi, @RamilCDISC, can you please share you run configuration?
I've tried to recreate it on macOS, but the output for such config

validate
-s
sdtmig
-v
3-4
-dv
2-1
-dp
tests/resources/test_dataset.json
-r
CORE-000107
-l
error
-ps
1
-of
json
-of
xlsx

contains proper path.

CORE-Report-2026-02-26T17-48-09.xlsx

@RamilCDISC
Copy link
Collaborator

@alexfurmenkov I am running validation on a mac. In my .env I have set DATASET_SIZE_THRESHOLD=0 In the terminal from the root of the project I execute command
python core.py validate -s sdtmig -v 3.4 -dp tests/resources/test_dataset.xpt
and second command
python core.py validate -s sdtmig -v 3.4 -dp tests/resources/test_dataset.json
You can execute these commands directly as the datasets are in the repository test resources.

For xpt dataset location is shows as /var/folders/7t/g8jmyzxn6k797cg9xt0sd9z40000gn/T and for json dataset /var/folders/7t/g8jmyzxn6k797cg9xt0sd9z40000gn/T

i have pulled the latest changes from the branch.

@alexfurmenkov
Copy link
Collaborator Author

@RamilCDISC , I've tried your configuration on mac

python core.py validate -s sdtmig -v 3.4 -dp tests/resources/test_dataset.xpt

and got next report:

CORE-Report-2026-02-27T15-29-52.xlsx

can you please contact me in slack so we can find a root of this issue on your machine?

Copy link
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR updates the reporting to report correct path for dataset and not parquet files path. The validation was done by ensuring test updates and all test passing. Manual testing was run by generating reports using Dask implementation with xpt, xlsx, json an ndjson datasets.

@SFJohnson24 SFJohnson24 merged commit e1b3a47 into main Mar 2, 2026
11 checks passed
@SFJohnson24 SFJohnson24 deleted the 1267-update-parquet-reporting branch March 2, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dask Reporting

4 participants