Skip to content

1558 csv data reader#1640

Open
alexfurmenkov wants to merge 18 commits intomainfrom
1558-csv-data-reader
Open

1558 csv data reader#1640
alexfurmenkov wants to merge 18 commits intomainfrom
1558-csv-data-reader

Conversation

@alexfurmenkov
Copy link
Collaborator

No description provided.

@alexfurmenkov alexfurmenkov marked this pull request as ready for review March 2, 2026 08:56
Copy link
Collaborator

@SFJohnson24 SFJohnson24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good so far-- can you see messages on slack with me and Gerry re: the .env and taking that into account as well with csv runs. https://github.com/cdisc-org/cdisc-rules-engine/issues/1559 My parsing script matches your reader logic--there is a description of the .env structure there.
It would want to get the standard/version/substandard/CT/define xml path from the .env

@alexfurmenkov alexfurmenkov linked an issue Mar 5, 2026 that may be closed by this pull request
raise NotImplementedError

def from_file(self, file_path):
with open(file_path, "r", encoding=self.encoding) as fp:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need encoding + general error handling wrapping similar to json reader here.

first_record = {}

with open(self.file_path, encoding=self.encoding) as f:
dataset_length = sum(1 for _ in f) - 1 # subtract header
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can result in dataset_length = -1 for empty csv files. For those cases I guess it should be zero. So we should clamp it to zero.

description = "JSON data is malformed."


class InvalidCSVFormat(EngineError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think better naming for this would be InvalidCSVFile instead of format because this error is for when csv is malformed or encoding failed which makes it invalid csv file. What do you think?

Copy link
Collaborator

@SFJohnson24 SFJohnson24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @alexfurmenkov can you please start adding some descriptions to your PRs? It would help with the reviews. Thanks

I tried running a CSV with no standard / version given per the discussion in slack and core.py still has standard and version as required for their click arguments. This needs to be changed. I tried changing them myself locally and it still fails-- it appears it is not grabbing them from the env.

  File "C:\Users\SamJohnson\Documents\workspaces\cdisc-rules-engine/core.py", line 537, in validate
    standard = standard.lower()
Image it seems the click strategy you implemented isnt grabbing things because envvar only works in the process environment and the .env we want is in the -d path. note from the slack screenshot above: --use-case also needs to be implemented and extracted from the env as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support CSV datasets in engine

3 participants