Skip to content

#1248 Specify define xml path in operations#1649

Open
alexfurmenkov wants to merge 9 commits intomainfrom
1248-fix-define-variable-metadata
Open

#1248 Specify define xml path in operations#1649
alexfurmenkov wants to merge 9 commits intomainfrom
1248-fix-define-variable-metadata

Conversation

@alexfurmenkov
Copy link
Collaborator

@alexfurmenkov alexfurmenkov commented Mar 6, 2026

Added define xml path parameter to operation.
Found an issue in ExcelDataService ruining define.xml loading, because we use one path to look for both data and define.xml. Also, when we use ExcelDataService, dataset metadata doesn't contain real path to the *.xlsx file.

@alexfurmenkov alexfurmenkov requested review from RamilCDISC, SFJohnson24 and gerrycampion and removed request for gerrycampion March 9, 2026 12:23
@alexfurmenkov alexfurmenkov marked this pull request as ready for review March 9, 2026 12:23
@alexfurmenkov alexfurmenkov changed the title #1248 Draft solution to bypass Excel sheet processing #1248 Specify define xml path in operations Mar 9, 2026
"-dxp",
"tests/resources/CoreIssue1248/define_subfolder/define.xml",
"-dp",
"tests/resources/CoreIssue1248/relrec.json",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better would be to use excel dataset here instead of json. The PR updates are for excel dataset. Using json will not test the updated changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add a test with no -dxp flag supplied to verify the case of getting define from dataset path also.

Copy link
Collaborator Author

@alexfurmenkov alexfurmenkov Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work without the -dxp flag until changes are made to ExcelDataService related to comment

or changes to self.rule_processor.perform_rule_operations call should be made, so dataset_metadata.full_path containing in our case relrec.xpt value will pass something else

filename=dataset_name,
full_path=dataset_name,
full_path=self.dataset_path,
# full_path=dataset_name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remove this old commented out code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll revert it back to full_path=dataset_name for now.
Ideally, it should be full_path=self.dataset_path so the name matches and the report shows the actual path.

However, changing this will break the current code because get_dataset expects a dataset name, while all service clients are currently sending full_path.

Possible solutions:

  1. Detect the service type and send the filename field instead of full_path. This would work but doesn’t feel very elegant.
  2. Change get_dataset to accept a dataset metadata object instead of a path parameter. Then each service can decide what field from the metadata it needs to retrieve the dataset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AC in the attached issue requires solution of the bug in excel dataset use path. i think we should resolve the issue in this PR before merge. You can discuss with @SFJohnson24 and @gerrycampion to decide what solution to use.

else os.path.join(self.params.directory_path, DEFINE_XML_FILE_NAME)
)
if not os.path.exists(define_path):
raise FileNotFoundError("Define XML file %s not found", define_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string formatting is incorrect. Use f"Define XML file {define_path} not found" instead.

Comment on lines +36 to 46
define_path = (
self.params.define_xml_path
if self.params.define_xml_path
else os.path.join(self.params.directory_path, DEFINE_XML_FILE_NAME)
)
if not os.path.exists(define_path):
raise FileNotFoundError(f"Define XML file {define_path} not found")
define_contents = self.data_service.get_define_xml_contents(
dataset_name=os.path.join(self.params.directory_path, DEFINE_XML_FILE_NAME)
dataset_name=define_path
)
define_reader = DefineXMLReaderFactory.from_file_contents(define_contents)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chunk should be moved to a BaseOperation function so that it can be reused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operation "define_variable_metadata" doesn't work for Excel data

3 participants