Enhance UTF Handling in JSON Dataset Reader

The dataset reader currently assumes utf8 and ascii characters, which causes issues when parsing JSON files containing international characters.   Since JSON is inherently Unicode (UTF-8 by default), we need to ensure that the dataset reader properly handles UTF-8, UTF-16, and UTF-32 encoded data.

A/C:   The dataset reader should be able to parse and process datasets containing international characters without issues.
- Update the all dataset reader to explicitly handle Unicode encoding.
- Ensure proper decoding when reading files.
- Add test cases with non-ASCII characters (Chinese, Japanese, etc.).
- both Test and validate commands should work

Errors:
this is the error with validate 
<img width="1326" src="https://api.zenhub.com/attachedFiles/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBBM2ZiQ0E9PSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--a5a744dababddbf9199c17b333b851436ec48141/image.png" alt="image.png" />
this is the error with test
<img width="1172" src="https://api.zenhub.com/attachedFiles/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBBM2piQ0E9PSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--064b32fbf7f6c0795dcee9eb5da9b127d4cea8d7/image.png" alt="image.png" />


DatasetJSON:
[ae_nonascii.zip](https://api.zenhub.com/attachedFiles/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBBMmZiQ0E9PSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--44b07d0f80fa06337841ddc9d4e8ffb101093202/ae_nonascii.zip)
Test Command JSON
[TestDatasets.zip](https://api.zenhub.com/attachedFiles/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBBM0hiQ0E9PSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--6ff2a3cbedabb063c569cec0ea475cbc01fc302a/TestDatasets.zip)

code:
https://github.com/cdisc-org/cdisc-rules-engine/tree/main/cdisc_rules_engine/services/data_readers     here we use uft-8 in the reader class (also in metadata reader classes)
https://github.com/cdisc-org/cdisc-rules-engine/blob/b93b1b60e4deb9cd9aadbaee3030994c3cd2a293/cdisc_rules_engine/services/data_services/dummy_data_service.py#L60 here we use utf-8 for JSON (from -lr flag)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance UTF Handling in JSON Dataset Reader #1022

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhance UTF Handling in JSON Dataset Reader #1022

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions