-
Notifications
You must be signed in to change notification settings - Fork 27
1022 enhance utf handling #1456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
c6a8c77
UTF Encoding Enhancement Implementation
RakeshBobba03 c8f5c32
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 9bbbe48
add dataset_implementation to DatasetJSONReader and encoding paramete…
RakeshBobba03 e92be69
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 2d4f6ac
move imports to top and add encoding parameter to test_validate
RakeshBobba03 2316d95
Merge branch 'main' into 1022-Enhance-UTF-Handling
RamilCDISC a970b73
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 ee0d5ac
Add short form flag (-e) for encoding option with validation and upda…
RakeshBobba03 273ee05
Merge branch '1022-Enhance-UTF-Handling' of https://github.com/cdisc-…
RakeshBobba03 0d1c9c6
Fix encoding error handling fallback and add missing dataset_implemen…
RakeshBobba03 277aca7
Fix XPT encoding detection order and add graceful error handling for …
RakeshBobba03 da87cef
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 b237486
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 8ff517c
Default to UTF-8 encoding with explicit -e flag support, remove autom…
RakeshBobba03 11accb9
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 fdce8c3
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 c7599e7
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 68e25fc
Refactor encoding handling: centralize utf-8 default in DataReaderInt…
RakeshBobba03 2dba53c
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 77e1129
Remove encoding parameter from from_file() call
RakeshBobba03 52bee07
Auto-updated branch with latest changes from main
SFJohnson24 b66e1e1
Auto-updated branch with latest changes from main
SFJohnson24 642ffc9
Auto-updated branch with latest changes from main
SFJohnson24 0a15ebc
Auto-updated branch with latest changes from main
SFJohnson24 fbe28d5
Auto-updated branch with latest changes from main
SFJohnson24 d483725
Auto-updated branch with latest changes from main
SFJohnson24 fe8971a
Auto-updated branch with latest changes from main
SFJohnson24 c6ed0d4
Auto-updated branch with latest changes from main
SFJohnson24 9eadd82
Auto-updated branch with latest changes from main
SFJohnson24 59756ac
Auto-updated branch with latest changes from main
SFJohnson24 8a57788
Auto-updated branch with latest changes from main
SFJohnson24 3f0caf8
Auto-updated branch with latest changes from main
SFJohnson24 4265595
Auto-updated branch with latest changes from main
SFJohnson24 819f7f5
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 697d867
Merge branch '1022-Enhance-UTF-Handling' of https://github.com/cdisc-…
RakeshBobba03 e988c74
Fix schema loading to always use UTF-8 instead of user encoding
RakeshBobba03 07f546e
Auto-updated branch with latest changes from main
SFJohnson24 ccd5c0e
Auto-updated branch with latest changes from main
SFJohnson24 8991b94
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 2416516
Merge branch '1022-Enhance-UTF-Handling' of https://github.com/cdisc-…
RakeshBobba03 5f410b9
Use DEFAULT_ENCODING everywhere and make encoding handling consistent
RakeshBobba03 73e1aba
Add parametrized tests for each README encoding
RakeshBobba03 bcc6b44
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 d8272ed
Use hardcoded utf-8 for schema files, inline pyreadstat calls in XPT …
RakeshBobba03 bfdac45
Merge branch 'main' into 1022-Enhance-UTF-Handling
RakeshBobba03 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,5 +28,6 @@ | |
| "jsonata_custom_functions", | ||
| "max_report_rows", | ||
| "max_errors_per_rule", | ||
| "encoding", | ||
| ], | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,6 +15,7 @@ | |
| from cdisc_rules_engine.services.data_readers.json_reader import JSONReader | ||
| from cdisc_rules_engine.enums.dataformat_types import DataFormatTypes | ||
| from cdisc_rules_engine.models.dataset import PandasDataset | ||
| from cdisc_rules_engine.constants import DEFAULT_ENCODING | ||
|
|
||
|
|
||
| class DataReaderFactory(FactoryInterface): | ||
|
|
@@ -26,9 +27,15 @@ class DataReaderFactory(FactoryInterface): | |
| DataFormatTypes.USDM.value: JSONReader, | ||
| } | ||
|
|
||
| def __init__(self, service_name: str = None, dataset_implementation=PandasDataset): | ||
| def __init__( | ||
| self, | ||
| service_name: str = None, | ||
| dataset_implementation=PandasDataset, | ||
| encoding: str = None, | ||
| ): | ||
| self._default_service_name = service_name | ||
| self.dataset_implementation = dataset_implementation | ||
| self.encoding = encoding | ||
|
|
||
| @classmethod | ||
| def register_service(cls, name: str, service: Type[DataReaderInterface]): | ||
|
|
@@ -47,7 +54,9 @@ def get_service(self, name: str = None, **kwargs) -> DataReaderInterface: | |
| """ | ||
| service_name = name or self._default_service_name | ||
| if service_name in self._reader_map: | ||
| return self._reader_map[service_name](self.dataset_implementation) | ||
| reader_class = self._reader_map[service_name] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To answer the question, I think the simplest solution is to just add this to the |
||
| encoding = self.encoding or DEFAULT_ENCODING | ||
| return reader_class(self.dataset_implementation, encoding=encoding) | ||
| raise ValueError( | ||
| f"Service name must be in {list(self._reader_map.keys())}, " | ||
| f"given service name is {service_name}" | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.