Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR refactors all data models from pydantic.dataclasses.dataclass to pydantic.BaseModel to provide more idiomatic usage of Pydantic features. It also centralizes relative path handling for DataFrameInformation objects and simplifies serialization logic.
- Replace
@dataclass(frozen=True)withBaseModelandmodel_config = ConfigDict(frozen=True)across all models - Migrate from
TypeAdapterto directBaseModel.model_dump_json()andBaseModel.model_validate()calls - Centralize
DataFrameInformationpath relativization logic into a dedicatedserialize/dataframe_information.pymodule
Reviewed Changes
Copilot reviewed 42 out of 42 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| Multiple test files | Updated test fixtures and expectations to work with new BaseModel approach |
| Multiple metadata model files | Converted from pydantic dataclass to BaseModel with frozen configuration |
| Multiple serialize modules | Replaced TypeAdapter usage with direct model methods and centralized path logic |
| Integration test files | Updated model instantiation and validation to use BaseModel patterns |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request is a significant and well-executed refactoring. It successfully migrates all Pydantic data models from pydantic.dataclasses.dataclass to pydantic.BaseModel, aligning with modern Pydantic idioms. The centralization of path relativization logic is a great improvement, cleaning up the model definitions and improving separation of concerns. The code is consistent, and the tests have been thoroughly updated to reflect these substantial changes. Overall, this is an excellent enhancement to the codebase.
srnnkls
left a comment
There was a problem hiding this comment.
A small comment on best practices for BaseModel configuration. Otherwise lgtm.
awaismirza92
left a comment
There was a problem hiding this comment.
LGTM.
Minor ruff check error
(getml-io) awais@Admins-MacBook-Air:~/code17/projects/getML/github/getml-io% uv run ruff check --extend-ignore FIX .
src/getml_io/serialize/dataframe_or_view.py:172:10: COM812 [*] Trailing comma missing
|
170 | role,
171 | column_type,
172 | )
| ^ COM812
173 | )
174 | if column_statistics_type is None:
|
= help: Add trailing comma
Found 1 error.
[*] 1 fixable with the --fix option.
ruff format unformatted file
awais@Admins-MacBook-Air:~/code17/projects/getML/github/getml-io% uv run ruff format --check .
warning: The following rule may cause conflicts when used with the formatter: COM812. To avoid unexpected behavior, we recommend disabling this rule, either by removing it from the lint.select or lint.extend-select configuration, or adding it to the lint.ignore configuration.
Would reformat: src/getml_io/serialize/dataframe_or_view.py
1 file would be reformatted, 80 files already formatted
da5401c to
0ccae38
Compare
…igrate from pydantic dataclasses to BaseModel This commit refactors all data models from `pydantic.dataclasses.dataclass` to `pydantic.BaseModel`. This change provides a more idiomatic and powerful way to use Pydantic's features, such as `model_config` and `model_dump/validate`. Key changes include: - Replaced `@dataclass(frozen=True)` with `BaseModel` and `model_config = ConfigDict(frozen=True)`. - Simplified serialization/deserialization logic by replacing `TypeAdapter` with direct `Model.model_dump_json()` and `Model.model_validate()` calls. - Centralized path relativization logic for `DataFrameInformation` into a new `serialize/dataframe_information.py` module. This removes the need for custom `_serialize_model` methods and `path` fields on `ContainerInformation` and `PipelineInformation`, cleaning up the model definitions. - Renamed `ROLE_TYPE_ADAPTER_MAPPING` to `ROLE_TO_COLUMN_STATISTICS_TYPE_MAPPING` and now stores model types instead of `TypeAdapter` instances, simplifying validation logic. - Removed obsolete type alias files (`feature_sets.py`, `prediction_results.py`).
be86507 to
7e4293c
Compare
…igrate from pydantic dataclasses to BaseModel
This commit refactors all data models from
pydantic.dataclasses.dataclasstopydantic.BaseModel. This change provides a more idiomatic and powerful way to use Pydantic's features, such asmodel_configandmodel_dump/validate.Key changes include:
@dataclass(frozen=True)withBaseModelandmodel_config = ConfigDict(frozen=True).TypeAdapterwith directModel.model_dump_json()andModel.model_validate()calls.DataFrameInformationinto a newserialize/dataframe_information.pymodule. This removes the need for custom_serialize_modelmethods andpathfields onContainerInformationandPipelineInformation, cleaning up the model definitions.ROLE_TYPE_ADAPTER_MAPPINGtoROLE_TO_COLUMN_STATISTICS_TYPE_MAPPINGand now stores model types instead ofTypeAdapterinstances, simplifying validation logic.feature_sets.py,prediction_results.py).