Skip to content

GH-53: refactor: Centralize relative dataframe-information-path and M…#58

Merged
Urfoex merged 1 commit intodevelopfrom
refactor/GH-53-relative-dataframe-information-path-at-serialization
Aug 29, 2025
Merged

GH-53: refactor: Centralize relative dataframe-information-path and M…#58
Urfoex merged 1 commit intodevelopfrom
refactor/GH-53-relative-dataframe-information-path-at-serialization

Conversation

@Urfoex
Copy link
Collaborator

@Urfoex Urfoex commented Aug 20, 2025

…igrate from pydantic dataclasses to BaseModel

This commit refactors all data models from pydantic.dataclasses.dataclass to pydantic.BaseModel. This change provides a more idiomatic and powerful way to use Pydantic's features, such as model_config and model_dump/validate.

Key changes include:

  • Replaced @dataclass(frozen=True) with BaseModel and model_config = ConfigDict(frozen=True).
  • Simplified serialization/deserialization logic by replacing TypeAdapter with direct Model.model_dump_json() and Model.model_validate() calls.
  • Centralized path relativization logic for DataFrameInformation into a new serialize/dataframe_information.py module. This removes the need for custom _serialize_model methods and path fields on ContainerInformation and PipelineInformation, cleaning up the model definitions.
  • Renamed ROLE_TYPE_ADAPTER_MAPPING to ROLE_TO_COLUMN_STATISTICS_TYPE_MAPPING and now stores model types instead of TypeAdapter instances, simplifying validation logic.
  • Removed obsolete type alias files (feature_sets.py, prediction_results.py).

@Urfoex Urfoex requested a review from Copilot August 20, 2025 15:44
@Urfoex Urfoex self-assigned this Aug 20, 2025
@Urfoex Urfoex added the enhancement New feature or request label Aug 20, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors all data models from pydantic.dataclasses.dataclass to pydantic.BaseModel to provide more idiomatic usage of Pydantic features. It also centralizes relative path handling for DataFrameInformation objects and simplifies serialization logic.

  • Replace @dataclass(frozen=True) with BaseModel and model_config = ConfigDict(frozen=True) across all models
  • Migrate from TypeAdapter to direct BaseModel.model_dump_json() and BaseModel.model_validate() calls
  • Centralize DataFrameInformation path relativization logic into a dedicated serialize/dataframe_information.py module

Reviewed Changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated 1 comment.

File Description
Multiple test files Updated test fixtures and expectations to work with new BaseModel approach
Multiple metadata model files Converted from pydantic dataclass to BaseModel with frozen configuration
Multiple serialize modules Replaced TypeAdapter usage with direct model methods and centralized path logic
Integration test files Updated model instantiation and validation to use BaseModel patterns

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed refactoring. It successfully migrates all Pydantic data models from pydantic.dataclasses.dataclass to pydantic.BaseModel, aligning with modern Pydantic idioms. The centralization of path relativization logic is a great improvement, cleaning up the model definitions and improving separation of concerns. The code is consistent, and the tests have been thoroughly updated to reflect these substantial changes. Overall, this is an excellent enhancement to the codebase.

@Urfoex Urfoex marked this pull request as ready for review August 20, 2025 15:49
@Urfoex Urfoex requested a review from srnnkls August 20, 2025 15:49
@Urfoex Urfoex requested a review from awaismirza92 August 25, 2025 08:43
Copy link
Collaborator

@srnnkls srnnkls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small comment on best practices for BaseModel configuration. Otherwise lgtm.

Copy link

@awaismirza92 awaismirza92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Minor ruff check error

(getml-io) awais@Admins-MacBook-Air:~/code17/projects/getML/github/getml-io% uv run ruff check --extend-ignore FIX .
src/getml_io/serialize/dataframe_or_view.py:172:10: COM812 [*] Trailing comma missing
|
170 | role,
171 | column_type,
172 | )
| ^ COM812
173 | )
174 | if column_statistics_type is None:
|
= help: Add trailing comma

Found 1 error.
[*] 1 fixable with the --fix option.

ruff format unformatted file

awais@Admins-MacBook-Air:~/code17/projects/getML/github/getml-io% uv run ruff format --check .
warning: The following rule may cause conflicts when used with the formatter: COM812. To avoid unexpected behavior, we recommend disabling this rule, either by removing it from the lint.select or lint.extend-select configuration, or adding it to the lint.ignore configuration.
Would reformat: src/getml_io/serialize/dataframe_or_view.py
1 file would be reformatted, 80 files already formatted

@Urfoex Urfoex force-pushed the feature/GH-54-convert-predict-ndarray-to-dataframe branch from da5401c to 0ccae38 Compare August 29, 2025 20:34
Base automatically changed from feature/GH-54-convert-predict-ndarray-to-dataframe to develop August 29, 2025 20:36
…igrate from pydantic dataclasses to BaseModel

This commit refactors all data models from `pydantic.dataclasses.dataclass` to `pydantic.BaseModel`. This change provides a more idiomatic and powerful way to use Pydantic's features, such as `model_config` and `model_dump/validate`.

Key changes include:
- Replaced `@dataclass(frozen=True)` with `BaseModel` and `model_config = ConfigDict(frozen=True)`.
- Simplified serialization/deserialization logic by replacing `TypeAdapter` with direct `Model.model_dump_json()` and `Model.model_validate()` calls.
- Centralized path relativization logic for `DataFrameInformation` into a new `serialize/dataframe_information.py` module. This removes the need for custom `_serialize_model` methods and `path` fields on `ContainerInformation` and `PipelineInformation`, cleaning up the model definitions.
- Renamed `ROLE_TYPE_ADAPTER_MAPPING` to `ROLE_TO_COLUMN_STATISTICS_TYPE_MAPPING` and now stores model types instead of `TypeAdapter` instances, simplifying validation logic.
- Removed obsolete type alias files (`feature_sets.py`, `prediction_results.py`).
@Urfoex Urfoex force-pushed the refactor/GH-53-relative-dataframe-information-path-at-serialization branch from be86507 to 7e4293c Compare August 29, 2025 20:36
@Urfoex Urfoex merged commit c013a44 into develop Aug 29, 2025
0 of 3 checks passed
@Urfoex Urfoex deleted the refactor/GH-53-relative-dataframe-information-path-at-serialization branch August 29, 2025 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adjust dataframe-information path to relative in serialization function, not via pydantic

3 participants

Comments