GH-51: feat: Add pipeline metadata serialization#65
Conversation
Introduces Pydantic models for `DataFrameMetaData` and `PipelineMetaData` to represent the data frames used in a pipeline. The `PipelineInformation` model is updated to include this new `metadata` field, which captures the names and roles of the population and peripheral data frames. This resolves the TODO for issue #51. Serialization logic is added to convert `getml.pipeline.metadata.AllMetadata` into the new `PipelineMetaData` model. All related tests, fixtures, and expected outputs have been updated accordingly. Bug-Fix: The new code now supports a value in `data_model.peripheral` being either a single `Placeholder` or a `list[Placeholder]`.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request successfully introduces metadata serialization for pipelines by adding new Pydantic models and integrating them into the PipelineInformation model. The serialization logic for converting getml.pipeline.metadata.AllMetadata is well-implemented, and the accompanying bug fix for handling both single and list Placeholder instances is a valuable addition. The tests, fixtures, and expected outputs have been updated comprehensively, and the refactoring of test fixtures improves code clarity. I've included one comment regarding the stability of generated test data, which could lead to flaky tests.
There was a problem hiding this comment.
Pull Request Overview
This PR introduces pipeline metadata serialization by adding Pydantic models for DataFrameMetaData and PipelineMetaData to capture data frame names and roles used in pipelines. This addresses issue #51 by replacing the TODO comment in PipelineInformation with actual metadata functionality.
- Adds new Pydantic models
DataFrameMetaDataandPipelineMetaDatafor metadata representation - Implements serialization logic to convert getML metadata into the new models
- Updates
PipelineInformationto include the metadata field and fixes data model peripheral handling
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/getml_io/getml/metadatas.py | Defines new Pydantic models for DataFrame and Pipeline metadata |
| src/getml_io/serialize/pipeline.py | Adds metadata serialization functions and integrates them into pipeline serialization |
| src/getml_io/serialize/data_model.py | Fixes data model peripheral handling to support both single Placeholder and list of Placeholders |
| src/getml_io/metadata/pipeline_information.py | Updates PipelineInformation to include metadata field |
| tests/unit/types.py | Adds type definitions for the new metadata structures |
| tests/unit/conftest.py | Updates test fixtures to include metadata in pipeline information |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Introduces Pydantic models for
DataFrameMetaDataandPipelineMetaDatato represent the data frames used in a pipeline.The
PipelineInformationmodel is updated to include this newmetadatafield, which captures the names and roles of the population and peripheral data frames. This resolves the TODO for issue #51.Serialization logic is added to convert
getml.pipeline.metadata.AllMetadatainto the newPipelineMetaDatamodel. All related tests, fixtures, and expected outputs have been updated accordingly.Bug-Fix: The new code now supports a value in
data_model.peripheralbeing either a singlePlaceholderor alist[Placeholder].