feat: add model registry client#186
Conversation
|
🎉 Welcome to the Kubeflow SDK! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
Pull Request Test Coverage Report for Build 21182264396Details
💛 - Coveralls |
fd5f82f to
33efe23
Compare
kramaranya
left a comment
There was a problem hiding this comment.
Thank you for this amazing work, @jonburdo!
Overall looks good, I left my initial comments
/assign @kubeflow/kubeflow-sdk-team
| ci | ||
| docs | ||
| examples | ||
| hub |
There was a problem hiding this comment.
Sorry I'm still a bit confused with hub vs model registry, @johnugeorge would you mind elaborating on this, specifically what do we expose to Kubeflow SDK users?
Shall this be model-registry or mr instead, since we only include only ModelRegistryClient?
There was a problem hiding this comment.
I'm thinking kubeflow.hub is the appropriate submodule given:
- We plan to rename from "Kubeflow Model Registry" to "Kubeflow Hub" assuming KEP-907 goes through.
- The scope of the project is already broader than just model registry and it is conceivable that we could offer other clients in the future.
To highlight the point, we already have a model catalog and some basic experiment tracking. Now we don't plan to have kubeflow sdk clients for these afaik, but we are positioned to be able to extend the project for other AI/ML asset types (mcp, prompt management). We also have some storage helpers and integrations as outlined here.
Note my assumption here is basically that we want to have roughly one submodule per kubeflow sub-project. We could decide not to do this and then have kubeflow.model_registry and add separate submodules for any new clients or functionality we add later.
| hub = [ | ||
| "model-registry>=0.3.0", | ||
| ] |
There was a problem hiding this comment.
I think we should include model-registry by default. wdyt @kubeflow/kubeflow-sdk-team
The issue is that Model Registry and other components already have their own extras, for example https://github.com/kubeflow/model-registry/blob/main/clients/python/pyproject.toml#L33-L36. We can't do "extras of extras", so if we tried to consolidate them we'd lose that functionality.
I think we should include core Kubeflow components by default (including model-registry) and reserve extras for third-party integrations like OpenTelemetry, MLflow, Hugging Face, etc.
There was a problem hiding this comment.
I think the takeaway in the Kubeflow SDK & ML Experience WG call was that we will keep model-registry optional for now. I'm thinking that means we leave this as is - lmk if not.
A few other things to note:
- The subset of model registry functions we are exposing here does not rely on the
model-registryextras if I'm not mistaken - This doesn't prevent users from doing
pip install 'model-registry[hf,boto3,olot]'if needed - Other options could include having
kubeflow[hub]formodel-registryandkubeflow[hub-all]formodel-registry[hf,boto3,olot]if we really needed it (but I don't think we do)
There was a problem hiding this comment.
Yes, let's keep the hub extras for now as we discussed. Later we can figure out a way how to manage extras of extras.
|
|
||
| [dependency-groups] | ||
| dev = [ | ||
| "kubeflow[hub]", |
There was a problem hiding this comment.
| "kubeflow[hub]", | |
| "model-registry>=0.3.0", |
There was a problem hiding this comment.
I think we just want to include anything in the hub extra rather than re-listing the packages.
33efe23 to
7c78065
Compare
There was a problem hiding this comment.
I've made some updates and simplifications since the first iteration:
- simplify
__init__andregister_model - remove
from_registryclassmethod - We can have something that does this inModelRegistryinstead of here I think - return
ModelVersioninstead ofModelVersion | Nonefromget_model_version- same for otherget_*functions. Error if missing
Makefile
Outdated
| @uv lock --check | ||
| @uv run ruff check --show-fixes --output-format=github . | ||
| @uv run ruff format --check kubeflow | ||
| @uv run mypy kubeflow/hub --exclude '_test\.py$$' |
There was a problem hiding this comment.
I've added mypy - for now just on kubeflow/hub. It would simplify linting if we used the typical pattern of a separate test dir in the root of the repo for tests instead of mixing in *_test.py files, but I stuck with the existing pattern for now.
There was a problem hiding this comment.
Can we use ruff for linting ? Using unified tool will allow us to reduce number of dependencies users have to install.
There was a problem hiding this comment.
We are using ruff for linting, but it doesn't include type-checking, which we really want, so we also need mypy (e.g. if we're passing an int to str arg ruff won't catch that). The same org behind ruff is working on a type checker, ty, but it's relatively new and would still be a separate tool: https://github.com/astral-sh/ty
There was a problem hiding this comment.
@kramaranya @szaher @astefanutti What are your thoughts on this? Shall we adopt ty for type checking?
@HumairAK @mprahl @hbelmiro @nsingla Are you planning to use any typing tool for Pipelines library?
There was a problem hiding this comment.
@jonburdo have you tried to compare ty and mypy? does it cover everything you're using mypy for?
There was a problem hiding this comment.
@andreyvelich No concrete plans at the moment. For the components/pipelines themselves, KFP doesn't really enforce or leverage Python type hints.
Adding a type checker for our scripts/CLIs could still be valuable. Though I'm guessing that's not the part you're asking about.
There was a problem hiding this comment.
Looking again at ty, I think it's a good option. It will be faster and align with our use of uv and ruff. Replacing mypy with ty
1218458 to
affcdc5
Compare
|
I've addressed initial feedback and believe this is ready for review by @kubeflow/kubeflow-sdk-team |
86564f2 to
854e6ba
Compare
andreyvelich
left a comment
There was a problem hiding this comment.
Thank you for this @jonburdo! Overall, lgtm, I left a few comments.
cc @kubeflow/kubeflow-sdk-team
Makefile
Outdated
| @uv lock --check | ||
| @uv run ruff check --show-fixes --output-format=github . | ||
| @uv run ruff format --check kubeflow | ||
| @uv run mypy kubeflow/hub --exclude '_test\.py$$' |
There was a problem hiding this comment.
Can we use ruff for linting ? Using unified tool will allow us to reduce number of dependencies users have to install.
| ``` | ||
|
|
||
| ```python | ||
| from kubeflow.hub import ModelRegistryClient |
There was a problem hiding this comment.
Are we going to have 2 clients in Hub? ModelRegistry and Catalog.
There was a problem hiding this comment.
We have been discussing in KF Hub biweekly meeting that any new user facing client will be developed in KF SDK.
👉 So when the user need comes for a user facing py client for the Catalog, we'd directly develop this in KF SDK.
Hope this is satisfactory answer for ML WG (this group) and from KSC pov.
Detail note: from a E2E testing point of view, we need python based code generated clients for all the endpoints, which is something that for example ML WG (this group) mentioned is not interested in exposing, etc. So infrastructure code we need for the purpose of E2E, fuzz, and coverage testing of the endpoints, we're continuing maintaining in the KF/model-registry repo. We're open to revisit this setup too.
There was a problem hiding this comment.
Great, thank you for sharing recording @tarilabs!
we need python based code generated clients for all the endpoints, which is something that for example ML WG (this group) mentioned is not interested in exposing, etc.
Yeah, that make sense. I guess, the conclusion was that we expose minimal set of APIs via kubeflow.hub package, and gather initial users feedback.
If users need other APIs that we don't expose via kubeflow.hub, they can always use it via this parameter:
self._registry()
Or import the client directly import model_registry.
In the future, we can potentially add more APIs to the kubeflow.hub directly.
The only question that I have, do users need two separate clients for Catalog and Model Registry APIs inside the kubeflow.hub sub package ?
e.g.:
from kubeflow.hub import ModelRegistryClient
from kubeflow.hub import CatalogClient
or just
from kubeflow.hub import HubClient
There was a problem hiding this comment.
Great, thank you for sharing recording @tarilabs!
we need python based code generated clients for all the endpoints, which is something that for example ML WG (this group) mentioned is not interested in exposing, etc.
Yeah, that make sense. I guess, the conclusion was that we expose minimal set of APIs via
kubeflow.hubpackage, and gather initial users feedback. If users need other APIs that we don't expose viakubeflow.hub, they can always use it via this parameter:self._registry()Or import the client directly
import model_registry. In the future, we can potentially add more APIs to thekubeflow.hubdirectly.
Great, sounds we are all aligned, personally glad and great to hear 👍
As for the latter question, my current assessment is:
from kubeflow.hub import ModelRegistryClient
from kubeflow.hub import CatalogClient
as they're different patterns, and in turn different use-cases.
However as we'll define it in more concrete terms, we can refine the case more specifically then.
Hope this is understandable and fair.
There was a problem hiding this comment.
Sure, that sounds great, thank you!
| hub = [ | ||
| "model-registry>=0.3.0", | ||
| ] |
There was a problem hiding this comment.
Yes, let's keep the hub extras for now as we discussed. Later we can figure out a way how to manage extras of extras.
| """Skip these tests if model-registry not installed.""" | ||
| pytest.importorskip("model_registry") | ||
|
|
||
| def test_init_raises_helpful_error_when_not_installed(self, monkeypatch): |
There was a problem hiding this comment.
Please can you refactor these tests to align with this:
E.g. every test should align with the function name: test_register_model(test_case) and we create test cases via :
@pytest.mark.parametrize(
"test_case",
[
TestCase()
]That will allow contributors to easily understand code and help with changes for Hub SDKs.
There was a problem hiding this comment.
Sure, I've made this change. This does we are importing TestCase from kubeflow.trainer. I also have mixed feelings about this, but if it's something we want to standardize in the sdk, then I guess it works.
There was a problem hiding this comment.
Thank you! Yes, eventually, we should have common types/helper function that we can re-use across packages.
This is yet another benefit to use common SDK to reduce code duplication and standardize on the code practices.
| from collections.abc import Iterator, Mapping | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| if TYPE_CHECKING: |
There was a problem hiding this comment.
Why do we need type checking here?
There was a problem hiding this comment.
We want type annotations on the functions for better type-safety. LSPs will use them too (docs/checks in code editors). The if TYPE_CHECKING allows us to import things we may not have a runtime. mypy and LSPs will use the imports but if model-registry is missing we don't get an import error at this point.
Since hub is an optional dependency, we check for model-registry upon usage of the client ModelRegistryClient() rather than upon import as discussed:
#186 (comment)
Otherwise in a new env, this would error:
pip install kubeflow
python -c 'import kubeflow.hub'
| - http:// defaults to 8080 | ||
| - no scheme defaults to 443 | ||
| author: Name of the author. | ||
| is_secure: Whether to use a secure connection. If not provided, inferred from base_url: |
There was a problem hiding this comment.
Same question, why not check it from the URL directly ?
There was a problem hiding this comment.
is_secure is needed when you want to skip TLS verification, ie you have self-signed certs in a local development environment
We will default to is_secure=True for https and is_secure=False for http, but I believe we need to allow users to disable it too (is_secure=False with https) for development (shouldn't be done in production).
There was a problem hiding this comment.
If it is only for development why not use http? Did you have use-cases when users want to verify https locally without secure connection?
There was a problem hiding this comment.
Discussed in community call today. @tarilabs noted that this is important for certain use-cases. Users may use this library for development and have a need to use unsigned certificates.
Fwiw, we do leave this out of our primary quick-start examples in the README
| self._registry = ModelRegistry( | ||
| server_address=base_url, | ||
| port=port, | ||
| author=author, # type: ignore[arg-type] |
There was a problem hiding this comment.
Since author is part of register_model API, does client still require it:
?There was a problem hiding this comment.
This provides a default value so it doesn't have to be passed to register_model every time it's called.
| name: Name of the model. | ||
| uri: URI of the model. | ||
|
|
||
| Keyword Args: |
There was a problem hiding this comment.
@kramaranya Do you know if we can auto-generate Kubeflow SDK docs from such format of docstring: Keyword Args?
There was a problem hiding this comment.
Yes, I think it's configurable. I'll soon open a PR for the website and will verify it
| name: str, | ||
| uri: str, | ||
| *, | ||
| version: str, |
There was a problem hiding this comment.
Since version is mandatory parameter why you put it after "*" as keyword args?
There was a problem hiding this comment.
It matches what we have in ModelRegistry and provides a little more flexibility if we want to re-order args after name, uri. I do wonder if we want to make version optional in the future and support an autogenerated version string if not provided.
There was a problem hiding this comment.
I think, this is good idea. I would suggest to make it Optional.
There was a problem hiding this comment.
Discussed in the community meeting. I believe we agreed to leave it as is for now. Apparently it becomes complicated if we try to provide a default as users won't always agree on if it should be uuid or incrementing v1, v2, ...
| TypeError: If model is not a RegisteredModel instance. | ||
| model_registry.exceptions.StoreError: If model does not have an ID. | ||
| """ | ||
| from model_registry.types import RegisteredModel |
There was a problem hiding this comment.
Can we move all imports to the top of this file?
There was a problem hiding this comment.
This would cause an immediate error upon import for import kubeflow.hub if the user has done:
pip install kubeflow
but not
pip install 'kubeflow[hub]'
We could opt for that. Or we could do a lazy import. I would suggest leaving it as-is though.
There was a problem hiding this comment.
But since we put it under TYPE_CHECKING that should be fine, isn't?
There was a problem hiding this comment.
In this function if not isinstance(model, RegisteredModel): needs to have RegisteredModel at runtime.
For example, if you use from __future__ import annotations or python>=3.10 then you can use:
if TYPE_CHECKING:
... import NonRuntimeVar
my_var: NonRuntimeVar = ...mypy and ty will still analyze use of NonRuntimeVar here, but you couldn't actually do:
NonRuntimeVar()
# or
isinstance(something, NonRuntimeVar)There was a problem hiding this comment.
Oh, I see, thanks for clarifying!
854e6ba to
7f41fa7
Compare
jonburdo
left a comment
There was a problem hiding this comment.
Thanks @andreyvelich for the review and all the thoughtful feedback! I believe I've addressed it. The one thing I'm not sure about is the question about docs: #186 (comment)
Makefile
Outdated
| @uv lock --check | ||
| @uv run ruff check --show-fixes --output-format=github . | ||
| @uv run ruff format --check kubeflow | ||
| @uv run mypy kubeflow/hub --exclude '_test\.py$$' |
There was a problem hiding this comment.
We are using ruff for linting, but it doesn't include type-checking, which we really want, so we also need mypy (e.g. if we're passing an int to str arg ruff won't catch that). The same org behind ruff is working on a type checker, ty, but it's relatively new and would still be a separate tool: https://github.com/astral-sh/ty
| from collections.abc import Iterator, Mapping | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| if TYPE_CHECKING: |
There was a problem hiding this comment.
We want type annotations on the functions for better type-safety. LSPs will use them too (docs/checks in code editors). The if TYPE_CHECKING allows us to import things we may not have a runtime. mypy and LSPs will use the imports but if model-registry is missing we don't get an import error at this point.
Since hub is an optional dependency, we check for model-registry upon usage of the client ModelRegistryClient() rather than upon import as discussed:
#186 (comment)
Otherwise in a new env, this would error:
pip install kubeflow
python -c 'import kubeflow.hub'
| def __init__( | ||
| self, | ||
| base_url: str, | ||
| port: int | None = None, |
There was a problem hiding this comment.
I've added from __future__ import annotations at the top in order to allow postponed evaluation of annotations. This allows things like:
|for union- missing-at-runtime variables like return type
-> RegisteredModelinstead of-> "RegisteredModel"
I've double checked this works on python 3.9 by ensuring from kubeflow.hub import ModelRegistryClient works as-is. Removing the from __future__ import annotations would result in:
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
| The scheme is used to infer is_secure and port if not explicitly provided. | ||
|
|
||
| Keyword Args: | ||
| port: Server port. If not provided, inferred from base_url scheme: |
There was a problem hiding this comment.
Good idea. Maybe we can do this as a follow-up?
I'm thinking we should keep the option to use the separate port arg, but if it's not used, we should also allow users to pass it via base_url, so these are all valid and equivalent:
ModelRegistryClient(base_url="https://example.com")
ModelRegistryClient(base_url="https://example.com", port=443)
ModelRegistryClient(base_url="https://example.com:443")This can also be a change made in the base developer client (ModelRegistry from model-registry package) instead of here.
| - http:// defaults to 8080 | ||
| - no scheme defaults to 443 | ||
| author: Name of the author. | ||
| is_secure: Whether to use a secure connection. If not provided, inferred from base_url: |
There was a problem hiding this comment.
is_secure is needed when you want to skip TLS verification, ie you have self-signed certs in a local development environment
We will default to is_secure=True for https and is_secure=False for http, but I believe we need to allow users to disable it too (is_secure=False with https) for development (shouldn't be done in production).
| self._registry = ModelRegistry( | ||
| server_address=base_url, | ||
| port=port, | ||
| author=author, # type: ignore[arg-type] |
There was a problem hiding this comment.
This provides a default value so it doesn't have to be passed to register_model every time it's called.
| name: str, | ||
| uri: str, | ||
| *, | ||
| version: str, |
There was a problem hiding this comment.
It matches what we have in ModelRegistry and provides a little more flexibility if we want to re-order args after name, uri. I do wonder if we want to make version optional in the future and support an autogenerated version string if not provided.
| TypeError: If model is not a RegisteredModel instance. | ||
| model_registry.exceptions.StoreError: If model does not have an ID. | ||
| """ | ||
| from model_registry.types import RegisteredModel |
There was a problem hiding this comment.
This would cause an immediate error upon import for import kubeflow.hub if the user has done:
pip install kubeflow
but not
pip install 'kubeflow[hub]'
We could opt for that. Or we could do a lazy import. I would suggest leaving it as-is though.
| """Skip these tests if model-registry not installed.""" | ||
| pytest.importorskip("model_registry") | ||
|
|
||
| def test_init_raises_helpful_error_when_not_installed(self, monkeypatch): |
There was a problem hiding this comment.
Sure, I've made this change. This does we are importing TestCase from kubeflow.trainer. I also have mixed feelings about this, but if it's something we want to standardize in the sdk, then I guess it works.
kramaranya
left a comment
There was a problem hiding this comment.
Thank you, @jonburdo!
Overall looks great to me 👍
| ci | ||
| docs | ||
| examples | ||
| hub |
Makefile
Outdated
| @uv lock --check | ||
| @uv run ruff check --show-fixes --output-format=github . | ||
| @uv run ruff format --check kubeflow | ||
| @uv run mypy kubeflow/hub --exclude '_test\.py$$' |
There was a problem hiding this comment.
@jonburdo have you tried to compare ty and mypy? does it cover everything you're using mypy for?
kubeflow/hub/__init__.py
Outdated
| @@ -0,0 +1,5 @@ | |||
| """Kubeflow Hub API.""" | |||
There was a problem hiding this comment.
Can you add copyright header please to all new files?
| The scheme is used to infer is_secure and port if not explicitly provided. | ||
|
|
||
| Keyword Args: | ||
| port: Server port. If not provided, inferred from base_url scheme: |
There was a problem hiding this comment.
I agree with @andreyvelich, this will keep our APIs consistent for users
| name: Name of the model. | ||
| uri: URI of the model. | ||
|
|
||
| Keyword Args: |
There was a problem hiding this comment.
Yes, I think it's configurable. I'll soon open a PR for the website and will verify it
| def list_models(self) -> Iterator[RegisteredModel]: | ||
| """Get an iterator for registered models. | ||
|
|
||
| Yields: | ||
| Registered models. | ||
| """ | ||
| yield from self._registry.get_registered_models() |
There was a problem hiding this comment.
We return an Iterator in Model Registry with list_models, however, we return lists for both OptimizerClient and TrainerClient in APIs like list_jobs().
@andreyvelich @jonburdo to provide consistent APIs to our users, shall we consider aligning OptimizerClient and TrainerClient with ModelRegistryClient and return an Iterator, alternatively we can return list in ModelRegistryClient?
There was a problem hiding this comment.
There was a problem hiding this comment.
Discussed in the community call today. We definitely want to keep this as a generator.
We use pagination, so eagerly fetching and returning a list would make extra API calls, and it is certainly possible to have enough entries for significant overhead.
I'm not sure I'd consider it real inconsistency if some functions in a library return a list and some return a generator - especially if it's done in a logical way based on amount of data or usefulness of evaluating lazily.
7f41fa7 to
a8c20a7
Compare
|
Okay, I've rebased and updated the code. I think I covered our discussion points! Here is a succinct summary covering most of the points: I changed:
I kept:
|
andreyvelich
left a comment
There was a problem hiding this comment.
Thanks for the updates @jonburdo!
I left a few comments, we should be good to go!
/assign @kramaranya @astefanutti @tarilabs @szaher in case you want to add more comments.
|
|
||
| Args: | ||
| base_url: Base URL of the model registry server including scheme. | ||
| Examples: "https://registry.example.com", "http://localhost" |
There was a problem hiding this comment.
Please can you add example with port as well? e.g. https://registry.example.com:8888
There was a problem hiding this comment.
I think we want to fix this in the developer (model-registry) client. I've created an issue for this: kubeflow/model-registry#2110
Otherwise we'll have url parsing here and then again there (in ModelRegistry)
There was a problem hiding this comment.
Sounds good @jonburdo!
Please can you create an issue in kubeflow/sdk to also track this update once Model Registry client supports it upstream?
| "test_case", | ||
| [ | ||
| TestCase( | ||
| name="http URL infers port=8080 and is_secure=False", |
There was a problem hiding this comment.
Could we add a test case when port is taken from the base_url?
"base_url": "http://localhost:1234",| ), | ||
| ], | ||
| ) | ||
| def test_get_model(test_case, client, mock_registry): |
There was a problem hiding this comment.
Could we add tests for test_get_model_version and test_get_model_artifact ?
| ), | ||
| ], | ||
| ) | ||
| def test_update_model_type_check(test_case, client): |
There was a problem hiding this comment.
Can you add all of these test cases to the test_update_model ? Similar to other tests, we can have one function with test cases for each public API we expose.
| ), | ||
| ], | ||
| ) | ||
| def test_update_model_version_type_check(test_case, client): |
There was a problem hiding this comment.
Same suggestion to move them to the test_update_model_version
| ), | ||
| ], | ||
| ) | ||
| def test_update_model_artifact_type_check(test_case, client): |
| ``` | ||
|
|
||
| ### Manage models with Model Registry | ||
|
|
There was a problem hiding this comment.
Can we also point users to the Model Registry documentation which says how to install control plane: https://www.kubeflow.org/docs/components/model-registry/installation/#standalone-installation
In the future, we might also want to introduce public ConfigMap check that verifies that Kubeflow Hub control plane is available on the cluster: #221
We can do this after the initial release.
9179042 to
4816b1d
Compare
andreyvelich
left a comment
There was a problem hiding this comment.
Thanks for the updates @jonburdo!
/lgtm
/cc @kramaranya in case you want to add more comments.
As we discussed, we will merge this to the main branch once we release Kubeflow SDK 0.3
/hold
Add an api for model registry under a new kubeflow.hub submodule. This takes the form of a ModelRegistryClient class and provides a user-facing api in a style similar to existing Kubeflow SDKs. AI-assisted: Claude Code 2.0.58 (claude-sonnet-4-5@20250929) Signed-off-by: Jon Burdo <jon@jonburdo.com>
4816b1d to
db12a29
Compare
|
I've rebased to fix the merge conflicts. To see the conflict resolution changes only, you can use: It's just a slight change to pyproject.toml and changes to uv.lock With the release of Kubeflow SDK 0.3 are we ready to merge this @andreyvelich ? |
andreyvelich
left a comment
There was a problem hiding this comment.
Thank you for this @jonburdo!
/lgtm
/assign @kubeflow/kubeflow-sdk-team @kramaranya
kramaranya
left a comment
There was a problem hiding this comment.
Thank you so much for this amazing contribution, @jonburdo!
/lgtm
|
@jonburdo feel free to add OWNERS file to https://github.com/jonburdo/sdk/tree/kubeflow-hub-api/kubeflow/hub with Model Registry WG. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/milestone v0.4 |
|
/unhold |
Add an api for model registry under a new kubeflow.hub submodule.
This takes the form of a ModelRegistryClient class which provides a kubeflow-style api for model-registry.
🤖 Generated with Claude Code
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #59
Checklist:
details
This provides a possible api following this discussion. Also see the linked issue and the following: