Skip to content

Manage ML models using MLflow #4198

@aicam

Description

@aicam

Description

We propose enabling a standardized experience for users to bring and utilize their own Machine Learning (ML) models within the Texera platform. To achieve this, we need to adopt a unified protocol for the entire lifecycle of model saving, loading, and execution.

After evaluating several standards, we recommend integrating MLflow as a starting protocol for model management in Texera.

Motivation & User Persona

Currently Texera serves two user groups with distinct needs:

  1. Students: Who use the platform to learn the fundamentals of Machine Learning and Data Science.
  2. Researchers in bioinformatics: Who require heavy computation for tasks such as sequence alignment and "shallow" machine learning (e.g., Scikit-Learn, classic statistical models).

Currently, there is no standardized way for these users to import and run pre-trained models seamlessly. Implementing a standard protocol will streamline this workflow and enhance Texera's extensibility.

Evaluation of Alternatives

We explored several options before selecting MLflow:

  • Hugging Face:
    • Pros: Excellent standards and ease of use; industry standard for LLMs.
    • Cons: Primarily focused on LLMs and Deep Learning. It does not offer a comprehensive solution for managing the full lifecycle (storage to loading) of general-purpose or "shallow" ML models often used by our target audience.
  • ONNX (Open Neural Network Exchange):
    • Pros: Great interoperability for deep learning models.
    • Cons: Heavily focused on Neural Networks, making it less suitable for the broad range of general ML libraries (like Scikit-Learn) that our biomedical users rely on.
  • MLflow (Selected):
    • Pros: Supports a wide variety of libraries including TensorFlow, PyTorch, and Scikit-Learn. Crucially, it manages the entire lifecycle from standardizing the storage format to loading the model for inference.

Proposed Implementation

The integration will leverage two existing architectural features within Texera:

1. Model Storage (via LakeFS)

  • We will utilize our existing LakeFS integration to store MLflow artifacts.
  • Models will be stored similarly to how we handle datasets, but with a key difference: we will enforce the MLflow protocol/structure on the files during upload to ensure compatibility.

2. Model Execution (New Operator)

  • We will introduce a new operator type: MLflow.
  • This will be built upon our existing Python Native Operator infrastructure.
  • The operator will automatically handle loading the model using the standard mlflow library and executing inference against the input data stream.

Image

Image

Impact / Priority

(P2)Medium – useful enhancement

Affected Area

Workflow Engine (Amber)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions