Manage ML models using MLflow

## Description
We propose enabling a standardized experience for users to bring and utilize their own Machine Learning (ML) models within the Texera platform. To achieve this, we need to adopt a unified protocol for the entire lifecycle of model saving, loading, and execution.

After evaluating several standards, we recommend integrating **MLflow** as a starting protocol for model management in Texera.

## Motivation & User Persona
Currently Texera serves two user groups with distinct needs:
1.  **Students:** Who use the platform to learn the fundamentals of Machine Learning and Data Science.
2.  **Researchers in bioinformatics:** Who require heavy computation for tasks such as sequence alignment and "shallow" machine learning (e.g., Scikit-Learn, classic statistical models).

Currently, there is no standardized way for these users to import and run pre-trained models seamlessly. Implementing a standard protocol will streamline this workflow and enhance Texera's extensibility.

## Evaluation of Alternatives
We explored several options before selecting MLflow:

* **Hugging Face:**
    * *Pros:* Excellent standards and ease of use; industry standard for LLMs.
    * *Cons:* Primarily focused on LLMs and Deep Learning. It does not offer a comprehensive solution for managing the full lifecycle (storage to loading) of general-purpose or "shallow" ML models often used by our target audience.
* **ONNX (Open Neural Network Exchange):**
    * *Pros:* Great interoperability for deep learning models.
    * *Cons:* Heavily focused on Neural Networks, making it less suitable for the broad range of general ML libraries (like Scikit-Learn) that our biomedical users rely on.
* **MLflow (Selected):**
    * *Pros:* Supports a wide variety of libraries including TensorFlow, PyTorch, and Scikit-Learn. Crucially, it manages the *entire* lifecycle from standardizing the storage format to loading the model for inference.


## Proposed Implementation
The integration will leverage two existing architectural features within Texera:

### 1. Model Storage (via LakeFS)
* We will utilize our existing **LakeFS** integration to store MLflow artifacts.
* Models will be stored similarly to how we handle datasets, but with a key difference: we will enforce the MLflow protocol/structure on the files during upload to ensure compatibility.

### 2. Model Execution (New Operator)
* We will introduce a new operator type: `MLflow`.
* This will be built upon our existing **Python Native Operator** infrastructure.
* The operator will automatically handle loading the model using the standard `mlflow` library and executing inference against the input data stream.

![Image](https://github.com/user-attachments/assets/a2d1425f-e404-4e51-be10-f97efe156b5d)

![Image](https://github.com/user-attachments/assets/06cf1a88-4d9a-4502-ba22-abe90eec3468)

### Impact / Priority

(P2)Medium – useful enhancement

### Affected Area

Workflow Engine (Amber)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage ML models using MLflow #4198

Description

Motivation & User Persona

Evaluation of Alternatives

Proposed Implementation

1. Model Storage (via LakeFS)

2. Model Execution (New Operator)

Impact / Priority

Affected Area

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Manage ML models using MLflow #4198

Description

Description

Motivation & User Persona

Evaluation of Alternatives

Proposed Implementation

1. Model Storage (via LakeFS)

2. Model Execution (New Operator)

Impact / Priority

Affected Area

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions