MLRun Hub

A centralized repository for open-source MLRun functions, modules, and steps that can be used as reusable components in ML pipelines.

📋 Table of Contents

Prerequisites
Installation
Make Commands
CLI Commands
Contributing
Testing
Code Standards
Resources

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.11+ - Required (or let UV manage it for you)
UV - Fast Python package manager (required)
Git - For version control
Make (optional) - For convenient command shortcuts

Note: This project uses UV as the baseline package manager. All dependency management and installation is done through UV.

Installation

Clone the repository:

git clone https://github.com/mlrun/functions.git
cd functions

Install dependencies:
```
uv sync
```

Note: you might need to install extra dependencies required for your asset tests. If your asset has a requirements.txt file, install it using:

uv pip install -r path/to/your/asset/requirements.txt

Make Commands

The project includes a Makefile for convenient command shortcuts:

Command	Description
`make help`	Show all available commands
`make sync`	Sync dependencies from lockfile
`make format`	Format code with Ruff
`make lint`	Run code linters with Ruff
`make test NAME=<name> [TYPE=functions]`	Run tests for a specific asset (functions/modules/steps)
`make cli ARGS="<args>"`	Run CLI with custom arguments

Examples

# Sync dependencies
make sync

# Format code
make format

# Run tests for a function (default type)
make test NAME=aggregate

# Run tests for a module
make test NAME=count_events TYPE=modules

# Run tests for a step
make test NAME=mystep TYPE=steps

# Run CLI command
make cli ARGS="generate-item-yaml function my_function"

CLI Commands

The project includes a CLI tool for managing MLRun functions, modules, and steps.

Usage

python -m cli.cli [COMMAND] [OPTIONS]

Available Commands

1. generate-item-yaml

Generate an item.yaml file from a Jinja2 template.

Syntax:

python -m cli.cli generate-item-yaml TYPE NAME

Examples:

# Generate item.yaml for a function
python -m cli.cli generate-item-yaml function aggregate

# Generate item.yaml for a module
python -m cli.cli generate-item-yaml module my_module

# Generate item.yaml for a step
python -m cli.cli generate-item-yaml step my_step

2. item-to-function

Create a function.yaml file from an item.yaml file.

Syntax:

python -m cli.cli item-to-function --item-path PATH

Example:

python -m cli.cli item-to-function --item-path functions/src/aggregate

3. function-to-item

Create an item.yaml file from a function.yaml file.

Syntax:

python -m cli.cli function-to-item --path PATH

Example:

python -m cli.cli function-to-item --path functions/src/aggregate

4. run-tests

Run the test suite for a specific asset.

Syntax:

python -m cli.cli run-tests -r PATH -s TYPE -fn NAME

Example:

python -m cli.cli run-tests -r functions/src/aggregate -s py -fn aggregate

Contributing

We welcome contributions! Follow these steps to contribute:

Types of Assets You Can Contribute

Functions - MLRun runtime functions (job or serving)
Modules - Generic modules or model monitoring applications
Steps - MLRun steps to be used in serving graphs

How to Contribute

Fork the repository on GitHub
Create a new branch for your asset:
```
git checkout -b feature/my-new-item
```
Add your asset under the appropriate directory:
- Functions: functions/src/
- Modules: modules/src/
- Steps: steps/src/
Follow the asset structure (see below)
Test your asset thoroughly
Open a pull request to the development branch

Asset Structure

Functions

functions/src/your_function_name/
├── item.yaml                      # Metadata (required)
├── function.yaml                  # MLRun function definition (required)
├── your_function_name.py          # Main code file (required)
├── your_function_name.ipynb       # Demo notebook (required)
├── test_your_function_name.py     # Unit tests (required)
└── requirements.txt               # Test dependencies (optional)

Steps to create a function:

Generate the item.yaml template:

python -m cli.cli generate-item-yaml function your_function_name

Fill in the item.yaml with:
- kind: job or serving
- categories: Browse MLRun Hub for existing categories
- version, description, and other metadata

Generate the function.yaml:

python -m cli.cli item-to-function --item-path functions/src/your_function_name

Implement your function in your_function_name.py:
- Keep code well-documented (docstrings are used in the hub UI)
- Specify function dependencies in item.yaml, not in requirements.txt
Create a demo notebook (your_function_name.ipynb):
- Must run end-to-end automatically
- Demonstrate the function's usage
Write unit tests (test_your_function_name.py):
- Cover functionality as much as possible
- Tests run automatically on each change

Modules

modules/src/your_module_name/
├── item.yaml                      # Metadata (required)
├── your_module_name.py            # Main code file (required)
├── your_module_name.ipynb         # Demo notebook (required)
├── test_your_module_name.py       # Unit tests (required)
└── requirements.txt               # Test dependencies (optional)

Steps to create a module:

Generate the item.yaml:

python -m cli.cli generate-item-yaml module your_module_name

Fill in the item.yaml with:
- kind: generic or monitoring_application
- categories and other metadata
- version, description, etc.
Implement, document, and test your module

For model monitoring modules, see the MLRun model monitoring guidelines.

Contribution Checklist

Asset follows the proper directory structure
item.yaml is complete and accurate
Code is well-documented with docstrings
Demo notebook runs end-to-end without errors
Unit tests cover the functionality
Code is formatted with Ruff (make format)
All tests pass locally
PR targets the development branch

Testing

Running Tests

Test a specific asset:

# Test a function (default)
make test NAME=aggregate

# Test a module
make test NAME=mymodule TYPE=modules

# Test a step
make test NAME=mystep TYPE=steps

# Or use the CLI directly
python -m cli.cli run-tests -r functions/src/aggregate -s py -fn aggregate

Run tests manually with pytest:

cd functions/src/aggregate
pytest test_aggregate.py -v

Writing Unit Tests

Place tests in test_<asset_name>.py
Use pytest as the testing framework
Mock external dependencies when necessary
Test edge cases and error conditions
Ensure tests are reproducible

Note: Tests will be run automatically on each change in the CI pipeline

Example test structure:

import pytest
from your_function_name import your_function

def test_basic_functionality():
    result = your_function(param1="value1")
    assert result is not None
    assert result.status == "success"

def test_error_handling():
    with pytest.raises(ValueError):
        your_function(invalid_param="bad_value")

Code Standards

Python Style Guide

We follow PEP 8 style guidelines with the following configuration:

Line length: 120 characters
Imports: Automatically sorted and organized
Type hints: Encouraged for function signatures
Formatter: Ruff is used for formatting and linting

Formatting Tools

Ruff - Fast Python formatter and linter:

Format code automatically:

make format
# or
uv run ruff format .
uv run ruff check --fix .

Check formatting without changes:

make lint
# or
uv run ruff format --check .
uv run ruff check .

Documentation Standards

Docstrings are mandatory for all public hub items
Use clear, concise descriptions
Include parameter types and return types
Provide usage examples when helpful

Example (function auto_trainer):

def train(
    context: MLClientCtx,
    dataset: DataItem,
    model_class: str,
    label_columns: Optional[Union[str, List[str]]] = None,
    drop_columns: List[str] = None,
    model_name: str = "model",
    tag: str = "",
    sample_set: DataItem = None,
    test_set: DataItem = None,
    train_test_split_size: float = None,
    random_state: int = None,
    labels: dict = None,
    **kwargs,
):
    """
    Training a model with the given dataset.

    example::

        import mlrun
        project = mlrun.get_or_create_project("my-project")
        project.set_function("hub://auto_trainer", "train")
        trainer_run = project.run(
            name="train",
            handler="train",
            inputs={"dataset": "./path/to/dataset.csv"},
            params={
                "model_class": "sklearn.linear_model.LogisticRegression",
                "label_columns": "label",
                "drop_columns": "id",
                "model_name": "my-model",
                "tag": "v1.0.0",
                "sample_set": "./path/to/sample_set.csv",
                "test_set": "./path/to/test_set.csv",
                "CLASS_solver": "liblinear",
            },
        )

    :param context:                 MLRun context
    :param dataset:                 The dataset to train the model on. Can be either a URI or a FeatureVector
    :param model_class:             The class of the model, e.g. `sklearn.linear_model.LogisticRegression`
    :param label_columns:           The target label(s) of the column(s) in the dataset. for Regression or
                                    Classification tasks. Mandatory when dataset is not a FeatureVector.
    :param drop_columns:            str or a list of strings that represent the columns to drop
    :param model_name:              The model's name to use for storing the model artifact, default to 'model'
    :param tag:                     The model's tag to log with
    :param sample_set:              A sample set of inputs for the model for logging its stats along the model in favour
                                    of model monitoring. Can be either a URI or a FeatureVector
    :param test_set:                The test set to train the model with.
    :param train_test_split_size:   if test_set was provided then this argument is ignored.
                                    Should be between 0.0 and 1.0 and represent the proportion of the dataset to include
                                    in the test split. The size of the Training set is set to the complement of this
                                    value. Default = 0.2
    :param random_state:            Relevant only when using train_test_split_size.
                                    A random state seed to shuffle the data. For more information, see:
                                    https://scikit-learn.org/stable/glossary.html#term-random_state
                                    Notice that here we only pass integer values.
    :param labels:                  Labels to log with the model
    :param kwargs:                  Here you can pass keyword arguments with prefixes,
                                    that will be parsed and passed to the relevant function, by the following prefixes:
                                    - `CLASS_` - for the model class arguments
                                    - `FIT_` - for the `fit` function arguments
                                    - `TRAIN_` - for the `train` function (in xgb or lgbm train function - future)

    """
    # Implementation here

Troubleshooting

Common Issues

CLI Commands Not Working

Problem: python -m cli.cli fails

Solution:

# Check if you're in the right directory
pwd  # Should be the project root

# Ensure dependencies are installed
make sync

Tests Failing

Problem: Tests fail when running locally

Solution:

# Install test dependencies if the item has a requirements.txt
cd functions/src/your_function
uv pip install -r requirements.txt

# Run tests with verbose output
pytest test_your_function.py -v -s

# Check for missing environment variables or configuration

Getting Help

If you encounter issues:

Check the MLRun documentation
Search GitHub issues
Open a new issue with:
- Error message
- Steps to reproduce
- Environment details (OS, Python version, UV version)

Resources

MLRun Hub UI: https://www.mlrun.org/hub/
MLRun Documentation: https://docs.mlrun.org/
MLRun Marketplace Repository: http://github.com/mlrun/marketplace
MLRun Community: https://github.com/mlrun/mlrun

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Support

For questions and support:

Open an issue on GitHub
Join the MLRun community discussions
Check the MLRun documentation

Happy Contributing! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 2,030 Commits
.github/workflows		.github/workflows
cli		cli
functions		functions
modules		modules
steps		steps
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

mlrun/functions

Folders and files

Latest commit

History

Repository files navigation

MLRun Hub

📋 Table of Contents

Prerequisites

Installation

Make Commands

Examples

CLI Commands

Usage

Available Commands

1. generate-item-yaml

2. item-to-function

3. function-to-item

4. run-tests

Contributing

Types of Assets You Can Contribute

How to Contribute

Asset Structure

Functions

Modules

Contribution Checklist

Testing

Running Tests

Writing Unit Tests

Code Standards

Python Style Guide

Formatting Tools

Documentation Standards

Troubleshooting

Common Issues

CLI Commands Not Working

Tests Failing

Getting Help

Resources

License

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 38

Uh oh!

Languages

Packages