Skip to content

Latest commit

 

History

History
217 lines (169 loc) · 10.7 KB

File metadata and controls

217 lines (169 loc) · 10.7 KB

Contributing to openml-python

This document describes the workflow on how to contribute to the openml-python package. If you are interested in connecting a machine learning package with OpenML (i.e. write an openml-python extension) or want to find other ways to contribute, see this page.

Scope of the package

The scope of the OpenML Python package is to provide a Python interface to the OpenML platform which integrates well with Python's scientific stack, most notably numpy, scipy and pandas. To reduce opportunity costs and demonstrate the usage of the package, it also implements an interface to the most popular machine learning package written in Python, scikit-learn. Thereby it will automatically be compatible with many machine learning libraries written in Python.

We aim to keep the package as light-weight as possible, and we will try to keep the number of potential installation dependencies as low as possible. Therefore, the connection to other machine learning libraries such as pytorch, keras or tensorflow should not be done directly inside this package, but in a separate package using the OpenML Python connector. More information on OpenML Python connectors can be found here.

Determine what contribution to make

Great! You've decided you want to help out. Now what? All contributions should be linked to issues on the GitHub issue tracker. In particular for new contributors, the good first issue label should help you find issues which are suitable for beginners. Resolving these issues allows you to start contributing to the project without much prior knowledge. Your assistance in this area will be greatly appreciated by the more experienced developers as it helps free up their time to concentrate on other issues.

If you encounter a particular part of the documentation or code that you want to improve, but there is no related open issue yet, open one first. This is important since you can first get feedback or pointers from experienced contributors.

To let everyone know you are working on an issue, please leave a comment that states you will work on the issue (or, if you have the permission, assign yourself to the issue). This avoids double work!

Contributing Workflow Overview

To contribute to the openml-python package, follow these steps:

  1. Determine how you want to contribute (see above).
  2. Set up your local development environment.
    1. Fork and clone the openml-python repository. Then, create a new branch from the main branch. If you are new to git, see our detailed documentation, or rely on your favorite IDE.
    2. Install the local dependencies to run the tests for your contribution.
    3. Test your installation to ensure everything is set up correctly.
  3. Implement your contribution. If contributing to the documentation, see here.
  4. Create a pull request.

Install Local Dependencies

We recommend following the instructions below to install all requirements locally. However, it is also possible to use the openml-python docker image for testing and building documentation. Moreover, feel free to use any alternative package managers, such as pip.

  1. To ensure a smooth development experience, we recommend using the uv package manager. Thus, first install uv. If any Python version already exists on your system, follow the steps below, otherwise see here.
    pip install uv
  2. Create a virtual environment using uv and activate it. This will ensure that the dependencies for openml-python do not interfere with other Python projects on your system.
    uv venv --seed --python 3.8 ~/.venvs/openml-python
    source ~/.venvs/openml-python/bin/activate
    pip install uv # Install uv within the virtual environment
  3. Then install openml with its test dependencies by running
    uv pip install -e .[test]
    from the repository folder. Then configure the pre-commit to be able to run unit tests, as well as pre-commit through:
    pre-commit install

Testing (Your Installation)

To test your installation and run the tests for the first time, run the following from the repository folder:

pytest tests

For Windows systems, you may need to add pytest to PATH before executing the command.

Executing a specific unit test can be done by specifying the module, test case, and test. You may then run a specific module, test case, or unit test respectively:

pytest tests/test_datasets/test_dataset.py
pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest
pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_data

To test your new contribution, add unit tests, and, if needed, examples for any new functionality being introduced. Some notes on unit tests and examples:

  • If a unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example, TestBase._mark_entity_for_removal('data', dataset.dataset_id), TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name)).
  • Please ensure that the example is run on the test server by beginning with the call to openml.config.start_using_configuration_for_example(), which is done by default for tests derived from TestBase.
  • Add the @pytest.mark.sklearn marker to your unit tests if they have a dependency on scikit-learn.

Running Tests That Require Admin Privileges

Some tests require admin privileges on the test server and will be automatically skipped unless you provide an admin API key. For regular contributors, the tests will skip gracefully. For core contributors who need to run these tests locally, you can set up the key by exporting the variable as below before running the tests:

# For windows
$env:OPENML_TEST_SERVER_ADMIN_KEY = "admin-key"
# For linux/mac
export OPENML_TEST_SERVER_ADMIN_KEY="admin-key"

Pull Request Checklist

You can go to the openml-python GitHub repository to create the pull request by comparing the branch from your fork with the main branch of the openml-python repository. When creating a pull request, make sure to follow the comments and structured provided by the template on GitHub.

An incomplete contribution -- where you expect to do more work before receiving a full review -- should be submitted as a draft. These may be useful to: indicate you are working on something to avoid duplicated work, request broad review of functionality or API, or seek collaborators. Drafts often benefit from the inclusion of a task list in the PR description.


Appendix

Basic git Workflow

The preferred workflow for contributing to openml-python is to fork the main repository on GitHub, clone, check out the branch main, and develop on a new branch branch. Steps:

  1. Make sure you have git installed, and a GitHub account.

  2. Fork the project repository by clicking on the 'Fork' button near the top right of the page. This creates a copy of the code under your GitHub user account. For more details on how to fork a repository see this guide.

  3. Clone your fork of the openml-python repo from your GitHub account to your local disk:

    git clone git@github.com:YourLogin/openml-python.git
    cd openml-python
  4. Switch to the develop branch:

    git checkout main
  5. Create a feature branch to hold your development changes:

    git checkout -b feature/my-feature

    Always use a feature branch. It's good practice to never work on the main branch! To make the nature of your pull request easily visible, please prepend the name of the branch with the type of changes you want to merge, such as feature if it contains a new feature, fix for a bugfix, doc for documentation and maint for other maintenance on the package.

  6. Develop the feature on your feature branch. Add changed files using git add and then git commit files:

    git add modified_files
    git commit

    to record your changes in Git, then push the changes to your GitHub account with:

    git push -u origin my-feature
  7. Follow these instructions to create a pull request from your fork.

(If any of the above seems like magic to you, please look up the Git documentation on the web, or ask a friend or another contributor for help.)

Pre-commit Details

Pre-commit is used for various style checking and code formatting. Before each commit, it will automatically run:

  • ruff a code formatter and linter. This will automatically format your code. Make sure to take a second look after any formatting takes place, if the resulting code is very bloated, consider a (small) refactor.
  • mypy a static type checker. In particular, make sure each function you work on has type hints.

If you want to run the pre-commit tests without doing a commit, run:

$ make check

or on a system without make, like Windows:

$ pre-commit run --all-files

Make sure to do this at least once before your first commit to check your setup works.

Contributing to the Documentation

We welcome all forms of documentation contributions — whether it's Markdown docstrings, tutorials, guides, or general improvements.

Our documentation is written either in Markdown or as a jupyter notebook and lives in the docs/ and examples/ directories of the source code repository.

To preview the documentation locally, you will need to install a few additional dependencies:

uv pip install -e .[examples,docs]

When dependencies are installed, run

mkdocs serve

This will open a preview of the website.