MuSEEK [muˈziːk] is a flexible and easy-to-extend data processing pipeline for multi-instrument autocorrelation radio telescopes currently under development by the MeerKLASS collaboration. It takes the observed (or simulated) TOD in the time-frequency domain as an input and processes it into HEALPix maps while applying calibration and automatically masking RFI. Several Jupyter notebook templates are also provided for data inspection and verification.
If you want to simply run the UHF pipeline and post-calibration notebooks on Ilifu, you can skip to Processing UHF data on Ilifu with museek_run_process_uhf_band Script and Running the Notebook with museek_run_notebook Script
The MuSEEK package has been developed at the Centre for Radio Cosmology at the University of the Western Cape and at the Jodrell Bank Centre for Astrophysics at UoM. It is inspired by SEEK, developed by the Software Lab of the Cosmology Research Group of the ETH Institute of Astronomy.
The development is coordinated on GitHub and contributions are welcome.
- Installation
- Available Commands
- Running A Pipeline
- Anatomy of Pipeline and Plugins
- Notebooks
- Contributing
- Maintainers
If you only need to run pre-existing MuSEEK pipeline or notebooks with no need for additional codes or pipelines development, there are two simple installation options.
If you are on Ilifu, you can use the shared meerklass Python virtual environment that has been pre-configed with most MeerKLASS-related Python modules, including MuSEEK, as well as other modules that you might ever need. Simply source the activation file as below.
source /idia/projects/meerklass/virtualenv/meerklass/bin/activate
If you need any other modules installed in this shared environment, please contact Boom (@piyanatk).
You will need python>=3.10,<3.13 and pip for this.
Then you can simply do
pip install git+https://github.com/meerklass/museek.git
This will install the latest version of MuSEEK that has been released on the GitHub repository.
You will probably want to do this inside a Python virtual environment. Read the next section for details.
If you need to develop new features, plugins or pipelines, or make modification to MuSEEK, you will have to setup your own Python virtual environment and manually install MuSEEK, preferably in editing mode. Please follow the guide below and also check Contributing section.
On Ilifu, the virtualenv command can be used although you may also use other tools (e.g. conda or venv).
module load python/3.10.16
virtualenv /path/to/virtualenv/museekThis will create a virtual environment named "museek" at the specified path. The environment can then be activated with,
source /path/to/virtualenv/museek/bin/activateYou will see that your command prompt is being prepended with the name of the virtual environment,
(museek) $It is a good idea to check that you are using Python in the virtual environment at this point
which python
# Should return /path/to/virtualenv/museek/bin/pythonInstalling MuSEEK required pip, so it is a good time to upgrade it now.
python -m pip install --upgrade pipYou are now ready to install MuSEEK.
First, clone the package,
git clone git+https://github.com/meerklass/museek.gitMuSEEK can then be installed with pip. It is recommended that that package is installed with --editable (or short-handed as -e) flag when developing new features as all changes will be reflected without having to re-install the package.
cd museek
python -m pip install -e .[test,dev]The [test,dev] option tells pip to install all optional dependencies, including for unit testing (test) and development tools (dev). This will also install pre-commit and ruff to help with code formatting.
If you want to use MuSEEK on one of the Jupyter nodes on Ilifu (or local Jupyter installation) or run the data inspection notebooks with the museek_run_notebook script, you will have to additionally install MuSEEK as Jupyter kernel. After completing the standard installation above, do:
python -m pip install ipykernel
python -m ipykernel install --name "museek_kernel" --usermuseek_kernel should now be selectable from Jupyter (may need refreshing).
Installing MuSEEK will install the museek command and console scripts museek_run_process_uhf_band and museek_run_notebook.
which museek
# should return /path/to/virtualenv/museek/bin/museek
which museek_run_process_uhf_band
# should return /path/to/virtualenv/museek/bin/museek_run_process_uhf_band
which museek_run_notebook
# should return /path/to/virtualenv/museek/bin/museek_run_notebookThe following sections describe how they work.
A MuSEEK pipeline usually consits of several plugins defined in the museek/plugin.
Running a pipeline requires a configuration file, which define the order of the plugins to run and their parameters.
The configuration files must be in museek/config path of this package. These configuration files will likely need to be edited (and thus the reason that the package should be installed with --editable flag).
Several pipeline have been defined. Notably, there is one for a demo purpose, and ones for L band and UHF band data processing.
Pipelines are run with the Ivory workflow engine, which should have been installed along side MuSEEK and symlinked to the museek command.
To run a pipeline on your local machine or a compute/Jupyter node on ilifu, the museek command can be simply executed, providing a relative path to the configuration file within MuSEEK directory.
For example, to run the Demo pipeline defined in museek/museek/config/demo.py, execute the following command,
museek museek.config.demoNote that the path to the configuration file must be provided in the Python import style format, i.e. replacing / with . and discarding .py extension. This is due to the restriction in the Ivory workflow engine, which we hope to fix in future releases.
To make change to a large numbers of plugins' parameters within a pipeline, it is recommended that the configuration file is copied and edited.
Alternatively, parameters can be overridden by passing extra flags to the museek command. For example, the following command will run the Demo pipeline, overriding the output folder to ./demo_results.
mkdir demo_results
museek --DemoLoadPlugin-context-folder=./demo_results museek.config.demo For actual data processing on Ilifu, you will want to submit a slurm job to run the pipeline.
You can use the sbatch command to schedule a job:
sbatch demo.shYou can find an sbatch script to run the Demo pipeline as an example below, but remember to change /path/to/virtualenv to your own environment. The allocated ressources in this script are minimal and for demonstration only, see below for a brief guideline on ressource usage.
#!/bin/bash
#SBATCH --job-name='MuSEEK-demo'
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=1
#SBATCH --mem=4GB
#SBATCH --output=museek-stdout.log
#SBATCH --error=museek-stderr.log
#SBATCH --time=00:05:00
source /path/to/virtualenv/museek/bin/activate
echo "Using a Python virtual environment: $(which)"
echo "Creating output directory"
mkdir -p demo_results
echo "Submitting Slurm job"
museek --DemoLoadPlugin-context-folder=demo_results museek.config.demoOnce the job is finished, you can check the results of the demo pipeline in your working directory and in demo_results.
To adopt this script to the real pipeline, you will need to change museek.config.demo to the config you want to use, e.g. museek.config.process_l_band. You also need to adjust the ressources in the sbatch script depending on the config. As a rough estimate, processing an entire MeerKAT observation block may be done with --cpus-per-task=32, --mem=128GB and --time=03:00:00.
The museek_run_process_uhf_band script provides a streamlined command-line interface for running the process UHF pipeline on Ilifu. It automatically creates and submits SLURM jobs for you.
$ museek_run_process_uhf_band --help
MuSEEK UHF Band Processing Script
This script generates and submits a Slurm job to process UHF band data using
the MuSEEK pipeline.
USAGE:
museek_run_process_uhf_band --block-name <block_name> --box <box_number>
[--base-context-folder <path>] [--data-folder <path>]
[--slurm-options <options>] [--dry-run]
OPTIONS:
--block-name <block_name>
(required) Block name or observation ID (e.g., 1675632179)
--box <box_number>
(required) Box number of this block name (e.g., 6)
--base-context-folder <path>
(optional) Path to the base context/output folder
The final context folder will be <base-context-folder>/BOX<box>/<block-name>
Default: /idia/projects/meerklass/MEERKLASS-1/uhf_data/XLP2025/pipeline
--data-folder <path>
(optional) Path to raw data folder
Default: /idia/projects/meerklass/MEERKLASS-1/uhf_data/XLP2025/raw
--slurm-options <options>
(optional) Additional SLURM options to pass to sbatch
Each --slurm-options takes ONE flag (e.g., --mail-user=user@domain.com)
Multiple --slurm-options can be specified for multiple flags
Examples:
Single: --slurm-options --time=72:00:00
Multiple: --slurm-options --mail-user=user@domain.com --slurm-options --mail-type=ALL
--dry-run
(optional) Show the generated sbatch script without submitting
--help
Display this help message
EXAMPLES:
museek_run_process_uhf_band --block-name 1675632179 --box 6
museek_run_process_uhf_band --block-name 1675632179 --box 6 --base-context-folder /custom/pipeline
museek_run_process_uhf_band --block-name 1675632179 --box 6 --dry-run
museek_run_process_uhf_band --block-name 1675632179 --box 6 --slurm-options --mail-user=user@uni.edu --slurm-options --mail-type=ALL --slurm-options --time=72:00:00
To access results stored by the pipeline as pickle files, the class ContextLoader can be used.
A pipeline is defined by its configuration file, which is technically a Python module file, and usually consists of several ConfigSection() instances.
One instance should be called Pipeline, which defines the entire pipeline, i.e. the order of the plugins to run. Other instances need to be named to the plugins they belong to. The workflow manager will hand over the correct configuration parameters to each plugin.
For example, the configuration file for the Demo pipeline (museek/config/demo.py) looks like the following,
from ivory.utils.config_section import ConfigSection
Pipeline = ConfigSection(
plugins=[
"museek.plugin.demo.demo_load_plugin",
"museek.plugin.demo.demo_flip_plugin",
"museek.plugin.demo.demo_plot_plugin",
"museek.plugin.demo.demo_joblib_plugin",
],
)
DemoLoadPlugin = ConfigSection(
url="https://cdn.openai.com/dall-e-2/demos/text2im/astronaut/horse/photo/9.jpg",
context_file_name="context.pickle",
context_folder="./"
)
DemoPlotPlugin = ConfigSection(do_show=False, do_save=True)
DemoFlipPlugin = ConfigSection(do_flip_right_left=True, do_flip_top_bottom=True)
DemoJoblibPlugin = ConfigSection(n_iter=10, n_jobs=2, verbose=0)Here, the DemoLoadPlugin, DemoFlipPlugin, DemoPlotPlugin, and DemoJoblibPlugin will be run in that order.
Plugins can be implemented by creating a class inheriting from the Ivory's AbstractPlugin abstract class. The inherited class must override the
run() and set_requirements() methods.
In addition, take note of the following requirements:
-
Only one plugin per file is allowed. One plugin can not import another plugin.
-
Naming: CamelCase ending on "Plugin", example: "GainCalibrationPlugin".
-
To have the plugin included in the pipeline, the config file's "Pipeline" entry needs to include the plugin under "plugins".
-
If the plugin requires configuration (most do), the config file needs to contain a section with the same name as the plugin.
-
Plugins need to define their requirements in
self.set_requirements(). The workflow engine will compare these to the set of results that are already produced when the plugin starts and hands them over to therun()method. The requirements are encapsulated asRequirementobjects, which are mereNamedTuples. See the docstring of theRequirementclass for more information. -
Plugins need to define a
self.run()method, which is executed by the workflow engine. -
Plugins need to define and run
self.set_result()to hand their results back to the workflow engine for storage, which will be passed to the next plugin in the pipeline. Plugin results need to be defined asResultobjects. See theResultclass doc for more information.
More information on these are included in their class documentations.
- Demonstration plugins:
DemoFlipPlugin,DemoLoadPlugin&DemoPlotPlugin InPluginOutPluginNoiseDiodeFlaggerPluginKnownRfiPluginRawdataFlaggerPluginScanTrackSplitPluginPointSourceFlaggerPluginAoflaggerPluginAoflaggerSecondRunPluginAntennaFlaggerPluginNoiseDiodePluginGainCalibrationPluginAoflaggerPostCalibrationPluginSanityCheckObservationPlugin- other plugins for 'calibrator', 'zebra', and 'standing wave', but they are not finished
MuSEEK now come with a few Jupyter notebook templates for data inspection, which are now run after process_uhf_band pipeline has been run on the data.
The notebooks can be copy and run on Jupyter, with the Jupyter Hub on Ilifu (jupyter.ilifu.ac.za), for example. This is only recommend for experimenting with the notebooks.
Since version 0.3.0, we have adopted papermill as an engine for executing the notebook templates through a command-line interface. This allow the notebook to be run outside Jupyter while modifying the required parameters dynamically, which is more suitable for running the post-calibration notebooks on hundreads of data blocks, for example. The papermill command should be automatically installed when you install MuSEEK as it is one of the requirements.
To execute the notebook, first make sure that you have installed a Jupyter kernel. For example, if you are using the shared meerklass environment on Ilifu, running the two command below will install a Jupyter kernel named meerklass for using with papermill (and jupyter.ilifu.ac.za)
source /idia/projects/meerklass/virtualenv/meerklass/bin/activate
python -m ipykernel install --user --name meerklass
Then to execute a notebook, simply do, for example,
papermill -k meerklass -p block_name 12345678 notebooks/calibrated_data_check-postcali.ipynb output_notebook.ipynb
Here, we tell papermill to run the calibrated_data_check-postcali.ipynb notebook using the meerklass kernel that we just installed, overidding the default block_name parameter in the notebook with 12345678 and saved the output notebook as output_notebook.ipynb
To figure out which parameters in the notebook can be passed thorugh papermill, use the --help-notebook tag. For example,
$ papermill --help-notebook notebooks/calibrated_data_check-postcali.ipynb
Usage: papermill [OPTIONS] NOTEBOOK_PATH [OUTPUT_PATH]
Parameters inferred for notebook 'notebooks/calibrated_data_check-postcali.ipynb':
block_name: str (default "1708972386")
data_name: str (default "aoflagger_plugin_postcalibration.pickle")
data_path: str (default "/idia/projects/hi_im/uhf_2024/pipeline/")Check out papermill documentation and its CLI help text for more information.
papermill --help
The museek_run_notebook script further streamlines the execution of the notebook via papermill on Ilifu or a compute cluster. It provides a wrapper to the papermill command and dynamically generates and submits SLURM jobs. It will find the notebook "template" in the MuSEEK package with name matching --notebook option.
$ museek_run_notebook --help
MuSEEK Notebook Execution Script
This script generates and submits a Slurm job to execute a MuSEEK Jupyter
notebook using papermill.
USAGE:
museek_run_notebook --notebook <notebook_name> --block-name <block_name> --box <box_number>
[--output-path <path>] [--kernel <kernel_name>]
[-p <param_name> <param_value>] ... [-p <param_name> <param_value>]
[--slurm-options <options>] ... [--slurm-options <options>]
[--dry-run]
OPTIONS:
--notebook <notebook_name>
(required) Name of the notebook to run (e.g., calibrated_data_check-postcali)
--block-name <block_name>
(required) Block name or observation ID (e.g., 1708972386)
--box <box_number>
(required) Box number of this block name (e.g., 6)
--output-path <path>
(optional) Base directory for notebook output
The final output folder will be <output_path>/BOX<box>/<block_name>/
Default: /idia/projects/meerklass/MEERKLASS-1/uhf_data/XLP2025/pipeline
--kernel <kernel_name>
(optional) Jupyter kernel to use for execution
Default: meerklass
-p <param_name> <param_value> | --parameters <param_name> <param_value>
(optional, repeatable) Parameters to pass to the notebook via papermill
These override notebook defaults
Examples: -p data_path /custom/path/ -p data_name custom.pickle
--slurm-options <options>
(optional) Additional SLURM options to pass to sbatch
Each --slurm-options takes ONE flag (e.g., --mail-user=user@domain.com)
Multiple --slurm-options can be specified for multiple flags
Examples:
Single: --slurm-options --time=02:00:00
Multiple: --slurm-options --mail-user=user@domain.com --slurm-options --mail-type=ALL
--dry-run
(optional) Show the generated sbatch script without submitting
--help
Display this help message
EXAMPLES:
museek_run_notebook --notebook calibrated_data_check-postcali --block-name 1708972386 --box 6
museek_run_notebook --notebook calibrated_data_check-postcali --block-name 1708972386 --box 6 -p data_path /custom/path/
museek_run_notebook --notebook calibrated_data_check-postcali --block-name 1708972386 --box 6 --dry-run
museek_run_notebook --notebook calibrated_data_check-postcali --block-name 1708972386 --box 6 --slurm-options --mail-user=user@uni.edu --slurm-options --mail-type=ALL
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
- Report Bugs: Please report potential bugs by opening an issue on MuSEEK GitHub repository, and include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in troubleshooting.
- Detailed steps to reproduce the bug.
- Fix Bugs, Implement Features, or Write Documentation: Fork the repository (or ask to be added as a collaborator) and contribute a Pull Request.
- Submit Feedback: If you are proposing a feature:
- Explain in detail how it should work.
- Keep the scope as narrow as possible, to make it easier to implement.
- Remember that this is a volunteer-driven project
Please try to follow standard Python code formatting (PEP8). The easiest way to achieve that is to use ruff, which should be available when installing MuSEEK in development mode.
ruff can check and automatically format code to align with the recommended Python style. Simple run these two commands in the root directory of the repository.
ruff check --statistics .
ruff format .
The first will print a summary of violation from the style guide. The second will auto-format the code. Note that the auto-formatting will not work on lines with long string or docstrings. You will have to manually format that.
A pre-commit hook, which is basically a little automated script that is run when git makes a new commit, is also provided in the repository to automatically run ruff upon making a new commit. To use it, simply install pre-commit hooks:
pre-commit install
Then everytime you do git commit, ruff will automatically format and lint your code before each commit.
Before you submit a pull request, check that it meets these guidelines:
- The pull request should include tests.
- The docs should be updated or extended accordingly. Add any new plugins to the list in README.md.
- The pull request should work for Python 3.10.
- Make sure that the tests pass.
- Code must be formatted with
ruff formatand passruff check.
The current maintainers of MuSEEK are:
- Mario Santos (@mariogrs)
- Wenkai Hu (@wkhu-astro)
- Piyanat Kittiwisit (@piyanatk)
- Geoff Murphy (@GeoffMurphy)