MuSEEK: Multi-dish Signal Extraction and Emission Kartographer

MuSEEK [muˈziːk] is a flexible and easy-to-extend data processing pipeline for multi-instrument autocorrelation radio telescopes currently under development by the MeerKLASS collaboration. It takes the observed (or simulated) TOD in the time-frequency domain as an input and processes it into HEALPix maps while applying calibration and automatically masking RFI. Several Jupyter notebook templates are also provided for data inspection and verification.

If you want to simply run the UHF pipeline and post-calibration notebooks on Ilifu, you can skip to Processing UHF data on Ilifu with museek_run_process_uhf_band Script and Running the Notebook with museek_run_notebook Script

The MuSEEK package has been developed at the Centre for Radio Cosmology at the University of the Western Cape and at the Jodrell Bank Centre for Astrophysics at UoM. It is inspired by SEEK, developed by the Software Lab of the Cosmology Research Group of the ETH Institute of Astronomy.

The development is coordinated on GitHub and contributions are welcome.

Installation

General Use

If you only need to run pre-existing MuSEEK pipeline or notebooks with no need for additional codes or pipelines development, there are two simple installation options.

1. Pre-configured Python virtual Environment on Ilifu

If you are on Ilifu, you can use the shared meerklass Python virtual environment that has been pre-configed with most MeerKLASS-related Python modules, including MuSEEK, as well as other modules that you might ever need. Simply source the activation file as below.

source /idia/projects/meerklass/virtualenv/meerklass/bin/activate

If you need any other modules installed in this shared environment, please contact Boom (@piyanatk).

2. Manual installation directly from GitHub

You will need python>=3.10,<3.13 and pip for this.

Then you can simply do

pip install git+https://github.com/meerklass/museek.git

This will install the latest version of MuSEEK that has been released on the GitHub repository.

You will probably want to do this inside a Python virtual environment. Read the next section for details.

Development Setup

If you need to develop new features, plugins or pipelines, or make modification to MuSEEK, you will have to setup your own Python virtual environment and manually install MuSEEK, preferably in editing mode. Please follow the guide below and also check Contributing section.

1. Setup a Python Virtual Environment

On Ilifu, the virtualenv command can be used although you may also use other tools (e.g. conda or venv).

module load python/3.10.16
virtualenv /path/to/virtualenv/museek

This will create a virtual environment named "museek" at the specified path. The environment can then be activated with,

source /path/to/virtualenv/museek/bin/activate

You will see that your command prompt is being prepended with the name of the virtual environment,

(museek) $

It is a good idea to check that you are using Python in the virtual environment at this point

which python
# Should return /path/to/virtualenv/museek/bin/python

Installing MuSEEK required pip, so it is a good time to upgrade it now.

python -m pip install --upgrade pip

You are now ready to install MuSEEK.

2. Manually install MuSEEK in editable mode

First, clone the package,

git clone git+https://github.com/meerklass/museek.git

MuSEEK can then be installed with pip. It is recommended that that package is installed with --editable (or short-handed as -e) flag when developing new features as all changes will be reflected without having to re-install the package.

cd museek
python -m pip install -e .[test,dev]

The [test,dev] option tells pip to install all optional dependencies, including for unit testing (test) and development tools (dev). This will also install pre-commit and ruff to help with code formatting.

Using MuSEEK in a Jupyter Notebook

If you want to use MuSEEK on one of the Jupyter nodes on Ilifu (or local Jupyter installation) or run the data inspection notebooks with the museek_run_notebook script, you will have to additionally install MuSEEK as Jupyter kernel. After completing the standard installation above, do:

python -m pip install ipykernel
python -m ipykernel install --name "museek_kernel" --user

museek_kernel should now be selectable from Jupyter (may need refreshing).

Available Commands

Installing MuSEEK will install the museek command and console scripts museek_run_process_uhf_band and museek_run_notebook.

which museek
# should return /path/to/virtualenv/museek/bin/museek

which museek_run_process_uhf_band
# should return /path/to/virtualenv/museek/bin/museek_run_process_uhf_band

which museek_run_notebook
# should return /path/to/virtualenv/museek/bin/museek_run_notebook

The following sections describe how they work.

Running A Pipeline

A MuSEEK pipeline usually consits of several plugins defined in the museek/plugin.

Running a pipeline requires a configuration file, which define the order of the plugins to run and their parameters.

The configuration files must be in museek/config path of this package. These configuration files will likely need to be edited (and thus the reason that the package should be installed with --editable flag).

Several pipeline have been defined. Notably, there is one for a demo purpose, and ones for L band and UHF band data processing.

Pipelines are run with the Ivory workflow engine, which should have been installed along side MuSEEK and symlinked to the museek command.

Running Locally

To run a pipeline on your local machine or a compute/Jupyter node on ilifu, the museek command can be simply executed, providing a relative path to the configuration file within MuSEEK directory.

For example, to run the Demo pipeline defined in museek/museek/config/demo.py, execute the following command,

museek museek.config.demo

Note that the path to the configuration file must be provided in the Python import style format, i.e. replacing / with . and discarding .py extension. This is due to the restriction in the Ivory workflow engine, which we hope to fix in future releases.

Changing Pipeline Parameters

To make change to a large numbers of plugins' parameters within a pipeline, it is recommended that the configuration file is copied and edited.

Alternatively, parameters can be overridden by passing extra flags to the museek command. For example, the following command will run the Demo pipeline, overriding the output folder to ./demo_results.

mkdir demo_results
museek --DemoLoadPlugin-context-folder=./demo_results museek.config.demo

Running as a SLURM Job

For actual data processing on Ilifu, you will want to submit a slurm job to run the pipeline.

You can use the sbatch command to schedule a job:

sbatch demo.sh

You can find an sbatch script to run the Demo pipeline as an example below, but remember to change /path/to/virtualenv to your own environment. The allocated ressources in this script are minimal and for demonstration only, see below for a brief guideline on ressource usage.

#!/bin/bash

#SBATCH --job-name='MuSEEK-demo'
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=1
#SBATCH --mem=4GB
#SBATCH --output=museek-stdout.log
#SBATCH --error=museek-stderr.log
#SBATCH --time=00:05:00

source /path/to/virtualenv/museek/bin/activate
echo "Using a Python virtual environment: $(which)"

echo "Creating output directory"
mkdir -p demo_results

echo "Submitting Slurm job"
museek --DemoLoadPlugin-context-folder=demo_results museek.config.demo

Once the job is finished, you can check the results of the demo pipeline in your working directory and in demo_results.

To adopt this script to the real pipeline, you will need to change museek.config.demo to the config you want to use, e.g. museek.config.process_l_band. You also need to adjust the ressources in the sbatch script depending on the config. As a rough estimate, processing an entire MeerKAT observation block may be done with --cpus-per-task=32, --mem=128GB and --time=03:00:00.

Processing UHF data on Ilifu with `museek_run_process_uhf_band` Script

The museek_run_process_uhf_band script provides a streamlined command-line interface for running the process UHF pipeline on Ilifu. It automatically creates and submits SLURM jobs for you.

$ museek_run_process_uhf_band --help
MuSEEK UHF Band Processing Script

This script generates and submits a Slurm job to process UHF band data using 
the MuSEEK pipeline.

USAGE:
  museek_run_process_uhf_band --block-name <block_name> --box <box_number> 
                            [--base-context-folder <path>] [--data-folder <path>]
                            [--slurm-options <options>] [--dry-run]

OPTIONS:
  --block-name <block_name>
      (required) Block name or observation ID (e.g., 1675632179)

  --box <box_number>
      (required) Box number of this block name (e.g., 6)

  --base-context-folder <path>
      (optional) Path to the base context/output folder
      The final context folder will be <base-context-folder>/BOX<box>/<block-name>
      Default: /idia/projects/meerklass/MEERKLASS-1/uhf_data/XLP2025/pipeline

  --data-folder <path>
      (optional) Path to raw data folder
      Default: /idia/projects/meerklass/MEERKLASS-1/uhf_data/XLP2025/raw

  --slurm-options <options>
      (optional) Additional SLURM options to pass to sbatch
      Each --slurm-options takes ONE flag (e.g., --mail-user=user@domain.com)
      Multiple --slurm-options can be specified for multiple flags
      Examples: 
        Single: --slurm-options --time=72:00:00
        Multiple: --slurm-options --mail-user=user@domain.com --slurm-options --mail-type=ALL

  --dry-run
      (optional) Show the generated sbatch script without submitting

  --help
      Display this help message

EXAMPLES:
  museek_run_process_uhf_band --block-name 1675632179 --box 6
  museek_run_process_uhf_band --block-name 1675632179 --box 6 --base-context-folder /custom/pipeline
  museek_run_process_uhf_band --block-name 1675632179 --box 6 --dry-run
  museek_run_process_uhf_band --block-name 1675632179 --box 6 --slurm-options --mail-user=user@uni.edu --slurm-options --mail-type=ALL --slurm-options --time=72:00:00

Examining Results

To access results stored by the pipeline as pickle files, the class ContextLoader can be used.

Anatomy of Pipeline and Plugins

Configuration File

A pipeline is defined by its configuration file, which is technically a Python module file, and usually consists of several ConfigSection() instances.

One instance should be called Pipeline, which defines the entire pipeline, i.e. the order of the plugins to run. Other instances need to be named to the plugins they belong to. The workflow manager will hand over the correct configuration parameters to each plugin.

For example, the configuration file for the Demo pipeline (museek/config/demo.py) looks like the following,

from ivory.utils.config_section import ConfigSection

Pipeline = ConfigSection(
    plugins=[
        "museek.plugin.demo.demo_load_plugin",
        "museek.plugin.demo.demo_flip_plugin",
        "museek.plugin.demo.demo_plot_plugin",
        "museek.plugin.demo.demo_joblib_plugin",
    ],
)

DemoLoadPlugin = ConfigSection(
    url="https://cdn.openai.com/dall-e-2/demos/text2im/astronaut/horse/photo/9.jpg",
    context_file_name="context.pickle",
    context_folder="./"
)

DemoPlotPlugin = ConfigSection(do_show=False, do_save=True)

DemoFlipPlugin = ConfigSection(do_flip_right_left=True, do_flip_top_bottom=True)

DemoJoblibPlugin = ConfigSection(n_iter=10, n_jobs=2, verbose=0)

Here, the DemoLoadPlugin, DemoFlipPlugin, DemoPlotPlugin, and DemoJoblibPlugin will be run in that order.

Plugins

Plugins can be implemented by creating a class inheriting from the Ivory's AbstractPlugin abstract class. The inherited class must override the run() and set_requirements() methods.

In addition, take note of the following requirements:

Only one plugin per file is allowed. One plugin can not import another plugin.
Naming: CamelCase ending on "Plugin", example: "GainCalibrationPlugin".
To have the plugin included in the pipeline, the config file's "Pipeline" entry needs to include the plugin under "plugins".
If the plugin requires configuration (most do), the config file needs to contain a section with the same name as the plugin.
Plugins need to define their requirements in self.set_requirements(). The workflow engine will compare these to the set of results that are already produced when the plugin starts and hands them over to the run() method. The requirements are encapsulated as Requirement objects, which are mere NamedTuples. See the docstring of the Requirement class for more information.
Plugins need to define a self.run() method, which is executed by the workflow engine.
Plugins need to define and run self.set_result() to hand their results back to the workflow engine for storage, which will be passed to the next plugin in the pipeline. Plugin results need to be defined as Result objects. See the Result class doc for more information.

Available Plugins

More information on these are included in their class documentations.

Demonstration plugins: DemoFlipPlugin, DemoLoadPlugin & DemoPlotPlugin
InPlugin
OutPlugin
NoiseDiodeFlaggerPlugin
KnownRfiPlugin
RawdataFlaggerPlugin
ScanTrackSplitPlugin
PointSourceFlaggerPlugin
AoflaggerPlugin
AoflaggerSecondRunPlugin
AntennaFlaggerPlugin
NoiseDiodePlugin
GainCalibrationPlugin
AoflaggerPostCalibrationPlugin
SanityCheckObservationPlugin
other plugins for 'calibrator', 'zebra', and 'standing wave', but they are not finished

Notebooks

MuSEEK now come with a few Jupyter notebook templates for data inspection, which are now run after process_uhf_band pipeline has been run on the data.

Running the Notebook with Jupyter

The notebooks can be copy and run on Jupyter, with the Jupyter Hub on Ilifu (jupyter.ilifu.ac.za), for example. This is only recommend for experimenting with the notebooks.

Running the Notebook with Papermill

Since version 0.3.0, we have adopted papermill as an engine for executing the notebook templates through a command-line interface. This allow the notebook to be run outside Jupyter while modifying the required parameters dynamically, which is more suitable for running the post-calibration notebooks on hundreads of data blocks, for example. The papermill command should be automatically installed when you install MuSEEK as it is one of the requirements.

To execute the notebook, first make sure that you have installed a Jupyter kernel. For example, if you are using the shared meerklass environment on Ilifu, running the two command below will install a Jupyter kernel named meerklass for using with papermill (and jupyter.ilifu.ac.za)

source /idia/projects/meerklass/virtualenv/meerklass/bin/activate
python -m ipykernel install --user --name meerklass

Then to execute a notebook, simply do, for example,

papermill -k meerklass -p block_name 12345678 notebooks/calibrated_data_check-postcali.ipynb output_notebook.ipynb

Here, we tell papermill to run the calibrated_data_check-postcali.ipynb notebook using the meerklass kernel that we just installed, overidding the default block_name parameter in the notebook with 12345678 and saved the output notebook as output_notebook.ipynb

To figure out which parameters in the notebook can be passed thorugh papermill, use the --help-notebook tag. For example,

$ papermill --help-notebook notebooks/calibrated_data_check-postcali.ipynb 
Usage: papermill [OPTIONS] NOTEBOOK_PATH [OUTPUT_PATH]

Parameters inferred for notebook 'notebooks/calibrated_data_check-postcali.ipynb':
  block_name: str (default "1708972386")
  data_name: str (default "aoflagger_plugin_postcalibration.pickle")
  data_path: str (default "/idia/projects/hi_im/uhf_2024/pipeline/")

Check out papermill documentation and its CLI help text for more information.

papermill --help

Running the Notebook with `museek_run_notebook` Script

The museek_run_notebook script further streamlines the execution of the notebook via papermill on Ilifu or a compute cluster. It provides a wrapper to the papermill command and dynamically generates and submits SLURM jobs. It will find the notebook "template" in the MuSEEK package with name matching --notebook option.

$ museek_run_notebook --help
MuSEEK Notebook Execution Script

This script generates and submits a Slurm job to execute a MuSEEK Jupyter 
notebook using papermill.

USAGE:
  museek_run_notebook --notebook <notebook_name> --block-name <block_name> --box <box_number>
                    [--output-path <path>] [--kernel <kernel_name>]
                    [-p <param_name> <param_value>] ... [-p <param_name> <param_value>]
                    [--slurm-options <options>] ... [--slurm-options <options>] 
                    [--dry-run]

OPTIONS:
  --notebook <notebook_name>
      (required) Name of the notebook to run (e.g., calibrated_data_check-postcali)

  --block-name <block_name>
      (required) Block name or observation ID (e.g., 1708972386)

  --box <box_number>
      (required) Box number of this block name (e.g., 6)

  --output-path <path>
      (optional) Base directory for notebook output
      The final output folder will be <output_path>/BOX<box>/<block_name>/
      Default: /idia/projects/meerklass/MEERKLASS-1/uhf_data/XLP2025/pipeline

  --kernel <kernel_name>
      (optional) Jupyter kernel to use for execution
      Default: meerklass

  -p <param_name> <param_value> | --parameters <param_name> <param_value>
      (optional, repeatable) Parameters to pass to the notebook via papermill
      These override notebook defaults
      Examples: -p data_path /custom/path/ -p data_name custom.pickle

  --slurm-options <options>
      (optional) Additional SLURM options to pass to sbatch
      Each --slurm-options takes ONE flag (e.g., --mail-user=user@domain.com)
      Multiple --slurm-options can be specified for multiple flags
      Examples: 
        Single: --slurm-options --time=02:00:00
        Multiple: --slurm-options --mail-user=user@domain.com --slurm-options --mail-type=ALL

  --dry-run
      (optional) Show the generated sbatch script without submitting

  --help
      Display this help message

EXAMPLES:
  museek_run_notebook --notebook calibrated_data_check-postcali --block-name 1708972386 --box 6
  museek_run_notebook --notebook calibrated_data_check-postcali --block-name 1708972386 --box 6 -p data_path /custom/path/
  museek_run_notebook --notebook calibrated_data_check-postcali --block-name 1708972386 --box 6 --dry-run
  museek_run_notebook --notebook calibrated_data_check-postcali --block-name 1708972386 --box 6 --slurm-options --mail-user=user@uni.edu --slurm-options --mail-type=ALL

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Report Bugs: Please report potential bugs by opening an issue on MuSEEK GitHub repository, and include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in troubleshooting.
- Detailed steps to reproduce the bug.
Fix Bugs, Implement Features, or Write Documentation: Fork the repository (or ask to be added as a collaborator) and contribute a Pull Request.
Submit Feedback: If you are proposing a feature:
- Explain in detail how it should work.
- Keep the scope as narrow as possible, to make it easier to implement.
- Remember that this is a volunteer-driven project

Code Formatting Guidelines

Please try to follow standard Python code formatting (PEP8). The easiest way to achieve that is to use ruff, which should be available when installing MuSEEK in development mode.

ruff can check and automatically format code to align with the recommended Python style. Simple run these two commands in the root directory of the repository.

ruff check --statistics .
ruff format .

The first will print a summary of violation from the style guide. The second will auto-format the code. Note that the auto-formatting will not work on lines with long string or docstrings. You will have to manually format that.

A pre-commit hook, which is basically a little automated script that is run when git makes a new commit, is also provided in the repository to automatically run ruff upon making a new commit. To use it, simply install pre-commit hooks:

pre-commit install

Then everytime you do git commit, ruff will automatically format and lint your code before each commit.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

The pull request should include tests.
The docs should be updated or extended accordingly. Add any new plugins to the list in README.md.
The pull request should work for Python 3.10.
Make sure that the tests pass.
Code must be formatted with ruff format and pass ruff check.

Maintainers

The current maintainers of MuSEEK are:

Mario Santos (@mariogrs)
Wenkai Hu (@wkhu-astro)
Piyanat Kittiwisit (@piyanatk)
Geoff Murphy (@GeoffMurphy)

Name		Name	Last commit message	Last commit date
Latest commit History 781 Commits
.github/workflows		.github/workflows
data		data
museek		museek
notebooks		notebooks
scripts		scripts
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

License

meerklass/museek

Folders and files

Latest commit

History

Repository files navigation

MuSEEK: Multi-dish Signal Extraction and Emission Kartographer

Table of Contents

Installation

General Use

1. Pre-configured Python virtual Environment on Ilifu

2. Manual installation directly from GitHub

Development Setup

1. Setup a Python Virtual Environment

2. Manually install MuSEEK in editable mode

Using MuSEEK in a Jupyter Notebook

Available Commands

Running A Pipeline

Running Locally

Changing Pipeline Parameters

Running as a SLURM Job

Processing UHF data on Ilifu with museek_run_process_uhf_band Script

Examining Results

Anatomy of Pipeline and Plugins

Configuration File

Plugins

Available Plugins

Notebooks

Running the Notebook with Jupyter

Running the Notebook with Papermill

Running the Notebook with museek_run_notebook Script

Contributing

Code Formatting Guidelines

Pull Request Guidelines

Maintainers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 9

Uh oh!

Languages

Processing UHF data on Ilifu with `museek_run_process_uhf_band` Script

Running the Notebook with `museek_run_notebook` Script

Packages