Dwarforge is an automated pipeline for the detection and classification of dwarf galaxy candidates in wide-field imaging surveys. The project is tuned for data from the Ultraviolet Near-Infrared Optical Northern Survey (UNIONS). The pipeline combines classical detection with MTObjects (MTO) (see MTO GitHub repo) and deep learning classification via a fine-tuned model from the Zoobot project. This repository was used to produce Galaxies OBserved as Low-luminosity Identified Nebulae (GOBLIN); a catalog of 43,000 dwarf galaxy candidates in the UNIONS survey (see GOBLIN paper and GOBLIN catalog). If you use DwarForge in academic work, please cite the GOBLIN catalog paper and this repository. See “Citing” below.
Requirements: A CANFAR VOSpace account, Python 3.11.5+, HDF5 development libs, and common scientific packages (see pyproject.toml).
Install
Clone the repository using
git clone https://gitlab.com/nick-main-group/dwarforge.gitChange into the local repository via
cd dwarforge
Then install the repository in editable mode so you can easily change the code on your local machine or cluster:
pip install -e .
vos X509/SSL certificate
To be able to download UNIONS data you need to be part of the collaboration (as long as the data is not public), have a CANFAR account and have a valid X509/SSL certificate on your machine. This certificate needs to be renewed every 10 days. In order to generate such a certificate, run the following command:
cadc-get-cert -u *yourusername*
Type your CANFAR password. If successful you should see a confirmation message like this:
DONE. 10 day certificate saved in /home/yourusername/.ssl/cadcproxy.pem
- Open the detection config file (configs/detection_config.yaml), define your machine (local | canfar | narval) and edit your paths.
- Set the number of processing cores that should be used for parallel processing. One core is dedicated to continuously downloading data and queueing it for processing.
- Set the band filter that should be processed as "anchor_band" (cfis-u | whigs-g | cfis_lsb-r | ps-i | wishes-z).
- Define your input (tiles | coordinates | dataframe | all_available). For a quick test use either a set of tile numbers or coordinates in RA/Dec.
- If you are running the script for the first time, set update_tiles and build_new_kd_tree to true. After that you can set both to false and only update when you know new tiles have been reduced.
- Run
python scripts/detection.pyon the command line. This will download the image tiles corresponding to your input, add them to a process queue, several workers will preprocess the images, run MTO detection, filter detections, and save the rebinned image (4x4 pixels) and the MTO detections. - Rerun the script for two more bands.
- Run
python scripts/combination.pyon the command line. This will cross-match detections in different bands to filter out false detections due to single-band artifacts or non-low-surface-brightness objects, create a per-tile catalog of detections, create RGB cutouts (256x256 pixels) of these and store them along with their meta data in .h5 files. - Run
python scripts/inference.pyon the command line to apply the trained deep learning Zoobot model to the created object cutouts and assign a probability that a given object is a dwarf galaxy.
If you want to combine cutouts of detections from multiple tiles to train an ML/DL model, edit the config file (configs/h5_aggregation_config.yaml) and run python scripts/h5_aggregation.py on the command line. This will create a new HDF5 file storing the cutouts.
Edit and run python scripts/file_transfer.py on the command line to upload the created HDF5 files to Google Drive for further processing on e.g., Google Colab.
If you use DwarForge or results produced with it, please cite both the GOBLIN catalog paper and this repository.
- https://www.skysurvey.cc : UNIONS survey website
- Teeninga, P., Moschini, U., Trager, S. C., & Wilkinson, M. H. 2015, Lect. Notes Comput. Sci., 9082, 157: MTObjects (paper 1)
- Teeninga, P., Moschini, U., Trager, S. C., & Wilkinson, M. H. 2016, MMTA, 1, 100: MTObjects (paper 2)
- Walmsley, M., Allen, C., Aussel, B., et al. 2023, JOSS, 8, 5312: Zoobot paper
- https://github.com/mwalmsley/zoobot : Zoobot GitHub repository
- https://github.com/CarolineHaigh/mtobjects : Python implementation of MTObjects
- https://github.com/heesters-nick/DwarfClass : Visual classification tool used to generate the training dataset
@ARTICLE{2025A&A...699A.232H,
author = {{Heesters}, Nick and {Chemaly}, David and {M{\"u}ller}, Oliver and {Sola}, Elisabeth and {Fabbro}, S{\'e}bastien and {Ferreira}, Ashley and {McConnachie}, Alan W. and {Magnier}, Eugene and {Hudson}, Michael J. and {Chambers}, Kenneth and {Hammer}, Fran{\c{c}}ois and {Sanchez-Janssen}, Ruben},
title = "{Galaxies OBserved as Low-luminosity Identified Nebulae (GOBLIN): Catalog of 43 000 high-probability dwarf galaxy candidates in the UNIONS survey}",
journal = {\aap},
keywords = {methods: observational, techniques: image processing, catalogs, surveys, galaxies: abundances, galaxies: dwarf, Astrophysics of Galaxies},
year = 2025,
month = jul,
volume = {699},
eid = {A232},
pages = {A232},
doi = {10.1051/0004-6361/202554501},
archivePrefix = {arXiv},
eprint = {2505.18307},
primaryClass = {astro-ph.GA},
adsurl = {https://ui.adsabs.harvard.edu/abs/2025A&A...699A.232H},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}