Skip to content

Detects and classifies dwarf galaxy candidates at survey scale.

License

Notifications You must be signed in to change notification settings

heesters-nick/DwarForge

Repository files navigation

DwarForge

CI Python Code style: ruff Type checked: mypy Paper DOI License: MIT

Dwarforge is an automated pipeline for the detection and classification of dwarf galaxy candidates in wide-field imaging surveys. The project is tuned for data from the Ultraviolet Near-Infrared Optical Northern Survey (UNIONS). The pipeline combines classical detection with MTObjects (MTO) (see MTO GitHub repo) and deep learning classification via a fine-tuned model from the Zoobot project. This repository was used to produce Galaxies OBserved as Low-luminosity Identified Nebulae (GOBLIN); a catalog of 43,000 dwarf galaxy candidates in the UNIONS survey (see GOBLIN paper and GOBLIN catalog). If you use DwarForge in academic work, please cite the GOBLIN catalog paper and this repository. See “Citing” below.

Installation

Requirements: A CANFAR VOSpace account, Python 3.11.5+, HDF5 development libs, and common scientific packages (see pyproject.toml).

Install

Clone the repository using

git clone https://gitlab.com/nick-main-group/dwarforge.git

Change into the local repository via

cd dwarforge

Then install the repository in editable mode so you can easily change the code on your local machine or cluster:

pip install -e .

vos X509/SSL certificate

To be able to download UNIONS data you need to be part of the collaboration (as long as the data is not public), have a CANFAR account and have a valid X509/SSL certificate on your machine. This certificate needs to be renewed every 10 days. In order to generate such a certificate, run the following command:

cadc-get-cert -u *yourusername*

Type your CANFAR password. If successful you should see a confirmation message like this:

DONE. 10 day certificate saved in /home/yourusername/.ssl/cadcproxy.pem

Quick start

  • Open the detection config file (configs/detection_config.yaml), define your machine (local | canfar | narval) and edit your paths.
  • Set the number of processing cores that should be used for parallel processing. One core is dedicated to continuously downloading data and queueing it for processing.
  • Set the band filter that should be processed as "anchor_band" (cfis-u | whigs-g | cfis_lsb-r | ps-i | wishes-z).
  • Define your input (tiles | coordinates | dataframe | all_available). For a quick test use either a set of tile numbers or coordinates in RA/Dec.
  • If you are running the script for the first time, set update_tiles and build_new_kd_tree to true. After that you can set both to false and only update when you know new tiles have been reduced.
  • Run python scripts/detection.py on the command line. This will download the image tiles corresponding to your input, add them to a process queue, several workers will preprocess the images, run MTO detection, filter detections, and save the rebinned image (4x4 pixels) and the MTO detections.
  • Rerun the script for two more bands.
  • Run python scripts/combination.py on the command line. This will cross-match detections in different bands to filter out false detections due to single-band artifacts or non-low-surface-brightness objects, create a per-tile catalog of detections, create RGB cutouts (256x256 pixels) of these and store them along with their meta data in .h5 files.
  • Run python scripts/inference.py on the command line to apply the trained deep learning Zoobot model to the created object cutouts and assign a probability that a given object is a dwarf galaxy.

HDF5 combination

If you want to combine cutouts of detections from multiple tiles to train an ML/DL model, edit the config file (configs/h5_aggregation_config.yaml) and run python scripts/h5_aggregation.py on the command line. This will create a new HDF5 file storing the cutouts.

File upload to Google Drive

Edit and run python scripts/file_transfer.py on the command line to upload the created HDF5 files to Google Drive for further processing on e.g., Google Colab.

Citing

If you use DwarForge or results produced with it, please cite both the GOBLIN catalog paper and this repository.

References

Paper BibTeX

@ARTICLE{2025A&A...699A.232H,
       author = {{Heesters}, Nick and {Chemaly}, David and {M{\"u}ller}, Oliver and {Sola}, Elisabeth and {Fabbro}, S{\'e}bastien and {Ferreira}, Ashley and {McConnachie}, Alan W. and {Magnier}, Eugene and {Hudson}, Michael J. and {Chambers}, Kenneth and {Hammer}, Fran{\c{c}}ois and {Sanchez-Janssen}, Ruben},
        title = "{Galaxies OBserved as Low-luminosity Identified Nebulae (GOBLIN): Catalog of 43 000 high-probability dwarf galaxy candidates in the UNIONS survey}",
      journal = {\aap},
     keywords = {methods: observational, techniques: image processing, catalogs, surveys, galaxies: abundances, galaxies: dwarf, Astrophysics of Galaxies},
         year = 2025,
        month = jul,
       volume = {699},
          eid = {A232},
        pages = {A232},
          doi = {10.1051/0004-6361/202554501},
archivePrefix = {arXiv},
       eprint = {2505.18307},
 primaryClass = {astro-ph.GA},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2025A&A...699A.232H},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

About

Detects and classifies dwarf galaxy candidates at survey scale.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published