Lacuna Malaria Detection Challenge

This is Team Epoch's solution to the Lacuna Malaria Detection Challenge, hosted by Zindi.

The Public score achieved was : 0.92801233, and Private score: 0.92472582.

A technical report will be written and uploaded after the competition finale.

Overview and objectives

This solution aims to tackle malaria challenges in Africa by assisting doctors in rapidly diagnosing infections with minimal equipment. By automating the analysis of blood cell images, it reduces the need for manual examination of large datasets, allowing healthcare professionals to focus on treatment and care.

The dataset, provided by Zindi and the Lacuna Fund, includes approximately 3,000 microscope images of blood cells. The competition's goal is to identify and localize key objects within these images by drawing bounding boxes around three classes:

White Blood Cells (WBCs)
Trophozoites
Negative (NEG)

This project strives to make a meaningful impact on malaria diagnostics, particularly in resource-limited settings.

End-to-End Machine Learning Workflow

This section outlines the process of extracting, transforming, modeling, and preparing predictions for submission in a machine learning workflow.

Preprocessing

Dataset Preparation:
- Convert the dataset into YOLO format.
- Filter bounding boxes using an IoU threshold to remove duplicate boxes.
Data Extraction:
- Source data is extracted from the provided CSV files.

Modeling

Three models are trained to perform specific tasks:

YOLO (11m): For general object detection.
DETR: Another object detection model to complement YOLO.
NEG Model: Specialized in identifying "NEG" images in predictions.

Test-Time Augmentation (TTA)

After training, each model undergoes test-time augmentation using the following techniques:

Horizontal Flip
Vertical Flip
Horizontal and Vertical Flip
No Flip

The predictions for each augmentation are ensembled for every model.

Ensembling

Ensemble the TTA predictions for each individual model.
Combine the ensembled predictions of all models into a unified set.

Postprocessing

Apply techniques defined in postprocessing.py:
- Use the NEG Model to adjust predictions by converting "NEG" predictions as needed.
- Refine bounding boxes, labels, and confidence scores.

Submission

The final predictions are exported to submission.csv, ready for submission.

Notes

For detailed parameter settings and methods, refer to the postprocessing.py script.
For hyperparemeters refer to the config files located in : config_files/detr_train_config_files and config_files/yolo_train_config_files.
The pipeline ensures an organized flow from raw data to final predictions.
Models were validated using our own map calculation located in : util/mAP_zindi.py

Getting started

This section contains the steps that need to be taken to get started with our project and fully reproduce our best submission on the public and private leaderboard. The project was developed on Ubuntu 22.04, and on Python 3.10.

Prerequisites

Models were trained on machine with the following specifications:

CPU: AMD Ryzen 9 7950X 16-Core Processor
GPU: NVIDIA RTX Quadro 6000
RAM: 96GB
OS: Ubuntu 22.04
Python: 3.10.12
Estimated training time: 7-8 hours for the DETR, 2-3 hours for YOLO.

1. Clone the repository

Make sure to clone the repository with your favourite git client or using the following command:

https://github.com/TeamEpochGithub/ZindiLacunaMalaria.git

Install Python 3.10

You can install the required python version here: Python 3.10

Install the required packages

Install the required packages (on a virtual environment is recommended) using the following command:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Setup the competition data

The data of the competition can be downloaded here: Lacuna Malaria Detection Challenge

Unzip all csv files into the data/csv_files directory, and all images into the data/img directory.

The structure should look like this:

data/
    ├── csv_files/
        ├── Train.csv
        ├── Test.csv
    ├── img/
        ├── id_xxxxxxxx.jpg
        ├── ...

Main file explanation

main.py: This runs the entire end-to-end solution elaborated earlier.

Inference and Deployment

The script inference.py enables quick deployment and prediction.

Maintenance and monitoring:

Regularly audit ETL pipelines for data integrity and scalability, and monitor model metrics (e.g., precision, recall, mAP) to detect performance drift. Automate retraining triggers and validate postprocessing logic, such as NEG model adjustments, to ensure consistent outputs.

Leverage cloud platforms for scalability, and use CI/CD pipelines for efficient model updates. Integrate new data into workflows and retrain models as needed. Implement drift detection and maintain documentation to support long-term usability and accessibility for healthcare practitioners. Note: use only the yolo models if implementing for a phone app.

Contributors

This repository was created by Team Epoch V, based in the Dream Hall of the Delft University of Technology.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
config_files		config_files
data/csv_files		data/csv_files
inference		inference
parameters/postprocessing_config_files		parameters/postprocessing_config_files
postprocessing		postprocessing
preprocessing		preprocessing
training		training
util		util
LICENSE		LICENSE
README.md		README.md
posprocess_config.yaml		posprocess_config.yaml
postprocess.py		postprocess.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train_and_infer.py		train_and_infer.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lacuna Malaria Detection Challenge

Overview and objectives

End-to-End Machine Learning Workflow

Preprocessing

Modeling

Test-Time Augmentation (TTA)

Ensembling

Postprocessing

Submission

Notes

Getting started

Prerequisites

1. Clone the repository

Install Python 3.10

Install the required packages

Setup the competition data

Main file explanation

Inference and Deployment

Maintenance and monitoring:

Contributors

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

TeamEpochGithub/MalariaDetectionChallenge

Folders and files

Latest commit

History

Repository files navigation

Lacuna Malaria Detection Challenge

Overview and objectives

End-to-End Machine Learning Workflow

Preprocessing

Modeling

Test-Time Augmentation (TTA)

Ensembling

Postprocessing

Submission

Notes

Getting started

Prerequisites

1. Clone the repository

Install Python 3.10

Install the required packages

Setup the competition data

Main file explanation

Inference and Deployment

Maintenance and monitoring:

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages