This is Team Epoch's solution to the Lacuna Malaria Detection Challenge, hosted by Zindi.
The Public score achieved was : 0.92801233, and Private score: 0.92472582.
A technical report will be written and uploaded after the competition finale.
This solution aims to tackle malaria challenges in Africa by assisting doctors in rapidly diagnosing infections with minimal equipment. By automating the analysis of blood cell images, it reduces the need for manual examination of large datasets, allowing healthcare professionals to focus on treatment and care.
The dataset, provided by Zindi and the Lacuna Fund, includes approximately 3,000 microscope images of blood cells. The competition's goal is to identify and localize key objects within these images by drawing bounding boxes around three classes:
- White Blood Cells (WBCs)
- Trophozoites
- Negative (NEG)
This project strives to make a meaningful impact on malaria diagnostics, particularly in resource-limited settings.
This section outlines the process of extracting, transforming, modeling, and preparing predictions for submission in a machine learning workflow.
-
Dataset Preparation:
- Convert the dataset into YOLO format.
- Filter bounding boxes using an IoU threshold to remove duplicate boxes.
-
Data Extraction:
- Source data is extracted from the provided CSV files.
Three models are trained to perform specific tasks:
- YOLO (11m): For general object detection.
- DETR: Another object detection model to complement YOLO.
- NEG Model: Specialized in identifying "NEG" images in predictions.
After training, each model undergoes test-time augmentation using the following techniques:
- Horizontal Flip
- Vertical Flip
- Horizontal and Vertical Flip
- No Flip
The predictions for each augmentation are ensembled for every model.
- Ensemble the TTA predictions for each individual model.
- Combine the ensembled predictions of all models into a unified set.
- Apply techniques defined in
postprocessing.py:- Use the NEG Model to adjust predictions by converting "NEG" predictions as needed.
- Refine bounding boxes, labels, and confidence scores.
The final predictions are exported to submission.csv, ready for submission.
- For detailed parameter settings and methods, refer to the
postprocessing.pyscript. - For hyperparemeters refer to the config files located in :
config_files/detr_train_config_filesandconfig_files/yolo_train_config_files. - The pipeline ensures an organized flow from raw data to final predictions.
- Models were validated using our own map calculation located in :
util/mAP_zindi.py
This section contains the steps that need to be taken to get started with our project and fully reproduce our best submission on the public and private leaderboard. The project was developed on Ubuntu 22.04, and on Python 3.10.
Models were trained on machine with the following specifications:
- CPU: AMD Ryzen 9 7950X 16-Core Processor
- GPU: NVIDIA RTX Quadro 6000
- RAM: 96GB
- OS: Ubuntu 22.04
- Python: 3.10.12
- Estimated training time: 7-8 hours for the DETR, 2-3 hours for YOLO.
Make sure to clone the repository with your favourite git client or using the following command:
https://github.com/TeamEpochGithub/ZindiLacunaMalaria.git
You can install the required python version here: Python 3.10
Install the required packages (on a virtual environment is recommended) using the following command:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
The data of the competition can be downloaded here: Lacuna Malaria Detection Challenge
Unzip all csv files into the data/csv_files directory, and all images into the data/img directory.
The structure should look like this:
data/
├── csv_files/
├── Train.csv
├── Test.csv
├── img/
├── id_xxxxxxxx.jpg
├── ...
main.py: This runs the entire end-to-end solution elaborated earlier.
The script inference.py enables quick deployment and prediction.
Regularly audit ETL pipelines for data integrity and scalability, and monitor model metrics (e.g., precision, recall, mAP) to detect performance drift. Automate retraining triggers and validate postprocessing logic, such as NEG model adjustments, to ensure consistent outputs.
Leverage cloud platforms for scalability, and use CI/CD pipelines for efficient model updates. Integrate new data into workflows and retrain models as needed. Implement drift detection and maintain documentation to support long-term usability and accessibility for healthcare practitioners. Note: use only the yolo models if implementing for a phone app.
This repository was created by Team Epoch V, based in the Dream Hall of the Delft University of Technology.
Read more about this competition here.
Marcin Jarosz, Madhav Venkatesh, Felipe Bononi Bello, Kenzo Heijman.


