Skip to content

Deep Learning Image Colorization Project - CNN and GAN models for automatic bird image colorization

Notifications You must be signed in to change notification settings

InonELGABSI/BirdsColoring

Repository files navigation

Deep Learning Image Colorization Project

This project develops deep learning models to automatically colorize grayscale bird images. We explore two main approaches: a Naive CNN Model and an advanced GAN-based Model to generate realistic colorized images.

Dataset

  • Source: UK Garden Birds Dataset from Kaggle
  • Content: 20 species of UK garden birds in natural settings
  • Image Size: All images resized to 128×128 pixels
  • Task: Convert grayscale input → RGB color output

Project Structure

The project consists of three Jupyter notebooks:

1. Architecture 1 - Naive Model (Arch-1-ATDL-FinalProject.ipynb)

Contains three CNN autoencoder experiments with progressive improvements:

  • Experiment 1: Basic encoder-decoder with Conv2D and upsampling layers
  • Experiment 2: Enhanced with dropout layers to prevent overfitting
  • Experiment 3: Advanced model with batch normalization (best performing naive model)

Each model converts grayscale (1-channel) input to RGB (3-channel) output using MSE loss.

2. Architecture 2 - GAN Model (Arc-2-ATDL-FinalProject.ipynb)

Advanced conditional GAN (CGAN) approach with two different generator architectures:

  • Model 1: Pix2Pix U-Net generator with PatchGAN discriminator (47 epochs)
  • Model 2: MobileNetV2 transfer learning generator with improved discriminator (115 epochs)

Uses adversarial loss + L1 reconstruction loss for realistic colorization.

Based on: Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE conference on computer vision and pattern recognition, 1125-1134. Paper Link

3. Test Environment (Test-ATDL-FinalProject.ipynb)

Evaluation notebook that compares the two best models:

  • Architecture 1 Model 3: Best naive CNN model
  • Architecture 2 GAN Model: Best GAN-based model

Downloads pre-trained weights and provides side-by-side comparison on test images.

How to Run

Prerequisites

  • Google Colab Account (Recommended environment)
  • Kaggle Account for dataset access
  • Libraries are automatically installed via !pip install commands in the notebooks

Running the Notebooks in Google Colab

  1. Upload notebooks to Google Drive or open directly in Colab

  2. Start with Architecture 1 (Naive Model):

    • Open Arch-1-ATDL-FinalProject.ipynb in Google Colab
    • Run the installation cells to install required libraries
    • Authenticate with Kaggle when prompted
    • Run cells sequentially to train all 3 experiments
  3. Run Architecture 2 (GAN Model):

    • Open Arc-2-ATDL-FinalProject.ipynb in Google Colab
    • Run installation and authentication cells
    • Train Pix2Pix U-Net model (Model 1) - approximately 47 epochs
    • Train MobileNetV2 model (Model 2) - approximately 115 epochs
  4. Test and Compare Models:

    • Open Test-ATDL-FinalProject.ipynb in Google Colab
    • Pre-trained models are automatically downloaded via gdown
    • Upload test images for colorization comparison
    • View side-by-side results of both architectures

Local Environment Setup (Alternative)

If running locally, ensure you have:

  • Python 3.7+
  • TensorFlow/Keras
  • Required libraries: numpy, matplotlib, pillow, scipy, kagglehub, gdown

Key Features

Naive CNN Models

  • Simple Architecture: Basic encoder-decoder structure
  • Progressive Enhancement: Adding dropout and batch normalization
  • Fast Training: 20-50 epochs depending on experiment
  • Loss Function: Mean Squared Error (MSE)

GAN Models

  • Advanced Architecture: U-Net generators with discriminators
  • Transfer Learning: MobileNetV2 pre-trained encoder
  • Sophisticated Loss: Adversarial + L1 reconstruction loss (λ=100)
  • Extended Training: 47-115 epochs for optimal results

Training Details

Architecture 1 Training

  • Optimizer: Adam (lr=2e-4)
  • Batch Size: 32
  • Epochs: 20-50 per experiment
  • Data Augmentation: Basic preprocessing

Architecture 2 Training

  • Generator Optimizer: Adam (lr=2e-4, β₁=0.5)
  • Discriminator Optimizer: Adam (lr=1e-4, β₁=0.5)
  • Batch Size: 10
  • Advanced Augmentation: Random jitter, horizontal flip, rotations
  • Training Strategy: Progressive phases with manual evaluation

Results Summary

Best Performing Models

  1. Architecture 1 Experiment 3: Best naive approach with batch normalization
  2. Architecture 2 Model 2: MobileNetV2-based generator (115 epochs)

Key Observations

  • Naive Models: Fast training but limited color saturation and detail
  • GAN Models: Superior visual quality with more realistic and vibrant colorization
  • Transfer Learning: MobileNetV2 encoder significantly improved feature extraction
  • Training Balance: Careful discriminator-generator balance crucial for stability

Interface Preview

Colorization Interface Demo

Files Structure

BirdsColoring/
├── Arch-1-ATDL-FinalProject.ipynb    # Naive CNN models (Google Colab)
├── Arc-2-ATDL-FinalProject.ipynb     # GAN-based models (Google Colab)  
├── Test-ATDL-FinalProject.ipynb      # Model comparison (Google Colab)
├── README.md                          # This file
├── LICENSE                            # MIT License
├── THIRD_PARTY_LICENSES.md            # Third-party library licenses
└── .gitignore                         # Git ignore rules

Important Notes

  • Environment: Notebooks are optimized for Google Colab with GPU support
  • Model Downloads: Pre-trained models are hosted on Google Drive and downloaded via gdown
  • Dataset Access: Requires Kaggle authentication for UK Garden Birds dataset
  • Training Time: GAN models require significant computational resources (GPU recommended)

References

  • Pix2Pix: Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1125-1134. https://arxiv.org/abs/1611.07004

  • MobileNetV2: Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4510-4519. https://arxiv.org/abs/1801.04381

  • Dataset: Mahony, D. (2020). 20 UK Garden Birds. Kaggle. https://www.kaggle.com/davemahony/20-uk-garden-birds

Authors

  • Inon Elgabsi
  • Nati Forish
  • Iyar Gadolov
  • Roy Edri

License

This project is licensed under the MIT License - see the LICENSE file for details.

For third-party library licenses, see THIRD_PARTY_LICENSES.md.


This project demonstrates the evolution from basic CNN autoencoders to sophisticated GAN architectures for image colorization, showing significant improvements in output quality through advanced deep learning techniques.

About

Deep Learning Image Colorization Project - CNN and GAN models for automatic bird image colorization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published