Skip to content

Machine Learning projects for the competitive AI course. Both are on mixed images separation, one is about classification, the other is instead about reconstruction.

License

Notifications You must be signed in to change notification settings

maforn/competitiveProjectsML

Repository files navigation

Machine Learning Competitive Projects

Machine Learning projects for the competitive AI course. Both are on image separation, one is about classification, the other is instead about reconstruction.

Description

This project aims to separate an image, obtained as the sum of two images, into its original components. The two source images, img1 and img2, come from different datasets: MNIST and Fashion-MNIST, respectively. No preprocessing is allowed. The neural network receives as input the combined image (img1 + img2) and returns the predictions (hat_img1 and hat_img2). Performance is evaluated using the mean square error (MSE) between the predicted and reference images. Both datasets (MNIST and Fashion-MNIST) are grayscale. For simplicity, all samples are fit to the resolution (32,32).

Architecture

A CNN with an encoder-decoder structure, enhanced by residual blocks, attention modules, and dilated convolutions. Inspired by U-Net, it integrates double residual blocks for richer representations and attention mechanisms for improved focus on critical details.

Lambda Layer for Fashion-MNIST

One output is derived from the other using input * 2 - output, allowing the model to focus on optimizing a single output.

Loss Function – FocalEdgeLoss

A combination of focally weighted MSE (to emphasize harder errors) and edge loss (using Sobel filters) to preserve structural details, crucial for Fashion-MNIST.

Results

MSE: 0.0002627953857881948
standard deviation 4.547221980832357e-06

Best model in the competition


Description

The model takes as input an image obtained as the average of two random samples taken from Cifar10, and must predict the categories of the two components. The first image belongs to the first 5 categories (airplane, automobile, bird, cat, deer), while the second belongs to the remaining (dog, frog, horse, ship, truck). The model must return two labels, each in a range of 5 values. The metric with which to evaluate the model is as follows: calculate the classification accuracy for the two component images, and then average them. The metric should be evaluated on 10000 inputs generated from test data. Repeat the calculation 10 times and measure the standard deviation, which must be reported.

Overview

The model follows a VGG16-style CNN for 32x32x3 inputs, using 3x3 convolutions, ReLU, BatchNorm, Dropout (0.3-0.5), and MaxPooling to extract features.

Architecture:

  • Blocks 1-2: 64 & 128 filters, Dropout → MaxPooling.
  • Block 3: 256 filters, three convolutions, Dropout → MaxPooling.
  • Blocks 4-5: 512 filters, six convolutions, Dropout → MaxPooling.
  • Final Layers: Flatten → Dense (512, ReLU, BatchNorm, Dropout) → Two Softmax Outputs (output_1, output_2).
    Trained with categorical crossentropy, early stopping, and up to 500 epochs, achieving ~84% accuracy. A lighter version (32 → 256 filters) reduces parameters while maintaining ~79% accuracy.

Results

Mean accuracy: 0.832255
standard deviation = 0.002102552020759534
Not the best one, but in the top five. The best ones used transfer learning from big pretrained models such as EfficientNet

About

Machine Learning projects for the competitive AI course. Both are on mixed images separation, one is about classification, the other is instead about reconstruction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published