Open-World Semantic Segmentation Including Class Similarity

This is the code repository of the paper Open-World Semantic Segmentation Including Class Similarity, accepted to the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024.

You can find the paper here.

Objective & Goals of Research

In this project, we aim to improve on Open-World Segmentation (OWS) on Cityscapes dataset and BDDAnomaally dataset.

We are reproducing and extending the work of Sodano et al..

Installation

Install the libraries of the requirements.yml, or create a conda environment by conda env create -f requirements.yml and then conda activate openworld.

The weights of ResNet34 with NonBottleneck 1D block pretrained on ImageNet are available here.

Training

You can choose your favourite hyperparameters configuration in args.py. For training, run python train.py --id <your_id> --dataset_dir <your_data_dir> --num_classes <N> --batch_size 8.

The expected data structure is taken from Cityscapes. BDDAnomaly has been converted to Cityscapes format.

Repo structure:

📦 
├─ .gitignore
├─ README.md
├─ fyp
│  ├─ .gitignore
│  ├─ .vscode
│  │  └─ launch.json
│  ├─ README.md
│  ├─ mavs.pickle
│  ├─ overfitall.sh
│  ├─ proprocessing_unknown_known.py
│  ├─ requirements.yml
│  ├─ scripts
│  │  └─ run_all.pbs
│  ├─ src
│  │  ├─ __init__.py
│  │  ├─ args.py
│  │  ├─ build_model.py
│  │  ├─ datasets
│  │  │  ├─ __init__.py
│  │  │  ├─ cityscapes
│  │  │  │  ├─ README.md
│  │  │  │  ├─ __init__.py
│  │  │  │  ├─ cityscapes.py
│  │  │  │  ├─ prepare_dataset.py
│  │  │  │  ├─ pytorch_dataset.py
│  │  │  │  ├─ requirements.txt
│  │  │  │  ├─ weighting_linear_1+16_val.pickle
│  │  │  │  ├─ weighting_linear_1+17_val.pickle
│  │  │  │  ├─ weighting_linear_1+19_train.pickle
│  │  │  │  ├─ weighting_linear_1+19_val.pickle
│  │  │  │  ├─ weighting_linear_1+19_valid.pickle
│  │  │  │  └─ weighting_median_frequency_1+19_train.pickle
│  │  │  └─ dataset_base.py
│  │  ├─ losses
│  │  │  ├─ __init__.py
│  │  │  ├─ ce_loss.py
│  │  │  ├─ contrastive_loss.py
│  │  │  ├─ dice_loss.py
│  │  │  ├─ focal_loss.py
│  │  │  ├─ objectosphere_loss.py
│  │  │  └─ ow_loss.py
│  │  ├─ models_v2
│  │  │  ├─ __init__.py
│  │  │  ├─ context_modules.py
│  │  │  ├─ decoder.py
│  │  │  ├─ model.py
│  │  │  ├─ model_utils.py
│  │  │  ├─ neck.py
│  │  │  ├─ resnet.py
│  │  │  └─ tru_for_decoder.py
│  │  ├─ prepare_data.py
│  │  ├─ preprocessing.py
│  │  └─ utils.py
│  ├─ start.sh
│  ├─ train.py
│  └─ vars.pickle
├─ scripts
│  ├─ downlioad_cityscapes.sh
│  ├─ run.sh
│  └─ run_all.pbs
└─ vars.pickle

Original Model

We implement the Encoder-Decoder architecture defined in the ContMAV paper. We initially use ResNet34 and train our model for 500 epochs using a learning rate of 0.004, as in the paper, on CityScapes Dataset. The cityscapes data is a large dataset of street-level images with pixel-level annotations for 19 classes. The dataset is split into training, validation, and test sets.

Tru For

We aim to improve the performance of the model by using a different architecture. We add an additional decoder to the model from Guillaro et al. to the model and combine the logits from both decoders.

Different Loss Functions

We also try using different loss functions:

Focal Loss
Focal Loss with Dice Loss
Infonce Loss for Ananomaly Detection

Infonce Loss

We separate crops from the image, extracting the known and unknown classes. We then use the Infonce loss to train the model to distinguish between the known and unknown classes. The Infonce loss is defined as:

def infonce_loss(known, unknown):
    known = F.normalize(known, dim=1)
    unknown = F.normalize(unknown, dim=1)
    logits = torch.mm(known, unknown.t())
    labels = torch.arange(logits.size(0)).to(logits.device)
    loss = F.cross_entropy(logits, labels)
    return loss

This method allows for the model to learn to distinguish between the known and unknown classes, improving the performance of the model on the OWS task. Especially since the samples of unknown classes are usually not balanced.

Synco

From the synco paper, we utilise the idea of generating hard negatives from the known classes.

Which is better?:

We use the known classes to generate hard negatives for the unknown classes. This allows the model to learn to distinguish between the known and unknown classes, improving the performance of the model on the OWS task.
We also use the unknown classes to generate hard negatives for the known classes. This allows the model to learn to distinguish between the known and unknown classes, improving the performance of the model on the OWS task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-World Semantic Segmentation Including Class Similarity

Objective & Goals of Research

Installation

Training

Original Model

Tru For

Different Loss Functions

Infonce Loss

Synco

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
fyp		fyp
train		train
train_scripts		train_scripts
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Open-World Semantic Segmentation Including Class Similarity

Objective & Goals of Research

Installation

Training

Original Model

Tru For

Different Loss Functions

Infonce Loss

Synco

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages