Towards Ship License Plate Recognition in the Wild: A Large Benchmark and Strong Baseline

Note: The link to our main AAAI2025 paper is here. Additionally, supplementary materials are available in this project directory. Dataset access requires an application and signed usage agreement.

In this work, we first built a large dataset called SLP34K. The dataset was collected based on a real-world waterway intelligent traffic surveillance system. Images in the datasets were captured by eight surveillance cameras located at eight different locations over 42 months. Finally, by using more cameras located at more different locations and long-period capturing, we have obtained a dataset of 34,385 SLP images of high diversity.

We also propose a simple yet effective SLP recognition baseline method, with a strong visual encoder. The strong visual encoder is achieved by two strategies. First, we employ self-supervised learning, to pre-train the visual encoder, which enables the encoder to learn robust visual representations without labeled data. Second, we refine the encoder via a semantic enhancement module, which finetunes the encoder via a contrastive learning mechanism to enhance the semantic consistency between the SLP image and its corresponding text label.

1. Runtime environment

1.1 Code acquisition

git clone https://github.com/HuiGuanLab/SLP34K.git

We use Anaconda as a virtual environment setup tool for model training. Next, we will introduce the training environment setup steps for self-supervised pretraining and SLP recognition fine-tuning, as well as the required installation configuration files. Due to conflicts between certain software package versions, our pre-training and fine-tuning were completed in two different virtual environments.

1.2 Self-supervised pre-training environment

cd SLP34K/mae
conda create -n slk34k_mae python=3.8
conda activate slk34k_mae
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Attention please: Please note that our pre-training code requires the installation of timm software version 0.3.2 (timm==0.3.2). Therefore, if your PyTorch version is higher than 1.8.1, you may need to make some corresponding code modifications. For more details, please refer to this issue.

1.3 SLP recognition fine-tuning environment

cd SLP34K/ocr_training
conda create --name slk34k_rec python=3.9
conda activate slk34k_rec
pip install -r requirements.txt
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

2. Dataset and pre-trained models download

2.1 Dataset download

Before formally accessing the dataset, applicants must sign a dataset license agreement(中国境内的申请者请签署和发送中文版数据集版权协议).

Please print the license agreement, hand-sign it, scan the signed document, and email it to us from your official institutional email address to apply for dataset access (mailto: liubaolongx@gmail.com or dongjf24@gmail.com).

Since our data is stored on Google Drive, you will first need to register for a Google Drive account.

Then, please send the dataset license agreement along with your Google Drive account details to us via the application email. Upon approval of the license agreement, we will grant your account access to the dataset.

Dataset	File Size	Format
SLP34K.7z	153.8 MB	JPG
data.7z	6.36 GB	lmdb

SLP34K.7z is the raw image of the SLP sample, and data.7z is the corresponding sample in lmdb format.

Download the data.7z file, extract it, and place it in the ocr_training/data. The directory structure of the dataset is as follows.

|--data
  |--test
    |--SixCommon_benchmark
      |--CUTE80
      |--IC13_1015
      |--IC13_2077
      |--III5K
      |--SVT
      |--SVTP
    |--SLP34K_lmdb_train
      |--IV_lmdb
      |--low_lmdb
      |--multi-lines_lmdb
      |--normal_lmdb
      |--OOV_lmdb
      |--single-line_lmdb  
      |--vertical_lmdb
    |--Union14m_benchmark
      |--artistic
      |--...
  |--train
    |--SLP34K_lmdb_train
    |--Union14m-L
      |--...
  |--val
    |--SLP34K_lmdb_test
    |--Union14m_benchmark
      |--..

2.2 Pre-trained models download

CheckPoint	File Size	Usage	Google Drive
pretrain_model.7z	2.2GB	Pre-trained weights	Google Drive
checkpoint.7z	2.7GB	SLP recognition	Google Drive

Download the weights above, unzip them, and place them in the ocr_training/ directory.

3. Model training

Our baseline method was trained and evaluated on both the SLP34K dataset and the publicly available text recognition dataset Union14M. Next, we will describe how the model was trained and tested on these two datasets.

3.1 Pre-training

Pre-training on SLP34K

Please make sure that data.7z has been successfully extracted.

cd ./mae

mkdir pretrain_data

cp -r ../ocr_training/data/train/SLP34K_lmdb_train  ./pretrain_data


CUDA_VISIBLE_DEVICES=0,1 python3 -m torch.distributed.launch --nproc_per_node=2 main_pretrain.py --data_path pretrain_data/SLP34K_lmdb_train  --mask_ratio 0.75 --model mae_vit_base_patch16_224x224  --output_dir  pretrain_SLP34K_patch16_224x224  --log_dir  pretrain_SLP34K_patch16_224x224 --batch_size 128 --norm_pix_loss --epochs 1500 --warmup_epochs 40 --blr 1.5e-4 --weight_decay 0.05

Pre-training on Union14M

cd ./mae

CUDA_VISIBLE_DEVICES=1,2 python3 -m torch.distributed.launch --nproc_per_node=2  \
main_pretrain.py --data_path pretrain_data/Union14M-U/boo32_lmdb \ 
pretrain_data/Union14M-U/cc_lmdb/  pretrain_data/Union14M-U/openvino_lmdb/ \
--mask_ratio 0.75 --model mae_vit_base_patch4_32x128 --output_dir  pretrain_Union14M_patch4_32x128 \
--log_dir  pretrain_Union14M_patch4_32x128 --batch_size 50 --norm_pix_loss --epochs 20 \
--warmup_epochs 40 --blr 1.5e-4 --weight_decay 0.05

3.2 SLP/Text recognition fine-tuning

After the completion of the first stage of pre-training, we enter the second stage of fine-tuning training for SLP recognition. Please follow the codes below to complete the fine-tuning training.

Training on SLP34K (for SLP recognition)

cd ./ocr_training

HYDRA_FULL_ERROR=1 CUDA_VISIBLE_DEVICES=0,1  python train.py model.img_size=[224,224] \
charset=SLP34K_568  dataset=SLP34K model.batch_size=60  trainer.gpus=2 \ 
trainer.val_check_interval=200  model=maevit_infonce_plm trainer.max_epochs=100 \ 
hydra.run.dir=outputs/ship/maevit_infonce_plm model.max_label_length=50  \
+trainer.accumulate_grad_batches=5  model.mae_pretrained_path=./pretrain_model/ship/224x224_pretrain_ship_vit_checkpoint-1499.pth

Training on Union14M (for text recognition)

cd ./ocr_training

HYDRA_FULL_ERROR=1 CUDA_VISIBLE_DEVICES=0,1  python train.py model.img_size=[32,128] \
charset=union14m  dataset=union14m model.batch_size=90  trainer.gpus=2 \
trainer.val_check_interval=200  model=maevit_infonce_plm trainer.max_epochs=10 \ 
hydra.run.dir=outputs/union14m/maevit_infonce_plm model.max_label_length=25 \
+trainer.accumulate_grad_batches=5 model.mae_pretrained_path=./pretrain_model/union14m/32x128_pretrain_union14m_vit_checkpoint_20.pth

3.3 Evaluation

After the model training is completed, we can proceed with the performance evaluation of the model. Next, we will provide test scripts for evaluating the performance on different datasets.

Evaluation on SLP34K

cd ./ocr_training

./test.py outputs/ship/maevit_infonce_plm/checkpoints/last.ckpt --batch_size 700
--test_data SLP34K --test_dir SLP34K_lmdb_benchmark

Evaluation on Union14M

cd ./ocr_training

./test.py outputs/union14m/maevit_infonce_plm/checkpoints/last.ckpt --batch_size 700 \
--test_data TEST_UNION14M --test_dir Union14m_benchmark

Evaluation on six common benchmarks

cd ./ocr_training

./test.py checkpoint/Union14M_maevit_infonce_plm_SOTA/checkpoints/last.ckpt --batch_size 700 \
--test_data TEST_BENCHMARK --test_dir SixCommon_benchmark

4. Local online testing with Gradio

We provide a local SLP recognition performance testing script based on Gradio. With the dataset we provide and the model provided by us or trained by yourself, you can test the SLP recognition performance locally.

cd ./ocr_training

python gradio_SLP34K.py

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
image/README		image/README
mae		mae
ocr_training		ocr_training
.gitignore		.gitignore
README.md		README.md
SLP34K_Dataset_License_Agreement.pdf		SLP34K_Dataset_License_Agreement.pdf
SLP34K数据集许可协议.pdf		SLP34K数据集许可协议.pdf
aaai2025_ocr_supp.pdf		aaai2025_ocr_supp.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Ship License Plate Recognition in the Wild: A Large Benchmark and Strong Baseline

Table of Contents

1. Runtime environment

1.1 Code acquisition

1.2 Self-supervised pre-training environment

1.3 SLP recognition fine-tuning environment

2. Dataset and pre-trained models download

2.1 Dataset download

2.2 Pre-trained models download

3. Model training

3.1 Pre-training

Pre-training on SLP34K

Pre-training on Union14M

3.2 SLP/Text recognition fine-tuning

Training on SLP34K (for SLP recognition)

Training on Union14M (for text recognition)

3.3 Evaluation

Evaluation on SLP34K

Evaluation on Union14M

Evaluation on six common benchmarks

4. Local online testing with Gradio

About

Uh oh!

Releases

Packages

Languages

HuiGuanLab/SLP34K

Folders and files

Latest commit

History

Repository files navigation

Towards Ship License Plate Recognition in the Wild: A Large Benchmark and Strong Baseline

Table of Contents

1. Runtime environment

1.1 Code acquisition

1.2 Self-supervised pre-training environment

1.3 SLP recognition fine-tuning environment

2. Dataset and pre-trained models download

2.1 Dataset download

2.2 Pre-trained models download

3. Model training

3.1 Pre-training

Pre-training on SLP34K

Pre-training on Union14M

3.2 SLP/Text recognition fine-tuning

Training on SLP34K (for SLP recognition)

Training on Union14M (for text recognition)

3.3 Evaluation

Evaluation on SLP34K

Evaluation on Union14M

Evaluation on six common benchmarks

4. Local online testing with Gradio

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages