This is the official implementation of the paper Image Hash Minimization for Tamper Detection by S. Maity and R. K. Karsh published at ICAPR 2017.
📌 FAQ
📌 Citation
| Pun et al. | Ours | |
|---|---|---|
| Hash Length | 634 digits | 64 bits |
| Robustness against Noise & Compression | Yes | Yes |
| Detection Accuracy | 60% Approximately | 77% |
- Mathworks MATLAB R2016b or later versions
✅ Dataset
- To test the accuracy of our model, we have used CASIA 2.0 dataset which is no longer available from its official source. However, the official dataset as well as a correctly annotated version can be downloaded from here.
- The dataset we curated having 200 tampered images with tampered area <5% is a private dataset and is unavailable for usage.
- The dataset should be extracted to have the following structure.
├── dataset # Dataset root directory
├── CASIAv2 # CASIAv2.0 dataset root directory
├── original # Directory containing original images
| ├── 1.jpg
| ├── 2.jpg
| ├── ...
|
└── tampered # Directory containing tampered images
├── (1).jpg
├── (2).jpg
├── ...
- The images in the 'original' image directory have naming convention as <image_number>.jpg and the images in the 'Tampered' image directory have naming convention as <(image_number)>.jpg for the corresponding original and tampered image pairs. The image numbers should be consecutive without any breaks.
- Open the
codesdirectory in MATLAB. - Set the path and hyper-parameters in
data_from_original.manddata_from_tampered.m. To imitate our process, ensureK=1as we used single cluster to determine the deviation of the centroid. The hyper-parameter, the thresholdthresfor the strength of the SURF features detected in the images needs to be tuned according to the dataset. The proper CASIAv2.0 root path should be provided in thedataset_pathand thecountshould be set as the total number of original and tampered image pairs.
count = 30; % number of samples <n> in dataset
K = 1; % setting the number of clusters to be formed
thres = 1000; % setting the threshold for SURF feature strength
dataset_path = 'path/to/dataset/root/CASIAv2/'; % setting the dataset path
maxiter_k = 1000000; % setting up the maximum iterations for clustering
- Run the
data_from_original.mscript and make sure that the centroids are saved ascenters_original.matin thecodesdirectory. The script will provide a visualization of the SURF features extracted from each of the original images. - Run the
data_from_tampered.mscript and make sure that the centroids are saved ascenters_tampered.matin thecodesdirectory. The script will provide a visualization of the SURF features extracted from each of the tampered images. - Set relevant parameters in
tampered.m. Thecountshould be set as the total number of original and tampered image pairs andK=1for imitating the method described in the paper, same asdata_from_original.manddata_from_tampered.m.
count = 30; % number of samples <n> in dataset
K = 1; % setting the number of clusters to be formed
- Run
tampered.mscript. The script will print out tampered or not-tampered status for each sample in the dataset and save the Euclidean distance matrix in a file nameddistance.matwhereNaNrepresents the images that are not tampered.
- The k means clustering initial seed is chosen by the k means++ algorithm. It can also be chosen at random.
- Different seeds either from the k means++ or random may result in minor deviation from the reported accuracy.
- We recommend using the k means++ as it generates more stable seeds than the random strategy.
If you use our code for your research, please cite our paper. Many thanks!
@inproceedings{maity2017image,
title={Image Hash Minimization for Tamper Detection},
author={Maity, Subhajit and Karsh, Ram Kumar},
booktitle={Ninth International Conference on Advances in Pattern Recognition (ICAPR)},
year={2017}}



