Skip to content

This project detects deepfake videos using a Vision Transformer (ViT) model, classifying frames as real or manipulated with high accuracy.

Notifications You must be signed in to change notification settings

GOKULRAM-K/DeepShield_Deep_Fake_Detecting_Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deepfake Video Detection with Vision Transformer (ViT)

This project detects deepfake videos using a Vision Transformer (ViT) model, classifying frames as real or manipulated with high accuracy.

Table of Contents

  • Dataset Preparation
  • Model Architecture
  • Training Process
  • Validation and Metrics
  • Video Prediction
  • Installation and Setup
  • Results
  • Website Usage

Dataset Preparation

Video Directories:

  • Real Videos: /DFD_original_sequences
  • Manipulated Videos: /DFD_manipulated_sequences

Frame Extraction:

Extract frames at 1 frame per second for model input.

Model Architecture

Model Architecture Diagram

  • Base Model: ViT (vit_base_patch16_224)
  • Input Size: 224x224 pixels
  • Classes: 2 (Real, Manipulated)
  • Pretrained Weights: Yes (ImageNet)

Model Initialization:

model = timm.create_model('vit_base_patch16_224', pretrained=True, num_classes=2)
model.to(device)
model = nn.DataParallel(model)

Training Process

Transformations:

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

Training Loop:

for epoch in range(num_epochs):
    model.train()
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

Validation and Metrics

Classification Report:

print(classification_report(all_labels, all_predictions, target_names=['Real', 'Manipulated']))

Confusion Matrix:

sns.heatmap(cm, annot=True, cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

Video Prediction

def predict_video(video_path, model, transform, device):
    cap = cv2.VideoCapture(video_path)
    real_count, manipulated_count = 0, 0
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        image = transform(Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))).unsqueeze(0).to(device)
        with torch.no_grad():
            outputs = model(image)
            _, predicted = torch.max(outputs, 1)
        real_count += (predicted.item() == 0)
        manipulated_count += (predicted.item() == 1)
    cap.release()

Installation and Setup

Install Packages:

pip install timm torch torchvision opencv-python pillow scikit-learn seaborn matplotlib

CUDA Verification:

print("CUDA Available:", torch.cuda.is_available())
print("GPU Name:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU")

Results

  • Training Accuracy: ~89.71%
  • Validation Accuracy: ~87.77%

Website Usage

Website Landing Page

Upload Interface

Processing Results

Contributors

Yadeesh T

Email: yadeesh005@gmail.com

LinkedIn: Profile

Gokul Ram K

Email: gokul.ram.kannan210905@gmail.com

LinkedIn: Profile

Rohit N

Email: rohit84.official@gmail.com

LinkedIn: Profile

Rahul B

Email: rahulbalachandar24@gmail.com

LinkedIn: Profile

About

This project detects deepfake videos using a Vision Transformer (ViT) model, classifying frames as real or manipulated with high accuracy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published