Review Fraud Detection System

Overview

This project implements a machine learning system that classifies product reviews as either fraudulent or non-fraudulent. The goal is to help consumers identify authentic reviews that provide reliable information about products they're considering purchasing.

The system uses a two-stage approach:

Sentiment Classification: Determines whether a review is positive or negative
Fraud Detection: Identifies reviews that are likely to be fraudulent (computer-generated)

Key Features

BERT-based deep learning models fine-tuned for review classification
Custom data loaders and preprocessing pipelines
Dual model approach (with/without review titles)
CLI tool for testing fraud detection on custom reviews
Comprehensive evaluation metrics and visualization

Project Structure

Fraud Classification/
├──App/
│   ├── config/              # Holds config.yaml, which contains hyperparams, etc...
│   ├── model/               # Holds the model that is being used to classify fraud
│   ├── cli.py               # The way you will interact with the model
│   └── model_def.py  
├── BERT_model/
│   │   ├── Fraud Classification          # Training for Fraud Model
│   │   ├── Sentiment_Analysis            # Training for sentiment classification Model
├── requirements.txt         # Project dependencies
└── README.md                # This file

Installation

Requirements

Python 3.8+
CUDA 12.8+ (optional, for GPU acceleration)

Dependencies

The project requires the following libraries:

torch>=1.7.0  
transformers>=4.5.0  
pandas>=1.0.0  
numpy>=1.19.0  
scikit-learn>=0.24.0  
matplotlib>=3.3.0  
seaborn>=0.11.0  
pyyaml>=5.4.0  
huggingface_hub[hf_xet]>=0.23.0

Standard Installation

pip install -r requirements.txt

Manual Installation

pip install torch>=1.7.0 \
            transformers>=4.5.0 \
            pandas>=1.0.0 \
            numpy>=1.19.0 \
            scikit-learn>=0.24.0 \
            matplotlib>=3.3.0 \
            seaborn>=0.11.0 \
            pyyaml>=5.4.0 \
            'huggingface_hub[hf_xet]>=0.23.0'

GPU Support

For GPU acceleration, install the CUDA-compatible version of PyTorch:

# First, install CUDA 12.8 from NVIDIA's website
# Then install the CUDA-compatible PyTorch version

Usage

Dataset Preparation

Sentiment Analysis Datasets:
- Download the Amazon Review Polarity CSV dataset
- Place it in the datasets/ directory corresponding to amazon_review_sentiment
Fraud Classification Dataset:
- A sample dataset is provided in datasets/fraud_data.csv (which is only needed by fraud classifier training)
- The below link is for if you want to run any of the files in processed_datasets/amazon_review_dataset_2018
  - For full reproduction, download the 2018 Amazon Review Dataset

Training Models

Sentiment Classification

# Train the model with review titles
cd BERT_model/Sentiment_Analysis
python train.py

# Train the model without review titles
python train_no_title.py

Fraud Classification

cd BERT_model/Fraud_Classification
python train.py

After training, select the best model checkpoint (typically the one with the highest validation accuracy) and move it to the model/ directory.

Testing Fraud Detection

cd App
python cli.py

This launches an interactive CLI where you can enter a review text and rating to test if it's classified as fraudulent.

Results

The current implementation achieves:

94.3% accuracy on sentiment classification with review titles
92.1% accuracy on sentiment classification without titles
100% validation accuracy on fraud detection (epoch 3 model)

Future Improvements

Increase the training dataset size for fraud classification
Incorporate product category stratification
Deploy as a web service with API
Experiment with alternative model architectures
Detect human-written fraudulent reviews (not just computer-generated)

Note

This project is an initial proof of concept with satisfactory results. Due to computational resource constraints, models were trained on a limited dataset. The approach demonstrates viability, but would benefit from additional data and computing resources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Review Fraud Detection System

Overview

Key Features

Project Structure

Installation

Requirements

Dependencies

Standard Installation

Manual Installation

GPU Support

Usage

Dataset Preparation

Training Models

Sentiment Classification

Fraud Classification

Testing Fraud Detection

Results

Future Improvements

Note

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
App		App
BERT_model		BERT_model
Datasets		Datasets
.gitignore		.gitignore
README.md		README.md
note.txt		note.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Review Fraud Detection System

Overview

Key Features

Project Structure

Installation

Requirements

Dependencies

Standard Installation

Manual Installation

GPU Support

Usage

Dataset Preparation

Training Models

Sentiment Classification

Fraud Classification

Testing Fraud Detection

Results

Future Improvements

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages