Duplicate Question Detection using Machine Learning

📌 Overview

Duplicate Question Detection is a Machine Learning and Natural Language Processing (NLP) project that identifies whether two given questions are semantically identical, even if they are phrased differently. This system helps reduce redundancy in question–answer platforms, forums, search engines, and customer support systems.

🎯 Problem Statement

Given a pair of questions, predict whether they have the same meaning.

1 → Duplicate questions
0 → Non-duplicate questions

🎯 Objectives

Apply NLP techniques to real-world text data
Perform text preprocessing and feature extraction
Train a supervised machine learning model
Evaluate performance using standard classification metrics

🗂 Dataset

The dataset contains question pairs with a binary label indicating duplication.

Columns:

Question 1
Question 2
Label (0 or 1)

⚙️ Technologies Used

Python
Jupyter Notebook
NumPy
Pandas
Scikit-learn
NLTK / SpaCy
Matplotlib
Seaborn

🔄 Project Workflow

Load the dataset
Text preprocessing
- Lowercasing
- Removing punctuation
- Stopword removal
- Tokenization
Feature engineering (Bag of Words / TF-IDF)
Model training
Model evaluation

📊 Evaluation Metrics

Accuracy
Precision
Recall
F1 Score

▶️ How to Run

Clone the repository:
```
git clone <repository-url>
```

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Duplicate_Question_Detection.ipynb		Duplicate_Question_Detection.ipynb
README.md		README.md
app.txt		app.txt
duplicate_model.pkl		duplicate_model.pkl
vectorizer.pkl		vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Duplicate Question Detection using Machine Learning

📌 Overview

🎯 Problem Statement

🎯 Objectives

🗂 Dataset

⚙️ Technologies Used

🔄 Project Workflow

📊 Evaluation Metrics

▶️ How to Run

About

Uh oh!

Releases

Packages

Languages

rishig47-dev/Duplicate-Question-Detection

Folders and files

Latest commit

History

Repository files navigation

Duplicate Question Detection using Machine Learning

📌 Overview

🎯 Problem Statement

🎯 Objectives

🗂 Dataset

⚙️ Technologies Used

🔄 Project Workflow

📊 Evaluation Metrics

▶️ How to Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages