Skip to content

Generating insights from Movie Ratings dataset using MongoDB queries

Notifications You must be signed in to change notification settings

federicatopazio/MongoDB_data_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 MongoLens: Insights into Movie Trends with MongoDB & Python

This project explores the MovieLens dataset using MongoDB for non-relational data handling and Python for querying, analysis, and visualization. It aims to uncover patterns in movie popularity, user preferences, genre evolution, and rating behaviors over time.

πŸ“¦ Project Structure

β”œβ”€β”€ notebooks/
β”‚ └── dataWrangling.ipynb # Data loading, merging, transformation & JSON export
β”‚ └── extra.ipynb # MongoDB queries & visualizations using Python
β”œβ”€β”€ merged_movies_ratings.json # Final processed dataset
β”œβ”€β”€ dump/ # intermediate datasets
β”œβ”€β”€ Pictures/
β”œβ”€β”€ SMBUD Project - Federica Topazio.pdf # Full report
β”œβ”€β”€ README.txt # Citations of dataset
└── README.md # This file

πŸ—ƒ Dataset

MovieLens 100k dataset containing:

  • ~100,000 ratings from 610 users
  • 9,742 movies with genre metadata
  • Timestamps for all ratings (from 1996 to 2018)

Main fields:

  • userId, movieId, title, genres, rating, timestamp, release year

πŸ”§ Technologies Used

  • MongoDB: flexible document-based schema & aggregation queries
  • Python (pandas, matplotlib, seaborn, pymongo): data wrangling and visualization
  • Jupyter Notebooks: analysis and execution of Python code

πŸ” Analysis Topics

βœ… Data Wrangling (See: dataWrangling.ipynb)

  • CSV loading, cleaning & transformation
  • Genre parsing and timestamp formatting
  • Normalization of ratings
  • JSON export for MongoDB

πŸ“Š MongoDB Queries & Visualizations (See: extra.ipynb)

  • Average rating per genre
  • User behavior based on rating activity
  • Most polarizing movies (variance in ratings)
  • Rating trends over time (e.g., for Toy Story)
  • Genre popularity and evolution over the years
  • Temporal patterns (monthly ratings)
  • Correlation between number of ratings and average rating

πŸ“ˆ Example Visualizations

  • 🎭 Bar chart: average rating per genre
  • πŸ‘€ Scatter plot: user engagement vs average rating
  • 🧨 Variance bar chart: most polarizing movies
  • πŸ“† Line plot: rating trends for specific movies over time
  • 🧊 Heatmap: genre popularity changes by year
  • πŸ“‰ Monthly rating trends

🧠 Key Learnings

  • MongoDB is well-suited for semi-structured datasets with nested lists (e.g., genres)
  • Aggregation pipelines enable complex, efficient querying
  • Visual exploration of rating patterns can inform recommender systems

✍️ Author

Federica Topazio
Politecnico di Milano | Systems and Methods for Big and Unstructured Data (2023–2024)

πŸ“œ License

This project is licensed for academic and research purposes.

About

Generating insights from Movie Ratings dataset using MongoDB queries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published