Skip to content

Data Analysis: Association Rule Mining & Sales Forecasting

Notifications You must be signed in to change notification settings

elhamabedi/data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Data Analysis: Association Rule Mining & Sales Forecasting

This project combines two powerful data mining techniques to analyze retail transaction data from a UK-based online retail company. The analysis focuses on discovering product purchasing patterns through Association Rule Mining and predicting future sales through Time Series Forecasting.

Dataset

Source: UCI Machine Learning Repository - Online Retail Dataset

Contains all transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based online retail company specializing in unique all-occasion gifts.

Methodology

Data Preprocessing

  • Handle missing values and invalid records
  • Convert data types and create derived features
  • Prepare data for both association rules and time series analysis

Exploratory Data Analysis (EDA)

  • Summary statistics for numerical columns
  • Distribution plots for quantity and unit price
  • Top products analysis (most frequently purchased)
  • Geographic distribution (sales by country)
  • Basket size distribution
  • Time series visualization with moving averages

Association Rule Mining

  • Apriori Algorithm implementation
  • Frequent itemset discovery (min_support = 0.02)
  • Association rule generation (min_confidence = 0.3)
  • Rule evaluation metrics: Support, Confidence, Lift
  • Top 10 most interesting rules identification

Time Series Forecasting

  • 7-Day Simple Moving Average model
  • Train/Test split (80%/20%)
  • Model evaluation with MAE and MAPE metrics
  • 14-day future sales prediction

Visualizations

  • Distribution Plots: Quantity and Unit Price histograms
  • Top Products: Bar chart of most frequently purchased products
  • Geographic Distribution: Sales by country
  • Basket Size: Distribution of items per transaction
  • Time Series: Daily sales with 7-day moving average
  • Association Rules: Scatter plot (Support vs Confidence colored by Lift)
  • Forecasting: Actual vs Predicted sales comparison

Project Structure

data-analysis/
├── 	code.ipynb                    
├── Output/
│   ├── baskets.pkl               # Preprocessed transaction baskets for association rules
│   ├── daily_sales.csv           # Aggregated daily sales data
│   ├── top_10_association_rules.csv  # Top 10 association rules with metrics
│   └── time_series_metrics.txt   # Forecasting model performance metrics
├── dataset/
│   └── Online Retail.xlsx        # Original dataset
├── README.md                     # This file

References

  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of VLDB, 487-499.
  2. Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice. Available at: https://otexts.com/fpp2/
  3. Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.
  4. UCI Machine Learning Repository: Online Retail Data Set. https://archive.ics.uci.edu/ml/datasets/Online+Retail