This project combines two powerful data mining techniques to analyze retail transaction data from a UK-based online retail company. The analysis focuses on discovering product purchasing patterns through Association Rule Mining and predicting future sales through Time Series Forecasting.
Source: UCI Machine Learning Repository - Online Retail Dataset
Contains all transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based online retail company specializing in unique all-occasion gifts.
- Handle missing values and invalid records
- Convert data types and create derived features
- Prepare data for both association rules and time series analysis
- Summary statistics for numerical columns
- Distribution plots for quantity and unit price
- Top products analysis (most frequently purchased)
- Geographic distribution (sales by country)
- Basket size distribution
- Time series visualization with moving averages
- Apriori Algorithm implementation
- Frequent itemset discovery (min_support = 0.02)
- Association rule generation (min_confidence = 0.3)
- Rule evaluation metrics: Support, Confidence, Lift
- Top 10 most interesting rules identification
- 7-Day Simple Moving Average model
- Train/Test split (80%/20%)
- Model evaluation with MAE and MAPE metrics
- 14-day future sales prediction
- Distribution Plots: Quantity and Unit Price histograms
- Top Products: Bar chart of most frequently purchased products
- Geographic Distribution: Sales by country
- Basket Size: Distribution of items per transaction
- Time Series: Daily sales with 7-day moving average
- Association Rules: Scatter plot (Support vs Confidence colored by Lift)
- Forecasting: Actual vs Predicted sales comparison
data-analysis/
├── code.ipynb
├── Output/
│ ├── baskets.pkl # Preprocessed transaction baskets for association rules
│ ├── daily_sales.csv # Aggregated daily sales data
│ ├── top_10_association_rules.csv # Top 10 association rules with metrics
│ └── time_series_metrics.txt # Forecasting model performance metrics
├── dataset/
│ └── Online Retail.xlsx # Original dataset
├── README.md # This file
- Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of VLDB, 487-499.
- Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice. Available at: https://otexts.com/fpp2/
- Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.
- UCI Machine Learning Repository: Online Retail Data Set. https://archive.ics.uci.edu/ml/datasets/Online+Retail