This project presents an end-to-end SQL exploratory data analysis (EDA) of a fictional tech-based business across a two-year period (2024–2025). The dataset is modelled using a star schema (fact + dimension tables) to reflect real-world analytics workflows.
The analysis focuses on revenue performance, customer behaviour, seasonality, churn/reactivation, pricing behaviour, and operational anomalies. The dataset is clean enough for immediate exploration but intentionally includes a small number of missing values and pricing outliers to support anomaly investigation and validation of assumptions.
- Understand overall revenue performance and identify key products driving sales
- Analyse customer behaviour by segment and location to support targeting strategies
- Identify seasonal demand patterns and product-level seasonality
- Assess churn and reactivation trends over time
- Detect operational anomalies (outliers + missing data), validate assumptions, and assess impact on conclusions
- Demonstrate SQL capability in PostgreSQL (CTEs, window functions, time analysis) and business-focused interpretation
- Which products generate the most sales?
- Which customer segments (SMB, Consumer, Enterprise) are most valuable?
- Are revenues growing over time (month/quarter/year)?
- Which cities or segments are purchasing the most?
- Are any customer segments paying more per transaction? If so, for which products?
- Do certain customer segments buy certain product types?
- Which periods spike in sales?
- Are some products more seasonal?
- Do customers churn or reappear based on time?
- Do outliers represent bulk deals, pricing errors, or special contracts? Is missing data affecting visibility?
The dataset follows a star schema design with one central fact table (fact_sales) and supporting dimensions (dim_customer, dim_product, dim_date). This structure enables efficient aggregation, segmentation, and time-series analysis.
A: Products 37 and 38 generated the most sales by a significant margin, outperforming the third-best product by approximately £1.5–£2.5 million. These are clear revenue drivers and should remain a strategic focus.
A: The Consumer segment is the most valuable, contributing roughly 2× total sales compared to the next highest segment (SMB).
A: Revenue decreased slightly year-on-year, which is worth monitoring to ensure it does not become a sustained trend. 2024 showed stronger seasonality (notable peaks in March and December). 2025 was more consistent across the year, although Q2 underperformed relative to other quarters and would benefit from deeper investigation into potential causes.
Business impact:
➡️ Identify where the business should invest, promote, or expand.
Relevant departments:
- Finance
- Commercial / Strategy
- Marketing
These insights support revenue forecasting, strategic investment decisions, and prioritisation of high-performing products and customer segments.
A: Bristol recorded the highest volume of orders (633). The customer segment placing the most orders was Consumer with 2,295 total orders.
A: Initial analysis suggested Enterprise customers paid significantly more than other segments. However, after excluding extreme pricing anomalies, segment-level differences narrowed considerably. The refined analysis shows that pricing premiums are small and largely consistent across segments, with Consumers paying a slightly higher average premium. Importantly, price deviations from list are driven by specific products rather than customer segments, highlighting product-level pricing opportunities over broad segment-based pricing strategies.
Note: This highlights the importance of validating assumptions against outliers, as a small number of extreme transactions can materially distort pricing conclusions.
A: All three segments purchase more SaaS Subscription products than any other category, with the Consumer segment ordering a significantly higher number of orders and quantity. This suggests an opportunity to explore ways to increase profitability and monetisation within the SaaS subscription portfolio.
Business impact:
➡️ Marketing targeting, tailored product offers, and regional strategies.
Relevant departments:
- Marketing
- Sales
- Product Management
This analysis helps guide marketing, sales priorities, and product positioning by examining how different customer segments purchase and generate value.
A: Sales show recurring spikes in Q4, particularly October and December, which are the highest-volume months in both years. Outside of Q4, demand also lifts in mid-year (Q2–Q3)—notably June 2024 and July/September 2025—with a smaller but consistent bump in early Q1 (January/March). Overall, the pattern suggests a strong year-end peak plus secondary peaks in mid-year and early-year periods.
A: Several products show clear seasonal demand, with peak quarterly sales often 1.5–2.2× higher than their average quarterly sales. The strongest seasonality is observed in Product 3 (Q3 2025), where sales were 2.21× the typical quarterly level, followed by Product 39 (Q4 2025) and Product 9 (Q3 2024), both approaching 2× average demand. Seasonal spikes are most commonly concentrated in Q2–Q4.
A: Customer churn is largely balanced by reactivations, indicating stable demand driven by returning customers rather than sustained new customer growth.
Business impact:
➡️ Inventory planning, subscription renewal focus, and demand forecasting.
Relevant departments:
- Operations / Supply Chain
- Finance
- Marketing
- Customer Success
Understanding seasonal demand patterns enables better inventory planning, cash-flow forecasting, campaign timing, and proactive customer engagement.
Q: Do outliers represent bulk deals, pricing errors, or special contracts? Is missing data affecting visibility?
A: Further anomaly analysis identified a small number of extreme pricing events that occurred only once per product. These transactions were deemed likely data quality issues rather than legitimate pricing behaviour and were excluded from pricing analysis.
Quantifying the impact of these exclusions showed that removing just three anomalous transactions reduced the maximum unit price from £120,000 to £18,901, while the average unit price changed only marginally (£1,442 → £1,402). At the product level, each affected product contained a single extreme transaction, inflating both average and maximum prices, while normal pricing behaviour remained stable once removed. This confirms the outliers materially distorted pricing conclusions without representing genuine commercial patterns.
In contrast, repeated high-value transactions for other products were retained as they likely reflect legitimate bundled or contract-level pricing.
Records with missing customer city or segment information were excluded from segmentation and geographic analyses to ensure accurate comparisons. These transactions were retained in the dataset for overall performance metrics but omitted where attribution was required.
Missing data summary:
- fact_sales: 10 missing
sales_amountvalues - dim_product: 0 missing values
- dim_date: 0 missing values
- dim_customer: 3 missing
cityvalues and 3 missingsegmentvalues
Business impact:
➡️ Improve data capture systems and increase confidence in reporting and decision-making.
Relevant departments:
- Data / Analytics
- Finance
- Sales Operations
These findings improve data reliability, protect reporting accuracy, and support better governance around pricing, contracts, and operational reporting.
- Pricing analysis excludes a small number of one-off extreme transactions identified as data anomalies.
- Segment and city analyses exclude records with missing customer attributes to avoid misclassification.
- Some transactions may represent bundled or contract-level pricing, meaning quantity does not always reflect discrete units sold.
- PostgreSQL
- CTEs and window functions (RANK, ROW_NUMBER)
- Time-based analysis (month, quarter, year)
- Customer segmentation, geographic analysis, and pricing analysis
- Anomaly detection, validation of assumptions, and data-quality interpretation
- Business-focused insight generation and stakeholder relevance mapping
This project demonstrates how SQL-based exploratory analysis can support multiple business functions, from strategic decision-making and pricing optimisation to operational planning and data quality improvement.
Project created by Sean Worrall as part of a SQL analytics portfolio.

