Welcome to the Data Warehouse & Analytics Project repository! Design, as part of my Portfolio Projects, showcases industry best practices in data engineering and data analytics. This Project demonstrates a comprehensive Data Warehousing and Analytics solution, building a data warehouse to generate actionable insights.
The data architecture for this project follows Medallion Architecture Bronze, Silver, and Gold layers:
Stores raw data as-is from the source systems. Data is ingested from CSV Files into SQL Server Database.
This layer includes data cleansing, standardisation, and normalisation processes to prepare data for analysis.
Houses business-ready data modelled into a star schema required for reporting and analytics.
This project involves:
-Data Architecture:
--Designing a Modern Data Warehouse Using Medallion Architecture Bronze, Silver, and Gold layers.
-ETL Pipelines:
--Extracting, transforming, and loading data from source systems into the warehouse.
-Data Modelling:
--Developing fact and dimension tables optimised for analytical queries.
-Analytics & Reporting:
--Creating SQL-based reports and dashboards for actionable insights.
Develop a modern Data Warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.
-Data Sources:
--Import data from two source systems (ERP and CRM) provided as CSV files.
-Data Quality:
--Cleanse and resolve data quality issues prior to analysis.
-Integration:
--Combine both sources into a single, user-friendly data model designed for analytical queries.
-Scope:
--Focus on the latest dataset only; historization of data is not required.
-Documentation:
--Provide clear documentation of the data model to support both business stakeholders and analytics teams.
Objective
Develop SQL-based analytics to deliver detailed insights into: -Customer Behaviour -Product Performance -Sales Trends
These insights empower stakeholders with key business metrics, enabling strategic decision-making.
data-warehouse-project/
datasets/
- Raw datasets used for the project (ERP and CRM data)
docs/
- Project documentation and architecture details
- etl.drawio -- Draw.io file shows all different techniques and methods of ETL
- data_architecture.drawio -- Draw.io file shows the project's architecture
- data_catalog.md -- Catalogue of datasets, including field descriptions and metadata
- data_flow.drawio -- Draw.io file for the data flow diagram
- data_models.drawio -- Draw.io file for data models (star schema)
- naming-conventions.md -- Consistent naming guidelines for tables, columns, and files
scripts/
- SQL scripts for ETL and transformations
bronze/
- Scripts for extracting and loading raw data
silver/
- Scripts for cleaning and transforming data
gold/
- Scripts for creating analytical models
tests/
-Test scripts and quality files
README.md/
- Project overview and instructions
LICENSE/
- License information for the repository
gitignore/
- Files and directories to be ignored by Git
requirements.txt/
- Dependencies and requirements for the project