This project provides Apache Airflow based ETL pipelines to process game event data. Player actions are incrementally loaded daily and transformed into analytics tables in Google BigQuery with built-in data quality checks.
- Apache Airflow: For managing ETL workflows
- Python: For data processing and transformations
- Google BigQuery: For data storage and analytics
- Git: Version control
```plaintext
game_data_assignment/
├── dag_fmg.py # Airflow DAG definition
├── files_to_deploy.cfg # Configuration file
├── requirements.txt # Python dependencies
├── .gitattributes # Git attributes
├── .idea/ # IDE project files
├── fmg_packages/ # Custom Python packages
└── test/ # Test files
- Clone and setup:
git clone https://github.com/efesabanogluu/game_data_assignment.git cd game_data_assignment - Install dependencies:
pip install -r requirements.txt
- Initialize Airflow:
airflow db init airflow users create --username admin --password admin --role Admin
- Run services:
airflow webserver --port 8080 & airflow scheduler
Incremental Daily Loads: Only new data is processed to reduce load on source databases.
Data Quality Checks: Predefined checks ensure data accuracy and integrity.
Modular Design: Code is organized for reusability and easy maintenance.
Testable Code: Included tests ensure code correctness and reliability.
The project includes testing framework details:
PyTest: For unit testing unittest: Python’s built-in testing module
Run tests with:
pytest
This ETL pipeline suits data engineers and analysts working on game analytics, such as:
User Behavior Analysis: Analyzing player activities within the game.
Performance Monitoring: Monitoring game server performance metrics.
Revenue Analysis: Analyzing in-game purchase data.