Skip to content

CB-LBG/Level-5-Data-Engineer-Learning-Journal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

212 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 Level 5 Data Engineer Learning Journal

🌟 Overview and Purpose

This repository serves as my official Learning Journal and portfolio for the Level 5 Data Engineer Apprenticeship.

It is a documentation of my practical experience, knowledge gained, project work, and personal reflections aligned with the apprenticeship curriculum and the Data Engineer (Level 5) standard.

Repository Structure

The journal is organized into modules/folders reflecting the core areas of data engineering practice, ensuring all competencies required for the End-Point Assessment (EPA) are logged and demonstrable.

  • ./01_Core_Concepts/: Notes, definitions, and foundational knowledge (e.g., Data Architecture, Ethics, Governance).
  • ./02_SQL_Data_Modelling/: SQL scripts, data modeling diagrams, and database practice.
  • ./03_Python_ETL/: Python scripts for data manipulation (Pandas), scripting, and basic ETL/ELT processes.
  • ./04_Data_Pipelines_Orchestration/: Code and configurations for building, automating, and monitoring data workflows (e.g., Airflow, Azure Data Factory).
  • ./05_Cloud_Infrastructure/: Notes and setup scripts (IaC) related to cloud platforms (AWS/Azure/GCP) for data solutions.
  • ./06_Capstone_Project/: The final, significant project used for EPA preparation (e.g., building a complete, end-to-end data platform).
  • ./Documentation/: Technical documentation, requirements gathering, and professional discussion preparation.

🛠️ Key Skills and Technologies Demonstrated

This journal documents hands-on mastery in the following core areas:

1. Programming & Data Processing

  • Python: Advanced scripting, data manipulation with Pandas/Numpy, and software development best practices (testing, version control).
  • SQL: Complex queries, stored procedures, performance tuning, and database administration (DDL/DML).
  • PySpark/Scala (Optional): Working with distributed computing frameworks for Big Data processing.

2. Data Infrastructure & Storage

  • Data Modelling: Relational (3NF/Dimensional) and NoSQL modeling for different use cases.
  • Data Warehousing: Concepts of Data Lakes, Data Lakehouses, and Data Marts (e.g., Snowflake, Microsoft Fabric).
  • Cloud Platforms: Implementation of data solutions using [AWS / Azure / GCP] services (e.g., S3/Blob Storage, EC2/VMs, RDS/Managed Databases).

3. Data Flow & Automation (ETL/ELT)

  • Pipeline Tools: Building and managing robust data pipelines using tools like Apache Airflow, Azure Data Factory, or similar orchestrators.
  • Streaming: Experience with batch, micro-batch, and real-time streaming concepts (e.g., Kafka, Azure Stream Analytics).

4. Software Engineering & DevOps

  • Version Control: Professional usage of Git and GitHub for collaborative development.
  • Containerization: Introduction to Docker for dependency management and reproducible environments.
  • CI/CD: Implementing basic Continuous Integration and Continuous Deployment for data pipelines (e.g., GitHub Actions).

About

Learning Journal for Level 5 Data Engineer Apprenticeship

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages