Welcome to my personal notes for the Data Engineer Handbook Bootcamp.
This repository is my learning journal, containing summaries, key concepts, and lab solutions for the 6-week bootcamp. It complements my forked Data Engineer Handbook repo.
data-engineer-notes/
βββ README.md
βββ resources.md
βββ assets/
βββ images/
βββ week00/
β βββ summary.md
β βββ key-concepts.md
β βββ lab-notes.md
β βββ lab00/
β βββ solution.ipynb
β βββ ... # Artifacts from bootcamp materials
βββ week01/
β βββ (similar structure)
βββ ...
βββ week06/
βββ (similar structure)Each week contains:
- Summary: Key takeaways from the week
- Key Concepts: Detailed explanations and examples of core ideas
- Lab Notes: Observations, detailed notes, and troubleshooting during labs
- Labs: Solutions for each lab
- Original Handbook: DataExpert-io/data-engineer-handbook
- My Fork: pizofreude/data-engineer-handbook
- Module 1: Bootcamp Orientation - Database setup and Boot Camp Kickoff [Week 0]
- Bootcamp Kickoff | 20 min
- Boot Camp Database Setup | 20 min
- Module 2: Dimensional Data Modeling [Week 1]
- Dimensional Data Modeling Complex Data Type and Cumulation Day 1 Lecture | 43 min
- Dimensional Data Modeling Complex Data Type and Cumulation Day 1 Lab | 41 min
- Dimensional Data Modeling: Building Slowly Changing Dimensions Day 2 Lecture | 40 min
- Dimensional Data Modeling: Building Slowly Changing Dimensions Day 2 Lab | 45 min
- Dimensional Data Modeling: Graph Data Modeling Day 3 Lecture | 34 min
- Dimensional Data Modeling: Graph Data Modeling Day 3 Lab | 46 min
- Dimensional Data Modeling - Week 1 Assignment
- Module 3: Fact Data Modeling [Week 2]
- Fact Data Modeling: Core Concepts, Deduplication Day 1 Lecture | 52 min
- Fact Data Modeling: Practical Insights into Data Modeling Day 1 Lab | 40 min
- Fact Data Modeling: Core Elements in Data Modeling Day 2 Lecture | 31 min
- Fact Data Modeling: Compact Tables for Efficient Data Representation Day 2 Lab | 45 min
- Fact Data Modeling: Minimizing Shuffle and Reducing Facts Day 3 Lecture | 32 min
- Fact Data Modeling: Practical Guide to Formatting and Aggregating Data Day 3 Lab | 30 min
- Fact Data Modeling - Week 2 Assignment
- Module 4: Apache Spark Fundamentals [Week 3]
- Apache Spark: Architecture, Optimization, and Best Practices Day 1 Lecture | 48 min
- Apache Spark: Hands-On for Broadcast and Hash Joins Day 1 Lab | 26 min
- Apache Spark: Managing Spark Jobs and Notebooks Day 2 Lecture | 34 min
- Apache Spark: User-Defined Functions and Broadcast Join Day 2 Lab | 36 min
- Unit Testing Spark Jobs: Importance, Challenges, and Leadership Perspectives Lecture | 41 min
- Unit Testing Spark Jobs: Mastering Spark and PySpark Testing Lab | 27 min
- Spark Fundamentals - Week 3 Assignment
- Module 5: Applying Analytical Patterns [Week 4]
- Applying Analytical Patterns: Exploring SQL, Scaling Projects and Aggregation Analysis Day 1 Lecture | 52 min
- Applying Analytical Patterns: Mastering Growth Accounting and Retention Analysis Day 1 Lab | 34 min
- Applying Analytical Patterns: Recursive CTEs and Window Functions Day 2 Lecture | 44 min
- Applying Analytical Patterns: Aggregations and Cardinality Reduction Day 2 Lab | 33 min
- Applying Analytical Patterns - Week 4 Assignment
- Module 6: Real-time pipelines with Flink and Kafka [Week 5]
- Flink Lab Setup | 7 min
- Streaming Pipelines: Mastering Streaming and Real-time Pipelines Day 1 Lecture | 50 min
- Streaming Pipelines: Setting up Streaming Pipelines Day 1 Lab | 40 min
- Streaming Pipelines: Exploring Data Collection and Processing Day 2 Lecture | 31 min
- Streaming Pipelines: Kafka, Postgres, Spark Integrations and Parallelism Day 2 Lab | 39 min
- Flink - Week 5 Assignment
- Module 7: Data Visualization and Impact [Week 6 Part 1]
- Data Visualization and Impact: Mastering Data Engineering Day 1 Lecture | 39 min
- Data Visualization and Impact: Hands-On with the CSV files Day 1 Lab | 8 min
- Data Visualization and Impact: Insights and Best Practices Day 2 Lecture | 23 min
- Data Visualization and Impact: Exploring Data Visualization and Aggregation Techniques Day 2 Lab | 37 min
- Data Visualization - Week 6 1st Assignment
- Module 8: Data Pipeline Maintenance [Week 6 Part 2]
- Data Pipeline Maintenance: Navigating the Complexities of Data Engineering Day 1 Lecture | 67 min
- Data Pipeline Maintenance: Strategies for Maintenance and Dock Building Day 2 Lecture | 77 min
- Data Pipeline Maintenance - Week 6 2nd Assignment
- Module 9: KPIs and Experimentation [Week 6 Part 3]
- KPIs and Experimentation: Decoding Business Success: Metrics, Growth Strategies and Collaborative Approaches Day 1 Lecture | 55 min
- KPIs and Experimentation: Setting up and Analysing Experiments Day 1 Lab | 36 min
- KPIs and Experimentation: Leading and Lagging Metrics Day 2 Lecture | 65 min
- KPIs and Experimentation - Week 6 3rd Assignment
- Module 10: Data Quality Patterns [Week 7]
- Data Quality Patterns: MIDAS Process from Airbnb Day 1 Lecture | 45 min
- Data Quality Patterns: Spec-Building Document Day 1 Lab | 33 min
- Data Quality Patterns: WAP Patterns Day 2 Lecture | 27 min
This repository now includes a comprehensive practice tracking system to organize daily coding practice across multiple platforms:
- practice/ - Platform-organized coding problems (LeetCode, StrataScratch, HackerRank, NeetCode, Codewars, etc.)
- concepts/ - Reference notes on data structures, algorithms, SQL patterns, and system design
- interview-prep/ - Interview-specific preparation materials (behavioral, technical, system design)
- logs/ - Daily practice logs and progress tracking with statistics dashboard
# Create today's log entry
./scripts/new-day.sh
# Start a new problem
# ./scripts/create-problem.sh <platform> <difficulty> "problem-name"
./scripts/create-problem.sh leetcode medium "problem-name"
# Create a concept note
# ./scripts/link-concept.sh "concept-name" <category>
./scripts/link-concept.sh "Window Functions" sql-patterns
# Generate weekly stats
python scripts/generate-stats.pyHere's your streamlined checklist for logging each problem.
Run once per day (first thing in the morning):
./scripts/new-day.shβ Done! No manual edits needed for this step.
For each new problem you're about to solve:
./scripts/create-problem.sh <platform> <difficulty> "<problem-slug>"Examples:
./scripts/create-problem.sh codewars easy "absolute-value-log-base"
./scripts/create-problem.sh leetcode medium "rank-scores"
./scripts/create-problem.sh stratascratch hard "revenue-analysis"β Done! Folder created, template copied, ready to code.
Navigate to the problem folder:
cd practice/<platform>/<difficulty>/<problem-slug>Open and write your solution:
code solution. sql # For SQL problems
# OR
code solution.py # For Python/algorithm problemsWhat to do:
- βοΈ Paste your working solution code
- βοΈ Add comments explaining key logic (optional but recommended)
- πΎ Save the file
Example:
-- Calculate absolute value and logarithm base 64
SELECT
ABS(number1) AS abs,
LOG(64, number2) AS log
FROM decimals;Open the notes file:
code notes.mdYou need to manually update these sections:
# [Problem Name] β CHANGE THIS
## π Metadata
- **Platform:** [Platform name] β CHANGE THIS
- **Difficulty:** [Easy/Medium/Hard] (Optional: add platform rating like "7 kyu") β CHANGE THIS
- **Date Solved:** 2026-01-03 β β
ALREADY FILLED BY SCRIPT
- **Time Spent:** XX minutes β CHANGE THIS
- **Status:** [β
Solved | π Revisit | β Stuck] β CHANGE THISExample:
# Absolute Value and Log to Base
## π Metadata
- **Platform:** Codewars
- **Difficulty:** Easy (7 kyu)
- **Date Solved:** 2026-01-03 β Script filled this
- **Time Spent:** 15 minutes
- **Status:** β
Solved## π Links
- [Problem URL] β PASTE THE ACTUAL URL HEREExample:
## π Links
- https://www.codewars.com/kata/594a8f2f7ca3c692a4000041/train/sql## π Topics & Tags
- [ ] SQL
- [ ] Window Functions
- [ ] Joins
- [ ] CTEs
- [ ] Python
- [ ] Dynamic ProgrammingCheck the relevant ones:
##π Topics & Tags
- [x] SQL β Put 'x' inside
- [x] Mathematical Functions
- [ ] Window Functions
- [ ] Joins## π Problem Statement
[Paste the problem description here]
### Example Input/Output
```markdown
Input:
Output: What to do:
- βοΈ Copy-paste the problem description from the platform
- βοΈ Add example input/output (if provided)
## π‘ Approach
### Initial Thoughts
[What was your first idea? What patterns did you recognize?]
### Solution Strategy
1. Step 1
2. Step 2
3. Step 3What to do:
- βοΈ Write your thought process (2-3 sentences)
- βοΈ List the steps you took (bullet points)
Example:
## π‘ Approach
### Initial Thoughts
Straightforward application of SQL math functions: ABS for absolute value, LOG for logarithm with custom base.
### Solution Strategy
1. Use `ABS(number1)` to get absolute values
2. Use `LOG(64, number2)` for logarithm base 64
3. Alias columns as required (`abs`, `log`)## π₯οΈ Solution
### Attempt 1 (Initial)
```sql
-- Your first solution hereResult: [Passed/Failed/Timeout]
**What to do:**
- βοΈ Paste your solution code (can be same as `solution.sql`)
- βοΈ Note if it passed or failed
**If you optimized it, add:**
```markdown
### Attempt 2 (Optimized) β
```sql
-- Improved solution
Result: β Passed with better performance
---
#### **G. Complexity Analysis**
```markdown
## β‘ Complexity Analysis
- **Time Complexity:** O(?)
- **Space Complexity:** O(?)
What to do:
- βοΈ Fill in the Big O notation
- βοΈ If you don't know, write: "Time: O(n) - single pass through table"
Example:
## β‘ Complexity Analysis
- **Time Complexity:** O(n) - single pass through table
- **Space Complexity:** O(n) - result set same size as input## π Key Learnings
1.
2.
3. What to do:
- βοΈ Write 2-4 things you learned (this is THE MOST IMPORTANT SECTION!)
Example:
## π Key Learnings
1. **ABS()** - Returns absolute value (distance from zero)
2. **LOG(base, value)** - PostgreSQL syntax for custom base logarithm
3. PostgreSQL uses `LOG(base, value)` while MySQL uses `LOG(value) / LOG(base)`
4. Base-64 logarithm: `LOG(64, 4096) = 2` because 64Β² = 4096## π·οΈ Related Concepts
See: `concepts/sql-patterns/[concept-file]. md`What to do:
- βοΈ If you created a concept note, link it here
- βοΈ Skip if you haven't created a concept yet
Example:
## π·οΈ Related Concepts
See: `concepts/sql-patterns/sql-mathematical-functions.md`Open today's log:
cd ../../../../ # Return to repo root
code logs/2026/01-january. mdFind today's date section and fill in:
### Friday, January 03, 2026
β±οΈ Time: X hours β CHANGE THISExample:
β±οΈ Time: 1.5 hours#### β
Completed
1. Add each problem with:
- Problem name and difficulty
- Key topics
- Link to your solution
- One-line key learning
Example:
#### β
Completed
1. **Codewars - Absolute Value and Log to Base** (Easy/7kyu)
- Topics: ABS(), LOG(), Mathematical functions
- [Solution](../../practice/codewars/easy/absolute-value-log-base/)
- Key learning: PostgreSQL LOG(base, value) syntax differs from MySQL
2. **LeetCode 178 - Rank Scores** (Medium)
- Topics: Window functions, DENSE_RANK
- [Solution](../../practice/leetcode/medium/178-rank-scores/)
- Key learning: DENSE_RANK vs RANK vs ROW_NUMBER differences#### π‘ Learnings
- Write 2-4 broader learnings from today:
Example:
#### π‘ Learnings
- Mathematical functions in SQL are database-specific (PostgreSQL vs MySQL syntax)
- Always check for NULL values when using LOG() with user input
- ABS() is useful for calculating distances and differences
- Created concept note: `concepts/sql-patterns/sql-mathematical-functions.md`#### π― Tomorrow
- [ ] Plan 2-3 things for tomorrow:
Example:
#### π― Tomorrow
- [ ] LeetCode 180 - Consecutive Numbers (Window functions practice)
- [ ] StrataScratch - Revenue analysis problem
- [ ] Review: Self-joins patterngit status
# Add your changes
git add practice/<platform>/<difficulty>/<problem-slug>/
git add logs/2026/01-january. md
# If you created a concept note, add it too
git add concepts/
# Commit with descriptive message
git commit -m "β
[Platform]: [Problem Name] - [Key Topic]"
# Push to GitHub
git pushExample commit messages:
git commit -m "β
Codewars: Absolute Value and Log to Base - SQL math functions"
git commit -m "β
LeetCode 178: Rank Scores - Window functions"
git commit -m "β
StrataScratch: Revenue Analysis - CTEs and aggregations"Run every Sunday (or end of week):
python scripts/generate-stats.pyCopy the output:
## π All-Time Stats
| Platform | Easy | Medium | Hard | Total |
|---------------|------|--------|------|-------|
| Codewars | 5 | 2 | 0 | 7 |
| Leetcode | 12 | 8 | 1 | 21 |
| **Total** | **17** | **10** | **1** | **28** |
π
Last Updated: 2026-01-05 20:30
Paste it into:
code logs/README.mdReplace the old stats section with the new output.
Also update:
## π₯ Current Streaks
- **Daily Practice:** X days β UPDATE THIS MANUALLYCommit:
git add logs/README.md
git commit -m "π Update weekly practice stats"
git pushPrint this and keep it next to you:
β‘ Step 1: ./scripts/new-day.sh (once per day)
For each problem:
β‘ Step 2: ./scripts/create-problem.sh <platform> <difficulty> "<slug>"
β‘ Step 3: Write solution in solution.sql or solution.py
β‘ Step 4: Fill in notes.md:
β‘ Change title
β‘ Update metadata (platform, difficulty, time, status)
β‘ Paste problem URL
β‘ Check topic tags
β‘ Paste problem statement
β‘ Write approach & strategy
β‘ Paste solution code
β‘ Add complexity analysis
β‘ Write key learnings (MOST IMPORTANT!)
β‘ Link concept note (if created)
β‘ Step 5: Update logs/2026/01-january.md:
β‘ Time spent today
β‘ Add problem to "Completed" list
β‘ Write today's learnings
β‘ Plan tomorrow's focus
β‘ Step 6: git add β commit β push
Weekly:
β‘ Sunday: Run generate-stats.py
β‘ Update logs/README.md & the monthly log + practice/README.md with new stats
If you're short on time, focus on:
- β
Solution code (
solution.sql) - β
Key learnings in
notes.md - β Daily log entry
Skip the rest for now, come back later to fill in.
If you solve multiple problems:
- Scaffold all problems first
- Solve all problems
- Update all
notes.mdfiles - Update daily log once (list all problems)
- Single commit at the end
Create editor snippets for repetitive sections like complexity analysis, common tags, etc.
| File | What to Update |
|---|---|
| solution.sql | Your code |
| notes.md | Title, metadata, URL, approach, learnings |
| logs/YYYY/MM-month.md | Time, problems list, learnings, tomorrow's plan |
| logs/README.md | Weekly stats (copy from script output) |
Everything else is automated! π
- Automation Scripts: Quickly scaffold new problems and logs with templates
- Platform-Agnostic: Automatically discovers and tracks any coding platform
- Comprehensive Templates: Detailed templates for problems, concepts, and daily logs
- Progress Tracking: Statistics generation and progress dashboards
- Knowledge Base: Structured concept notes linked to practice problems
See practice/README.md for detailed usage instructions and workflow.