Production-style AWS/EKS DevOps learning platform.
Timeline: Part-time weekends (12 hrs/week) | Budget: $250/month | Progress: Weeks 0-15 ✅ (Weeks 14, 16-18 skipped)
make up # Create infrastructure + configure kubectl
make down # Destroy everything| Week | Topic | Status |
|---|---|---|
| 0 | AWS Setup, Billing, Terraform State | ✅ |
| 1 | VPC Foundation | ✅ |
| 2 | EKS Cluster | ✅ |
| 3 | GitOps (Argo CD) | ✅ |
| 4 | AWS Load Balancer Controller | ✅ |
| 5 | ExternalDNS | ✅ |
| 6 | TLS (cert-manager) | ✅ |
| 7 | CI/CD Build (ECR, GitHub Actions) | ✅ |
| 8 | CI/CD Deploy (GitOps flow) | ✅ |
| 9 | Observability: Metrics (Container Insights) | ✅ |
| 10 | Observability: Logs & Traces | ✅ |
| 11 | Scaling: Karpenter | ✅ |
| 12 | Stateful: DynamoDB | ✅ |
| 13 | Security & Policy Enforcement | ✅ |
| 14 | Async: SQS/SNS Workers | ⏭️ |
| 15 | Resilience & Chaos | ✅ |
| 16 | EKS Upgrade | ⏭️ |
| 17 | Multi-Region & DR | ⏭️ |
| 18 | Cost Optimization & Wrap-Up | ⏭️ |
State Backend: S3 ryan-eks-lab-tfstate + DynamoDB eks-lab-tfstate-lock
Goal: AMP + AMG + ADOT → CloudWatch Container Insights
AMP was ~$20/day (too expensive). Switched to Container Insights via amazon-cloudwatch-observability addon.
Note: Before descoping, hit ADOT bug where relabel_configs couldn't construct __address__ for custom app metrics. See docs/week10-guestbook-metrics-investigation.md.
Cost: ~$3-5/month
Goal: Centralized logging and distributed tracing
- Fluent Bit via
amazon-cloudwatch-observabilityaddon → CloudWatch Logs - Structured JSON logging in guestbook app (user, action, trace_id, client_ip)
- X-Ray tracing configured via CloudWatch Agent OTLP endpoints
- Log retention set to 7 days (Terraform-managed)
- Security review: GuardDuty findings triage, CloudTrail review
Cost: ~$5/session (CloudWatch Logs ~$0.50/GB)
Goal: Automatic node provisioning
- Create Karpenter IAM role (Pod Identity, not IRSA)
- Install Karpenter Helm chart (v1.0.8)
- Create NodePool (Spot + On-Demand, t4g/m6g/c6g Graviton families)
- Create EC2NodeClass (AL2023, 20GB gp3, IMDSv2)
- SQS queue for Spot interruption handling
- Consolidation policy for cost optimization
Cost: ~$4/session (potential Spot savings)
Goal: Admission control and secrets management
- Install Kyverno
- Create policies: no
:latest, require limits, no privileged, require labels - Add Trivy to CI pipeline (fail on HIGH/CRITICAL)
- Install External Secrets Operator
- Sync secret from Secrets Manager → K8s
Cost: ~$5/session (Secrets Manager ~$0.40/secret/month)
Goal: Event-driven architecture
Status: Skipped - Already familiar with SQS/SNS patterns; doesn't add meaningful functionality to guestbook app.
Cost: $0 (skipped)
Goal: Understand failure modes
- Add PodDisruptionBudgets (minAvailable: 2 for guestbook)
- Manual chaos: delete pods, drain nodes (runbook)
- AWS FIS experiment: terminate EC2 instance
- Document runbooks (node failure, crashloop debugging)
Cost: ~$12/session
Goal: Safe upgrade procedures
Status: Skipped - New role doesn't require EKS; lab objectives met.
Cost: $0 (skipped)
Goal: Basic disaster recovery
Status: Skipped - New role doesn't require EKS; lab objectives met.
Cost: $0 (skipped)
Goal: Production-ready cost controls and documentation
Status: Skipped - New role doesn't require EKS; lab objectives met.
Cost: $0 (skipped)
✅ Spin up full platform in <30 min
✅ Deploy via GitOps, zero manual kubectl
✅ Automatic HTTPS + DNS
✅ Metrics, logs, traces observable
✅ Graceful failure handling
✅ Security policies enforced
✅ DB + queue connectivity
✅ Destroy cleanly with one command
✅ Understand and explain every component
infra/ # Terraform
k8s/ # Helm/manifests
argocd/ # Argo CD config + Applications
guestbook/ # Sample app
scripts/ # up.sh, down.sh
docs/ # Week-specific notes
All resources tagged with: project, env, owner, created_at, ttl_hours
- MFA on root, IAM user for daily work
- Security Hub, GuardDuty, Config enabled
- IRSA for pod-to-AWS access
- HTTPS for public endpoints
- No 0.0.0.0/0 except documented ALB