devops-final-project

Final project for the DevOps Upskill course.

Background

This project tries to solve an issue I encountered during my bachelor thesis. I was developing ML models and had to compare the results and performance. The process was predominantly manual, involving extensive use of Excel sheets for results analysis. This approach was not only time-consuming but also prone to errors and inefficiencies.

Value stream map of BSc. thesis

This is an approximation of how working on my thesis went. The goal was to compare several different models performance for a specific task. The total time available for me to develop and write the thesis was 4 months. According to the Value Stream Map above, for 3 models developed (the amount that were developed in the end), it would take 130 days or 4.33 months. The main bottleneck was tracking the performance of the models, collecting the results and comparing them manually.

Solution

To address the issue of tracking and preparing models, this project introduces an automated process for ML model training and result tracking on a local Kubernetes cluster. It allows a Python developer to submit their training jobs to the local cluster, track and compare their performance in the MLflow UI. Additionally, a CI pipeline is available for the Python code, which also builds and pushes the Docker container to Dockerhub.

Topics from the DevOps course which are covered:

Value stream mapping
Source control
Branching strategies: Trunk-based development
Building Pipelines
Continuous Integration
Security
Docker
Kubernetes
Infrastructure as code
Secrets management

Overview

infrastructure directory contains the Terraform code for setting up the infrastructure. It assumes that a minikube cluster has been started locally.
- main.tf contains the provider versions
- mlflow.tf contains the resources related to the MLflow tracking server.
- postgres.tf contains the resources related to the Postgres DB which is neede for the tracking server.
- variables.tf contains variables.
training directory contains the Python code for training and registering a model with MLflow and sklearn. Additionally, contains one example manifest for running the example training code.
.github/workflows directory contains 3 different GitActions pipelines:
- infrastructure-commit.yaml is a pipeline which runs whenever changes are made to the /infrastructure directoty. It checks the terraform formatting, lints the Terraform code, validates it and performs a Checkov check
- python-commit.yml is a pipeline which runs whenever changes are made to the /training directory or the main Dockerfile. It runs black, pylint, gitleaks, SonarCloud, build the Docker image, checks it for vulnerabilities with Trivy and then pushes it to Dockerhub.
- mlflow-commit.yml is a pipeline which runs whenever changes are made to the mlflow.Dockerfile or its requirements.txt. It builds the Docker image for the MLflow tracking server, checks it for vulnerabilities with Trivy and pushes it to Dockerhub.

Requirements

Minikube cluster running on local machine. Ingress addon of Minikube enabled. Kubectl installed.
Terraform >= 1.7.0.
Choose a password for the database and set it as a Terraform variable in the development environment.
```
export TF_VAR_db_password=<password>
```

Installation instructions

Make sure you have everything from 'Requirements' section set up already.
From the infrastructure repository, run
```
terraform init
terraform plan
```
If plan looks good to you, you can run
```
terraform apply
```
Get the name of the MLflow server pod. It is the one starting with mlflow-server. We need this for the next step.
```
kubectl get pods -n mlflow
```
Forward the port to be able to view the MLflow UI.
```
kubectl port-forward <mlflow-server-pod-name> -n mlflow 5000:5000
```
You can now access the MLflow UI at http://localhost:5000/.
Apply the demo training manifest from /training on the mlflow namespace
```
kubectl apply -f training/elasticnet_manifest.yml -n mlflow
```
The job will be finished in several seconds, and then you can see the results on the MLflow UI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

devops-final-project

Background

Value stream map of BSc. thesis

Solution

Overview

Requirements

Installation instructions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
infrastructure		infrastructure
training		training
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
mlflow.Dockerfile		mlflow.Dockerfile
requirements.txt		requirements.txt

evelonche/devops-final-project

Folders and files

Latest commit

History

Repository files navigation

devops-final-project

Background

Value stream map of BSc. thesis

Solution

Overview

Requirements

Installation instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages