Skip to content

Conversation

@boniek83
Copy link

@boniek83 boniek83 commented Apr 13, 2022

This is implementation of per pod gpu monitoring from #1

Example in example/kubernetes assumes rdc already contains rdc_prometheus_py patch (of course you can just ADD prepatched rdc_prometheus_py to Dockerfile if you want to test it right now).

You need to build container image and push it to your container image repository and modify some things in rdc.yaml file: location of both container images, nodeSelector (to match label of worker nodes that contain AMD GPUs) and podresources-api's volume location - in my case it was on a host machine.

Rdc and rdc_prometheus.py don't have to be inside of kubernetes to make it work - it is just easier that way to make an example.

Tested and works in production on kubernetes 1.21. Example output:
https://gist.github.com/boniek83/7eaefe7f46edad1ef28046118c354c17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant