Conversation
ethangraham2001
left a comment
There was a problem hiding this comment.
Looks good but a few concerns to address before merging.
Also, I suggest changing the base of the PR. This PR contains commits from your other PR (the one for cross-platform compatibility). These commits shouldn't be present in a PR that is unrelated.
I think we should also discuss whether we want the docker files within this repo or another one dedicated to docker files and environment setup. It's probably worth separating code and deployment moving forward.
There was a problem hiding this comment.
Great work. I have a few questions relating to this file.
- Firstly, we are able to run the daemon within the docker container, which is fantastic. What does the workflow look like so that a user can then run their code inside of the container as well?
- Have you confirmed that prometheus metrics can be accessed from outside of the docker container through port 9000? I'm not sure if docker works over localhost in Linux or if it has special internal addresses. Need to check
- Have you checked how we can build this into a container image? That would be good for releases or other, but not necessarily top-priority right now.
| EXPOSE 9000 | ||
|
|
||
| # Run the GPU probe binary with some default arguments | ||
| CMD ["/usr/local/bin/gpu_probe", "--memleak", "--metrics-addr", "0.0.0.0:9000"] |
There was a problem hiding this comment.
flags are hard-coded. Is there any way to allow the user to configure this on the fly?
Not necessarily a problem if we are sharing pre-rolled docker files to make deployment easier with recommended configs.
There was a problem hiding this comment.
There shouldn't be any rust files in this PR. I suggest making a new branch from main, adding the docker file, and then only adding that file to the PR.
This base docker file makes the program work inside of docker. Might need to change some things in there depending on the host system. Please note that vmlinux.h should be created by the host system (for now at least) because docker shares the kernel with the host (I think).
To build run:
docker build -t gpuprobe:latest .
To run:
docker run --rm -it --privileged --cap-add=SYS_ADMIN --gpus all -p 9000:9000 gpuprobe:latest
This will start the memleak checker.