-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add OSRM_LOCK_DIR environment variable for containerized deployments (memory leak fix) #7312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Allows specifying a custom directory for shared memory lock files via the OSRM_LOCK_DIR environment variable. This enables containerized deployments to persist lock files across container restarts, preventing crashes when using shared memory (osrm-routed -s)
TheMarex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense. Please add an entry to the changelog. Let me know if you have time to do the small edits yourself, otherwise I can try to put this on my TODO for the week.
include/storage/shared_memory.hpp
Outdated
| std::filesystem::path dir(lock_dir); | ||
| if (!std::filesystem::exists(dir)) | ||
| { | ||
| std::filesystem::create_directories(dir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this should error here. The directory should exist.
include/storage/shared_memory.hpp
Outdated
| // Returns directory for OSRM lock files (OSRM_LOCK_DIR env var or system temp) | ||
| inline std::filesystem::path getLockDir() | ||
| { | ||
| if (const char *lock_dir = std::getenv("OSRM_LOCK_DIR")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking I think a prefix like OSRM_ makes sense, but it is inconsistent with the other env variables. I would recommend SHM_LOCK_DIR.
019fbb9 to
26eb039
Compare
Thanks for the review @TheMarex. changes pushed. |
Problem
OSRM crashes with
"lock file does not exist"when using shared memory (osrm-routed -s) in containerized environments (Kubernetes) after a container restart.The crash triggers a restart/crash loop, and each restart accumulates orphaned shm segments, leaking memory proportional to the graph size until the node is exhausted.
Related issues: #5134, #5703
Reproduction
See gist: https://gist.github.com/wes4m/719cb69b72e26c09c7ff57ed71cf33d9
Why This Happens in Containers
OSRM's shared memory implementation assumes
/tmpand System V shared memory have the same lifecycle. This is true on traditional systems where both are cleared on reboot, but not in containers:/tmpfilesystemWhen a container restarts:
/tmpis reset and lock files get deletedSharedRegionRegister(in shm) still references old segmentsSolution
This PR adds
OSRM_LOCK_DIRenvironment variable to specify a custom directory for lock files. This allows containerized deployments to place lock files in a volume that persists across container restarts (e.g. kubernetesemptyDir)Other Approaches
Mounting
/tmpto a persistent volume fixes the issue but will persist all temporary files from the container, not just lock files. SettingTMPDIRwill do the same, and cleaning orphaned shm segments on startup is a workaround that doesn't address the root cause.OSRM_LOCK_DIRonly affects lock file location and is backward compatible (tmp dir fallback), behavior is unchanged.