Skip to content

Conversation

@wes4m
Copy link

@wes4m wes4m commented Jan 2, 2026

Problem

OSRM crashes with "lock file does not exist" when using shared memory (osrm-routed -s) in containerized environments (Kubernetes) after a container restart.

The crash triggers a restart/crash loop, and each restart accumulates orphaned shm segments, leaking memory proportional to the graph size until the node is exhausted.

Related issues: #5134, #5703

Reproduction

See gist: https://gist.github.com/wes4m/719cb69b72e26c09c7ff57ed71cf33d9

Why This Happens in Containers

OSRM's shared memory implementation assumes /tmp and System V shared memory have the same lifecycle. This is true on traditional systems where both are cleared on reboot, but not in containers:

Component Traditional System Container
/tmp filesystem Cleared on reboot Cleared on container restart
System V shared memory Cleared on reboot Survives container restart (IPC namespace persists)

When a container restarts:

  1. /tmp is reset and lock files get deleted
  2. IPC namespace survives with shm segments persisting (regardless of hostIPC setting)
  3. SharedRegionRegister (in shm) still references old segments
  4. OSRM checks for the old segment's lock file before attaching, doesn't find it, and crashes

Solution

This PR adds OSRM_LOCK_DIR environment variable to specify a custom directory for lock files. This allows containerized deployments to place lock files in a volume that persists across container restarts (e.g. kubernetes emptyDir)

Other Approaches

Mounting /tmp to a persistent volume fixes the issue but will persist all temporary files from the container, not just lock files. Setting TMPDIR will do the same, and cleaning orphaned shm segments on startup is a workaround that doesn't address the root cause.

OSRM_LOCK_DIR only affects lock file location and is backward compatible (tmp dir fallback), behavior is unchanged.

Allows specifying a custom directory for shared memory lock files via
the OSRM_LOCK_DIR environment variable. This enables containerized
deployments to persist lock files across container restarts, preventing
crashes when using shared memory (osrm-routed -s)
@wes4m wes4m changed the title Add OSRM_LOCK_DIR environment variable for containerized deployments Add OSRM_LOCK_DIR environment variable for containerized deployments (memory leak fix) Jan 4, 2026
Copy link
Member

@TheMarex TheMarex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense. Please add an entry to the changelog. Let me know if you have time to do the small edits yourself, otherwise I can try to put this on my TODO for the week.

std::filesystem::path dir(lock_dir);
if (!std::filesystem::exists(dir))
{
std::filesystem::create_directories(dir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this should error here. The directory should exist.

// Returns directory for OSRM lock files (OSRM_LOCK_DIR env var or system temp)
inline std::filesystem::path getLockDir()
{
if (const char *lock_dir = std::getenv("OSRM_LOCK_DIR"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking I think a prefix like OSRM_ makes sense, but it is inconsistent with the other env variables. I would recommend SHM_LOCK_DIR.

@wes4m wes4m force-pushed the fix/container-shm-lock branch from 019fbb9 to 26eb039 Compare January 7, 2026 18:14
@wes4m
Copy link
Author

wes4m commented Jan 7, 2026

This change makes sense. Please add an entry to the changelog. Let me know if you have time to do the small edits yourself, otherwise I can try to put this on my TODO for the week.

Thanks for the review @TheMarex. changes pushed.

@wes4m wes4m requested a review from TheMarex January 7, 2026 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants