-
Notifications
You must be signed in to change notification settings - Fork 3
Description
spent some time with this in a POC this week. looked like a great solution.
it has heartbeats, a ttl, etc.
but as best I can tell - it depends exclusively on dynamodb to remove the record based on the TTL - which is guaranteed to happen in the span of days.
so, as an example:
a running container dies, setddblock isn't notified to shut down. the last heartbeat updated the ttl to +15s.
now another container starts before the 15s expire - checks the lock, it's locked, all good.
next attempt - after the 15s expire. the record is expired, and available to be deleted by ddb, but still exists.
this code doesnt check the ttl, only that the record exists (and an expired record is treated the same by GetItem with ddb)
further checks continue to fail until the eventual cleanup happens (in my testing, ~~10 minutes. but the documentation guarantees typically within a few days of their expiration, so we could be locked and dead for a long time.
It feels to me like we should be validating against the TTL, not just existence of the record....
the eventual removal is great but it's not an automatic unlocker.