Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 37 additions & 1 deletion dimos/robot/unitree/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,46 @@ def start_background_loop() -> None:
self.task = self.loop.create_task(async_connect())
self.loop.run_forever()

CONNECT_TIMEOUT = 30 # seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONNECT_TIMEOUT should be a class-level attribute

Defining the timeout as a method-local constant makes it invisible at the class API level and impossible for callers to customise without patching the source. Users on slower or congested networks might need a longer window; users in CI want a shorter one.

A class attribute is both more discoverable and overridable:

Suggested change
CONNECT_TIMEOUT = 30 # seconds
CONNECT_TIMEOUT = 30 # seconds — override on the class or instance if needed

Or better, promote it to the class body:

class UnitreeWebRTCConnection(Resource):
    CONNECT_TIMEOUT: int = 30  # seconds
    ...

and reference it as self.CONNECT_TIMEOUT inside connect().


self.loop = asyncio.new_event_loop()
self.thread = threading.Thread(target=start_background_loop, daemon=True)
self.thread.start()
self.connection_ready.wait()

if not self.connection_ready.wait(timeout=CONNECT_TIMEOUT):
# Cancel the async connection task and disconnect to avoid leaving
# a half-open WebRTC session on the robot side (which would cause
# the next connection attempt to hang as well).
async def _cleanup() -> None:
if self.task is not None:
self.task.cancel()
try:
await self.task
except (asyncio.CancelledError, Exception):
pass
if hasattr(self, "conn") and self.conn is not None:
try:
await self.conn.disconnect()
except Exception:
pass

if self.loop.is_running():
future = asyncio.run_coroutine_threadsafe(_cleanup(), self.loop)
try:
future.result(timeout=5.0)
except Exception:
pass
self.loop.call_soon_threadsafe(self.loop.stop)
if self.thread.is_alive():
self.thread.join(timeout=2.0)
Comment on lines +144 to +145
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background thread may still be alive when TimeoutError is raised

thread.join(timeout=2.0) is non-blocking past its deadline — if the thread has not stopped within 2 s (e.g. a blocking C-extension call inside conn.disconnect()), execution falls through silently and TimeoutError is raised while the daemon thread is still running.

If a caller catches the TimeoutError and immediately tries to reconnect, the still-alive thread sharing self.loop / self.conn could cause a race condition. Consider logging a warning when the join times out:

if self.thread.is_alive():
    self.thread.join(timeout=2.0)
    if self.thread.is_alive():
        import warnings
        warnings.warn(
            "Background WebRTC thread did not stop within 2s after timeout cleanup.",
            RuntimeWarning,
            stacklevel=2,
        )

raise TimeoutError(
f"WebRTC connection to {self.ip} timed out after {CONNECT_TIMEOUT}s. "
"Common causes:\n"
" - Another WebRTC client is connected (close the Unitree mobile app)\n"
" - Robot is unreachable on the network\n"
" - Port 9991 (encrypted SDP) is not responding\n"
"Tip: only one WebRTC client can connect to the Go2 at a time."
)
Comment on lines +120 to +153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing conn.disconnect() and task.cancel() in timeout cleanup

When the timeout fires, the cleanup stops the event loop but never cancels the pending async_connect() task or calls self.conn.disconnect(). Compare this with the existing stop() and disconnect() methods, which always cancel the task and schedule await self.conn.disconnect() before stopping the loop.

Without calling self.conn.disconnect(), the robot-side WebRTC peer connection may remain open or in a "connecting" state, potentially blocking the very retry attempt the user will make after seeing the TimeoutError — the same root cause as the original bug. It can also emit a Python runtime warning: Task was destroyed but it is pending!.

The cleanup block should mirror the pattern used in stop():

if not self.connection_ready.wait(timeout=CONNECT_TIMEOUT):
    if self.loop.is_running():
        if self.task is not None:
            self.loop.call_soon_threadsafe(self.task.cancel)

        async def async_disconnect() -> None:
            try:
                await self.conn.disconnect()
            except Exception:
                pass

        asyncio.run_coroutine_threadsafe(async_disconnect(), self.loop)
        self.loop.call_soon_threadsafe(self.loop.stop)

    if self.thread.is_alive():
        self.thread.join(timeout=2.0)

    raise TimeoutError(...)


def start(self) -> None:
pass
Expand Down