Skip to content

Conversation

@pperanich
Copy link
Collaborator

@pperanich pperanich commented Jan 15, 2026

Currently, ezmsg systems written with the high level API (units, collections, etc) are run with ez.run which is a synchronous blocking run function. Wouldn't it be neat if we had a non-blocking run function too?

Enter GraphRunner. This new feature allows users to create a GraphRunner object and call either a blocking or non-blocking run!

Are you working on a software that orchestrates a bunch of subgraphs that all run on the same graph server? GraphRunner is your friend and gives you a nice handle to start and programmatically stop (!) your ezmsg systems.

Did you start your graph without a GraphServer running on the canonical ezmsg port, and now your system is running on a "sandboxed" random port? Still want to add more stuff to it? You're in luck, because GraphRunner makes the graph's address available as a property!

All silliness aside, this feature represents a refactor of some pretty gnarly code that has existed in ezmsg since public release.

# FIXME: This function is the last major re-implementation needed to make this
# codebase more maintainable.

@griffinmilsap
Copy link
Collaborator

griffinmilsap commented Jan 21, 2026

The awful code that this refactor replaced was properly handling a lot of terrible edge cases relating to the way interrupts percolate to threads and child subprocesses across Win/Mac/Linux. There are currently two important regressions we have to fix before this can get merged. These regressions do not appear in the unit tests because they require signal injection into a running system and I'm not sure how to automate a test like that.

Regressions I've noticed so far:

  • examples/ezmsg_count.py uses a VERY disused ezmsg feature: @ez.thread. This flags a coroutine to run in a new thread, and is just a convenience shortcut to loop.run_in_executor to make threads a little less intimidating to people unfamiliar with `asyncio. On Windows, KeyboardInterrupt does not get propogated to this thread, and it runs forever with no way to kill the process once started. On Mac/Linux, KeyboardInterrupt propogates to the thread and the system shuts down.
    • ez.thread is very disused, and is ultimately a feature that we could and maybe should consider deprecating. It provides the user no mechanism to detect shutdown (like a termination event or boolean) and is not fully implemented/thought through in its current state.
    • loop.run_in_executor is used in MANY places -- basically any time you want to run a long-running synchronous blocking function (e.g. inferencing a model in torch) -- and will still hang any system's shutdown until the thread ends on its own.
  • Running any system with an @ez.main function does not properly terminate until the @ez.main function returns on its own. This is a similar issue to the above @ez.thread one, but affects Win/Mac/Linux. A simple test is to modify the @ez.thread in examples/ezmsg_count.py to @ez.main and attempt to terminate the system with a KeyboardInterrupt. This appears to be an existing bug that I hadn't caught until now. Its sporadic and properly shuts down frequently on dev/ as well as this feature branch. It probably has to do with interrupting at the precise moment that a polling check is running.
    • @ez.main was a necessary feature when we only had blocking ez.run and needed to wrap main. The functionality was particularly important whenever code needed to run in the main thread (like PyQT, PyGame, and others). Although we could consider retiring @ez.main in addition to @ez.thread this would be more painful as @ez.main has percolated into more code. I also think it adds value, letting you specify a main thread function for a child subprocess.

@griffinmilsap
Copy link
Collaborator

I've sorted out the issue with the Windows shutdown regression. There's still an issue (unrelated to this PR) with unreliable termination of software that makes use of @ez.main.

@griffinmilsap griffinmilsap merged commit 901b2d8 into dev Jan 21, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants