feat: Add GraphRunner API to replace legacy run method. #207

pperanich · 2026-01-15T14:42:00Z

Currently, ezmsg systems written with the high level API (units, collections, etc) are run with ez.run which is a synchronous blocking run function. Wouldn't it be neat if we had a non-blocking run function too?

Enter GraphRunner. This new feature allows users to create a GraphRunner object and call either a blocking or non-blocking run!

Are you working on a software that orchestrates a bunch of subgraphs that all run on the same graph server? GraphRunner is your friend and gives you a nice handle to start and programmatically stop (!) your ezmsg systems.

Did you start your graph without a GraphServer running on the canonical ezmsg port, and now your system is running on a "sandboxed" random port? Still want to add more stuff to it? You're in luck, because GraphRunner makes the graph's address available as a property!

All silliness aside, this feature represents a refactor of some pretty gnarly code that has existed in ezmsg since public release.

ezmsg/src/ezmsg/core/backend.py

Lines 227 to 228 in eaf8068

    
           # FIXME: This function is the last major re-implementation needed to make this 
        
           # codebase more maintainable.

griffinmilsap · 2026-01-21T14:23:19Z

The awful code that this refactor replaced was properly handling a lot of terrible edge cases relating to the way interrupts percolate to threads and child subprocesses across Win/Mac/Linux. There are currently two important regressions we have to fix before this can get merged. These regressions do not appear in the unit tests because they require signal injection into a running system and I'm not sure how to automate a test like that.

Regressions I've noticed so far:

examples/ezmsg_count.py uses a VERY disused ezmsg feature: @ez.thread. This flags a coroutine to run in a new thread, and is just a convenience shortcut to loop.run_in_executor to make threads a little less intimidating to people unfamiliar with `asyncio. On Windows, KeyboardInterrupt does not get propogated to this thread, and it runs forever with no way to kill the process once started. On Mac/Linux, KeyboardInterrupt propogates to the thread and the system shuts down.
- ez.thread is very disused, and is ultimately a feature that we could and maybe should consider deprecating. It provides the user no mechanism to detect shutdown (like a termination event or boolean) and is not fully implemented/thought through in its current state.
- loop.run_in_executor is used in MANY places -- basically any time you want to run a long-running synchronous blocking function (e.g. inferencing a model in torch) -- and will still hang any system's shutdown until the thread ends on its own.
Running any system with an @ez.main function does not properly terminate until the @ez.main function returns on its own. This is a similar issue to the above @ez.thread one, but affects Win/Mac/Linux. A simple test is to modify the @ez.thread in examples/ezmsg_count.py to @ez.main and attempt to terminate the system with a KeyboardInterrupt. This appears to be an existing bug that I hadn't caught until now. Its sporadic and properly shuts down frequently on dev/ as well as this feature branch. It probably has to do with interrupting at the precise moment that a polling check is running.
- @ez.main was a necessary feature when we only had blocking ez.run and needed to wrap main. The functionality was particularly important whenever code needed to run in the main thread (like PyQT, PyGame, and others). Although we could consider retiring @ez.main in addition to @ez.thread this would be more painful as @ez.main has percolated into more code. I also think it adds value, letting you specify a main thread function for a child subprocess.

griffinmilsap · 2026-01-21T20:08:17Z

I've sorted out the issue with the Windows shutdown regression. There's still an issue (unrelated to this PR) with unreliable termination of software that makes use of @ez.main.

feat: Add GraphRunner API to replace legacy run method.

11d48a4

griffinmilsap added 2 commits January 21, 2026 11:42

better handling of component kwarg debug messages

6f5bba3

attempt to fix shutdown issues on windows

b409309

griffinmilsap merged commit 901b2d8 into dev Jan 21, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add GraphRunner API to replace legacy run method. #207

feat: Add GraphRunner API to replace legacy run method. #207

Uh oh!

pperanich commented Jan 15, 2026 •

edited by griffinmilsap

Loading

Uh oh!

griffinmilsap commented Jan 21, 2026 •

edited

Loading

Uh oh!

griffinmilsap commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# FIXME: This function is the last major re-implementation needed to make this
	# codebase more maintainable.

feat: Add GraphRunner API to replace legacy run method. #207

feat: Add GraphRunner API to replace legacy run method. #207

Uh oh!

Conversation

pperanich commented Jan 15, 2026 • edited by griffinmilsap Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

griffinmilsap commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

griffinmilsap commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pperanich commented Jan 15, 2026 •

edited by griffinmilsap

Loading

griffinmilsap commented Jan 21, 2026 •

edited

Loading