draft PR to discuss MAKER impl in effectful-llm #404

kiranandcode · 2025-11-20T16:47:18Z

Edit: updated with the latest changes to effectful, we can implement the logic in the following snippet:

def predict_next_step(game_state: GameState) -> Step:
    ValidStep = build_validated_model(game_state)

    @Template.define
    def predict_next_step_inner(game_state) -> ValidStep:
        """
        Given the state of the game of towers of Hanoi as follows:
        {game_state}
        Predict the next step to complete the game (moving all disks to the rightmost tower).
        Give a reasoning for your prediction, and return the step following the format:
        <step>start,end</step>
        where start and end are zero-based indices for the towers to move. Be concise and avoid wordy answers.
        """
        raise NotHandled

    s = predict_next_step_inner(game_state)
    return (s.start, s.end)


def solve_hanoi(state: GameState):
    log = []

    for i in itertools.count():
        print(f"step {i} - {state}")
        with handler(KAheadSampler()), handler(RetryLLMHandler()):
            step = predict_next_step(state)
        # track the step at each point
        if new_state := state.apply(step):
            log.append((state, step))

        state = new_state or state
        state.visualise()
        if state.is_done():
            break

I don't think this necessarily should even live in effectful, but regardless it's an interesting case study to kick the tires on potential async/concurrent interfaces for effectful so making this draft PR to collect discussion about it.

relevant snippets from the implementation (and links to the paper):

Algorithm 3

class Agent:
    ...
    def has_no_red_flags(self, response: str) -> Step | None:
        if len(response) > 450.0:  return None
        step = self.parse_response(response)
        if not step: return None
        if not (0 <= step.start < len(self.game_state.towers) and 0 <= step.end < len(self.game_state.towers)):
            return None
        if step not in self.game_state.valid_steps():
            return None
        return step

    def get_vote(self):  # algorithm 3
        while True:
            resp = self.predict_next_step()
            if step := self.has_no_red_flags(resp):
                return step

Algorithm 2

class FirstToMoveAheadSelector:
    def do_voting(self) -> Step:  # algorithm 2
        # run n in parallel repeatedly until k come out in top
        while True:
            # submit a batch of votes
            for vote in futures.as_completed(
                Executor.submit(agent.get_vote) for agent in self.agents
            ):
                self.votes[vote] += 1
                max_other_votes = max(
                    (self.votes[o_vote] for o_vote in self.votes if o_vote != vote),
                    default=0,
                )
                if self.votes[vote] >= max_other_votes + self.k:
                    return vote

Algorithm 1

def solve_hanoi(state: GameState):
    log = []

    for i in itertools.count():
        print(f"step {i} - {state}")
        step = FirstToAheadMoveSelector(state).do_voting()
        # track the step at each point
        log.append((state, step))

        state = state.apply(step)
        state.visualise()

eb8680

This is neat, it seems like there are several takeaways. Alongside this iterative solver, it would also be cool to implement a recursive solver that is allowed to call itself as a tool.

eb8680 · 2025-11-20T19:48:02Z

tests/test_maker.py

+        return steps
+
+
+class MicroAgent:


It seems like MicroAgent should just be a single stateless Template with a structured output type annotation?

@Template.define def predict_next_state(state: GameState) -> Step: """..."""

Here the behavior of parse_response, has_no_red_flags and get_vote should all be implicit in the annotation and the semantics of structured output generation.

Yes, that makes sense. parse_response and has_no_red_flags could be expressed as annotations on the return type, probably raising some kind of exception. and get_no_vote could be either a handler, like Dat's LLMRetryHandler

parse_response should just be the default behavior of structured output generation. has_no_red_flags would follow either from a more specific static Step type or making it a Pydantic model with the runtime validation logic from has_no_red_flags.

so part of the reason I didn't use the default structured output generation functionality was from the paper it seemed like the claim was that whether the model fails to produce syntactically valid code provides signal for whether it's reasoning correctly. constrained output decoding perturbs the distribution so even if it's reasoning incorrectly it will produce syntactically well formed output.

eb8680 · 2025-11-20T20:04:36Z

tests/test_maker.py

+        else:
+            self.visualise_text()
+
+    def apply(self, step: Step) -> Optional["GameState"]:


Why is the return type of apply an Optional? What does None mean in this context?

I wrote the GameState class first and fleshed out an API that was maybe overly cautious in what it accepts, here None represents an invalid move, though elsewhere we check that results are in valid moves so an invalid move is never supplied to apply.

tests/test_maker.py

eb8680 · 2025-11-20T20:20:59Z

tests/test_maker.py

+
+        return total_moves
+
+    def is_done(self) -> bool:


I don't see this called anywhere?

oh, good catch, I simplified the solve_hanoi function before commiting everything and in that process removed the check for completion.

tests/test_maker.py

eb8680 · 2025-12-10T12:49:12Z

tests/test_maker.py

+        raise NotImplementedError
+
+
+def build_validated_model(game_state: GameState) -> type[Step]:


I guess this kind of dynamic class creation is allowed, but it's not very Pythonic. If this kind of dependent type checking is something we want to support/encourage, we should think about how to make it easier and more idiomatic to express.

Yes, that makes sense. I'll open an issue to discuss.

eb8680 · 2025-12-10T12:50:49Z

tests/test_maker.py

+
+
+def predict_next_step(game_state: GameState) -> Step:
+    ValidStep = build_validated_model(game_state)


Perhaps a simpler alternative design would be to make the validation logic accessible as a tool that predict_next_step_inner is expected to call before it returns a result? Then Step would be simpler and predict_next_step_inner could be moved back up to module scope.

That makes sense, though the semantics that I had been considering for tool calling is that the list of tools provided to a template are a list of things the llm may use in in its execution, without any guarantee that any of the tools will be used. As this invariant is always required for the rest of the code, it makes sense to do it outside the LLM call.

jfeser · 2025-12-31T21:15:06Z

Note that #479 should be reverted in this branch.

eb8680 reviewed Nov 20, 2025

View reviewed changes

kiranandcode mentioned this pull request Nov 21, 2025

Implement handlers LLM sampling strategies #407

Open

kiranandcode added 6 commits December 9, 2025 19:39

added maker example

6acc886

implmented MAKER

ab6695e

updated parse response to return None on unparseable predictions

78c3941

fixed minor bug in futures

ff14d96

added break on complete to solve_hanoi

5df2dbe

updated with latest versions

9dbdbd0

kiranandcode force-pushed the kg-maker branch from 805eaae to 9dbdbd0 Compare December 10, 2025 01:27

kiranandcode changed the base branch from kg-futures-proposal to staging-llm December 10, 2025 01:28

eb8680 reviewed Dec 10, 2025

View reviewed changes

kiranandcode mentioned this pull request Dec 10, 2025

Support for constrained decoding with dependent types #441

Open

lint

86f5482

kiranandcode force-pushed the kg-maker branch from 8a75d14 to 86f5482 Compare December 10, 2025 15:09

eb8680 added the module:llm label Dec 15, 2025

eb8680 mentioned this pull request Dec 29, 2025

handlers.llm.sampling should be moved to an example ahead of initial release #469

Closed

		raise NotImplementedError


		def build_validated_model(game_state: GameState) -> type[Step]:



		def predict_next_step(game_state: GameState) -> Step:
		ValidStep = build_validated_model(game_state)

draft PR to discuss MAKER impl in effectful-llm #404

Are you sure you want to change the base?

draft PR to discuss MAKER impl in effectful-llm #404

Uh oh!

Conversation

kiranandcode commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Algorithm 3

Algorithm 2

Algorithm 1

Uh oh!

eb8680 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiranandcode Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jfeser commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kiranandcode commented Nov 20, 2025 •

edited

Loading

kiranandcode Nov 21, 2025 •

edited

Loading