Full End-To-End Inference Flow and Gateway Node Implementation #487

samherring99 · 2026-01-14T00:20:15Z

This PR provides the following things:

The implementation for the Gateway node type in architectures/inference-only/inference-node/src/bin/gateway-node.rs, which includes the handle_inference method to forward inference requests over the P2P network (using iroh's bidrectional streams), and the Gateway node performs the discovery of available inference nodes through iroh's gossip. This is implemented in the gateway node binary at architectures/inference-only/inference-node/Cargo.toml
The Gateway node also writes its endpoint ID to a temporary file for bootstrapping. This will eventually be resolved a different way, but is fine for local testing for now.
A direct P2P protocol handler for inference requests under shared/inference/src/protocol_handler.rs which implements iroh's ProtocolHandler trait to accept incoming inference requests over the direct P2P connection.
Updates to shared/inference/src/protocol.rs to allow for OpenAPI API style /v1/chat/completions messages and some tests.
Refactoring of the Rust bridge at python/python/psyche/vllm/rust_bridge.py to use OpenAPI API style /v1/chat/completions messages. These changes are reflected in shared/inference/src/node.rs, shared/inference/src/vllm.rs, and shared/inference/src/protocol.rs with some testing.
The inference node main code at architectures/inference-only/inference-node/src/main.rs can now read bootstrap peers from a given file, and rebroadcast availability over gossip every 30 seconds.
(MIGHT BE IMPORTANT ❗ ) Updating shared/network/src/lib.rs and shared/network/src/router.rs to use an internal init_internal method, plus a method called init_with_custom_protocol to use a custom protocol on initialization.
Adding axum and tower to dependencies in Cargo.toml and architectures/inference-only/inference-node/Cargo.toml.
A test script in scripts/test-inference-e2e.sh to test the end to end inference flow
5 new just commands added to the justfile: inference-node, gateway-node, inference-stack, test-inference, test-inference-e2e

Testing (requires a venv with vLLM installed as of now):

source .venv/bin/activate

cargo build --bin psyche-inference-node && cargo build --bin gateway-node --features gateway

just inference-stack NousResearch/Hermes-4-14B

Output

*Inference Node*
INFO psyche_inference_node: Starting Psyche Inference Node                                                                                 
INFO psyche_inference_node: Model: NousResearch/Hermes-4-14B                                                                               
INFO psyche_inference_node: Tensor Parallel Size: 1                                                                                        
INFO psyche_inference_node: GPU Memory Utilization: 0.9                                                                                    
INFO psyche_inference_node: Discovery mode: N0                                                                                             
INFO psyche_inference_node: Relay kind: N0                                                                                                 
INFO psyche_inference_node: Capabilities: []                                                                                               
INFO psyche_inference_node: Reading bootstrap peers from PSYCHE_GATEWAY_BOOTSTRAP_FILE: "/tmp/psyche-gateway-peer.json"                    
INFO psyche_inference_node: Loaded 1 gateway endpoint(s) from file                                                                         
INFO psyche_inference_node: Initializing Python interpreter...                                                                             
INFO psyche_inference_node: Python interpreter initialized                                                                                 
INFO psyche_inference_node: Initializing vLLM engine...                                                                                    
INFO psyche_inference::node: Initializing inference node with model: NousResearch/Hermes-4-14B 
*vLLM startup*
INFO psyche_inference::node: vLLM engine initialized successfully: inference_node_<node_ID>                                                                  
INFO psyche_inference_node: Initializing P2P network...                                                                                    
INFO psyche_inference_node: Registering inference protocol handler...                                                                      
DEBUG psyche_network: Using relay servers: Default iroh relay (production) servers
INFO relay-actor: iroh::magicsock::transports::relay::actor: home is now relay https://use1-1.relay.n0.iroh-canary.iroh.link./, was None
INFO psyche_network: Our endpoint ID: <endpoint_ID>
INFO psyche_network: Connected!
INFO psyche_inference_node: P2P network initialized
INFO psyche_inference_node:   Endpoint ID: <gateway_endpoint_ID>
INFO psyche_inference_node: Protocol handler registered
INFO gossip{me=031737a1f6}:connect{me=031737a1f6 alpn="/iroh-gossip/1" remote=b927ac690a}:discovery{me=031737a1f6 endpoint=b927ac690a}:add_endpoint_addr:add_endpoint_addr{endpoint=<endpoint>}: iroh::magicsock::endpoint_map: inserting new endpoint in EndpointMap endpoint=<endpoint> relay_url=Some(RelayUrl("https://use1-1.relay.n0.iroh-canary.iroh.link./")) source=dns
INFO gossip{me=031737a1f6}:connect{me=031737a1f6 alpn="/iroh-gossip/1" remote=<endpoint>}:prepare_send:get_send_addrs{endpoint=<endpoint>}: iroh::magicsock::endpoint_map::endpoint_state: new connection type typ=relay(https://use1-1.relay.n0.iroh-canary.iroh.link./)
DEBUG psyche_network: broadcasted gossip message with hash <hash>: NodeAvailable { model_name: "NousResearch/Hermes-4-14B", checkpoint_id: None, capabilities: [] } message_hash=<hash>
INFO psyche_inference_node: Broadcasted availability to network
INFO psyche_inference_node: Inference node ready! Listening for requests...

INFO router.accept{me=031737a1f6 alpn="/psyche/inference/1" remote=<peer_ID_gateway>}: psyche_inference::protocol_handler: Received inference request <request_ID> from <peer_ID_gateway>                                                                                                                
INFO router.accept{me=<peer_ID> alpn="/psyche/inference/1" remote=<peer_ID_gateway>}: psyche_inference::protocol_handler: Processing inference request: <request_ID>
INFO rust_bridge.py:95: Adding request with sampling_params: {'temperature': 1.0, 'top_p': 1.0, 'max_tokens': 250, 'stop_token_ids': [151645], 'stop': ['<|im_end|>']}
INFO rust_bridge.py:106: Final output has 1 completions
INFO rust_bridge.py:108: Final generated text: "Hello! How can I assist you today? I'm Hermes, a large language model created by Nous Research. I'm happy to converse with you and try to help across a broad range of topics, to the best of my abilities. Please provide more context if you have any specific questions or requests for me."
INFO rust_bridge.py:109: Final finish reason: stop


*Gateway Node*
INFO gateway_node: Starting gateway node
INFO gateway_node:   HTTP API: http://127.0.0.1:8000
INFO gateway_node: No bootstrap peers configured (gateway will be a bootstrap node)
INFO gateway_node: Initializing P2P network...
DEBUG psyche_network: Using relay servers: Default iroh relay (production) servers
INFO relay-actor: iroh::magicsock::transports::relay::actor: home is now relay https://use1-1.relay.n0.iroh-canary.iroh.link./, was None
INFO psyche_network: Our endpoint ID: <endpoint_ID>
INFO psyche_network: Connected!
INFO gateway_node: P2P network initialized
INFO gateway_node:   Endpoint ID: <endpoint_ID>
INFO gateway_node: Found PSYCHE_GATEWAY_ENDPOINT_FILE env var: /tmp/psyche-gateway-peer.json
INFO gateway_node: Wrote gateway endpoint to "/tmp/psyche-gateway-peer.json"
INFO gateway_node: Other nodes can bootstrap using this file
INFO gateway_node: Waiting for gossip mesh to stabilize...
INFO gateway_node: Gossip mesh should be ready
INFO gateway_node: Gateway ready! Listening on http://127.0.0.1:8000
INFO gateway_node: Discovering inference nodes...
INFO gateway_node: HTTP server listening on 127.0.0.1:8000

INFO gateway_node: Discovered inference node!
INFO gateway_node:   Peer ID: <peer_ID>
INFO gateway_node:   Model: NousResearch/Hermes-4-14B
INFO gateway_node:   Checkpoint: None
INFO gateway_node:   Capabilities: []
INFO gateway_node: Routing request to node: 031737a1f6 (model: NousResearch/Hermes-4-14B)
INFO gateway_node: Sent inference request <request_id> to network
INFO gateway_node: Sending inference request <request_id> via direct P2P
INFO gateway_node: Connecting to peer <peer_ID> with ALPN Ok("/psyche/inference/1")
INFO gateway_node: Connected, opening bidirectional stream
INFO gateway_node: Sending 77 bytes
INFO gateway_node: Finishing send stream
INFO gateway_node: Reading response...
INFO gateway_node: Received 754 bytes, deserializing
INFO gateway_node: Successfully received inference response
INFO gateway_node: Received inference response for <request_id>

*Test window*
curl -X POST http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"messages": [{"role": "user", "content": "Hello, world!"}], "max_tokens": 250}'

{"id":"chatcmpl-<id>","object":"chat.completion","created":<timestamp>,"model":"NousResearch/Hermes-4-14B","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How can I assist you today? I'm Hermes, a large language model created by Nous Research. I'm happy to converse with you and try to help across a broad range of topics, to the best of my abilities. Please provide more context if you have any specific questions or requests for me."},"finish_reason":"stop"}]}

The above commands will start up 1 gateway node and 1 inference node, will allow the gateway node to write its endpoint ID to a temp file where the inference node can read it and bootstrap from it, and then will spin up an endpoint at localhost:8000/v1/chat/completions to receive requests to be forwarded to the inference node.

As always, any questions, comments, or concerns with how this is set up are welcome 😄 - streaming, checkpoint updating, and load balancing are all on the future roadmap for this effort, as well as discussion on how to correctly bootstrap from our gateway nodes.

pefontana

Nice @samherring99 !
I noticed two thing that may be easy to change:

the just command doesnt work, because the tmux session dont inherit the nix develop .#dev-python neither the python venv
To run it I have to run in two diferent terminal:

nix develop .#dev-python
source .venv/bin/activate
 PSYCHE_GATEWAY_BOOTSTRAP_FILE=psyche-gateway-peer.json LIBTORCH_USE_PYTORCH=1 RUST_LOG=info cargo run --bin   psyche-inference-node -- --model-name NousResearch/Hermes-4-14B --discovery-mode n0 --relay-kind n0


nix develop .#dev-python
source .venv/bin/activate
PSYCHE_GATEWAY_ENDPOINT_FILE=psyche-gateway-peer.json RUST_LOG=info cargo run --bin gateway-node --features gateway -- --discovery-mode n0 --relay-kind n0

With the command

PSYCHE_GATEWAY_BOOTSTRAP_FILE=psyche-gateway-peer.json LIBTORCH_USE_PYTORCH=1 RUST_LOG=info cargo run --bin   psyche-inference-node -- --model-name NousResearch/Hermes-4-14B --discovery-mode n0 --relay-kind n0

I am getting a NumPy error
ImportError: Numba needs NumPy 2.2 or less. Got NumPy 2.3
I tried to install Numpy 2.2 but the Path is still set to the nix version one
Maybe we can update Numba to fix this and make it easier to run?

samherring99 · 2026-01-19T01:44:56Z

Nice @samherring99 ! I noticed two thing that may be easy to change:

the just command doesnt work, because the tmux session dont inherit the nix develop .#dev-python neither the python venv
To run it I have to run in two diferent terminal:
nix develop .#dev-python
source .venv/bin/activate
 PSYCHE_GATEWAY_BOOTSTRAP_FILE=psyche-gateway-peer.json LIBTORCH_USE_PYTORCH=1 RUST_LOG=info cargo run --bin   psyche-inference-node -- --model-name NousResearch/Hermes-4-14B --discovery-mode n0 --relay-kind n0


nix develop .#dev-python
source .venv/bin/activate
PSYCHE_GATEWAY_ENDPOINT_FILE=psyche-gateway-peer.json RUST_LOG=info cargo run --bin gateway-node --features gateway -- --discovery-mode n0 --relay-kind n0
With the command
PSYCHE_GATEWAY_BOOTSTRAP_FILE=psyche-gateway-peer.json LIBTORCH_USE_PYTORCH=1 RUST_LOG=info cargo run --bin   psyche-inference-node -- --model-name NousResearch/Hermes-4-14B --discovery-mode n0 --relay-kind n0
I am getting a NumPy error ImportError: Numba needs NumPy 2.2 or less. Got NumPy 2.3 I tried to install Numpy 2.2 but the Path is still set to the nix version one Maybe we can update Numba to fix this and make it easier to run?

Could you share the tmux errors you're seeing? It might come down to versioning issues between our setups but the just command starts up tmux with nix develop .#dev-python and runs the cargo commands for my set up.

As for the NumPy errors I think we will resolve this with vllm included in the nix packaging ;) 🤞 - but I will look into it regardless

pefontana · 2026-01-20T20:02:49Z

Could you share the tmux errors you're seeing? It might come down to versioning issues between our setups but the just command starts up tmux with nix develop .#dev-python and runs the cargo commands for my set up.

As for the NumPy errors I think we will resolve this with vllm included in the nix packaging ;) 🤞 - but I will look into it regardless

Sure.
Here I run

nix develop .#dev-python

source .venv/bin/activate

pip install vllm

cargo build --bin psyche-inference-node && cargo build --bin gateway-node --features gateway

just inference-stack NousResearch/Hermes-4-14B

And the tmux ouputs:
0. the gateways seems to work all right

I am having this error in the inference session and I think it is because the tmux session doesnt inherit the nix develop .#dev-python

PSYCHE_GATEWAY_BOOTSTRAP_FILE=/tmp/psyche-gateway-peer.json LIBTORCH_BYPASS_VERSION_CHECK=1 RUST_LOG=info,psyche_network=debug cargo run --bin psyche-inference-node -- --model-name NousResearch/Hermes-4-14B --discovery-mode n0 --relay-kind n0
   Compiling torch-sys v0.22.0 (https://github.com/jquesnelle/tch-rs.git?rev=11d1ca2ef6dbd3f1e5b0986fab0a90fbb6734496#11d1ca2e)
error: failed to run custom build command for `torch-sys v0.22.0 (https://github.com/jquesnelle/tch-rs.git?rev=11d1ca2ef6dbd3f1e5b0986fab0a90fbb6734496#11d1ca2e)`

Caused by:
  process didn't exit successfully: `/tmp/psyche/target/debug/build/torch-sys-5c7b6a5bfffdd2f5/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-env-changed=LIBTORCH_USE_PYTORCH

  --- stderr
  Error: no cxx11 abi returned by python Output { status: ExitStatus(unix_wait_status(256)), stdout: "", stderr: "Traceback (most recent call last):\n  File \"<string>\", line 3, in <module>\n  File \"/nix/store/m6p1pwa6vk3gmh9q6b48mvgfm0jwiqb4-python3.12-torch-2.9.0/lib/python3.12/site-packages/torch/__init__.py\", line 427, in <module>\n    from torch._C import *  # noqa: F403\n    ^^^^^^^^^^^^^^^^^^^^^^\nImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by /nix/store/1a1z3dyygjj8rs690x71y50kc3l8xnx4-cuda12.8-nccl-2.28.7-1/lib/libnccl.so.2)\n" }

pefontana · 2026-01-20T20:04:23Z

@samherring99
Aside from that, I've been looking at the code and I think it's fine. If we can fix it in the future with the changes in nix, we can merge it.

samherring99 · 2026-01-21T17:52:47Z

Could you share the tmux errors you're seeing? It might come down to versioning issues between our setups but the just command starts up tmux with nix develop .#dev-python and runs the cargo commands for my set up.
As for the NumPy errors I think we will resolve this with vllm included in the nix packaging ;) 🤞 - but I will look into it regardless

Sure. Here I run

nix develop .#dev-python

source .venv/bin/activate

pip install vllm

cargo build --bin psyche-inference-node && cargo build --bin gateway-node --features gateway

just inference-stack NousResearch/Hermes-4-14B

And the tmux ouputs: 0. the gateways seems to work all right

I am having this error in the inference session and I think it is because the tmux session doesnt inherit the nix develop .#dev-python

PSYCHE_GATEWAY_BOOTSTRAP_FILE=/tmp/psyche-gateway-peer.json LIBTORCH_BYPASS_VERSION_CHECK=1 RUST_LOG=info,psyche_network=debug cargo run --bin psyche-inference-node -- --model-name NousResearch/Hermes-4-14B --discovery-mode n0 --relay-kind n0
   Compiling torch-sys v0.22.0 (https://github.com/jquesnelle/tch-rs.git?rev=11d1ca2ef6dbd3f1e5b0986fab0a90fbb6734496#11d1ca2e)
error: failed to run custom build command for `torch-sys v0.22.0 (https://github.com/jquesnelle/tch-rs.git?rev=11d1ca2ef6dbd3f1e5b0986fab0a90fbb6734496#11d1ca2e)`

Caused by:
  process didn't exit successfully: `/tmp/psyche/target/debug/build/torch-sys-5c7b6a5bfffdd2f5/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-env-changed=LIBTORCH_USE_PYTORCH

  --- stderr
  Error: no cxx11 abi returned by python Output { status: ExitStatus(unix_wait_status(256)), stdout: "", stderr: "Traceback (most recent call last):\n  File \"<string>\", line 3, in <module>\n  File \"/nix/store/m6p1pwa6vk3gmh9q6b48mvgfm0jwiqb4-python3.12-torch-2.9.0/lib/python3.12/site-packages/torch/__init__.py\", line 427, in <module>\n    from torch._C import *  # noqa: F403\n    ^^^^^^^^^^^^^^^^^^^^^^\nImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by /nix/store/1a1z3dyygjj8rs690x71y50kc3l8xnx4-cuda12.8-nccl-2.28.7-1/lib/libnccl.so.2)\n" }

FWIW this looks like a CUDA / NCCL error, I'm guessing this is also related to venv / torch / vllm issues. will tag @arilotter for confirmation / final review 🙂

IAvecilla · 2026-01-21T18:40:11Z

architectures/inference-only/inference-node/src/bin/gateway-node.rs

+    let nodes = state.available_nodes.read().await;
+    if nodes.is_empty() {
+        return Err(AppError::NoNodesAvailable);
+    }
+
+    let node = nodes.values().next().unwrap();


Instead of checking that the vec is not empty and then calling unwrap, I think we can do:

let nodes = state.available_nodes.read().await; let node = nodes.values().next().ok_or(AppError::NoNodesAvailable)?;

It’s not really important since we’re unlikely to panic, but I think this is more idiomatic.

Sweet, I'll test this out and update.

IAvecilla · 2026-01-21T19:08:22Z

architectures/inference-only/inference-node/src/bin/gateway-node.rs

+                                                let _ = tx.send(response).await;
+                                            }
+                                        }
+                                        Err(e) => {


If something fails in the send_inference_request call, we’re not cleaning up the request_id from pending_requests. Is that handled somewhere else? Not sure whether it’s correct to remove it if something fails there.

Good callout, I'll ensure this is handled correctly and will update here.

IAvecilla · 2026-01-21T19:15:26Z

shared/inference/src/protocol_handler.rs

+                    peer_id.fmt_short()
+                );
+
+                tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;


Is this sleep necessary?

~~This was a debug measure I forgot to remove 🙃 thanks for catching it~~

I lied, this is necessary because we need to wait to give time for the bytes to flush through the network before the connection was dropped and we need to wait for the receiver to actually read all the data. I'll add a comment.

Oh okay, maybe you can do connection.closed().await;? Not really sure though, I didn’t try it. I’m just trying to avoid future problems where the receiver takes more than 100 ms to read things and we end up having the same error, but it’s not as high priority 😅

I did think about this, will likely address in a later PR if it becomes and issue.

IAvecilla · 2026-01-21T19:26:37Z

architectures/inference-only/inference-node/src/main.rs

    info!("Capabilities: {:?}", capabilities);

+    // read bootstrap peers from multiple sources in priority order
+    let bootstrap_peers: Vec<EndpointAddr> =


I think we have almost the same logic in the main.rs file of the crate. Can we extract it to an aux function in handle the difference on the implementations?

Good point, was being lazy about this, will move to a new lib.rs file for shared implementation.

IAvecilla · 2026-01-21T19:39:04Z

shared/inference/src/protocol.rs

    pub request_id: String,
-    pub prompt: String,
+    pub messages: Vec<ChatMessage>,
    #[serde(default = "default_max_tokens")]


Not strictly related to this PR but I think both protocol and gateway-node uses the same default functions. Can they be different at some point? Also you can use the default value directly without using the default functions

I see, to reduce scope I'll probably tackle this in a later PR if thats okay

Yeah, no worries

IAvecilla · 2026-01-21T19:53:30Z

architectures/inference-only/inference-node/src/bin/gateway-node.rs

+    let model_name = req.model.clone().unwrap_or_else(|| node.model_name.clone());
+    info!(
+        "Routing request to node: {} (model: {})",
+        node.peer_id.fmt_short(),
+        node.model_name
+    );


This is more of a question, but here we get an inference node from the list and only use its model_name. Then, in the run_gateway function, we select another node from the list as target_node, which is the one we actually route the request to. I might be misunderstanding something, but wouldn’t it be better to select a single node and route the request to that one?

Yeah, I did this because I wasn't passing the peer ID through the channel as part of an InferenceMessage type, but I'll make that change to include it so we select one node only and route the request there.

arilotter · 2026-01-22T20:09:08Z

shared/inference/src/protocol_handler.rs

+                send.finish()?;
+
+                // wait for a moment to let the connection flush all the bytes to the reciever
+                tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;


conn.close(0u32.into(), b"bye!"); endpoint.close().await;

this should flush the connection buffer before returning.
see https://github.com/n0-computer/iroh/blob/6ad5ac4238a3cc101791922167aab952d4c99c1e/iroh/examples/echo.rs#L65

thank you will test this out!

So, AFAICT there's no async way to wait for QUIC to flush without closing the entire endpoint... I'm fine with a time delay but would an adaptive delay based on payload size be more reasonable / future proof? I think we want the endpoint to stay open to accept future requests, and according to the comments in what you linked that seems like a requirement 🙁

arilotter · 2026-01-22T20:16:42Z

architectures/inference-only/inference-node/src/bin/gateway-node.rs

+    http::StatusCode,
+    response::{IntoResponse, Response},
+    routing::post,
+};


can group these all in one big use that's feature-flagged ? or just.. rip out the gateway feature IMO

Yes should be no issue 😎

arilotter · 2026-01-22T20:18:59Z

architectures/inference-only/inference-node/src/bin/gateway-node.rs

+                                // Spawn task to handle P2P connection
+                                let endpoint = network.router().endpoint().clone();
+                                let state_clone = state.clone();
+                                tokio::spawn(async move {


task here without any tracking is a little scary - should we keep these in some task pool, add timeouts, monitor, etc? once we get a request we simply throw this into the tokio task pool and can't tell if something works or not.

This worked with tokio::task::JoinSet::new() 🙂

…pes, adding initial skeleton of inference-node main loop, wiring inference-node up to iroh gossip updates, updating Cargo toml

… shared crate

… param type to be optional, single protocol, and generic, adding justfile commands and test script to test inference

…P response handling

….lock

…rotocolHandler method and custom protocol code path

…g to single node selection for request routing

…asks

samherring99 force-pushed the inference_networking_gateway branch 5 times, most recently from 09b4132 to ad343a2 Compare January 14, 2026 01:36

samherring99 requested a review from arilotter January 14, 2026 02:13

samherring99 force-pushed the inference_networking_gateway branch 5 times, most recently from d1f1169 to 2dbbe93 Compare January 16, 2026 17:53

pefontana reviewed Jan 16, 2026

View reviewed changes

samherring99 force-pushed the inference_networking_gateway branch from 2dbbe93 to 937d280 Compare January 20, 2026 17:47

samherring99 force-pushed the inference_networking_gateway branch from 937d280 to 0af54f3 Compare January 21, 2026 17:03

samherring99 force-pushed the inference_networking_gateway branch from 0af54f3 to 958793a Compare January 21, 2026 19:21

IAvecilla reviewed Jan 21, 2026

View reviewed changes

samherring99 force-pushed the inference_networking_gateway branch 2 times, most recently from 36eea32 to 91248ac Compare January 21, 2026 21:17

arilotter reviewed Jan 22, 2026

View reviewed changes

samherring99 added 7 commits January 22, 2026 16:30

Updating protocol with InferenceMessage and InferenceGossipMessage ty…

22e88b4

…pes, adding initial skeleton of inference-node main loop, wiring inference-node up to iroh gossip updates, updating Cargo toml

Adding gateway node binary file source and adding protocol handler to…

a8bb766

… shared crate

Adding additional protocols vec to P2PNetwork init, updating protocol…

fdcae78

… param type to be optional, single protocol, and generic, adding justfile commands and test script to test inference

Bootstrapping from file and env variable, adding direct connection P2…

77f4d39

…P response handling

Adding stop tokens and chat templating to rust bridge, updating Cargo…

a55c8c1

….lock

Migrating all code to OpenAI API compatible chat completions endpoint

1a0f717

Updating init method changes for shared network to have an optional P…

5ae69b9

…rotocolHandler method and custom protocol code path

Moving bootstrap logic to new lib file in inference crate and changin…

dff55c4

…g to single node selection for request routing

samherring99 force-pushed the inference_networking_gateway branch from 91248ac to ef35e92 Compare January 22, 2026 23:52

Removing gateway feature flag, adding task set for tracking spawned t…

e4cb343

…asks

samherring99 force-pushed the inference_networking_gateway branch from ef35e92 to e4cb343 Compare January 23, 2026 00:30

samherring99 self-assigned this Jan 23, 2026

Full End-To-End Inference Flow and Gateway Node Implementation #487

Are you sure you want to change the base?

Full End-To-End Inference Flow and Gateway Node Implementation #487

Uh oh!

Conversation

samherring99 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing (requires a venv with vLLM installed as of now):

Output

Uh oh!

pefontana left a comment

Choose a reason for hiding this comment

Uh oh!

samherring99 commented Jan 19, 2026

Uh oh!

pefontana commented Jan 20, 2026

Uh oh!

pefontana commented Jan 20, 2026

Uh oh!

samherring99 commented Jan 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samherring99 Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samherring99 Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

samherring99 commented Jan 14, 2026 •

edited

Loading

samherring99 Jan 21, 2026 •

edited

Loading

samherring99 Jan 23, 2026 •

edited

Loading