Talk. Delegate. Done.
Hob is a voice-first, multi-agent application built on the OpenAI Realtime API. It supports low-latency streaming voice conversations with AI agents that can call tools, hand off between each other, and delegate complex reasoning to higher-intelligence text models — all in real time.
Hob is a fork of OpenAI's Realtime API Agents Demo, used under the MIT license and being developed into a new product in the open.
- Node.js 18+
- An OpenAI API key with Realtime API access, or an Azure OpenAI resource with Realtime API enabled
npm installCopy the sample env file and fill in your credentials:
cp .env.sample .env.localOPENAI_API_KEY=sk-...LLM_PROVIDER=azure
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-azure-api-key
AZURE_OPENAI_API_VERSION=2025-04-01-preview
AZURE_OPENAI_REALTIME_DEPLOYMENT=gpt-4o-realtime-preview
AZURE_OPENAI_RESPONSES_DEPLOYMENT=gpt-4.1
AZURE_OPENAI_MINI_DEPLOYMENT=gpt-4o-miniThe app uses a three-tier strategy to determine which provider to use:
- Explicit — set
LLM_PROVIDERtoopenaiorazureto force a provider (useful when both sets of credentials are present) - Auto-detect — if
LLM_PROVIDERis unset, the app checks forOPENAI_API_KEYfirst, thenAZURE_OPENAI_ENDPOINT - Fail — if no provider can be resolved, the app throws a startup error
npm run devOpen http://localhost:3000 in your browser. Click Connect to start a voice session.
Hob uses WebRTC to stream audio directly between the browser and the OpenAI Realtime API — the Next.js server is only involved in minting a short-lived session token. Once connected, the conversation is handled by a network of AI agents defined in src/app/agentConfigs/.
In this repo, these are different things:
| Term | What it is | Example in code |
|---|---|---|
| Agent | One RealtimeAgent with a single role: instructions, tools, and allowed handoffs |
assistant, chatAgent, authenticationAgent |
| Agent scenario | A named RealtimeAgent[] set that defines the team used for one session |
defaultAssistantScenario, chatSupervisorScenario, customerServiceRetailScenario, simpleHandoffScenario |
Put simply:
- An agent is one worker.
- A scenario is the full team configuration and entry point you choose from the UI (
?agentConfig=<name>).
Four built-in scenarios are currently included:
| Scenario | Description |
|---|---|
defaultAssistant (default) |
Production-oriented single-assistant flow with hosted tools (webSearch, codeInterpreter, optional fileSearch) |
chatSupervisor |
Two-layer pattern where a realtime front agent delegates difficult responses/tool use to a stronger supervisor model |
customerServiceRetail |
Multi-agent retail example (authentication, returns, sales, simulated human escalation) |
simpleHandoff |
Minimal two-agent handoff reference for learning and debugging |
Select a scenario from the dropdown in the top bar, or pass ?agentConfig=<name> as a URL parameter.
defaultAssistant is the current default scenario (defaultAgentSetKey), backed by a single assistant RealtimeAgent defined in src/app/agentConfigs/defaultAssistant/index.ts.
Its behavior is intentionally simple:
- General-purpose voice assistant with concise, practical replies
- Handles lightweight conversation directly
- Uses tools for facts, recent information, calculations, and code-like tasks
- Asks follow-up questions when required parameters are missing
- Does not claim tool usage unless a tool was actually called
The defaultAssistant tools are implemented in src/app/agentConfigs/defaultAssistant/hostedTools.ts using the Agents SDK tool(...) helper.
How tool execution works:
- The realtime assistant decides to call a tool (
webSearchorcodeInterpreter) based on user intent. - The tool
execute(...)handler calls a shared helper (callResponses) thatPOSTs to/api/responses. /api/responsesis a server-side proxy that selects OpenAI vs Azure based on env config (LLM_PROVIDER+ fallback logic).- The proxy calls the Responses API with
parallel_tool_calls: falseand the hosted tool definition for that request. - Tool output text is normalized by
extractOutputText(...)and returned to the realtime agent as{ result: "..." }. - If the API call fails, the tool returns a structured error (for example
web_search_failedorcode_interpreter_failed) so the assistant can recover gracefully.
Tool-specific behavior:
webSearchaccepts a singlequerystring.webSearchuses the hosted Responses toolweb_search.webSearchprompts for an accurate, concise answer and asks for source URLs in plain text.codeInterpreteraccepts a singletaskstring.codeInterpreteruses the hosted Responses toolcode_interpreterwithcontainer: { type: "auto" }.codeInterpreterreturns concise computed or derived output from the code execution flow.
- Voice or Push-to-Talk — server-side VAD by default; PTT mode available via the toolbar
- Multi-agent handoffs — agents transfer the conversation context to each other seamlessly
- Tool calling — agents can look up data, manage carts, check policies, and more
- Output guardrails — every response is asynchronously checked for offensive, off-brand, or violent content
- Audio recording — the full conversation (both sides) can be downloaded as a WAV file
- Developer event log — a live panel shows every SDK and API event with expandable JSON payloads
For a full description of the system design, see docs/hob-architecture-baseline.md.
High-level stack: Next.js 15 · React 19 · TypeScript · OpenAI Agents SDK · WebRTC · Tailwind CSS
Everyone is invited and welcome to contribute: open issues, propose pull requests, share ideas, or help improve documentation.
Participation is open to all, regardless of background or viewpoint.
This project follows the FOSS Pluralism Manifesto,
which affirms respect for people, freedom to critique ideas, and space for diverse perspectives.
Copyright (c) 2025, 2026 OpenAI, Iwan van der Kleijn
This project is licensed under the MIT License. See the LICENSE file for details.

