Hob

Talk. Delegate. Done.

Hob is a voice-first, multi-agent application built on the OpenAI Realtime API. It supports low-latency streaming voice conversations with AI agents that can call tools, hand off between each other, and delegate complex reasoning to higher-intelligence text models — all in real time.

Hob is a fork of OpenAI's Realtime API Agents Demo, used under the MIT license and being developed into a new product in the open.

Getting Started

Prerequisites

Node.js 18+
An OpenAI API key with Realtime API access, or an Azure OpenAI resource with Realtime API enabled

Installation

npm install

Configuration

Copy the sample env file and fill in your credentials:

cp .env.sample .env.local

Direct OpenAI (default)

OPENAI_API_KEY=sk-...

Azure OpenAI

LLM_PROVIDER=azure
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-azure-api-key
AZURE_OPENAI_API_VERSION=2025-04-01-preview
AZURE_OPENAI_REALTIME_DEPLOYMENT=gpt-4o-realtime-preview
AZURE_OPENAI_RESPONSES_DEPLOYMENT=gpt-4.1
AZURE_OPENAI_MINI_DEPLOYMENT=gpt-4o-mini

Provider selection

The app uses a three-tier strategy to determine which provider to use:

Explicit — set LLM_PROVIDER to openai or azure to force a provider (useful when both sets of credentials are present)
Auto-detect — if LLM_PROVIDER is unset, the app checks for OPENAI_API_KEY first, then AZURE_OPENAI_ENDPOINT
Fail — if no provider can be resolved, the app throws a startup error

Running

npm run dev

Open http://localhost:3000 in your browser. Click Connect to start a voice session.

How It Works

Hob uses WebRTC to stream audio directly between the browser and the OpenAI Realtime API — the Next.js server is only involved in minting a short-lived session token. Once connected, the conversation is handled by a network of AI agents defined in src/app/agentConfigs/.

Agent vs Scenario

In this repo, these are different things:

Term	What it is	Example in code
Agent	One `RealtimeAgent` with a single role: instructions, tools, and allowed handoffs	`assistant`, `chatAgent`, `authenticationAgent`
Agent scenario	A named `RealtimeAgent[]` set that defines the team used for one session	`defaultAssistantScenario`, `chatSupervisorScenario`, `customerServiceRetailScenario`, `simpleHandoffScenario`

Put simply:

An agent is one worker.
A scenario is the full team configuration and entry point you choose from the UI (?agentConfig=<name>).

Four built-in scenarios are currently included:

Scenario	Description
`defaultAssistant` (default)	Production-oriented single-assistant flow with hosted tools (`webSearch`, `codeInterpreter`, optional `fileSearch`)
`chatSupervisor`	Two-layer pattern where a realtime front agent delegates difficult responses/tool use to a stronger supervisor model
`customerServiceRetail`	Multi-agent retail example (authentication, returns, sales, simulated human escalation)
`simpleHandoff`	Minimal two-agent handoff reference for learning and debugging

Select a scenario from the dropdown in the top bar, or pass ?agentConfig=<name> as a URL parameter.

DefaultAssistant

defaultAssistant is the current default scenario (defaultAgentSetKey), backed by a single assistant RealtimeAgent defined in src/app/agentConfigs/defaultAssistant/index.ts.

Its behavior is intentionally simple:

General-purpose voice assistant with concise, practical replies
Handles lightweight conversation directly
Uses tools for facts, recent information, calculations, and code-like tasks
Asks follow-up questions when required parameters are missing
Does not claim tool usage unless a tool was actually called

DefaultAssistant Tools (excluding file search)

The defaultAssistant tools are implemented in src/app/agentConfigs/defaultAssistant/hostedTools.ts using the Agents SDK tool(...) helper.

How tool execution works:

The realtime assistant decides to call a tool (webSearch or codeInterpreter) based on user intent.
The tool execute(...) handler calls a shared helper (callResponses) that POSTs to /api/responses.
/api/responses is a server-side proxy that selects OpenAI vs Azure based on env config (LLM_PROVIDER + fallback logic).
The proxy calls the Responses API with parallel_tool_calls: false and the hosted tool definition for that request.
Tool output text is normalized by extractOutputText(...) and returned to the realtime agent as { result: "..." }.
If the API call fails, the tool returns a structured error (for example web_search_failed or code_interpreter_failed) so the assistant can recover gracefully.

Tool-specific behavior:

webSearch accepts a single query string.
webSearch uses the hosted Responses tool web_search.
webSearch prompts for an accurate, concise answer and asks for source URLs in plain text.
codeInterpreter accepts a single task string.
codeInterpreter uses the hosted Responses tool code_interpreter with container: { type: "auto" }.
codeInterpreter returns concise computed or derived output from the code execution flow.

Key Features

Voice or Push-to-Talk — server-side VAD by default; PTT mode available via the toolbar
Multi-agent handoffs — agents transfer the conversation context to each other seamlessly
Tool calling — agents can look up data, manage carts, check policies, and more
Output guardrails — every response is asynchronously checked for offensive, off-brand, or violent content
Audio recording — the full conversation (both sides) can be downloaded as a WAV file
Developer event log — a live panel shows every SDK and API event with expandable JSON payloads

Architecture

For a full description of the system design, see docs/hob-architecture-baseline.md.

High-level stack: Next.js 15 · React 19 · TypeScript · OpenAI Agents SDK · WebRTC · Tailwind CSS

Principles of Participation

Everyone is invited and welcome to contribute: open issues, propose pull requests, share ideas, or help improve documentation.
Participation is open to all, regardless of background or viewpoint.

This project follows the FOSS Pluralism Manifesto,
which affirms respect for people, freedom to critique ideas, and space for diverse perspectives.

License and Copyright

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.claude		.claude
.codex/skills		.codex/skills
.github		.github
docs		docs
images		images
openspec		openspec
public		public
specs		specs
src/app		src/app
vendor		vendor
.env.sample		.env.sample
.gitignore		.gitignore
.gitmodules		.gitmodules
FOSS_PLURALISM_MANIFESTO.md		FOSS_PLURALISM_MANIFESTO.md
LICENSE		LICENSE
ORIG_OPENAI_README.md		ORIG_OPENAI_README.md
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hob

Getting Started

Prerequisites

Installation

Configuration

Direct OpenAI (default)

Azure OpenAI

Provider selection

Running

How It Works

Agent vs Scenario

DefaultAssistant

DefaultAssistant Tools (excluding file search)

Key Features

Architecture

Principles of Participation

License and Copyright

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

soyrochus/hob

Folders and files

Latest commit

History

Repository files navigation

Hob

Getting Started

Prerequisites

Installation

Configuration

Direct OpenAI (default)

Azure OpenAI

Provider selection

Running

How It Works

Agent vs Scenario

DefaultAssistant

DefaultAssistant Tools (excluding file search)

Key Features

Architecture

Principles of Participation

License and Copyright

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages