LLM Inference for Go

A single interface in Go to get inference from multiple LLM / AI providers using their official SDKs.

Features at a glance
Installation
Quickstart
Examples
Supported providers
HTTP debugging
Notes
Development

Features at a glance

Single normalized interface (ProviderSetAPI) for multiple providers. Current support:
- Anthropic Messages API. Official SDK used
- OpenAI Chat Completions API Official SDK used
- OpenAI Responses API Official SDK used
Normalized data model in spec/:
- messages (user / assistant / system / developer),
- text, images, and files,
- tools (function, custom, built-in tools like web search),
- reasoning / thinking content,
- streaming events (text + thinking),
- usage accounting.
Streaming support:
- Text streaming for all providers that support it.
- Reasoning / thinking streaming where the provider exposes it (Anthropic, OpenAI Responses).
Client and Server Tools:
- Client tools are supported via Function Calling.
- Anthropic server-side web search.
- OpenAI Responses web search tool.
- OpenAI Chat Completions web search via web_search_options.
HTTP-level debugging:
- Pluggable CompletionDebugger interface.
- A built-in ready to use implementation at: debugclient.HTTPCompletionDebugger:
  - wraps SDK HTTP clients,
  - captures request/response metadata,
  - redacts secrets and sensitive content,
  - attaches a scrubbed debug blob to FetchCompletionResponse.DebugDetails.

Installation

# Go 1.25+
go get github.com/ppipada/inference-go

Quickstart

Basic pattern:

Create a ProviderSetAPI.
Add one or more providers. Set their API keys.
Send a FetchCompletionRequest.

Examples

Basic OpenAI Responses
Basic OpenAI Chat Completions
Basic Anthropic Messages
Extended OpenAI Responses example
- Demonstrates tools, web search, file and image attachments.

Supported providers

Anthropic Messages API

Feature support

Area	Supported?	Notes
Text input/output	yes	User and assistant messages mapped to text blocks.
Streaming text	yes
Reasoning / thinking	yes	Redacted thinking is also supported; not streamed to caller.
Streaming thinking	yes
Images (input)	yes	Base64 and URL images mapped to Anthropic image blocks.
Files / documents (input)	yes	PDF via base64 or URL. Plain-text base64 is currently skipped.
Tools (function/custom)	yes	JSON Schema based.
Web search	yes	Server web search tool use + web search result blocks.
Citations	partial	URL citations only. Other stateful citations are not mapped.
Metadata / service tiers	opaque	Not exposed in normalized types; available in debug payload.
Stateful flows	no	Library focuses on stateless calls only.
Usage data	yes	Input/Output/Cached. Anthropic doesn't expose Reasoning tokens usage.

OpenAI Chat Completions API

Feature support

Area	Supported?	Notes
Text input/output	yes	Single assistant message per completion (first choice).
Streaming text	yes
Reasoning / thinking	yes	Reasoning effort config only; no separate reasoning messages in API.
Streaming thinking	no	Not exposed by Chat Completions.
Images (input)	yes	As data URLs or remote URLs inside user messages.
Files / documents (input)	yes	As data URLs only; stateful file IDs / file URLs are not used.
Tools (function/custom)	yes	JSON Schema based.
Web search	yes	API doesn't expose tool; mapped via top-level `web_search_options`.
Citations	yes	URL citations mapped from annotations.
Metadata / service tiers	opaque	Not exposed in normalized types; available in debug payload.
Stateful flows	no	Library focuses on stateless calls only.
Usage data	yes	Input/Output/Cached/Reasoning.

OpenAI Responses API

Feature support

Area	Supported?	Notes
Text input/output	yes	Input/output messages fully supported.
Streaming text	yes
Reasoning / thinking	yes	Reasoning items mapped to `ReasoningContent`, including encrypted content.
Streaming thinking	yes
Images (input)	yes	As data URLs or remote URLs.
Files / documents (input)	yes	As data URLs or remote URLs.
Tools (function/custom)	yes	JSON Schema based.
Web search	yes	Server web search tool choice + web search tool call blocks.
Citations	yes	URL citations mapped to `spec.CitationKindURL`.
Metadata / service tiers	opaque	Not exposed in normalized types; available in debug payload.
Stateful flows	no	Store is explicitly disabled (`Store: false`)
Usage data	yes	Input/Output/Cached/Reasoning.

HTTP debugging

The library exposes a pluggable CompletionDebugger interface:

type CompletionDebugger interface {
    HTTPClient(base *http.Client) *http.Client
    StartSpan(ctx context.Context, info *spec.CompletionSpanStart) (context.Context, spec.CompletionSpan)
}

package debugclient includes an implementation that can be readily used as HTTPCompletionDebugger:
- wraps the provider SDK’s *http.Client,
- captures and scrubs:
  - URL, method, headers (with secret redaction),
  - query params,
  - request/response bodies (optional, scrubbed of LLM text and large base64),
  - curl command for reproduction,
- attaches a structured HTTPDebugState to FetchCompletionResponse.DebugDetails.
- You can then inspect resp.DebugDetails for a given call, or just rely on slog output.
Use it via WithDebugClientBuilder:

ps, _ := inference.NewProviderSetAPI(
    inference.WithDebugClientBuilder(func(p spec.ProviderParam) spec.CompletionDebugger {
        return debugclient.NewHTTPCompletionDebugger(&debugclient.DebugConfig{
            LogToSlog: false,
        })
    }),
)

Notes

Stateless focus. The design focuses on stateless request/response interactions:
- no conversation IDs,
- no file IDs,
Opaque / provider‑specific fields.
- Many provider‑specific fields (error details, service tiers, cache metadata, full raw responses) are only available through the debug payload, not in the normalized spec types.
- Few of the common needed params may be added over time and as needed.
Token counting - Normalized Usage reports what the provider exposes:
- Anthropic: input vs. cached tokens, output tokens.
- OpenAI: prompt vs. cached tokens, completion tokens, reasoning tokens where available.
Heuristic prompt filtering.
- ModelParam.MaxPromptLength triggers sdkutil.FilterMessagesByTokenCount, which uses a simple heuristic token counter. It is approximate, not an exact tokenizer.

Development

Formatting follows gofumpt and golines via golangci-lint, which is also used for linting. All rules are in .golangci.yml.
Useful scripts are defined in taskfile.yml; requires Task.
Bug reports and PRs are welcome:
- Keep the public API (package inference and spec) small and intentional.
- Avoid leaking provider‑specific types through the public surface; put them under internal/.
- Please run tests and linters before sending a PR.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
.vscode		.vscode
debugclient		debugclient
internal		internal
scripts		scripts
spec		spec
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.testcoverage.yml		.testcoverage.yml
.tool-versions		.tool-versions
LICENSE		LICENSE
README.md		README.md
data_contract_meta.go		data_contract_meta.go
data_contract_meta_test.go		data_contract_meta_test.go
go.mod		go.mod
go.sum		go.sum
notes.md		notes.md
provider_set.go		provider_set.go
taskfile.yml		taskfile.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Inference for Go

Features at a glance

Installation

Quickstart

Examples

Supported providers

Anthropic Messages API

OpenAI Chat Completions API

OpenAI Responses API

HTTP debugging

Notes

Development

About

Uh oh!

Releases

Contributors 2

Uh oh!

Languages

License

ppipada/inference-go

Folders and files

Latest commit

History

Repository files navigation

LLM Inference for Go

Features at a glance

Installation

Quickstart

Examples

Supported providers

Anthropic Messages API

OpenAI Chat Completions API

OpenAI Responses API

HTTP debugging

Notes

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Uh oh!

Languages