A single interface in Go to get inference from multiple LLM / AI providers using their official SDKs.
- Features at a glance
- Installation
- Quickstart
- Examples
- Supported providers
- HTTP debugging
- Notes
- Development
-
Single normalized interface (
ProviderSetAPI) for multiple providers. Current support:- Anthropic Messages API. Official SDK used
- OpenAI Chat Completions API Official SDK used
- OpenAI Responses API Official SDK used
-
Normalized data model in
spec/:- messages (user / assistant / system / developer),
- text, images, and files,
- tools (function, custom, built-in tools like web search),
- reasoning / thinking content,
- streaming events (text + thinking),
- usage accounting.
-
Streaming support:
- Text streaming for all providers that support it.
- Reasoning / thinking streaming where the provider exposes it (Anthropic, OpenAI Responses).
-
Client and Server Tools:
- Client tools are supported via Function Calling.
- Anthropic server-side web search.
- OpenAI Responses web search tool.
- OpenAI Chat Completions web search via
web_search_options.
-
HTTP-level debugging:
- Pluggable
CompletionDebuggerinterface. - A built-in ready to use implementation at:
debugclient.HTTPCompletionDebugger:- wraps SDK HTTP clients,
- captures request/response metadata,
- redacts secrets and sensitive content,
- attaches a scrubbed debug blob to
FetchCompletionResponse.DebugDetails.
- Pluggable
# Go 1.25+
go get github.com/ppipada/inference-goBasic pattern:
- Create a
ProviderSetAPI. - Add one or more providers. Set their API keys.
- Send a
FetchCompletionRequest.
- Basic OpenAI Responses
- Basic OpenAI Chat Completions
- Basic Anthropic Messages
- Extended OpenAI Responses example
- Demonstrates tools, web search, file and image attachments.
Feature support
| Area | Supported? | Notes |
|---|---|---|
| Text input/output | yes | User and assistant messages mapped to text blocks. |
| Streaming text | yes | |
| Reasoning / thinking | yes | Redacted thinking is also supported; not streamed to caller. |
| Streaming thinking | yes | |
| Images (input) | yes | Base64 and URL images mapped to Anthropic image blocks. |
| Files / documents (input) | yes | PDF via base64 or URL. Plain-text base64 is currently skipped. |
| Tools (function/custom) | yes | JSON Schema based. |
| Web search | yes | Server web search tool use + web search result blocks. |
| Citations | partial | URL citations only. Other stateful citations are not mapped. |
| Metadata / service tiers | opaque | Not exposed in normalized types; available in debug payload. |
| Stateful flows | no | Library focuses on stateless calls only. |
| Usage data | yes | Input/Output/Cached. Anthropic doesn't expose Reasoning tokens usage. |
Feature support
| Area | Supported? | Notes |
|---|---|---|
| Text input/output | yes | Single assistant message per completion (first choice). |
| Streaming text | yes | |
| Reasoning / thinking | yes | Reasoning effort config only; no separate reasoning messages in API. |
| Streaming thinking | no | Not exposed by Chat Completions. |
| Images (input) | yes | As data URLs or remote URLs inside user messages. |
| Files / documents (input) | yes | As data URLs only; stateful file IDs / file URLs are not used. |
| Tools (function/custom) | yes | JSON Schema based. |
| Web search | yes | API doesn't expose tool; mapped via top-level web_search_options. |
| Citations | yes | URL citations mapped from annotations. |
| Metadata / service tiers | opaque | Not exposed in normalized types; available in debug payload. |
| Stateful flows | no | Library focuses on stateless calls only. |
| Usage data | yes | Input/Output/Cached/Reasoning. |
Feature support
| Area | Supported? | Notes |
|---|---|---|
| Text input/output | yes | Input/output messages fully supported. |
| Streaming text | yes | |
| Reasoning / thinking | yes | Reasoning items mapped to ReasoningContent, including encrypted content. |
| Streaming thinking | yes | |
| Images (input) | yes | As data URLs or remote URLs. |
| Files / documents (input) | yes | As data URLs or remote URLs. |
| Tools (function/custom) | yes | JSON Schema based. |
| Web search | yes | Server web search tool choice + web search tool call blocks. |
| Citations | yes | URL citations mapped to spec.CitationKindURL. |
| Metadata / service tiers | opaque | Not exposed in normalized types; available in debug payload. |
| Stateful flows | no | Store is explicitly disabled (Store: false) |
| Usage data | yes | Input/Output/Cached/Reasoning. |
The library exposes a pluggable CompletionDebugger interface:
type CompletionDebugger interface {
HTTPClient(base *http.Client) *http.Client
StartSpan(ctx context.Context, info *spec.CompletionSpanStart) (context.Context, spec.CompletionSpan)
}-
package
debugclientincludes an implementation that can be readily used asHTTPCompletionDebugger:- wraps the provider SDK’s
*http.Client, - captures and scrubs:
- URL, method, headers (with secret redaction),
- query params,
- request/response bodies (optional, scrubbed of LLM text and large base64),
- curl command for reproduction,
- attaches a structured
HTTPDebugStatetoFetchCompletionResponse.DebugDetails. - You can then inspect
resp.DebugDetailsfor a given call, or just rely onslogoutput.
- wraps the provider SDK’s
-
Use it via
WithDebugClientBuilder:
ps, _ := inference.NewProviderSetAPI(
inference.WithDebugClientBuilder(func(p spec.ProviderParam) spec.CompletionDebugger {
return debugclient.NewHTTPCompletionDebugger(&debugclient.DebugConfig{
LogToSlog: false,
})
}),
)-
Stateless focus. The design focuses on stateless request/response interactions:
- no conversation IDs,
- no file IDs,
-
Opaque / provider‑specific fields.
- Many provider‑specific fields (error details, service tiers, cache metadata, full raw responses) are only available through the debug payload, not in the normalized
spectypes. - Few of the common needed params may be added over time and as needed.
- Many provider‑specific fields (error details, service tiers, cache metadata, full raw responses) are only available through the debug payload, not in the normalized
-
Token counting - Normalized
Usagereports what the provider exposes:- Anthropic: input vs. cached tokens, output tokens.
- OpenAI: prompt vs. cached tokens, completion tokens, reasoning tokens where available.
-
Heuristic prompt filtering.
ModelParam.MaxPromptLengthtriggerssdkutil.FilterMessagesByTokenCount, which uses a simple heuristic token counter. It is approximate, not an exact tokenizer.
- Formatting follows
gofumptandgolinesviagolangci-lint, which is also used for linting. All rules are in .golangci.yml. - Useful scripts are defined in
taskfile.yml; requires Task. - Bug reports and PRs are welcome:
- Keep the public API (
package inferenceandspec) small and intentional. - Avoid leaking provider‑specific types through the public surface; put them under
internal/. - Please run tests and linters before sending a PR.
- Keep the public API (