Skip to content

A single interface in Go to get inference from multiple llm/ai providers using their official SDKs

License

Notifications You must be signed in to change notification settings

ppipada/inference-go

Repository files navigation

LLM Inference for Go

License: MIT Go Report Card lint

A single interface in Go to get inference from multiple LLM / AI providers using their official SDKs.

Features at a glance

  • Single normalized interface (ProviderSetAPI) for multiple providers. Current support:

  • Normalized data model in spec/:

    • messages (user / assistant / system / developer),
    • text, images, and files,
    • tools (function, custom, built-in tools like web search),
    • reasoning / thinking content,
    • streaming events (text + thinking),
    • usage accounting.
  • Streaming support:

    • Text streaming for all providers that support it.
    • Reasoning / thinking streaming where the provider exposes it (Anthropic, OpenAI Responses).
  • Client and Server Tools:

    • Client tools are supported via Function Calling.
    • Anthropic server-side web search.
    • OpenAI Responses web search tool.
    • OpenAI Chat Completions web search via web_search_options.
  • HTTP-level debugging:

    • Pluggable CompletionDebugger interface.
    • A built-in ready to use implementation at: debugclient.HTTPCompletionDebugger:
      • wraps SDK HTTP clients,
      • captures request/response metadata,
      • redacts secrets and sensitive content,
      • attaches a scrubbed debug blob to FetchCompletionResponse.DebugDetails.

Installation

# Go 1.25+
go get github.com/ppipada/inference-go

Quickstart

Basic pattern:

  1. Create a ProviderSetAPI.
  2. Add one or more providers. Set their API keys.
  3. Send a FetchCompletionRequest.

Examples

Supported providers

Anthropic Messages API

Feature support

Area Supported? Notes
Text input/output yes User and assistant messages mapped to text blocks.
Streaming text yes
Reasoning / thinking yes Redacted thinking is also supported; not streamed to caller.
Streaming thinking yes
Images (input) yes Base64 and URL images mapped to Anthropic image blocks.
Files / documents (input) yes PDF via base64 or URL. Plain-text base64 is currently skipped.
Tools (function/custom) yes JSON Schema based.
Web search yes Server web search tool use + web search result blocks.
Citations partial URL citations only. Other stateful citations are not mapped.
Metadata / service tiers opaque Not exposed in normalized types; available in debug payload.
Stateful flows no Library focuses on stateless calls only.
Usage data yes Input/Output/Cached. Anthropic doesn't expose Reasoning tokens usage.

OpenAI Chat Completions API

Feature support

Area Supported? Notes
Text input/output yes Single assistant message per completion (first choice).
Streaming text yes
Reasoning / thinking yes Reasoning effort config only; no separate reasoning messages in API.
Streaming thinking no Not exposed by Chat Completions.
Images (input) yes As data URLs or remote URLs inside user messages.
Files / documents (input) yes As data URLs only; stateful file IDs / file URLs are not used.
Tools (function/custom) yes JSON Schema based.
Web search yes API doesn't expose tool; mapped via top-level web_search_options.
Citations yes URL citations mapped from annotations.
Metadata / service tiers opaque Not exposed in normalized types; available in debug payload.
Stateful flows no Library focuses on stateless calls only.
Usage data yes Input/Output/Cached/Reasoning.

OpenAI Responses API

Feature support

Area Supported? Notes
Text input/output yes Input/output messages fully supported.
Streaming text yes
Reasoning / thinking yes Reasoning items mapped to ReasoningContent, including encrypted content.
Streaming thinking yes
Images (input) yes As data URLs or remote URLs.
Files / documents (input) yes As data URLs or remote URLs.
Tools (function/custom) yes JSON Schema based.
Web search yes Server web search tool choice + web search tool call blocks.
Citations yes URL citations mapped to spec.CitationKindURL.
Metadata / service tiers opaque Not exposed in normalized types; available in debug payload.
Stateful flows no Store is explicitly disabled (Store: false)
Usage data yes Input/Output/Cached/Reasoning.

HTTP debugging

The library exposes a pluggable CompletionDebugger interface:

type CompletionDebugger interface {
    HTTPClient(base *http.Client) *http.Client
    StartSpan(ctx context.Context, info *spec.CompletionSpanStart) (context.Context, spec.CompletionSpan)
}
  • package debugclient includes an implementation that can be readily used as HTTPCompletionDebugger:

    • wraps the provider SDK’s *http.Client,
    • captures and scrubs:
      • URL, method, headers (with secret redaction),
      • query params,
      • request/response bodies (optional, scrubbed of LLM text and large base64),
      • curl command for reproduction,
    • attaches a structured HTTPDebugState to FetchCompletionResponse.DebugDetails.
    • You can then inspect resp.DebugDetails for a given call, or just rely on slog output.
  • Use it via WithDebugClientBuilder:

ps, _ := inference.NewProviderSetAPI(
    inference.WithDebugClientBuilder(func(p spec.ProviderParam) spec.CompletionDebugger {
        return debugclient.NewHTTPCompletionDebugger(&debugclient.DebugConfig{
            LogToSlog: false,
        })
    }),
)

Notes

  • Stateless focus. The design focuses on stateless request/response interactions:

    • no conversation IDs,
    • no file IDs,
  • Opaque / provider‑specific fields.

    • Many provider‑specific fields (error details, service tiers, cache metadata, full raw responses) are only available through the debug payload, not in the normalized spec types.
    • Few of the common needed params may be added over time and as needed.
  • Token counting - Normalized Usage reports what the provider exposes:

    • Anthropic: input vs. cached tokens, output tokens.
    • OpenAI: prompt vs. cached tokens, completion tokens, reasoning tokens where available.
  • Heuristic prompt filtering.

    • ModelParam.MaxPromptLength triggers sdkutil.FilterMessagesByTokenCount, which uses a simple heuristic token counter. It is approximate, not an exact tokenizer.

Development

  • Formatting follows gofumpt and golines via golangci-lint, which is also used for linting. All rules are in .golangci.yml.
  • Useful scripts are defined in taskfile.yml; requires Task.
  • Bug reports and PRs are welcome:
    • Keep the public API (package inference and spec) small and intentional.
    • Avoid leaking provider‑specific types through the public surface; put them under internal/.
    • Please run tests and linters before sending a PR.

About

A single interface in Go to get inference from multiple llm/ai providers using their official SDKs

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •