SwiftLlama

SwiftLlama is a wrapper for the llama.cpp library, designed to provide a Swift-native API for developers on iOS, macOS, watchOS, tvOS, and visionOS. It supports both text generation (Llama 3, CodeLlama, etc.) and embedding extraction (Nomic, BERT, etc.).

Installation

Swift Package Manager

Add the following to your Package.swift:

dependencies: [
    .package(url: "https://github.com/graemerycyk/SwiftLlama.git", from: "0.7.0")
]

Usage

1. Initialization

Initialize SwiftLlama with the path to your GGUF model file.

let swiftLlama = try SwiftLlama(modelPath: path)

2. Text Generation

Call without streaming

let response: String = try await swiftLlama.start(for: prompt)

Using AsyncStream for streaming

for try await value in await swiftLlama.start(for: prompt) {
    result += value
}

Using Combine publisher for streaming

await swiftLlama.start(for: prompt)
    .sink { _ in
    } receiveValue: {[weak self] value in
        self?.result += value
    }.store(in: &cancallable)

3. Embedding Extraction

Extract semantic embeddings from text for similarity search, RAG, and other ML tasks.

Quick Start

Download a Model: We recommend nomic-embed-text-v1.5.Q8_0.gguf for the best balance of quality and performance.
```
wget https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q8_0.gguf
```

Extract:

// Initialize with embedding model
let swiftLlama = try SwiftLlama(modelPath: "path/to/nomic-embed-text-v1.5.Q8_0.gguf")

// Extract embedding
let embedding = try await swiftLlama.extractEmbedding(for: "Hello, world!")

print("Dimension: \(embedding.count)") // e.g., 384
print("Magnitude: \(sqrt(embedding.reduce(0) { $0 + $1 * $1 }))") // ~1.0 (Normalized)

API Reference

extractEmbedding(for:) extracts a normalized embedding vector from input text.

@SwiftLlamaActor
public func extractEmbedding(for text: String) async throws -> [Float]

Returns: Normalized [Float] array (L2 norm ≈ 1.0).
Throws: modelNotLoaded, tokenizationFailed, invalidEmbeddingDimension, embeddingExtractionFailed.

Use Cases

Semantic Search Find documents similar to a query by comparing their embeddings.

let queryEmb = try await swiftLlama.extractEmbedding(for: "I forgot my password")
let docEmb = try await swiftLlama.extractEmbedding(for: "How to reset password")

// Calculate Cosine Similarity (Dot product of normalized vectors)
let similarity = zip(queryEmb, docEmb).reduce(0) { $0 + $1.0 * $1.1 }
print("Similarity: \(similarity)") // High value (0.7-1.0) indicates similarity

Duplicate Detection Identify duplicate content by checking for extremely high similarity (>0.95).

Recommended Models

nomic-embed-text-v1.5 (384 dimensions): Download. Best for general use.
all-MiniLM-L6-v2 (384 dimensions): Download. Very fast and lightweight.

Performance Tips

Batch Processing: Process texts sequentially in a loop or asyncMap. The model is thread-safe via Actor isolation.
Caching: Cache embeddings for static content to avoid re-computation.
Preprocessing: Trim whitespace and lowercase text for better matching consistency.
Quantization: Use Q4_K_M quantization for mobile devices to reduce memory usage with minimal quality loss.

Troubleshooting

"Invalid embedding dimension": Ensure you loaded an embedding model (e.g., nomic), not a generative chat model (e.g., Llama-3-Instruct).
Low similarity: Try preprocessing your text (lowercasing, removing special characters) or using a domain-specific model.

Supported Models

SwiftLlama supports models compatible with llama.cpp.

Text Generation: Llama 3, CodeLlama, Mistral, etc.
- Quick test: codellama-7b-instruct.Q4_K_S.gguf
Embeddings: Nomic, BERT, etc.

Test Projects

Refer to the TestProjects folder for iOS/macOS examples.

CLI Tool: TestProjects/TestApp-Commandline contains examples for both text generation and embeddings.
```
swift run test-embedding /path/to/model.gguf
```

Technical Details

llama.cpp Integration:

Uses llama_n_embd, llama_set_embeddings, llama_decode, and llama_get_embeddings.
Embeddings are L2-normalized automatically before returning.
Thread-safe access is managed via @SwiftLlamaActor.
Includes ggml backend support for Metal (GPU) and Accelerate (CPU) for optimal performance on Apple Silicon.

Contributing & Testing

Running Unit Tests

To run the embedding tests, you need to provide a path to a valid embedding model via an environment variable.

export EMBEDDING_MODEL_PATH="/path/to/nomic-embed-text-v1.5.Q8_0.gguf"
swift test

Manual Verification

Build: swift build
Test Tool: swift run test-embedding /path/to/model.gguf
Checklist:
- ✅ Unit tests pass (testEmbeddingNormalization, testSimilarTextSimilarity).
- ✅ Manual testing with test-embedding tool confirms reasonable cosine similarity values.
- ✅ Memory usage is stable during batch processing.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github		.github
Sources		Sources
TestProjects		TestProjects
Tests/SwiftLlamaTests		Tests/SwiftLlamaTests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SwiftLlama

Installation

Swift Package Manager

Usage

1. Initialization

2. Text Generation

Call without streaming

Using AsyncStream for streaming

Using Combine publisher for streaming

3. Embedding Extraction

Quick Start

API Reference

Use Cases

Recommended Models

Performance Tips

Troubleshooting

Supported Models

Test Projects

Technical Details

Contributing & Testing

Running Unit Tests

Manual Verification

License

About

Uh oh!

Releases

Packages

Languages

License

graemerycyk/SwiftLlama

Folders and files

Latest commit

History

Repository files navigation

SwiftLlama

Installation

Swift Package Manager

Usage

1. Initialization

2. Text Generation

Call without streaming

Using AsyncStream for streaming

Using Combine publisher for streaming

3. Embedding Extraction

Quick Start

API Reference

Use Cases

Recommended Models

Performance Tips

Troubleshooting

Supported Models

Test Projects

Technical Details

Contributing & Testing

Running Unit Tests

Manual Verification

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages