SwiftLlama is a wrapper for the llama.cpp library, designed to provide a Swift-native API for developers on iOS, macOS, watchOS, tvOS, and visionOS. It supports both text generation (Llama 3, CodeLlama, etc.) and embedding extraction (Nomic, BERT, etc.).
Add the following to your Package.swift:
dependencies: [
.package(url: "https://github.com/graemerycyk/SwiftLlama.git", from: "0.7.0")
]Initialize SwiftLlama with the path to your GGUF model file.
let swiftLlama = try SwiftLlama(modelPath: path)let response: String = try await swiftLlama.start(for: prompt)for try await value in await swiftLlama.start(for: prompt) {
result += value
}await swiftLlama.start(for: prompt)
.sink { _ in
} receiveValue: {[weak self] value in
self?.result += value
}.store(in: &cancallable)Extract semantic embeddings from text for similarity search, RAG, and other ML tasks.
- Download a Model: We recommend
nomic-embed-text-v1.5.Q8_0.gguffor the best balance of quality and performance.wget https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q8_0.gguf
- Extract:
// Initialize with embedding model let swiftLlama = try SwiftLlama(modelPath: "path/to/nomic-embed-text-v1.5.Q8_0.gguf") // Extract embedding let embedding = try await swiftLlama.extractEmbedding(for: "Hello, world!") print("Dimension: \(embedding.count)") // e.g., 384 print("Magnitude: \(sqrt(embedding.reduce(0) { $0 + $1 * $1 }))") // ~1.0 (Normalized)
extractEmbedding(for:) extracts a normalized embedding vector from input text.
@SwiftLlamaActor
public func extractEmbedding(for text: String) async throws -> [Float]- Returns: Normalized
[Float]array (L2 norm ≈ 1.0). - Throws:
modelNotLoaded,tokenizationFailed,invalidEmbeddingDimension,embeddingExtractionFailed.
Semantic Search Find documents similar to a query by comparing their embeddings.
let queryEmb = try await swiftLlama.extractEmbedding(for: "I forgot my password")
let docEmb = try await swiftLlama.extractEmbedding(for: "How to reset password")
// Calculate Cosine Similarity (Dot product of normalized vectors)
let similarity = zip(queryEmb, docEmb).reduce(0) { $0 + $1.0 * $1.1 }
print("Similarity: \(similarity)") // High value (0.7-1.0) indicates similarityDuplicate Detection Identify duplicate content by checking for extremely high similarity (>0.95).
- nomic-embed-text-v1.5 (384 dimensions): Download. Best for general use.
- all-MiniLM-L6-v2 (384 dimensions): Download. Very fast and lightweight.
- Batch Processing: Process texts sequentially in a loop or
asyncMap. The model is thread-safe via Actor isolation. - Caching: Cache embeddings for static content to avoid re-computation.
- Preprocessing: Trim whitespace and lowercase text for better matching consistency.
- Quantization: Use
Q4_K_Mquantization for mobile devices to reduce memory usage with minimal quality loss.
- "Invalid embedding dimension": Ensure you loaded an embedding model (e.g., nomic), not a generative chat model (e.g., Llama-3-Instruct).
- Low similarity: Try preprocessing your text (lowercasing, removing special characters) or using a domain-specific model.
SwiftLlama supports models compatible with llama.cpp.
- Text Generation: Llama 3, CodeLlama, Mistral, etc.
- Quick test: codellama-7b-instruct.Q4_K_S.gguf
- Embeddings: Nomic, BERT, etc.
Refer to the TestProjects folder for iOS/macOS examples.
- CLI Tool:
TestProjects/TestApp-Commandlinecontains examples for both text generation and embeddings.swift run test-embedding /path/to/model.gguf
llama.cpp Integration:
- Uses
llama_n_embd,llama_set_embeddings,llama_decode, andllama_get_embeddings. - Embeddings are L2-normalized automatically before returning.
- Thread-safe access is managed via
@SwiftLlamaActor. - Includes
ggmlbackend support for Metal (GPU) and Accelerate (CPU) for optimal performance on Apple Silicon.
To run the embedding tests, you need to provide a path to a valid embedding model via an environment variable.
export EMBEDDING_MODEL_PATH="/path/to/nomic-embed-text-v1.5.Q8_0.gguf"
swift test- Build:
swift build - Test Tool:
swift run test-embedding /path/to/model.gguf - Checklist:
- ✅ Unit tests pass (
testEmbeddingNormalization,testSimilarTextSimilarity). - ✅ Manual testing with
test-embeddingtool confirms reasonable cosine similarity values. - ✅ Memory usage is stable during batch processing.
- ✅ Unit tests pass (
MIT