Skip to content

JacobMcKenzieSmarty/tfidf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TF-IDF Search Engine in Go

A simple and fast TF-IDF-based text search engine written in Go. It supports tokenization, log-scaled term frequency and inverse document frequency weighting, query vector construction, and cosine similarity ranking.


🛝 Based off the Presentation


🚀 Features

  • Tokenizes and indexes a set of short documents
  • Computes smoothed log TF-IDF vectors
  • Supports vectorized cosine similarity for ranking
  • Returns top-k most relevant documents for a query

🧱 Project Structure

tfidf/
├── go.mod              // Module definition
├── main.go             // Entry point
├── model/              // Shared types (Document, Vector, Score)
├── pipeline/           // Tokenizer, TF-IDF logic, search engine
├── data/               // Used to extract documents from an example corpus

🛠️ Getting Started

1. Clone the repo

git clone https://github.com/JacobMcKenzieSmarty/tfidf.git
cd tfidf

2. Run the project

go run main.go

🔍 Example

docs := []model.Document{
    {0, "apple orange banana"},
    {1, "banana apple"},
    {2, "computer science and data"},
}
query := "banana apple"

Output:

Rank 1: Doc 1 (score: 0.9765)
Rank 2: Doc 0 (score: 0.6123)
Rank 3: Doc 2 (score: 0.0000)

📦 Dependencies

No external libraries — pure Go!


📜 License

MIT License — feel free to use, modify, and contribute!


🤝 Contributing

PRs welcome! Open issues or feature requests freely.

About

This is meant as a quick demo for utilizing TF-IDF in information retrieval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages