Node.js CLI tool for local text classification using word embeddings.
- 🔄 Convert CSV to JSON embeddings using
Xenova/all-MiniLM-L6-v2 - 🧠 Classify unlabelled text via pre-trained embeddings
- 📈 Optional evaluation of dataset performance
- 🗃️ Works with CSVs containing
categoryandcommentheaders
Ideal for local NLP classification workflows.
git clone https://github.com/allemandi/embed-classify-cli.git
cd embed-classify-cli
yarn install
yarn startInput CSV files must include:
category: Label for training datacomment: Text content to embed or classify
Generate embedding.json from labeled CSV:
node index.js csv-embedding -i ./data/training.csv
Use trained embeddings to classify new input:
node index.js embedding-classification -i ./data/unclassified.csv -c ./data/embedding.json -o ./data/predicted.csv
Check configurable flags in
index.jsfor more options.
Tune classification behavior in embedding-classification.js with these params:
--weightedVotes
Use averaged similarity scores--comparisonPercentage
% of top similar samples to compare (0–100)--maxSamplesToSearch
Limit how many samples are compared--similarityThresholdPercent
Minimum cosine similarity to include in comparison
Check out these related projects that might interest you:
-
Embed Classify Web
Sleek, modern web app for text classification using embeddings. -
@allemandi/embed-utils
Utilities for text classification using cosine similarity embeddings. -
Vector Knowledge Base
A minimalist command-line knowledge system with semantic memory capabilities using vector embeddings for information retrieval.
If you have ideas, improvements, or new features:
- Fork the project
- Create your feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add some amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
If this project has helped you or saved you time, consider buying me a coffee to help fuel more ideas and improvements!
This project was developed with the help of AI tools (e.g., GitHub Copilot, Cursor, v0) for code suggestions, debugging, and optimizations.
MIT