Skip to content

Youxise/SQLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQLLM

A comparative demonstration of AI-powered tools for databases, including vector search, natural language to SQL conversion, and RAG question-answering.

Overview

This project showcases three key AI capabilities for modern databases:

  1. Vector Search - Semantic search across documents using embeddings
  2. Text-to-SQL - Convert natural language questions into SQL queries
  3. RAG Q&A - Retrieval-Augmented Generation for intelligent question answering

Features

  • Multi-backend vector search comparison (ChromaDB, SQLite-vec, PostgreSQL pgvector, MongoDB)
  • Local LLM integration with Ollama for SQL generation
  • Real-time benchmarking and performance metrics
  • Interactive Streamlit interface

Quick Start

Prerequisites

  • Python 3.8+
  • 2GB RAM minimum
  • Optional: Docker (for PostgreSQL/MongoDB backends)
  • Optional: Ollama (for local LLM features)

Installation

# Clone the repository
git clone https://github.com/yourusername/ai-database-tools-demo.git
cd ai-database-tools-demo

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run app.py

The app will open at http://localhost:8501

With Docker (Optional)

For PostgreSQL and MongoDB backends:

docker-compose up -d

With Ollama (Optional)

For Text-to-SQL and RAG features:

# Install Ollama from https://ollama.ai
ollama pull llama3.2:1b

Project Structure

.
├── app.py                          # Main Streamlit application
├── requirements.txt                # Python dependencies
├── docker-compose.yml              # PostgreSQL & MongoDB setup
├── src/
│   ├── vector_search/              # Vector search implementations
│   │   ├── vector_search_comparative.py
│   │   ├── sqlite_backend.py
│   │   ├── postgres_backend.py
│   │   └── mongodb_backend.py
│   ├── text_to_sql/                # Natural language to SQL
│   │   └── text_to_sql_enhanced.py
│   ├── rag/                        # RAG Q&A system
│   │   └── rag_enhanced.py
│   └── utils/                      # Utilities and benchmarks
│       └── benchmarks.py
└── data/                           # Data storage

Modules

Vector Search

Compare semantic search across different vector database backends:

  • ChromaDB - Lightweight vector database
  • SQLite-vec - SQLite with vector extension
  • PostgreSQL pgvector - PostgreSQL extension (Docker required)
  • MongoDB - Atlas vector search (Docker required)

Features:

  • Side-by-side performance comparison
  • Real-time indexing and search benchmarks
  • Support for 384-dimensional embeddings (all-MiniLM-L6-v2)

Text-to-SQL

Convert natural language questions into executable SQL:

  • Powered by Ollama (local LLM)
  • Fallback to predefined queries if Ollama unavailable
  • SQLite demo database with clients, products, orders
  • Real-time SQL generation and execution

RAG Q&A

Retrieval-Augmented Generation for intelligent answers:

  • Document indexing with embeddings
  • Semantic search for relevant context
  • LLM-powered answer generation (with Ollama)
  • Fallback to direct passage display

Configuration

Environment Variables

Create a .env file (optional):

OLLAMA_HOST=http://localhost:11434
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=demo_db
POSTGRES_USER=demo_user
POSTGRES_PASSWORD=demo_pass
MONGODB_URI=mongodb://localhost:27017

Usage Examples

Vector Search

# Index documents
from src.vector_search import demo_vector_search
demo_vector_search()

# Documents are automatically converted to embeddings
# Search with: "What is a vector database?"

Text-to-SQL

# Generate SQL from natural language
question = "Top 10 customers by revenue"
# Returns: SELECT c.name, SUM(o.total) as revenue FROM customers c...

RAG Q&A

# Ask questions about indexed documents
question = "How to install pgvector?"
# Returns contextualized answer with sources

Performance

Approximate benchmarks on standard hardware:

Backend Indexing (8 docs) Search Latency QPS
ChromaDB 0.2s 10-30ms ~3000
SQLite-vec <0.1s 5-15ms ~1000
PostgreSQL pgvector 0.3s 5-15ms ~5000
MongoDB 0.4s 10-30ms ~8000

Technologies Used

  • Streamlit - Web interface
  • ChromaDB - Vector database
  • Sentence Transformers - Text embeddings
  • Ollama - Local LLM
  • PostgreSQL + pgvector - Vector search extension
  • MongoDB - Document database with vector search
  • SQLite - Embedded database
  • Pandas - Data manipulation

Troubleshooting

ChromaDB warnings

The app mocks onnxruntime to avoid DLL errors. This is normal and doesn't affect functionality.

Ollama not detected

Make sure Ollama is running: ollama serve

Add to PATH if needed (Windows): C:\Users\YourName\AppData\Local\Programs\Ollama

SQLite threading errors

All SQLite connections use check_same_thread=False for Streamlit compatibility.

Docker services not connecting

# Check Docker status
docker-compose ps

# Restart services
docker-compose restart

# View logs
docker-compose logs -f

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - feel free to use this project for learning and development.

Acknowledgments

  • Built with Streamlit for rapid prototyping
  • Embeddings powered by Sentence Transformers
  • Local LLM capabilities via Ollama
  • Vector search backends: ChromaDB, pgvector, MongoDB Atlas

Contact

For questions or feedback, please open an issue on GitHub.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages