TransparentSF

A data analysis and visualization platform for San Francisco city data

Overview

TransparentSF is a web-based application that combined AI agents with public data to answer questions, build interactive visualizations and deep analysis of San Francisco public city data. It includes features for:

Automated analysis of public datasets
AI-powered insights generation
Interactive chat interface for data exploration
Anomaly detection with PostgreSQL storage
AI-powered newsletter narration with ElevenLabs text-to-speech

Technology Stack

Backend: Python
Vector Database: Qdrant
Database: PostgreSQL
APIs:
- OpenAI API for analysis
- ElevenLabs API for text-to-speech narration

Installation

Prerequisites:

Python3 with pip
Docker (for Qdrant)
PostgreSQL (for anomaly storage)
OpenAI API key

Steps:

Clone the repository:

git clone https://github.com/yourusername/transparentSF.git
cd transparentSF

Install Python dependencies:

cd ai
pip install -r requirements.txt
cd ..

Set up environment variables:
Create a .env file in the project root with the following variables:

OPENAI_API_KEY=your_openai_api_key

PERPLEXITY_API_KEY=your_perplexity_api_key # Optional: For automated newsletter generation
ELEVENLABS_API_KEY=your_elevenlabs_api_key # Optional: For newsletter narration
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM # Optional: Default is Rachel voice
DATAWRAPPER_API_KEY=your_datawrapper_api_key # Required: For map and chart generation

# PostgreSQL Connection Details (Defaults are usually sufficient if running locally)
PG_HOST=localhost 
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=your_postgres_password # Set this if you configured a password
PG_DBNAME=transparentsf

Set up PostgreSQL:

# For macOS with Homebrew
brew install postgresql
brew services start postgresql

# For Ubuntu/Debian
sudo apt-get update
sudo apt-get install postgresql postgresql-contrib

# For Windows
# Download and install from https://www.postgresql.org/download/windows/

Initialize the PostgreSQL database:
```
python ai/tools/init_postgres_db.py
```
This script will create:
- A database named "transparentsf" (if it doesn't exist)
- An "anomalies" table to store detected anomalies
You can customize the connection parameters as needed:
```
python ai/tools/init_postgres_db.py --host localhost --port 5432 --user postgres --password <pass> --dbname transparentsf
```

Usage

Start all services:

The easiest way to start all required services is to use the included script:
```
./start_services.sh
```
This script will:
- Start the main backend service
- Start Qdrant for vector search
- Ensure all services are ready before proceeding
Access the application:

After starting the services, you can access:
- Chat Interface: http://localhost:8000/
  - Use this for interactive data exploration and queries
- Backend Configuration: http://localhost:8000/backend
  - Use this for system configuration, data analysis setup, and administration tasks
  - From here you can run analysis tasks, configure metrics, and manage the system

Manual service startup (alternative):

If you prefer to start services individually:

# Start Qdrant
docker run -p 6333:6333 qdrant/qdrant

# Start the backend service
cd ai
python main.py

Run Initial Analysis:

To perform initial data analysis, visit the backend configuration at:
```
http://localhost:8000/backend
```
From the backend interface, you can:
- Configure data sources
- Run analysis on specific metrics
- Schedule automatic analysis
- View analysis results

Project Structure

/ai: Core analysis and processing scripts
- backend.py: Initial data analysis pipeline
- webChat.py: Interactive chat interface
- load_analysis_2_vec.py: Vector database loader
- /agents: AI agent system components
  - explainer_agent.py: Agent for explaining metric changes and anomalies
  - __init__.py: Agent package initialization
- /tools: Utility scripts and tools
  - anomaly_detection.py: Anomaly detection with PostgreSQL storage
  - init_postgres_db.py: Database initialization tool
  - view_anomalies.py: Tool for viewing anomalies in the database
  - gen_map_dw.py: Datawrapper map generation
  - genChart.py: Chart generation utilities
  - genAggregate.py: Data aggregation tools
- /output: Generated analysis results
  - /narration: Generated audio narrations of newsletters
- /logs: System and analysis logs
  - /evals: Agent evaluation logs
- /templates: Web interface templates
- /data: Data storage and configuration
  - /prompts: AI prompt templates
  - /datasets: Dataset storage

Agent System

The project uses a sophisticated agent system for data analysis and explanation:

Core Agents

Explainer Agent: Specialized in explaining metric changes and anomalies
- Handles metric analysis and visualization
- Generates natural language explanations
- Creates interactive charts and maps
- Manages metric metadata and categorization

Agent Features

Tool Integration:
- Data querying and analysis
- Chart and map generation
- Anomaly detection
- Metric management
- Vector database search
Context Management:
- Maintains conversation history
- Tracks analysis context
- Manages dataset state
Evaluation System:
- Logs agent interactions
- Tracks tool usage
- Monitors performance metrics

Agent Tools

Data Tools:
- Dataset loading and management
- Data aggregation and transformation
- Time series analysis
Visualization Tools:
- Interactive map generation
- Chart creation
- Data presentation
Analysis Tools:
- Anomaly detection
- Metric tracking
- Trend analysis

For more details on agent configuration and usage, see the Agent System Documentation.

Features

Newsletter Narration

TransparentSF includes an AI-powered newsletter narration feature that converts monthly reports into professional-quality audio using ElevenLabs' text-to-speech API. The process involves:

Audio Transformation: AI optimization of newsletter content for audio consumption
- Converts complex sentences into natural speech patterns
- Writes out numbers and percentages for clear pronunciation
- Removes visual elements (charts, links, emojis)
- Adds smooth transitions between sections
- Optimizes length for a 2-3 minute listening experience
Text-to-Speech: Generation of professional-quality audio narration using ElevenLabs API

To use the narration feature:

Generate a monthly newsletter as usual
Navigate to the Monthly Reports page
Find your desired report in the list
Click the "Generate Narration" button
Wait for processing (30-60 seconds)
The audio file will be saved in output/narration/ with the same naming convention as the report

For more details on customization options, voice settings, and troubleshooting, see the Newsletter Narration Documentation.

Interactive Maps and Charts

TransparentSF integrates with Datawrapper to create professional, interactive visualizations:

Map Types

Supervisor District Maps: Visualize data across San Francisco's 11 supervisor districts
Police District Maps: Show data distribution across SFPD districts
Point Maps: Display specific locations with customizable markers
Address Maps: Plot locations using addresses (automatic geocoding)
Intersection Maps: Highlight specific street intersections

Map Features

Interactive Elements:
- Hover tooltips with detailed information
- Zoom and pan controls
- Responsive design for all devices
- Custom color schemes and legends
Data Visualization:
- Support for both absolute values and percentage changes
- Custom color palettes for different data types
- Automatic scaling and normalization
- Series grouping for multiple data categories

Chart Types

Time Series: Track changes over time
Bar Charts: Compare values across categories
Pie Charts: Show proportional distributions
Line Charts: Display trends and patterns

Customization Options

Visual Styling:
- Custom color schemes
- Adjustable marker sizes
- Configurable tooltips
- Custom legends and labels
Data Presentation:
- Automatic data formatting
- Custom number formatting
- Configurable thresholds
- Series grouping and aggregation

Usage

Generate a visualization through the web interface
Choose the appropriate map or chart type
Configure visualization settings
Preview and adjust as needed
Embed in reports or export as needed

For more details on customization options and advanced features, see the Datawrapper Integration Documentation.

Contributing

Fork the repository
Create your feature branch:
```
git checkout -b feature/AmazingFeature
```
Commit your changes:
```
git commit -m 'Add some AmazingFeature'
```
Push to the branch:
```
git push origin feature/AmazingFeature
```
Open a Pull Request

License

This project is licensed under the ISC License. See the LICENSE file for details.

Acknowledgments

San Francisco's DataSF efforts and all the departments that publish data
OpenAI for AI capabilities

Need help? Feel free to open an issue for support.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.vscode		.vscode
ai		ai
backup_history		backup_history
.DS_Store		.DS_Store
.cursorrules		.cursorrules
.gitignore		.gitignore
.replit		.replit
GROK_INTEGRATION_EXAMPLE.md		GROK_INTEGRATION_EXAMPLE.md
LICENSE		LICENSE
add_audio_file_column.sql		add_audio_file_column.sql
add_city_id_column.sql		add_city_id_column.sql
add_display_order_column.sql		add_display_order_column.sql
add_map_columns.sql		add_map_columns.sql
add_map_columns_improved.sql		add_map_columns_improved.sql
add_most_recent_data_date_column.sql		add_most_recent_data_date_column.sql
debug_stale_data.py		debug_stale_data.py
metrics.db		metrics.db
metrics_backup.sql		metrics_backup.sql
migrate_map_id_to_integer.sql		migrate_map_id_to_integer.sql
nullify_jail_bookings_map_query.sql		nullify_jail_bookings_map_query.sql
readme.md		readme.md
recreate_metrics.sql		recreate_metrics.sql
reference_symbol_config.json		reference_symbol_config.json
replit.nix		replit.nix
requirements.txt		requirements.txt
start_services.sh		start_services.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TransparentSF

Overview

Technology Stack

Installation

Usage

Project Structure

Agent System

Core Agents

Agent Features

Agent Tools

Features

Newsletter Narration

Interactive Maps and Charts

Map Types

Map Features

Chart Types

Customization Options

Usage

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

robjective/transparentSF

Folders and files

Latest commit

History

Repository files navigation

TransparentSF

Overview

Technology Stack

Installation

Usage

Project Structure

Agent System

Core Agents

Agent Features

Agent Tools

Features

Newsletter Narration

Interactive Maps and Charts

Map Types

Map Features

Chart Types

Customization Options

Usage

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages