This repository manages the InfoCompanies project's data model and database operations. It provides a complete workflow for company data using a Dockerized PostgreSQL database and modern schema management with Prisma.
- Dockerized PostgreSQL: Easy local setup with persistent volumes and PgAdmin UI.
- Data Enrichment: Python scripts for scraping and loading company data.
- Modern Schema Management: Prisma ORM with type-safe database access and robust migrations.
- Automated Data Loading: Bash scripts to orchestrate pulling, unzipping, and importing CSVs into the database.
- Backup & Restore: Tools for SQL/CSV backup and restore, including gzip support.
- Autocomplete Support: Extraction and indexing of unique values for fast autocomplete APIs.
- Database GUI: Built-in Prisma Studio for visual database exploration and management.
- Type Safety: Auto-generated, fully typed database client for multiple languages.
- CI/CD: GitHub Actions for linting, formatting, and build validation.
git clone <repo-url>
cd InfoCompanies-Data-ModelCopy and edit .env from template.env:
cp template.env .envSee schema/README.md for detailed Prisma setup.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.indocker compose up -dcd schema
prisma generate
prisma db push # Push schema to database./db.shThis will:
- Start Docker containers
- Set up database schema with Prisma
- Load CSVs from the ETL
- Shut down containers
cd schema
pnpm exec prisma studio- Prisma Schema: Modern database schema defined in schema/prisma/schema.prisma
- Model Documentation: Individual model references in schema/prisma/models/
- Type-Safe Client: Auto-generated Prisma Client for database operations
- Migration System: Robust schema versioning and migration management
- Company: Comprehensive business data with financial information (2018-2023)
- Leader: Company leadership and management information
- Autocomplete Models: City, Industry Sector, Legal Form, Region reference data
- User Management: User quotas and company interaction tracking
- Configuration: System settings and configuration data
- Web scraping and data enrichment in parsing/parsing.py and parsing/parsing_request.py
- Loads and updates company info from Google and other sources
- Database Management: Prisma CLI commands for schema and data management
- Backup/Restore: scripts/backup.sh
- CSV Transfer: scripts/util.sh
- Data Loading: scripts/load-csv-to-database.sh
- Database GUI: Prisma Studio for visual data exploration and editing
- Prisma Studio: Visual database browser and editor (
prisma studio) - Type Generation: Auto-generated type-safe database client
- Schema Validation: Built-in schema validation and error checking
- Migration Management: Version-controlled database schema evolution
- Linting, formatting, and build checks in .github/workflows/action.yml
- README.md: Main usage and features
- schema/README.md: Prisma schema management and migration guide
- schema/prisma/models/README.md: Database models overview
- docs/AUTOCOMPLETE.md: How to add autocomplete support
# Start database services
docker compose up -d
# Generate Prisma client (after schema changes)
cd schema && pnpm exec prisma generate
# Push schema to database (development)
cd schema && pnpm exec prisma db push
# Create and apply migrations (production)
cd schema && pnpm exec prisma migrate dev --name "your_migration_name"
# Open database GUI
cd schema && pnpm exec prisma studio
# Load data
./db.sh- All scripts assume a Unix-like environment and require Docker.
- Data files (
.csv,.dump, etc.) are git-ignored by default. - For troubleshooting, check logs in the output pane or use
docker logs. - Prisma Client: Generated client is located in
schema/generated/prisma/ - Environment: Ensure
DATABASE_URLis properly configured in your.envfile - Development: Use
pnpm exec prisma db pushfor quick schema changes,pnpm exec prisma migrate devfor production-ready migrations