-
Notifications
You must be signed in to change notification settings - Fork 0
SQL Database Support
Complete guide to querying databases through API Dock using DuckDB.
API Dock supports SQL queries against Parquet files and other data sources using DuckDB. Define databases in your configuration and query them through REST endpoints.
Database configurations are stored in config/databases/ directory. Each database defines:
- tables: Mapping of table names to file paths (supports S3, GCS, HTTPS, local paths)
- queries: Named SQL queries for reuse
- routes: REST endpoints mapped to SQL queries
# databases/db_example.yaml
name: user_database
description: User management database
tables:
users: s3://my-bucket/data/users.parquet
permissions: https://example.com/data/permissions.parquet
posts: databases/posts.parquet
routes:
- route: users
sql: SELECT [[users]].* FROM [[users]]
- route: users/{{user_id}}
sql: SELECT [[users]].* FROM [[users]] WHERE user_id = {{user_id}}
- route: users/{{user_id}}/permissions
sql: |
SELECT [[permissions]].*
FROM [[permissions]]
WHERE [[permissions]].user_id = {{user_id}}API Dock supports multiple storage backends:
tables:
users: s3://my-bucket/data/users.parquet
logs: s3://logs-bucket/app/*.parquet # Glob patterns supportedtables:
analytics: gcs://my-bucket/analytics/data.parquettables:
public_data: https://data.example.com/dataset.parquettables:
local_users: databases/users.parquet
dev_data: /absolute/path/to/data.parquetUse double square brackets to reference tables defined in the tables section:
SELECT [[users]].* FROM [[users]]This automatically expands to:
SELECT users.* FROM 's3://my-bucket/users.parquet' AS usersUse double curly braces for route parameters:
route: users/{{user_id}}
sql: SELECT [[users]].* FROM [[users]] WHERE user_id = {{user_id}}When accessing /db_example/users/123, {{user_id}} is replaced with '123' (SQL-escaped).
Reference named queries from the queries section:
queries:
get_permissions: |
SELECT p.*
FROM [[permissions]] p
WHERE p.user_id = {{user_id}}
routes:
- route: users/{{user_id}}/permissions
sql: "[[get_permissions]]"queries:
user_with_permissions: |
SELECT
u.*,
p.permissions
FROM [[users]] u
LEFT JOIN [[permissions]] p ON u.user_id = p.user_id
WHERE u.user_id = {{user_id}}
routes:
- route: users/{{user_id}}/full
sql: "[[user_with_permissions]]"routes:
- route: posts/{{post_id}}/comments/{{comment_id}}
sql: |
SELECT *
FROM [[comments]]
WHERE post_id = {{post_id}}
AND comment_id = {{comment_id}}routes:
- route: users/{{user_id}}/stats
sql: |
SELECT
COUNT(*) as post_count,
MAX(created_at) as last_post,
AVG(likes) as avg_likes
FROM [[posts]]
WHERE user_id = {{user_id}}queries:
popular_posts: |
SELECT
p.*,
u.name as author_name,
COUNT(c.comment_id) as comment_count
FROM [[posts]] p
JOIN [[users]] u ON p.user_id = u.user_id
LEFT JOIN [[comments]] c ON p.post_id = c.post_id
WHERE p.user_id = {{user_id}}
GROUP BY p.post_id, u.name
HAVING COUNT(c.comment_id) > 5
ORDER BY comment_count DESC
LIMIT 10
routes:
- route: users/{{user_id}}/popular-posts
sql: "[[popular_posts]]"name: blog_database
description: Blog system with users, posts, and comments
tables:
users: s3://blog-data/users.parquet
posts: s3://blog-data/posts.parquet
comments: s3://blog-data/comments.parquet
tags: databases/tags.parquet
queries:
# Reusable query components
user_posts: |
SELECT p.*
FROM [[posts]] p
WHERE p.user_id = {{user_id}}
post_with_author: |
SELECT
p.*,
u.name as author_name,
u.email as author_email
FROM [[posts]] p
JOIN [[users]] u ON p.user_id = u.user_id
WHERE p.post_id = {{post_id}}
routes:
# Users
- route: users
sql: SELECT * FROM [[users]] LIMIT 100
- route: users/{{user_id}}
sql: SELECT * FROM [[users]] WHERE user_id = {{user_id}}
- route: users/{{user_id}}/posts
sql: "[[user_posts]]"
# Posts
- route: posts
sql: SELECT * FROM [[posts]] ORDER BY created_at DESC LIMIT 50
- route: posts/{{post_id}}
sql: "[[post_with_author]]"
- route: posts/{{post_id}}/comments
sql: |
SELECT c.*, u.name as commenter_name
FROM [[comments]] c
JOIN [[users]] u ON c.user_id = u.user_id
WHERE c.post_id = {{post_id}}
ORDER BY c.created_at ASC
# Tags
- route: tags/{{tag_name}}/posts
sql: |
SELECT p.*, u.name as author_name
FROM [[posts]] p
JOIN [[users]] u ON p.user_id = u.user_id
WHERE p.tags LIKE '%' || {{tag_name}} || '%'Use the describe command to see SQL with expanded table references:
pixi run api-dock describe blog_databaseOutput shows:
- Table paths with full URLs
- Expanded SQL queries
- Parameter placeholders
# List all users
curl http://localhost:8000/blog_database/users
# Get specific user
curl http://localhost:8000/blog_database/users/123
# Get user's posts
curl http://localhost:8000/blog_database/users/123/posts
# Get post with comments
curl http://localhost:8000/blog_database/posts/456/comments- Use Parquet format - Columnar storage is efficient for queries
- Add WHERE clauses - Filter data as early as possible
- Use LIMIT - Restrict result sets for list queries
- Index data properly - Parquet row groups help with filtering
- Cache frequently accessed data - Consider local copies of hot data
API Dock uses DuckDB which supports:
- Full SQL standard - Complex queries, CTEs, window functions
- Parquet optimization - Predicate pushdown, column pruning
- Multiple formats - Parquet, CSV, JSON
- Cloud storage - S3, GCS with automatic credential handling
-
Glob patterns - Query multiple files:
s3://bucket/data/*.parquet
- SQL injection protection - Parameters are properly escaped
- Route restrictions - Control database access via restrictions
- Read-only - Database queries are SELECT only (no INSERT/UPDATE/DELETE)
- Routing and Restrictions - Route configuration
- Versioning - Versioned database configs
- Configuration - Main configuration setup
- Main README - Quick start guide