Skip to content

SQL Database Support

Claude edited this page Oct 14, 2025 · 5 revisions

SQL Database Support

Complete guide to querying databases through API Dock using DuckDB.

Overview

API Dock supports SQL queries against Parquet files and other data sources using DuckDB. Define databases in your configuration and query them through REST endpoints.

Database Configuration

Database configurations are stored in config/databases/ directory. Each database defines:

  • tables: Mapping of table names to file paths (supports S3, GCS, HTTPS, local paths)
  • queries: Named SQL queries for reuse
  • routes: REST endpoints mapped to SQL queries

Basic Example

# databases/db_example.yaml
name: user_database
description: User management database

tables:
  users: s3://my-bucket/data/users.parquet
  permissions: https://example.com/data/permissions.parquet
  posts: databases/posts.parquet

routes:
  - route: users
    sql: SELECT [[users]].* FROM [[users]]

  - route: users/{{user_id}}
    sql: SELECT [[users]].* FROM [[users]] WHERE user_id = {{user_id}}

  - route: users/{{user_id}}/permissions
    sql: |
      SELECT [[permissions]].*
      FROM [[permissions]]
      WHERE [[permissions]].user_id = {{user_id}}

Cloud Storage Support

API Dock supports multiple storage backends:

Amazon S3

tables:
  users: s3://my-bucket/data/users.parquet
  logs: s3://logs-bucket/app/*.parquet  # Glob patterns supported

Google Cloud Storage

tables:
  analytics: gcs://my-bucket/analytics/data.parquet

HTTPS URLs

tables:
  public_data: https://data.example.com/dataset.parquet

Local Files

tables:
  local_users: databases/users.parquet
  dev_data: /absolute/path/to/data.parquet

SQL Syntax

Table References: [[table_name]]

Use double square brackets to reference tables defined in the tables section:

SELECT [[users]].* FROM [[users]]

This automatically expands to:

SELECT users.* FROM 's3://my-bucket/users.parquet' AS users

Path Parameters: {{param_name}}

Use double curly braces for route parameters:

route: users/{{user_id}}
sql: SELECT [[users]].* FROM [[users]] WHERE user_id = {{user_id}}

When accessing /db_example/users/123, {{user_id}} is replaced with '123' (SQL-escaped).

Named Queries: [[query_name]]

Reference named queries from the queries section:

queries:
  get_permissions: |
    SELECT p.*
    FROM [[permissions]] p
    WHERE p.user_id = {{user_id}}

routes:
  - route: users/{{user_id}}/permissions
    sql: "[[get_permissions]]"

Advanced Configuration

Named Queries with JOINs

queries:
  user_with_permissions: |
    SELECT
      u.*,
      p.permissions
    FROM [[users]] u
    LEFT JOIN [[permissions]] p ON u.user_id = p.user_id
    WHERE u.user_id = {{user_id}}

routes:
  - route: users/{{user_id}}/full
    sql: "[[user_with_permissions]]"

Multiple Parameters

routes:
  - route: posts/{{post_id}}/comments/{{comment_id}}
    sql: |
      SELECT *
      FROM [[comments]]
      WHERE post_id = {{post_id}}
        AND comment_id = {{comment_id}}

Aggregations

routes:
  - route: users/{{user_id}}/stats
    sql: |
      SELECT
        COUNT(*) as post_count,
        MAX(created_at) as last_post,
        AVG(likes) as avg_likes
      FROM [[posts]]
      WHERE user_id = {{user_id}}

Complex Queries

queries:
  popular_posts: |
    SELECT
      p.*,
      u.name as author_name,
      COUNT(c.comment_id) as comment_count
    FROM [[posts]] p
    JOIN [[users]] u ON p.user_id = u.user_id
    LEFT JOIN [[comments]] c ON p.post_id = c.post_id
    WHERE p.user_id = {{user_id}}
    GROUP BY p.post_id, u.name
    HAVING COUNT(c.comment_id) > 5
    ORDER BY comment_count DESC
    LIMIT 10

routes:
  - route: users/{{user_id}}/popular-posts
    sql: "[[popular_posts]]"

Complete Example

name: blog_database
description: Blog system with users, posts, and comments

tables:
  users: s3://blog-data/users.parquet
  posts: s3://blog-data/posts.parquet
  comments: s3://blog-data/comments.parquet
  tags: databases/tags.parquet

queries:
  # Reusable query components
  user_posts: |
    SELECT p.*
    FROM [[posts]] p
    WHERE p.user_id = {{user_id}}

  post_with_author: |
    SELECT
      p.*,
      u.name as author_name,
      u.email as author_email
    FROM [[posts]] p
    JOIN [[users]] u ON p.user_id = u.user_id
    WHERE p.post_id = {{post_id}}

routes:
  # Users
  - route: users
    sql: SELECT * FROM [[users]] LIMIT 100

  - route: users/{{user_id}}
    sql: SELECT * FROM [[users]] WHERE user_id = {{user_id}}

  - route: users/{{user_id}}/posts
    sql: "[[user_posts]]"

  # Posts
  - route: posts
    sql: SELECT * FROM [[posts]] ORDER BY created_at DESC LIMIT 50

  - route: posts/{{post_id}}
    sql: "[[post_with_author]]"

  - route: posts/{{post_id}}/comments
    sql: |
      SELECT c.*, u.name as commenter_name
      FROM [[comments]] c
      JOIN [[users]] u ON c.user_id = u.user_id
      WHERE c.post_id = {{post_id}}
      ORDER BY c.created_at ASC

  # Tags
  - route: tags/{{tag_name}}/posts
    sql: |
      SELECT p.*, u.name as author_name
      FROM [[posts]] p
      JOIN [[users]] u ON p.user_id = u.user_id
      WHERE p.tags LIKE '%' || {{tag_name}} || '%'

Viewing Expanded SQL

Use the describe command to see SQL with expanded table references:

pixi run api-dock describe blog_database

Output shows:

  • Table paths with full URLs
  • Expanded SQL queries
  • Parameter placeholders

Testing

# List all users
curl http://localhost:8000/blog_database/users

# Get specific user
curl http://localhost:8000/blog_database/users/123

# Get user's posts
curl http://localhost:8000/blog_database/users/123/posts

# Get post with comments
curl http://localhost:8000/blog_database/posts/456/comments

Performance Tips

  1. Use Parquet format - Columnar storage is efficient for queries
  2. Add WHERE clauses - Filter data as early as possible
  3. Use LIMIT - Restrict result sets for list queries
  4. Index data properly - Parquet row groups help with filtering
  5. Cache frequently accessed data - Consider local copies of hot data

DuckDB Features

API Dock uses DuckDB which supports:

  • Full SQL standard - Complex queries, CTEs, window functions
  • Parquet optimization - Predicate pushdown, column pruning
  • Multiple formats - Parquet, CSV, JSON
  • Cloud storage - S3, GCS with automatic credential handling
  • Glob patterns - Query multiple files: s3://bucket/data/*.parquet

Security

  • SQL injection protection - Parameters are properly escaped
  • Route restrictions - Control database access via restrictions
  • Read-only - Database queries are SELECT only (no INSERT/UPDATE/DELETE)

See Also