This project is under high development and is not yet ready for production use.
pg_ducklake brings a native datalake experience into PostgreSQL, powered by DuckLake (a DuckDB lakehouse format with SQL catalog metadata and open Parquet data files).
- Managed DuckLake tables: Create/Write/Query DuckLake tables in PostgreSQL via SQL (e.g., psql/JDBC).
- DuckDB compatibility: tables created by
pg_ducklakeare directly queryable from DuckDB clients. - Cloud storage: store data files in AWS S3 (or GCS, R2) to decouple storage and compute for serverless analytics.
- Fast analytics: columnar storage + DuckDB vectorized execution, with hybrid queries over PostgreSQL heap tables supported.
-- Or use AWS S3 as data storage.
-- SET ducklake.default_table_path = 's3://my-bucket/prefix/';
CREATE TABLE my_table (
id INT,
name TEXT,
age INT
) USING ducklake;
INSERT INTO my_table VALUES (1, 'Alice', 25), (2, 'Bob', 30);
SELECT * FROM my_table;INSTALL ducklake;
LOAD ducklake;
ATTACH 'ducklake:postgres:dbname=postgres host=localhost' AS my_ducklake (METADATA_SCHEMA 'ducklake');
SELECT * FROM my_ducklake.public.my_table;For cloud storage (AWS S3 or Azure Blob Storage), see the Secrets Management guide.
Run PostgreSQL with pg_ducklake pre-installed in a docker container:
docker run -d -e POSTGRES_PASSWORD=duckdb --name pgducklake pgducklake/pgducklake:18-main
docker exec -it pgducklake psqlRequirements:
- PostgreSQL: 14, 15, 16, 17, 18
- Operating Systems: Ubuntu 22.04-24.04, macOS
git clone https://github.com/relytcloud/pg_ducklake
cd pg_ducklake
# (Optional) install pg_duckdb
# make install-pg_duckdb
make installSee compilation guide for detailed instructions.
This pattern performs a one-time ETL copy from row-store (PostgreSQL heap) tables to DuckLake (column-store) tables for fast analytics, while OLTP continues to use the original heap tables.
-- Create a PostgreSQL row-store (heap) table.
CREATE TABLE row_store_table AS
SELECT i AS id, 'hello pg_ducklake' AS msg
FROM generate_series(1, 10000) AS i;
-- Create a DuckLake column-store table via ETL.
CREATE TABLE col_store_table USING ducklake AS
SELECT *
FROM row_store_table;
-- Run analytics against the converted table.
SELECT max(id) FROM col_store_table;External datasets (e.g., CSV/Parquet) can be ingested with DuckDB readers and materialized as tables for analytics.
CREATE TABLE titanic USING ducklake AS
SELECT * FROM read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv');
SELECT "Pclass", "Sex", COUNT(*), AVG("Survived") AS survival_rate
FROM titanic
GROUP BY "Pclass", "Sex";- CREATE / CREATE_TABLE_AS for DuckLake tables
- INSERT / SELECT / DELETE / UPDATE for DuckLake tables
- Online schema evolution (ADD COLUMN / DROP COLUMN / type promotion)
- Time-travel queries
- Partitioned tables
- Read-only
pg_ducklaketables referencing shared DuckLake datasets (e.g., frozen DuckLake) - Table maintenance (e.g., compaction / GC) via PostgreSQL (e.g., VACUUM or UDFs) [^]
- HTAP support for incremental row-store → column-store conversion (PostgreSQL heap → DuckLake)
- Complex types
[^]: Table maintenance can be carried out by standalone DuckDB clients (preferable, since it is serverless and avoids burdening the PostgreSQL server);
pg_ducklakestill plans to expose these operations for ease of use.
- Native inlined (heap) table for small writes
- Better transaction concurrency model (based on PostgreSQL XID)
- Faster metadata operations via PostgreSQL native functions (e.g., SPI)
- Access control behavior for DuckLake tables [^]
[^]: DuckLake tables are exposed via PostgreSQL table access methods (AM), so PostgreSQL table/column privileges may already apply; the current behavior and gaps will be reviewed and documented. DuckLake itself relies on its metadata service for ACL management.
We welcome contributions! Please see:
- Issues for bug reports