pg_ducklake

PostgreSQL Extension for DuckLake

This project is under high development and is not yet ready for production use.

pg_ducklake brings a native datalake experience into PostgreSQL, powered by DuckLake (a DuckDB lakehouse format with SQL catalog metadata and open Parquet data files).

Key Features

Managed DuckLake tables: Create/Write/Query DuckLake tables in PostgreSQL via SQL (e.g., psql/JDBC).
DuckDB compatibility: tables created by pg_ducklake are directly queryable from DuckDB clients.
Cloud storage: store data files in AWS S3 (or GCS, R2) to decouple storage and compute for serverless analytics.
Fast analytics: columnar storage + DuckDB vectorized execution, with hybrid queries over PostgreSQL heap tables supported.

See it in action

Your first Data Lake in PostgreSQL

-- Or use AWS S3 as data storage.
-- SET ducklake.default_table_path = 's3://my-bucket/prefix/';

CREATE TABLE my_table (
    id INT,
    name TEXT,
    age INT
) USING ducklake;

INSERT INTO my_table VALUES (1, 'Alice', 25), (2, 'Bob', 30);

SELECT * FROM my_table;

Access your data with DuckDB

INSTALL ducklake;
LOAD ducklake;
ATTACH 'ducklake:postgres:dbname=postgres host=localhost' AS my_ducklake (METADATA_SCHEMA 'ducklake');
SELECT * FROM my_ducklake.public.my_table;

For cloud storage (AWS S3 or Azure Blob Storage), see the Secrets Management guide.

Quick Start

Docker

Run PostgreSQL with pg_ducklake pre-installed in a docker container:

docker run -d -e POSTGRES_PASSWORD=duckdb --name pgducklake pgducklake/pgducklake:18-main

docker exec -it pgducklake psql

Compile from source

Requirements:

PostgreSQL: 14, 15, 16, 17, 18
Operating Systems: Ubuntu 22.04-24.04, macOS

git clone https://github.com/relytcloud/pg_ducklake
cd pg_ducklake
# (Optional) install pg_duckdb
# make install-pg_duckdb
make install

See compilation guide for detailed instructions.

Usecases

Convert a PostgreSQL heap table into a DuckLake table

This pattern performs a one-time ETL copy from row-store (PostgreSQL heap) tables to DuckLake (column-store) tables for fast analytics, while OLTP continues to use the original heap tables.

-- Create a PostgreSQL row-store (heap) table.
CREATE TABLE row_store_table AS
SELECT i AS id, 'hello pg_ducklake' AS msg
FROM generate_series(1, 10000) AS i;

-- Create a DuckLake column-store table via ETL.
CREATE TABLE col_store_table USING ducklake AS
SELECT *
FROM row_store_table;

-- Run analytics against the converted table.
SELECT max(id) FROM col_store_table;

Load an external dataset

External datasets (e.g., CSV/Parquet) can be ingested with DuckDB readers and materialized as tables for analytics.

CREATE TABLE titanic USING ducklake AS
SELECT * FROM read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv');

SELECT "Pclass", "Sex", COUNT(*), AVG("Survived") AS survival_rate
FROM titanic
GROUP BY "Pclass", "Sex";

Roadmap

Features

CREATE / CREATE_TABLE_AS for DuckLake tables
INSERT / SELECT / DELETE / UPDATE for DuckLake tables
Online schema evolution (ADD COLUMN / DROP COLUMN / type promotion)
Time-travel queries
Partitioned tables
Read-only pg_ducklake tables referencing shared DuckLake datasets (e.g., frozen DuckLake)
Table maintenance (e.g., compaction / GC) via PostgreSQL (e.g., VACUUM or UDFs) [^]
HTAP support for incremental row-store → column-store conversion (PostgreSQL heap → DuckLake)
Complex types

[^]: Table maintenance can be carried out by standalone DuckDB clients (preferable, since it is serverless and avoids burdening the PostgreSQL server); pg_ducklake still plans to expose these operations for ease of use.

Performance

Native inlined (heap) table for small writes
Better transaction concurrency model (based on PostgreSQL XID)
Faster metadata operations via PostgreSQL native functions (e.g., SPI)

Docs

Access control behavior for DuckLake tables [^]

[^]: DuckLake tables are exposed via PostgreSQL table access methods (AM), so PostgreSQL table/column privileges may already apply; the current behavior and gaps will be reviewed and documented. DuckLake itself relies on its metadata service for ACL management.

Contributing

We welcome contributions! Please see:

Issues for bug reports

Acknowledgments

This project is built with pg_duckdb and ducklake.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.agents		.agents
.claude/skills		.claude/skills
.github/workflows		.github/workflows
docker		docker
docs		docs
include		include
scripts		scripts
sql		sql
src		src
test		test
third_party		third_party
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Makefile		Makefile
Makefile.global		Makefile.global
README.md		README.md
docker-bake.hcl		docker-bake.hcl
pg_ducklake.control		pg_ducklake.control

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pg_ducklake

Key Features

See it in action

Your first Data Lake in PostgreSQL

Access your data with DuckDB

Quick Start

Docker

Compile from source

Usecases

Convert a PostgreSQL heap table into a DuckLake table

Load an external dataset

Roadmap

Features

Performance

Docs

Contributing

Acknowledgments

About

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

pg_ducklake

Key Features

See it in action

Your first Data Lake in PostgreSQL

Access your data with DuckDB

Quick Start

Docker

Compile from source

Usecases

Convert a PostgreSQL heap table into a DuckLake table

Load an external dataset

Roadmap

Features

Performance

Docs

Contributing

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Languages