-
Notifications
You must be signed in to change notification settings - Fork 76
Description
Similar request to #204 but in reverse. I have looked for this in the already closed issues but did not find. Kindly redirect me if that is the case.
I was wondering if it is possible to read from a binary dump from PostgreSQL. Currently it is possible to connect to a Postgres instance directly, but when dumps are generated somewhere we cannot connect to, it is not possible to read data with DuckDB.
The reason behind this request is the following:
PostgreSQL's COPY ... TO ... WITH (FORMAT binary) produces compact .bin files. In some production environments, binary dumps are generated on database servers (e.g. via cron jobs, etc.) and shipped to others who don't have direct connection access. Today, the postgres_scanner extension can read binary data from a live Postgres wire connection and can write binary files via COPY ... TO ... (FORMAT postgres_binary), but there's no way to read those files back without a running PostgreSQL instance.
Therefore we propose the following API
Table function:
SELECT * FROM read_postgres_binary('path/to/dump.bin', columns={a: 'BOOLEAN', b: 'INTEGER', c: 'UUID'});(We must provide the columns parameter since the PostgreSQL binary format doesn't encode type information (only field count and raw bytes per field), so the reader needs to be told what types to expect. For COPY FROM, the types are inferred from the target table schema.)
COPY FROM:
CREATE TABLE target (a BOOLEAN, b INTEGER, c UUID);
COPY target FROM 'path/to/dump.bin' (FORMAT postgres_binary);I can try and and take a shot at it if you want:
The plan would be the following:
-
Extract a
PostgresBinaryParserbase class from the existingPostgresBinaryReader, containing all the type-parsing logic (ReadValue,ReadInteger,ReadDecimal,ReadArray,ReadGeometry, etc.) decoupled fromlibpq. -
Have the existing
PostgresBinaryReader(live connection) and a newPostgresBinaryFileReader(file based reader) both inherit fromPostgresBinaryParser, differing only in how they source their data buffers (PQgetCopyDatavsFileHandle::Read). -
Register both a
read_postgres_binarytable function andCOPY FROMsupport on the existingpostgres_binarycopy function.