Skip to content

Support decoding from a file #405

@arnoldwakim

Description

@arnoldwakim

Similar request to #204 but in reverse. I have looked for this in the already closed issues but did not find. Kindly redirect me if that is the case.

I was wondering if it is possible to read from a binary dump from PostgreSQL. Currently it is possible to connect to a Postgres instance directly, but when dumps are generated somewhere we cannot connect to, it is not possible to read data with DuckDB.

The reason behind this request is the following:
PostgreSQL's COPY ... TO ... WITH (FORMAT binary) produces compact .bin files. In some production environments, binary dumps are generated on database servers (e.g. via cron jobs, etc.) and shipped to others who don't have direct connection access. Today, the postgres_scanner extension can read binary data from a live Postgres wire connection and can write binary files via COPY ... TO ... (FORMAT postgres_binary), but there's no way to read those files back without a running PostgreSQL instance.

Therefore we propose the following API

Table function:

SELECT * FROM read_postgres_binary('path/to/dump.bin', columns={a: 'BOOLEAN', b: 'INTEGER', c: 'UUID'});

(We must provide the columns parameter since the PostgreSQL binary format doesn't encode type information (only field count and raw bytes per field), so the reader needs to be told what types to expect. For COPY FROM, the types are inferred from the target table schema.)

COPY FROM:

CREATE TABLE target (a BOOLEAN, b INTEGER, c UUID);
COPY target FROM 'path/to/dump.bin' (FORMAT postgres_binary);

I can try and and take a shot at it if you want:

The plan would be the following:

  • Extract a PostgresBinaryParser base class from the existing PostgresBinaryReader, containing all the type-parsing logic (ReadValue, ReadInteger, ReadDecimal, ReadArray, ReadGeometry, etc.) decoupled from libpq.

  • Have the existing PostgresBinaryReader (live connection) and a new PostgresBinaryFileReader (file based reader) both inherit from PostgresBinaryParser, differing only in how they source their data buffers (PQgetCopyData vs FileHandle::Read).

  • Register both a read_postgres_binary table function and COPY FROM support on the existing postgres_binary copy function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions