Skip to content

LIMIT 1 returning multiple rows within open transaction #395

@mosiman

Description

@mosiman

What happens?

I'm finding that tables of a sufficient size are returning multiple rows even after specifying LIMIT 1. For me, it only seems to happen if I've tried to create an index within the same transaction. After committing the transaction, the expected behaviour returns.

This seems similar to #380 , #381 but I'm not sure so I opened a new issue.

Notes:

  • On the test table, it did not show up with tables of sizes 5000, 500000.
  • On my "real data", it shows up on a table with ~238000 rows
  • On the test table, a size of 1000000 seems to return two rows, and 3000000 returns 6 rows.
  • I was not able to reproduce this on a standard duckdb table
  • I was not able to reproduce this directly against the postgres database (e.g. via psql).
  • I'm running DuckDB via nix, but

To Reproduce

Start a postgres server. I don't think this is specific to the podman or docker image because the same happens with postgres via a nix flake. I've included a podman command for ease of replication.

podman run -it -e POSTGRES_HOST_AUTH_METHOD=trust -e POSTGRES_USER=duck -e POSTGRES_DB=duck --network=host -p 5432:5432 docker.io/postgres:18.

Attach to the database in duckdb

INSTALL postgres; LOAD postgres;
ATTACH 'user=duck dbname=duck host=127.0.0.1 port=5432' AS pg (TYPE postgres);

Create a table on pg

CREATE TABLE pg.test_table AS
WITH data as (
    SELECT
        unnest(generate_series(1,3000000)) as id,
        floor(random() * 3)::int as grp
)
select
    data.id,
    CASE
        WHEN data.grp = 0 then 'Group A'
        WHEN data.grp = 1 then 'Group B'
        WHEN data.grp = 2 then 'Group C'
    END as grp
from data;
;

Observe that without an index, limit 1 works

SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌───────┬─────────┐
│  id   │   grp   │
│ int64 │ varchar │
├───────┼─────────┤
│   0   │ Group A │
└───────┴─────────┘

Create an index within a transaction and query limit 1 within the transaction

BEGIN;
CREATE INDEX test_idx ON pg.test_table (grp);
SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌────────┬─────────┐
│   id   │   grp   │
│ int64  │ varchar │
├────────┼─────────┤
│      0 │ Group A │
│ 185000 │ Group A │
│ 370001 │ Group A │
│ 555000 │ Group A │
│ 740010 │ Group A │
│ 925000 │ Group A │
└────────┴─────────┘

I don't think an indexed is being used, but I can't seem to recreate this without the index (EXPLAIN ANALYZE was run inside of the transaction)

┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││              Total Time: 0.0015s             ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             id            │
│            grp            │
│                           │
│           6 rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│     Table: test_table     │
│                           │
│        Projections:       │
│            grp            │
│             id            │
│                           │
│          Filters:         │
│       grp='Group A'       │
│                           │
│           6 rows          │
│          (0.00s)          │
└───────────────────────────┘

Aborting the transaction yields the correct results

rollback;
SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌───────┬─────────┐
│  id   │   grp   │
│ int64 │ varchar │
├───────┼─────────┤
│   0   │ Group A │
└───────┴─────────┘

Interestingly, trying to recreate the index outside of the aborted transaction yields an error, but not sure if related. I'm not able to drop the index either,

D CREATE INDEX test_idx ON pg.test_table (grp);
Binder Error:
Index with name "test_idx" already exists in schema "public
D DROP INDEX test_idx;
Catalog Error:
Index with name test_idx does not exist!
Did you mean "pg.test_idx"?
D DROP INDEX pg.test_idx;
Invalid Error:
Failed to execute query "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
DROP Index "public"."test_idx"": ERROR:  index "test_idx" does not exist

If you repeat all the steps above on a fresh db, but commit instead of rolling back, you get the expected behaviour

D INSTALL postgres; LOAD postgres;
D ATTACH 'user=duck dbname=duck host=127.0.0.1 port=5432' AS pg (TYPE postgres);
D CREATE TABLE pg.test_table AS
  WITH data as (
      SELECT
          unnest(generate_series(3000000)) as id,
          floor(random() * 3)::int as grp
  )
  select
      data.id,
      CASE
          WHEN data.grp = 0 then 'Group A'
          WHEN data.grp = 1 then 'Group B'
          WHEN data.grp = 2 then 'Group C'
      END as grp
  from data;
D ;
D SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌───────┬─────────┐
│  id   │   grp   │
│ int64 │ varchar │
├───────┼─────────┤
│   0   │ Group A │
└───────┴─────────┘
D BEGIN;
D CREATE INDEX test_idx ON pg.test_table (grp);
D SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌────────┬─────────┐
│   id   │   grp   │
│ int64  │ varchar │
├────────┼─────────┤
│      0 │ Group A │
│ 185000 │ Group A │
│ 370002 │ Group A │
│ 555002 │ Group A │
│ 740000 │ Group A │
│ 925000 │ Group A │
└────────┴─────────┘
D commit;
D SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌───────┬─────────┐
│  id   │   grp   │
│ int64 │ varchar │
├───────┼─────────┤
│   0   │ Group A │
└───────┴─────────┘
D EXPLAIN ANALYZE select * from pg.test_table WHERE grp = 'Group A' limit 1;
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE select * from pg.test_table WHERE grp = 'Group A' limit 1;
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││              Total Time: 0.0028s             ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             id            │
│            grp            │
│                           │
│           1 row           │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      STREAMING_LIMIT      │
│    ────────────────────   │
│           1 row           │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│     Table: test_table     │
│                           │
│        Projections:       │
│            grp            │
│             id            │
│                           │
│          Filters:         │
│       grp='Group A'       │
│                           │
│         4,096 rows        │
│          (0.00s)          │
└───────────────────────────┘

I am also able to reproduce this on the latest main. To do that, I fired up a python docker container and installed via pip.

podman run --network=host -it docker.io/python:3.14-bookworm /bin/bash
pip install duckdb --upgrade --pre

and then running this inside the repl in the container:

import duckdb
con = duckdb.connect(":memory:")
con.sql("SELECT version()").show()
con.sql("INSTALL postgres; LOAD postgres; ATTACH 'user=duck dbname=duck host=127.0.0.1 port=5432' AS pg (TYPE postgres);")

con.sql("""
CREATE TABLE pg.test_table AS
WITH data as (
    SELECT
        unnest(generate_series(1,3000000)) as id,
        floor(random() * 3)::int as grp
)
select
    data.id,
    CASE
        WHEN data.grp = 0 then 'Group A'
        WHEN data.grp = 1 then 'Group B'
        WHEN data.grp = 2 then 'Group C'
    END as grp
from data;
;
""")

con.sql("SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;").show()
con.begin()
con.sql("CREATE INDEX test_idx ON pg.test_table (grp);")
con.sql("SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;").show()
con.rollback()
con.sql("SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;").show()

OS:

Fedora Silverblue 41

PostgreSQL Version:

18.1

DuckDB Version:

1.4.1

DuckDB Client:

CLI

Full Name:

Dillon Chan

Affiliation:

N/A

Have you tried this on the latest main branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions