-
Notifications
You must be signed in to change notification settings - Fork 76
Description
What happens?
I'm finding that tables of a sufficient size are returning multiple rows even after specifying LIMIT 1. For me, it only seems to happen if I've tried to create an index within the same transaction. After committing the transaction, the expected behaviour returns.
This seems similar to #380 , #381 but I'm not sure so I opened a new issue.
Notes:
- On the test table, it did not show up with tables of sizes 5000, 500000.
- On my "real data", it shows up on a table with ~238000 rows
- On the test table, a size of 1000000 seems to return two rows, and 3000000 returns 6 rows.
- I was not able to reproduce this on a standard duckdb table
- I was not able to reproduce this directly against the postgres database (e.g. via
psql). - I'm running DuckDB via
nix, but
To Reproduce
Start a postgres server. I don't think this is specific to the podman or docker image because the same happens with postgres via a nix flake. I've included a podman command for ease of replication.
podman run -it -e POSTGRES_HOST_AUTH_METHOD=trust -e POSTGRES_USER=duck -e POSTGRES_DB=duck --network=host -p 5432:5432 docker.io/postgres:18.
Attach to the database in duckdb
INSTALL postgres; LOAD postgres;
ATTACH 'user=duck dbname=duck host=127.0.0.1 port=5432' AS pg (TYPE postgres);
Create a table on pg
CREATE TABLE pg.test_table AS
WITH data as (
SELECT
unnest(generate_series(1,3000000)) as id,
floor(random() * 3)::int as grp
)
select
data.id,
CASE
WHEN data.grp = 0 then 'Group A'
WHEN data.grp = 1 then 'Group B'
WHEN data.grp = 2 then 'Group C'
END as grp
from data;
;
Observe that without an index, limit 1 works
SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌───────┬─────────┐
│ id │ grp │
│ int64 │ varchar │
├───────┼─────────┤
│ 0 │ Group A │
└───────┴─────────┘
Create an index within a transaction and query limit 1 within the transaction
BEGIN;
CREATE INDEX test_idx ON pg.test_table (grp);
SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌────────┬─────────┐
│ id │ grp │
│ int64 │ varchar │
├────────┼─────────┤
│ 0 │ Group A │
│ 185000 │ Group A │
│ 370001 │ Group A │
│ 555000 │ Group A │
│ 740010 │ Group A │
│ 925000 │ Group A │
└────────┴─────────┘
I don't think an indexed is being used, but I can't seem to recreate this without the index (EXPLAIN ANALYZE was run inside of the transaction)
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Query Profiling Information ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││ Total Time: 0.0015s ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│ QUERY │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ──────────────────── │
│ 0 rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│ id │
│ grp │
│ │
│ 6 rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ TABLE_SCAN │
│ ──────────────────── │
│ Table: test_table │
│ │
│ Projections: │
│ grp │
│ id │
│ │
│ Filters: │
│ grp='Group A' │
│ │
│ 6 rows │
│ (0.00s) │
└───────────────────────────┘
Aborting the transaction yields the correct results
rollback;
SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌───────┬─────────┐
│ id │ grp │
│ int64 │ varchar │
├───────┼─────────┤
│ 0 │ Group A │
└───────┴─────────┘
Interestingly, trying to recreate the index outside of the aborted transaction yields an error, but not sure if related. I'm not able to drop the index either,
D CREATE INDEX test_idx ON pg.test_table (grp);
Binder Error:
Index with name "test_idx" already exists in schema "public
D DROP INDEX test_idx;
Catalog Error:
Index with name test_idx does not exist!
Did you mean "pg.test_idx"?
D DROP INDEX pg.test_idx;
Invalid Error:
Failed to execute query "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
DROP Index "public"."test_idx"": ERROR: index "test_idx" does not exist
If you repeat all the steps above on a fresh db, but commit instead of rolling back, you get the expected behaviour
D INSTALL postgres; LOAD postgres;
D ATTACH 'user=duck dbname=duck host=127.0.0.1 port=5432' AS pg (TYPE postgres);
D CREATE TABLE pg.test_table AS
WITH data as (
SELECT
unnest(generate_series(3000000)) as id,
floor(random() * 3)::int as grp
)
select
data.id,
CASE
WHEN data.grp = 0 then 'Group A'
WHEN data.grp = 1 then 'Group B'
WHEN data.grp = 2 then 'Group C'
END as grp
from data;
D ;
D SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌───────┬─────────┐
│ id │ grp │
│ int64 │ varchar │
├───────┼─────────┤
│ 0 │ Group A │
└───────┴─────────┘
D BEGIN;
D CREATE INDEX test_idx ON pg.test_table (grp);
D SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌────────┬─────────┐
│ id │ grp │
│ int64 │ varchar │
├────────┼─────────┤
│ 0 │ Group A │
│ 185000 │ Group A │
│ 370002 │ Group A │
│ 555002 │ Group A │
│ 740000 │ Group A │
│ 925000 │ Group A │
└────────┴─────────┘
D commit;
D SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;
┌───────┬─────────┐
│ id │ grp │
│ int64 │ varchar │
├───────┼─────────┤
│ 0 │ Group A │
└───────┴─────────┘
D EXPLAIN ANALYZE select * from pg.test_table WHERE grp = 'Group A' limit 1;
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Query Profiling Information ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE select * from pg.test_table WHERE grp = 'Group A' limit 1;
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││ Total Time: 0.0028s ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│ QUERY │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ──────────────────── │
│ 0 rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│ id │
│ grp │
│ │
│ 1 row │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ STREAMING_LIMIT │
│ ──────────────────── │
│ 1 row │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ TABLE_SCAN │
│ ──────────────────── │
│ Table: test_table │
│ │
│ Projections: │
│ grp │
│ id │
│ │
│ Filters: │
│ grp='Group A' │
│ │
│ 4,096 rows │
│ (0.00s) │
└───────────────────────────┘
I am also able to reproduce this on the latest main. To do that, I fired up a python docker container and installed via pip.
podman run --network=host -it docker.io/python:3.14-bookworm /bin/bash
pip install duckdb --upgrade --pre
and then running this inside the repl in the container:
import duckdb
con = duckdb.connect(":memory:")
con.sql("SELECT version()").show()
con.sql("INSTALL postgres; LOAD postgres; ATTACH 'user=duck dbname=duck host=127.0.0.1 port=5432' AS pg (TYPE postgres);")
con.sql("""
CREATE TABLE pg.test_table AS
WITH data as (
SELECT
unnest(generate_series(1,3000000)) as id,
floor(random() * 3)::int as grp
)
select
data.id,
CASE
WHEN data.grp = 0 then 'Group A'
WHEN data.grp = 1 then 'Group B'
WHEN data.grp = 2 then 'Group C'
END as grp
from data;
;
""")
con.sql("SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;").show()
con.begin()
con.sql("CREATE INDEX test_idx ON pg.test_table (grp);")
con.sql("SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;").show()
con.rollback()
con.sql("SELECT * FROM pg.test_table WHERE grp = 'Group A' limit 1;").show()
OS:
Fedora Silverblue 41
PostgreSQL Version:
18.1
DuckDB Version:
1.4.1
DuckDB Client:
CLI
Full Name:
Dillon Chan
Affiliation:
N/A
Have you tried this on the latest main branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree