Benchmarking SQLite query execution time and memory usage across 19 TPC-H standard business intelligence queries — with statistically robust multi-run methodology and process-level isolation.
Standard database benchmarks often report single-run metrics, which are skewed by OS page caching, cold starts, and cross-query memory contamination in long-running processes. This study addresses that gap by:
- Running each TPC-H query 10–40 times across multiple isolated collection sessions
- Using subprocess-per-query isolation to prevent RSS memory pollution between queries
- Capturing both wall-clock time and process RSS memory per execution
- Aggregating into summary statistics and visualizations for direct cross-query comparison
The result is a reproducible, statistically grounded picture of how SQLite handles each of 19 real-world OLAP query patterns.
| Category | Tool | Why |
|---|---|---|
| Database | SQLite | Lightweight, file-based engine; ideal for reproducible local benchmarks without server overhead |
| Memory Profiling | psutil | Captures OS-level RSS (Resident Set Size) rather than Python heap only — catches SQLite's own buffer usage |
| Batch Automation | Bash + Python subprocess | Spawns a fresh Python process per query so each RSS baseline starts clean |
| Data Processing | pandas | Multi-run CSV merging, zero-value filtering, and describe() summary statistics |
| Visualization | matplotlib | Per-query execution bar charts and cross-query summary stat comparisons |
| Query Standard | TPC-H | Industry-standard OLAP benchmark; 22 parameterized business intelligence queries across 8 relational tables |
-
Multi-run statistical profiling — Each query was executed 10–40 times across 4 independent collection runs, yielding datasets with 50–120+ observations per query for robust mean and variance estimates.
-
Process-level isolation —
automateBash.pyis invoked as a fresh subprocess byexecLoop.bashfor each query. Every process starts with a clean heap and fresh RSS baseline, scoping memory deltas to the target query alone. -
Dual-metric capture — Each execution records
time.perf_counter()wall-clock time and apsutilRSS delta. SQLite's.statsoutput andEXPLAIN QUERY PLANare also captured per run for deeper diagnostics. -
Memory noise filtering —
combineFiles.pydrops zero-valued memory observations (cache-hit runs that return a near-zero RSS delta) and deduplicates readings before computing statistics, reducing measurement noise. -
Cross-query comparison — Summary stats (mean, std, min, 25%, 50%, 75%, max) saved to
time_allStats.csvandmem_allStats.csvfor direct comparison across all 19 queries.
flowchart TD
A[TPC-H SQL files\ntpchQueriesSQLlTE/] --> B[execLoop.bash]
B -->|"spawns subprocess\nper query"| C[automateBash.py]
C --> D[(tpch.db\nSQLite)]
C -->|N executions| E[Raw CSVs\nfinalMetrics/]
E --> F[combineFiles.py\nMerge + Filter + Stats]
F --> G[finalTimes/ & finalMem/\nper-query CSVs]
F --> H[time_allStats.csv\nmem_allStats.csv]
G --> I[exploreData.py\nVisualization]
I --> J[timeGraphs/*.png\nSummary *.png]
All times in seconds; collected over 50–120 runs per query on a local SQLite database.
| Query | Description | Mean (s) | Std (s) | Notes |
|---|---|---|---|---|
| Q9 | Product Type Profit | 32.67 | 4.16 | Slowest — 8-table join across full lineitem scan |
| Q13 | Customer Distribution | 23.73 | 2.99 | Correlated subquery over orders with LIKE filter |
| Q8 | National Market Share | 12.50 | 1.89 | Multi-join with region predicate pushdown |
| Q21 | Suppliers Who Kept Orders Waiting | 10.29 | 1.83 | Multiple correlated EXISTS subqueries |
| Q1 | Pricing Summary | 9.59 | 1.36 | Full lineitem scan with multi-column aggregation |
| Q7 | Volume Shipping | 7.49 | 0.99 | Cross-nation join with year extraction |
| Q5 | Local Supplier Volume | 4.85 | 0.49 | 6-table join with region and year filter |
| Q3 | Shipping Priority | 3.48 | 1.24 | Unshipped orders revenue ranking |
| Q19 | Discounted Revenue | 3.34 | 1.02 | Disjunctive OR predicate on lineitem |
| Q18 | Large Volume Customer | 2.73 | 0.96 | HAVING clause over grouped subquery |
| Q6 | Forecasting Revenue Change | 2.13 | 0.34 | Simple lineitem scan, discount range filter |
| Q14 | Promotion Effect | 2.77 | 0.46 | Conditional SUM over date-filtered lineitem |
| Q12 | Shipping Modes and Order Priority | 2.50 | 0.46 | JOIN on shipmode with CASE aggregation |
| Q10 | Returned Item Reporting | 2.99 | 0.40 | Customer revenue from flagged returns |
| Q15 | Top Supplier | 2.46 | 0.45 | Subquery creates ranked supplier view |
| Q4 | Order Priority Checking | 1.52 | 0.34 | EXISTS subquery per order row |
| Q11 | Important Stock Identification | 1.01 | 0.21 | Subquery computes warehouse-wide threshold |
| Q16 | Parts/Supplier Relationship | 0.50 | 0.08 | NOT IN with indexed part filter — fastest join |
| Q2 | Minimum Cost Supplier | 0.47 | 0.50 | Highly selective subquery; fastest overall |
Full statistics in
analysis/time_allStats.csvandanalysis/mem_allStats.csv. Per-query bar charts inanalysis/timeGraphs/andanalysis/*.png.
The problem: The original automate.py script ran all queries sequentially inside a single Python process. Because SQLite caches database pages and Python's memory allocator does not return RSS to the OS between queries, the RSS delta measured for query N included residual allocations from queries 1 through N-1. Memory readings were therefore systematically inflated for later-running queries, making cross-query comparison unreliable.
The solution: The harness was refactored into automateBash.py, a CLI-invocable module that accepts <db> <sql> <n_runs> <out_csv> as arguments. execLoop.bash invokes it as a separate python subprocess for each query. Every process starts with a fresh heap and a clean RSS baseline, so the memory delta is scoped to that query's executions only.
The trade-off: Each subprocess adds ~1–2 seconds of Python interpreter startup time. Since the target queries run for 0.5–45 seconds, the startup overhead is negligible relative to measurement signal and well within the noise of the standard deviation.
queryEfficiencyStudy/
├── tpchQueriesSQLlTE/ # 22 TPC-H queries adapted for SQLite syntax
│ └── querySource.md # Business meaning of each query
├── baseline_tests_python/ # Exploratory single-run scripts using tracemalloc (not used in final data)
├── automate.py # First-generation multi-run harness (single-process; deprecated)
├── automateBash.py # CLI-driven harness — use this for new data collection
├── execLoop.bash # Bash driver: spawns automateBash.py once per query
├── finalMetrics/ # Raw CSVs from 4 collection runs (run1, run3, run4, 10run)
└── analysis/
├── combineFiles.py # Merges runs, filters memory noise, computes summary stats
├── exploreData.py # Generates per-query bar charts and cross-query stat plots
├── finalTimes/ # Merged execution time CSVs (one file per query)
├── finalMem/ # Merged + filtered memory CSVs (one file per query)
├── timeGraphs/ # Bar chart PNGs per query
├── time_allStats.csv # Summary statistics table for execution times
└── mem_allStats.csv # Summary statistics table for memory usage
pip install pandas matplotlib psutilYou will also need a tpch.db SQLite database. Generate one using the TPC-H dbgen tool.
python automateBash.py /path/to/tpch.db tpchQueriesSQLlTE/q1.sql 30 finalMetrics/run4/q1.csv# Edit execLoop.bash to set correct paths, then:
bash execLoop.bashcd analysis/
python combineFiles.py # merges runs → finalTimes/, finalMem/, *_allStats.csv
python exploreData.py # generates charts → timeGraphs/*.png, *.png