Skip to content

Comments

Analyze tables before collecting VIDEX metadata stats (fix unstable TABLE_ROWS after import)#81

Merged
kr11 merged 1 commit intomainfrom
fix/fetch_table_rows_zero
Jan 18, 2026
Merged

Analyze tables before collecting VIDEX metadata stats (fix unstable TABLE_ROWS after import)#81
kr11 merged 1 commit intomainfrom
fix/fetch_table_rows_zero

Conversation

@kr11
Copy link
Member

@kr11 kr11 commented Jan 11, 2026

Pull Request Summary

When running videx_build_env immediately after importing data into a fresh MariaDB instance (Debian + Docker), table statistics may not be ready yet. In particular, information_schema.TABLES.TABLE_ROWS can temporarily be 0 (or inconsistent) for some InnoDB tables (observed on supplier in tpch_tiny, refer to #79 example).

This causes VIDEX metadata collection to record incorrect row-count stats and can lead to different EXPLAIN results between mariadb and mariadb-videx.

This PR mitigates the issue by running ANALYZE TABLE on all base tables in the target schema before fetching statistics.

Related Issues

Resolves: #79

Detailed Description

What problem does this PR solve?

In a freshly initialized MariaDB instance, right after bulk importing data, InnoDB table statistics may not be updated yet. At this moment, querying information_schema.TABLES can return TABLE_ROWS = 0 (or otherwise inconsistent estimates) for some tables (e.g., supplier in tpch_tiny as seen in #79).

VIDEX metadata collection (fetch_information_schema) uses these statistics as part of its metadata output. If the stats are incorrect at collection time, VIDEX may build metadata with wrong row-count related values, which can impact cost estimation and lead to different EXPLAIN results between mariadb and mariadb-videx.

How does the solution work?

Before fetching any statistics, we proactively refresh table statistics in the target schema by:

  1. Listing tables in the target database (SHOW TABLES FROM <schema>).
  2. Running ANALYZE TABLE <schema>.<table> for each base table.

After ANALYZE TABLE, subsequent reads from information_schema.TABLES (and related InnoDB stats tables) are much more stable, so metadata collection becomes deterministic even when executed immediately after data import.

Trade-offs / alternatives considered

  • Trade-off: Adds a small upfront cost (one-time ANALYZE per table) during metadata collection / environment build.
  • Alternative: “Wait/retry until stats look sane” — rejected because it is less deterministic and still depends on timing.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to VIDEX! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Core]: Changes to core engine functionality
  • [Opt]: Changes to VIDEX-Optimizer-Plugin
  • [Stats]: Changes to VIDEX-Statistic-Server
  • [Algo]: Implementation of new algorithms for NDV, cardinality estimation, etc.
  • [Pipe]: Enhancements to the pipeline (e.g., data collection, environment setup)
  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [Test]: Adding or updating tests
  • [Perf]: Performance improvements
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use the most specific prefix or multiple prefixes in order of importance (e.g., [Algorithm][Stats]).

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Changes have been tested on both Plugin-Mode and Standalone-Mode (if applicable)
  • Statistical accuracy has been verified (for algorithm or optimizer changes)
  • No regression in query plan accuracy compared to InnoDB (if applicable)
  • Performance benchmarks conducted (for performance-sensitive changes)

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@kr11
Copy link
Member Author

kr11 commented Jan 11, 2026

@YoungHypo, could you please review this PR when you have a chance? Thanks.

@kr11
Copy link
Member Author

kr11 commented Jan 18, 2026

No other comments, I will merge it

@kr11 kr11 merged commit 721d3d5 into main Jan 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant