Conversation
Member
Author
|
@YoungHypo, could you please review this PR when you have a chance? Thanks. |
Member
Author
|
No other comments, I will merge it |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Summary
When running
videx_build_envimmediately after importing data into a fresh MariaDB instance (Debian + Docker), table statistics may not be ready yet. In particular,information_schema.TABLES.TABLE_ROWScan temporarily be0(or inconsistent) for some InnoDB tables (observed onsupplierintpch_tiny, refer to #79 example).This causes VIDEX metadata collection to record incorrect row-count stats and can lead to different
EXPLAINresults betweenmariadbandmariadb-videx.This PR mitigates the issue by running
ANALYZE TABLEon all base tables in the target schema before fetching statistics.Related Issues
Resolves: #79
Detailed Description
What problem does this PR solve?
In a freshly initialized MariaDB instance, right after bulk importing data, InnoDB table statistics may not be updated yet. At this moment, querying
information_schema.TABLEScan returnTABLE_ROWS = 0(or otherwise inconsistent estimates) for some tables (e.g.,supplierintpch_tinyas seen in #79).VIDEX metadata collection (
fetch_information_schema) uses these statistics as part of its metadata output. If the stats are incorrect at collection time, VIDEX may build metadata with wrong row-count related values, which can impact cost estimation and lead to differentEXPLAINresults betweenmariadbandmariadb-videx.How does the solution work?
Before fetching any statistics, we proactively refresh table statistics in the target schema by:
SHOW TABLES FROM <schema>).ANALYZE TABLE <schema>.<table>for each base table.After
ANALYZE TABLE, subsequent reads frominformation_schema.TABLES(and related InnoDB stats tables) are much more stable, so metadata collection becomes deterministic even when executed immediately after data import.Trade-offs / alternatives considered
ANALYZEper table) during metadata collection / environment build.Contribution Guidelines (Expand for Details)
We appreciate your contribution to VIDEX! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Core]: Changes to core engine functionality[Opt]: Changes to VIDEX-Optimizer-Plugin[Stats]: Changes to VIDEX-Statistic-Server[Algo]: Implementation of new algorithms for NDV, cardinality estimation, etc.[Pipe]: Enhancements to the pipeline (e.g., data collection, environment setup)[Bug]: Corrections to existing functionality[CI]: Changes to build process or CI pipeline[Docs]: Updates or additions to documentation[Test]: Adding or updating tests[Perf]: Performance improvements[Misc]: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use the most specific prefix or multiple prefixes in order of importance (e.g., [Algorithm][Stats]).
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.