API: Use unsigned byte-wise comparison to fix UUID comparison bug#14500
API: Use unsigned byte-wise comparison to fix UUID comparison bug#14500bodduv wants to merge 11 commits intoapache:mainfrom
Conversation
|
Thanks for handling this! |
api/src/test/java/org/apache/iceberg/expressions/TestInclusiveManifestEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestInclusiveManifestEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluator.java
Outdated
Show resolved
Hide resolved
|
overall the change LGTM, just left a few minor comments but we need to decide in the community how we want to proceed with this change |
|
@bodduv: Could you please help me understand the effect of this change on the current tables?
Am I correct, that after the upgrade the metadata filtering will skip the new file (UUID_MIDDLE < UUID_MAX) - filtered out by the wrong min value? |
|
Thank you for the comment @pvary
It matters how a query engine prepares min, max values for UUID columns to hand them over for writing manifest file and manifest lists. Some engines could use min and max values as prepared by Parquet Java (which is RFC compliant) during writes.
Yes, if the min and max metrics persisted in manifest file and manifest list are constructed using the faulty non-RFC compliant UUID comparisons, then yes we would not be able to read the new file back with such a filter (on UUID column) after upgrading. What is even more problematic: even an equality filter A remedy would be to migrate the table (doing a full table scan) to rewrite metrics accurately. Note: This issue is only in Java implementation of the spec. Go, Rust, Cpp implementations are RFC compliant making the bug more severe. I.e., If the same table is read with a filter using Go implementation, it produced correct, but different records than what Java implementation produces. |
|
This is a serious behavioral change which could effect correctness. I would try to resurrect the thread with a summary (short/easily understandable problem statement), and with a focused more detailed description. Also, I would add this to the next community sync topics. |
It is. But I also think this is a serious data correctness bug in Java implementation of the Iceberg spec. If we would like to preserve the old non-RFC compliant way of UUID comparisons, then there is a disparity with other implementations of the spec. So the actual question is: What does Iceberg spec say about UUID comparisons, how should UUID values be compared?
I should clarify regarding ^this. If we stick to RFC compliance, then we do NOT need a solution for implementations other than Java as other implementations are not affected by this UUID comparison bug. Let me clarify: If one uses Go implementation of the spec to create Iceberg table with a UUID column just like above. In this case, min=UUID_MIN and max=UUID_MAX compliant with RFC. No surprises while using a filter on UUID_MIDDLE, the new file should be read correctly. Query engines using Java implementation of the spec might need to revisit UUID comparisons. There is another approach of disabling any manifest entry filtering (data file filtering) and manifest file filtering (partition pruning) so as to not trigger any UUID comparisons (via Iceberg Java APIs). This are taking this approach and
Thank you @pvary for effort and taking a closer look into this. [1] Trino fixed trinodb/trino#12834 by trinodb/trino#12911. |
There was a problem hiding this comment.
I echo @pvary concern that this is a pretty significant behavior change, as the ordering won't be stable for UUIDs between versions. I'm digging into the RFCs exact language but I also came across https://stackoverflow.com/questions/79489893/does-the-uuid-comparison-in-java-violate-the-uuid-standard it sounds like this sorting behavior is particularly defined for UUIDV6/7 and sorting is not prescribed in V4? At the same time, the openJDK folks acknowledge that this is a bug, so I'm not sure which yet (like I said, still digging into it) https://bugs.openjdk.org/browse/JDK-7025832
In general, I think we should close this ambiguity on the sorting of UUIDs in the spec definition; as you pointed out, implementations are inconsistent in how this is performed. Whether or not this definition should be based off the RFC or if there's a good argument to retroactively work from the Java reference implementation behavior is something I'm still thinking through, and I think we should discuss in the mailing list.
I have addressed how RFCs define ordering among UUIDs in #14216. RFC 9562 section "6.1.1. Sorting" mentions "UUID formats created by this specification are intended to be lexicographically sortable while in the textual representation."
Maintaining backward compatibility is an argument in this case. But on the other hand, looking forward into the future, is there a possibility to communicate breaking changes and provide table migration strategy(ies) in release docs? I suppose its a topic for the mailing list. |
|
Anything which could be a breaking change needs to be discussed on the dev list. If needed, discussed on the sync |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
@pvary Thank you for promptly jumping in (during previous community sync) while I could not recollect the details of why we wanted to pause on the behavior change. It was discussed that evaluating expressions once with RFC-compliant comparator and again with signed comparator to then allow the filter to pass if either of the expression evaluations is true. I attempt this in fe9faf5. Following is a short description of the changes.
Implementation around expression evaluation was not conducive to such two fold evaluations. So I had to force it. I hope the increased code complexity is acceptable. |
1f74764 to
fe9faf5
Compare
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestStrictMetricsEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestStrictMetricsEvaluator.java
Outdated
Show resolved
Hide resolved
data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java
Outdated
Show resolved
Hide resolved
|
it would be great to get some additional eyes on this. @amogh-jahagirdar @singhpk234 @huaxingao @RussellSpitzer in case any of you have some time to take a look at this as well |
|
Marked as a bug and added to 1.11.0, as I see this could cause cross engine correctness issue. |
|
Thank you @nastra for taking the time to review. I addressed the comments. |
|
Sorry for the delay in reviewing this, I'll take a look today |
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java
Show resolved
Hide resolved
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thanks @bodduv from my side I fundamentally agree with the fix, I need a bit more time to go through all the tests.
|
Thank you @amogh-jahagirdar for taking the time to review. I addressed the comments made so far. |
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java
Outdated
Show resolved
Hide resolved
|
Thank you @stevenzwu for the comments. I addressed them. |
stevenzwu
left a comment
There was a problem hiding this comment.
went through the unit test code. really appreciate the comprehensive coverage. awesome work!
api/src/test/java/org/apache/iceberg/types/TestComparators.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestInclusiveManifestEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestInclusiveManifestEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluator.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Test | ||
| public void testStrictUuidInWithLegacyInvertedBounds() { | ||
| // With inverted bounds [0x80..., 0x40...] where lower > upper in RFC order, |
There was a problem hiding this comment.
we don't need to perform the dual check (signed and unsigned) for StrictMetricsEvaluator?
There was a problem hiding this comment.
In some sense this is a breaking change for the write/delete path: equipping StrictMetricsEvaluator with dual comparator would be quite questionable.
Hera are following instances that the new unsigned comparator is used (as they are related to write operations):
- Usage at
BaseOverwriteFiles: ensure that files being added during an overwrite operation match the overwrite filter. - Usage at
ManifestFilterManager: avoid data loss by validating data files marked for delete (throw validation error to make sure none of the data files are deleted based on signed comparisons) - Usage at
SparkTable: avoid metadata-based data file removal -> force row level deletions that use unsigned comparator.
data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java
Outdated
Show resolved
Hide resolved
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java
Outdated
Show resolved
Hide resolved
data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java
Show resolved
Hide resolved
|
Thank you @stevenzwu for the thorough review. I addressed them. |
Fixes #14216
The problem is described in the issue as well as in the dev mailing list post.
The changes proposed in the PR take a simple approach to fix the comparisons bug. This directly addresses correctness issues at both
ManifestEntryas well asManifestFilefiltering. Note that the changes make UUID values comparison forward-compatible, but compliant with UUID RFCs.Additionally, the changes here maintain backward compatibility with files written before RFC 4122/9562 compliant comparison was implemented. This is accomplished by evaluating predicate expression once with RFC compliant comparator and again with signed comparator. If either of the evaluations is true, then the ManifestEntry or ManifestFile is filtered-in. This conservative approach ensures files are not skipped incorrectly.