Spark 4.1: Display write metrics on SQL UI#15104
Spark 4.1: Display write metrics on SQL UI#15104manuzhang wants to merge 2 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR ports write metrics display functionality from Spark 4.0 to Spark 4.1, enabling write operation metrics to be shown in the Spark SQL UI. The changes introduce custom metric classes for tracking various write operations and integrate them with Spark's connector API.
Changes:
- Added 25 new custom metric classes extending
CustomSumMetricto track data files, delete files, records, and file sizes for added/removed/total categories - Integrated metrics reporting into
SparkWriteandSparkPositionDeltaWriteby implementingreportDriverMetrics()andsupportedCustomMetrics() - Enhanced
BaseTableto support combining multiple metrics reporters
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| TotalRecords.java | Defines metric for tracking total record count |
| TotalPositionalDeletes.java | Defines metric for tracking total positional deletes |
| TotalFileSizeInBytes.java | Defines metric for tracking total file size |
| TotalEqualityDeletes.java | Defines metric for tracking total equality deletes |
| TotalDeleteFiles.java | Defines metric for tracking total delete files |
| TotalDataFiles.java | Defines metric for tracking total data files |
| RemovedRecords.java | Defines metric for tracking removed records |
| RemovedPositionalDeletes.java | Defines metric for tracking removed positional deletes |
| RemovedPositionalDeleteFiles.java | Defines metric for tracking removed positional delete files |
| RemovedFileSizeInBytes.java | Defines metric for tracking removed file size |
| RemovedEqualityDeletes.java | Defines metric for tracking removed equality deletes |
| RemovedEqualityDeleteFiles.java | Defines metric for tracking removed equality delete files |
| RemovedDeleteFiles.java | Defines metric for tracking removed delete files |
| RemovedDataFiles.java | Defines metric for tracking removed data files |
| AddedRecords.java | Defines metric for tracking added records |
| AddedPositionalDeletes.java | Defines metric for tracking added positional deletes |
| AddedPositionalDeleteFiles.java | Defines metric for tracking added positional delete files |
| AddedFileSizeInBytes.java | Defines metric for tracking added file size |
| AddedEqualityDeletes.java | Defines metric for tracking added equality deletes |
| AddedEqualityDeleteFiles.java | Defines metric for tracking added equality delete files |
| AddedDeleteFiles.java | Defines metric for tracking added delete files |
| AddedDataFiles.java | Defines metric for tracking added data files |
| SparkWriteBuilder.java | Adds custom metrics support to write builder |
| SparkWrite.java | Integrates metrics reporter and implements reportDriverMetrics() |
| SparkPositionDeltaWrite.java | Integrates metrics reporter and implements custom metrics methods |
| SparkWriteUtil.java | Provides utility methods for creating custom metrics and task metrics |
| InMemoryMetricsReporter.java | Adds commitReport() method to retrieve commit metrics |
| BaseTable.java | Adds combineMetricsReporter() method for combining metrics reporters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...k/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/metrics/AddedFileSizeInBytes.java
Outdated
Show resolved
Hide resolved
...v4.1/spark/src/main/java/org/apache/iceberg/spark/source/metrics/RemovedFileSizeInBytes.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/metrics/InMemoryMetricsReporter.java
Show resolved
Hide resolved
...k/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/metrics/AddedFileSizeInBytes.java
Outdated
Show resolved
Hide resolved
...v4.1/spark/src/main/java/org/apache/iceberg/spark/source/metrics/RemovedFileSizeInBytes.java
Outdated
Show resolved
Hide resolved
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java
Show resolved
Hide resolved
1c63813 to
2da3d86
Compare
nastra
left a comment
There was a problem hiding this comment.
@manuzhang can you please add some tests similar to TestSparkReadMetrics to make sure we actually get those commit metrics
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java
Show resolved
Hide resolved
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkWriteBuilder.java
Outdated
Show resolved
Hide resolved
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkWriteUtil.java
Outdated
Show resolved
Hide resolved
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkWriteUtil.java
Outdated
Show resolved
Hide resolved
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkWriteUtil.java
Outdated
Show resolved
Hide resolved
nastra
left a comment
There was a problem hiding this comment.
overall I think this is close but we're. missing some tests and changes to SparkPositionDeletesRewrite, since that one also implements Spark's Write API
c2e3242 to
3e32551
Compare
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeletesRewrite.java
Show resolved
Hide resolved
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkWriteMetrics.java
Outdated
Show resolved
Hide resolved
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkWriteMetrics.java
Outdated
Show resolved
Hide resolved
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkWriteMetrics.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
why not run tests for v3 tables by default?
There was a problem hiding this comment.
The current metrics are intended for v2 tables.
There was a problem hiding this comment.
The metrics should work across different format versions or not? My point is why we're limiting ourselves to a format version that isn't the latest one?
There was a problem hiding this comment.
The main difference is that position delete is replaced by deletion vector. I'd like to target the default table format version first. Besides, this PR is initially opened against Spark 3.5 in 2024.
Co-authored-by: copilot <copilot@github.com>
3e32551 to
c2b27e2
Compare
No description provided.