Backend
VL (Velox)
Bug description
Description
Gluten's columnar writer optimization wraps AdaptiveSparkPlanExec with ColumnarToCarrierRow to avoid unnecessary columnar-to-row conversions. However, this breaks the pattern matching used in Apache Spark PR #51432, which relies on:
queryExecution.executedPlan match {
case ae: AdaptiveSparkPlanExec =>
ae.context.shuffleIds.asScala.keys
}
When AdaptiveSparkPlanExec is wrapped by ColumnarToCarrierRow, the pattern matching fails, making shuffle IDs inaccessible.
Root Cause
In GlutenWriterColumnarRules.injectFakeRowAdaptor(), when the child is an AdaptiveSparkPlanExec, the original implementation:
- Created a new
AdaptiveSparkPlanExec with supportsColumnar=true
- Wrapped this with
genColumnarToCarrierRow() → ColumnarToCarrierRow(AdaptiveSparkPlanExec(...))
This structure hides AdaptiveSparkPlanExec inside ColumnarToCarrierRow, breaking any external pattern matching.
Solution
Refactored the wrapping logic to:
- Wrap
aqe.inputPlan with genColumnarToCarrierRow() first → ColumnarToCarrierRow(inputPlan)
- Create a new
AdaptiveSparkPlanExec with the wrapped child → AdaptiveSparkPlanExec(ColumnarToCarrierRow(...))
- Set
supportsColumnar=false since the child is already wrapped
Gluten version
main branch
Spark version
spark-4.0.x
Spark configurations
No response
System information
No response
Relevant logs