Skip to content

Releases: apache/auron

v6.0.0-incubating

17 Oct 13:45

Choose a tag to compare

New Features

  • New Configurations: Introduced settings for decimal operations, JSON parsing fallback, Parquet reader, native logging, and compression.
  • Memory Management: Improve memory management using Linux RSS (resident set size).
  • Operators: Supports operator fusion in Sort -> SortMergeJoin execution, reducing costs of join key serialization.
  • Enhanced Compatibility: Added support for JDK 17 and Scala 2.13.
  • New Functions: Added support for trim in casts and extended hashing function coverage.

Improvements

  • Stability: Improved handling of stage retry on shuffle failures and memory spilling.
  • Modularity: Restructured codebase by extracting Celeborn, Uniffle, and Paimon into separate 3rdparty modules.
  • Observability: Improved logging with Thread IDs and enhanced Spark UI metrics for skew detection.
  • Uniffle Integration: Improved support and documentation for Uniffle shuffle manager.
  • Minor Performance Improvement: Optimized batch serde, array interleavig and coalescing.
  • Build & CI: Enhanced build scripts, added ARM support, and streamlined the CI process.

Bug Fixes

  • Data Correctness: Fixed critical issues in join logic, value comparisons, and hash calculations.
  • Memory Leaks & Crashes: Resolved memory management issues and NPEs.
  • Execution Engine: Fixed errors in outer generate, UDTF execution, and Parquet sink tasks.
  • Integration: Corrected issues with 3rdparty systems like Celeborn and Uniffle.

NOTE: This release includes a significant number of performance optimizations, memory management improvements, bug fixes, and new features, with notable enhancements in shuffle management, execution engine optimization, and third-party integration. Some minor changes are not included in the above list, please see the commit list for more details.

What's Changed

  • [BLAZE-975] Fix duplicated shuffle data fetch under AQE Rebalance when using Uniffle in Blaze by @merrily01 in #976
  • [DOCS] Add Uniffle integration guide to README.md by @merrily01 in #987
  • improve OnHeapSpillManager concurrency by @richox in #990
  • refactor UDTF: fix sliced binary array + ffi error by @richox in #989
  • fix outer generate by @richox in #991
  • Fix ScalarValue::List comparison by @richox in #994
  • refactor AccGenericColumn for better data type inference and memory usage statistics. by @richox in #997
  • fix incorrect join: rewrite join keys to fit spark's LongHashedRelation by @richox in #1002
  • default SUGGEST_BATCH_MEM_SIZE to 8MB by @richox in #1005
  • refactor SortExec: improve performance and memory statistics by @richox in #1006
  • batch_serde improvements by @richox in #1007
  • OnHeapSpillManager improvement by @richox in #1008
  • disable netty off-heap memory usage in BlazeShuffleManager by @richox in #1009
  • update datafusion dep: cherry pick apache/arrow-rs#7422: improve take_bytes performance and reduce oom by @richox in #1000
  • [BLAZE-1010] Bump Paimon from 1.0.1 to 1.1.1 by @SteNicholas in #1011
  • add TID to logging by @richox in #1014
  • optimize array interleaving by @richox in #1019
  • add memory limit to process resident usage by @richox in #1018
  • fix SpillBuf bug when getting disk size of a closed file by @richox in #1020
  • support normalize nan and zero by @Flyangz in #1016
  • fix GetIndexedField nullable logic by @richox in #1022
  • Script Mode Modification to Reduce CI Time by @lihao712 in #1025
  • fix AccScalarValueColumn with null values by @diliulou in #1023
  • Disable native decimal binary operation by @harveyyue in #1030
  • diable parquet predicate pruning for decimal types by @richox in #1033
  • fix incorrect FirstIgnoresNull logic by @richox in #1034
  • supports stage retry when shuffle read failed by @richox in #1035
  • fix incorrect First & FirstIgnoresNull logic by @Flyangz in #1037
  • fix NPE when initializing non-deterministic UDF wrapper in driver side by @richox in #1039
  • do not convert unsupported aggregate functions to UDAF wrapper by @richox in #1040
  • add spark.blaze.decimal.arithOp.enabled, defaults to false by @richox in #1042
  • fix blaze decimal opt not fallback when arithOp set false as default by @xm0830 in #1045
  • fix NPE in UDAFWrapper.resize by @richox in #1043
  • fix HashJoin rewriteKeyExpr by @richox in #1044
  • update rust edition to 2024 by @diliulou in #1051
  • fix all warnings by @diliulou in #1052
  • format Arc to PhysicalExprRef by @diliulou in #1053
  • move all dependency versions to root cargo.toml by @diliulou in #1055
  • supports spark.blaze.parseJsonError.fallback by @diliulou in #1050
  • [BLAZE-1056] Bump Celeborn version from 0.5.4 to 0.6.0 by @SteNicholas in #1057
  • remove unused cargo deps by @diliulou in #1058
  • java.lang.NoClassDefFoundError on Spark local mode using --conf spark.jars by @XorSum in #1047
  • Exclude log4j-slf4j-impl introduced from rss-client-spark3-shaded by @wForget in #1064
  • Use foldLeft instead of map+sum in SparkUDAFWrapperContext by @cxzl25 in #1074
  • [BLAZE-1071][FOLLOWUP] Fix incorrect mapStatus to prevent Uniffle failures in Blaze by @merrily01 in #1072
  • [MINOR] Fix typo in bloom_filter_might_contain.rs by @merrily01 in #1076
  • Blaze Sort->MergeJoin reduces key row and column conversion by @eden123456789 in #1078
  • Spark UI SMJ skew by @cxzl25 in #1079
  • improve performance of CoalesceStream by @richox in #1081
  • scala-compiler/scala-reflect scope provided by @cxzl25 in #1082
  • [BLAZE-1085] Make native log level configurable by @wForget in #1086
  • Add spotless check and apply in reformat by @cxzl25 in #1083
  • Improve rust log format by @cxzl25 in #1084
  • Spark UI SHJ skew by @cxzl25 in #1088
  • ByteBuddy use contextClassLoader by @XorSum in #1087
  • Use foldLeft instead of map+sum by @XorSum in #1091
  • Fix modules relative path by @turboFei in #1090
  • Avoid traversing Project list by @cxzl25 in #1094
  • SortExec: use merge sort for long keys to reduce number of comparison by @richox in #1096
  • improve DS scan unsupported message by @cxzl25 in #1093
  • CI batch by @cxzl25 in #1102
  • [NIT] Fix some typos by @turboFei in #1099
  • [BLAZE-1100] Fallback shuffle exchange when RoundRobinPartitioning with unsupported MapType by @merrily01 in #1101
  • tokio thread name with tid by @cxzl25 in #1095
  • Support to build project with fixed maven version by @turboFei in #1089
  • Fix and refine build-native.sh by @turboFei in #1097
  • CI protoc token by @cxzl25 in #1103
  • Fix NativeShuffledHash not implemented error by @turboFei in #1098
  • Skip build native for dev/reformat by @turboFei in #1107
  • [BLAZE-1104] Make Parquet maxOverReadSize and metadataCacheSize configurable by @merrily01 in #1105
  • Remove unused dev/.scalafmt.conf by @turboFei in #1110
  • fix execution error in non-native parquet sink tasks by @richox in #1123
  • Move build-native.sh into mvn-build-helper folder by @turboFei in #1106
  • Support to enable/disable blaze for different SparkPlan types during runtime by @turboFei in #1109
  • Make io compression zstd level configurable by @turboFei in #1111
  • Remove log cast key value by @cxzl25 in #1118
  • Shade more packages by @turboFei in #1121
  • Reuse code for ensure jni bridge ...
Read more

v5.0.0

28 Apr 09:41
36cb159

Choose a tag to compare

New Feature

  • Supports UDAF falling back.
  • Supports native round-robin partitioner.
  • Supports native range partitioner.
  • Supports native WindowGroupLimitExec introduced in Spark-3.5.
  • Supports SHJ falling back to SMJ when built side is too big.
  • Fully supports to Apache Celeborn shuffle service.
  • Initial supports to Apache Uniffle shuffle service.
  • Initial supports to Apache Paimon datasource.

Improvement

Improved memory management in AggExec/SortMergeJoinExec, reducing number of OOMs.
Imptoved metric statistics.

Bug fixes

  • Fixed inconsistent string to data casting.
  • Fixed inconsistent bloom filter join when bloom filter is generated by Spark.
  • Fixed incorrect sort ordering when writing tables with dynamic partitions.
  • Fixed inconsistent sha2x functions.
  • Fixed a lot of bugs those might lead to query failure, see What's Changed.

What's Changed

  • release version v4.0.1 by @richox in #690
  • fix incorrect expression conversion: Days should be DayOfMonth by @richox in #691
  • fix ci: cache spark binaries by @richox in #696
  • Bump smallvec from 2.0.0-alpha.7 to 2.0.0-alpha.8 by @dependabot in #692
  • Bump prost from 0.13.3 to 0.13.4 by @dependabot in #688
  • Dev repartitioning by @gy11233 in #693
  • Add Blaze icon and issue navigation In IDEA by @cxzl25 in #699
  • [BLAZE-700] Minor nit fix for hyperlink by @merrily01 in #701
  • [BLAZE-706] Fix year/month/day functions data type by @wForget in #703
  • [BLAZE-704] Specify name for spark ext function by @wForget in #705
  • Fix some incorrect module name mapping in docker compose file by @harveyyue in #709
  • fix ci: trigger ci when opening PR by @richox in #711
  • Support native scan hive paimon cow table by @harveyyue in #708
  • Automatically use the protoc version downloaded by the maven plugin by @cxzl25 in #702
  • [BLAZE-707][FOLLOWUP] NativePaimonTableScanExec should use shimed PartitionedFile and min partition number by @SteNicholas in #713
  • fix ci: trigger ci when opening/changing PR by @richox in #714
  • [BLAZE-287][FOLLOWUP] BlazeCelebornShuffleWriter should use mapped shuffle id for rerunning stage of fetch failure by @SteNicholas in #712
  • Bump sonic-rs from 0.3.16 to 0.3.17 by @dependabot in #694
  • Bump smallvec from 2.0.0-alpha.8 to 2.0.0-alpha.9 by @dependabot in #698
  • Bump foldhash from 0.1.3 to 0.1.4 by @dependabot in #710
  • feat(spill): Align with the multi IO compression codec in spill by @zuston in #657
  • bug fixes by @richox in #717
  • Fix OrcScan reads missing data column by @ASiegeLion in #716
  • fix test failures by @richox in #720
  • [BLAZE-725] Bump Spark from 3.5.3 to 3.5.4 by @SteNicholas in #726
  • Fix MacOS compile by @cxzl25 in #724
  • Apply spotless by @cxzl25 in #728
  • Remove bug_report unnecessary information by @cxzl25 in #727
  • fix mvn build helper by @richox in #735
  • Duplicated project schema will cause index out of bounds exception in orc_exec by @harveyyue in #723
  • [BLAZE-729] Fix a typo in the Shebang line of the shell script by @merrily01 in #730
  • Fix orc map type entries field naming issue by @harveyyue in #732
  • Bump itertools from 0.13.0 to 0.14.0 by @dependabot in #733
  • close inactive issues by @richox in #738
  • [BLAZE-736] Write time should increment for mapperEnd in CelebornPart… by @HYBG-1126 in #739
  • fix ci: use huaweicloud mirror to download spark binaries by @richox in #742
  • Bump tempfile from 3.14.0 to 3.15.0 by @dependabot in #741
  • Bump async-trait from 0.1.83 to 0.1.84 by @dependabot in #740
  • fix performance issues by @richox in #743
  • [BLAZE-744] Bump Celeborn version from 0.5.2 to 0.5.3 by @SteNicholas in #745
  • Dev repartitioning by @gy11233 in #734
  • Bump Paimon version from 0.9.0 to 1.0.0 by @harveyyue in #751
  • [BLAZE-747] Enhance the ArrowFFIExporter.exportNextBatch method to execute conditionally by @merrily01 in #748
  • fix range repartitioning proto issue by @gy11233 in #752
  • Bump async-trait from 0.1.84 to 0.1.85 by @dependabot in #746
  • Bump uuid from 1.11.0 to 1.11.1 by @dependabot in #754
  • Bump tokio from 1.42.0 to 1.43.0 by @dependabot in #755
  • fix ci: update to actions/upload-artifact@v4 by @richox in #756
  • fix ci: update to actions/upload-artifact@v4 by @richox in #757
  • fix ci: add --all-opens for supporting jdk17 by @richox in #758
  • fix ci: use cached spark-bin directory to walk around permission denied issue by @richox in #766
  • fix-ci: use specified jdk version by @richox in #767
  • fix-ci: adjust memory configuration by @richox in #768
  • Bump uuid from 1.11.1 to 1.12.0 by @dependabot in #765
  • Bump log from 0.4.22 to 0.4.25 by @dependabot in #764
  • [BLAZE-747][FOLLOW-UP] Fix user changed in FFI NextBatch by @Flyangz in #769
  • Bump smallvec from 2.0.0-alpha.9 to 2.0.0-alpha.10 by @dependabot in #770
  • [BLAZE-762] Return null when log function input is negative by @wForget in #763
  • [BLAZE-760] Fallback shuffle exchange when range partitioning with unsupported type by @wForget in #761
  • supports falling back hash join to sort merge join when hash table is too big by @richox in #753
  • fix-ci: remote incorrect cache by @richox in #779
  • fix-ci: rust fmt by @richox in #780
  • Add comma to line in README file by @xleoken in #778
  • bug fixes by @richox in #777
  • [BLAZE-775] Support float type for sum function by @wForget in #776
  • [BLAZE-773] Support long type for floor function by @wForget in #774
  • fix-ci: pull_request_target -> pull_request by @richox in #782
  • fix build error and code style by @wForget in #781
  • Bump uuid from 1.12.0 to 1.12.1 by @dependabot in #783
  • [BLAZE-786] Mark big decimal value convertion as unsupported by @wForget in #787
  • use better aggregate OwnedKey construction by @richox in #784
  • [BLAZE-790] Support LZ4_RAW compression codec for parquet by @SteNicholas in #791
  • Add support of mac aarch64 for tpcds data generator by @zuston in #792
  • Automatic cancel previous CI tests when newly commit comes for per PR by @zuston in #794
  • Add support of pprof dump for rust execution by @zuston in #793
  • Bump tempfile from 3.15.0 to 3.16.0 by @dependabot in #802
  • Bump serde from 1.0.216 to 1.0.217 by @dependabot in #800
  • Bump poem from 1.3.59 to 3.1.6 by @dependabot in #799
  • Bump rand from 0.8.5 to 0.9.0 by @dependabot in #801
  • Bump bytes from 1.9.0 to 1.10.0 by @dependabot in #811
  • Bump async-trait from 0.1.85 to 0.1.86 by @dependabot in #810
  • Add support of Apache Uniffle for remote shuffle service by @zuston in #796
  • use separated thread in ffi exporter by @richox in #788
  • Fix the rootless-docker action failure when building the jar in github action by @zuston in #813
  • Bump uuid from 1.12.1 to 1.13.1 by @dependabot in #814
  • Add support of building native with --features by @zuston in #797
  • Add support of memory profile by @zuston in #798
  • [BLAZE-808] Support statistics of ExecutionPlan for WindowExec by @SteNicholas in #809
  • [BLAZE-805] Support statistics of ExecutionPlan for SortExec by @steni...
Read more

v4.0.1

10 Dec 07:08
02082df

Choose a tag to compare

New Feature

  • Initial supports to ORC input file format.
  • Initial supports to RSS framework and Apache Celeborn shuffle service.

Improvement

  • Optimize AggExec by supporting Implement columnar-based aggregation.
  • Use custom implemented hashmap implement for aggregation.
  • Supports specialized count(0).
  • Optimize bloom filter by reusing same bloom filter in the same executor.
  • Optimize bloom filter by supporting shrinking.
  • Optimize reading parquet files by supporting parallel reading.
  • Improve spill file deletion logics.

Bug fixes

  • Fix file not found for path with url encoded character.
  • Fix Hashaggregate convert job throwing ScalaReflectionException.
  • Fix pruning error while reading parquet files with multiple row groups.
  • Fix incorrect number of tasks due to missing shuffleOrigin.
  • Fix record batch creating error when hash joining with empty input.

Other

  • Upgrade datafusion/arrow dependency to v42/v53.
  • Replace gxhash with foldhash for better compatibility on some hardwares.
  • Other minor improvement & fixes.

What's Changed

New Contributors

Full Changelog: v4.0.0...v4.0.1

v4.0.0

10 Oct 06:16

Choose a tag to compare

New features

  • supports spark3.0/3.1/3.2/3.3/3.4/3.5.
  • supports integrating with Apache Celeborn.
  • supports native ORC input format.
  • supports bloom filter join introduced in spark 3.5.
  • supports forceShuffledHashJoin for running tpch/tpcds benchmarks.
  • new supported native expression/functions: year, month, day, md5.

Bug fixes

  • add missing UDTF.terminate() invokes.
  • fix NPE while executing some native spark physical plans.

Performance

  • use custom implemented hash table for faster joining, supporting SIMD, bulk searching, memory prefetching, etc.
  • improve shuffle write performance.
  • reuse FSDataInputStream for same input file.

What's Changed

New Contributors

Full Changelog: v3.0.1...v4.0.0

v3.0.1

23 Jul 13:48
6f27604

Choose a tag to compare

blaze-v3.0.1

Features

  • Supports spark3.0/3.2/3.3.

Performance

fix GetJsonObject conversion, supporting faster get_json_object with sonic-rs.

Bugfix

  • fix childOrderingRequiredTag computation logic.

v3.0.0 [yanked]

01 Jul 07:46

Choose a tag to compare

blaze-v3.0.0 [yanked]

Features

  • Supports using spark.io.compression.codec for shuffle/broadcast compression
  • Supports date type casting
  • Refactor join implementations to support existence joins and BHJ building hash map on driver side

Performance

  • Fixed performance issues when running on spark3 with default configurations
  • Use cached parquet metadata
  • Refactor native broadcast to avoid duplicated broadcast jobs
  • Supports spark333 batch shuffle reading

Bugfix

  • Fix in_list conversion in from_proto.rs

v2.0.9.1

11 May 03:49
4180741

Choose a tag to compare

release version 2.0.9.1 (#470)

Co-authored-by: zhangli20 <zhangli20@kuaishou.com>

v2.0.9

11 Apr 08:49
3fc6838

Choose a tag to compare

v2.0.9

v2.0.8

02 Feb 12:40
1aecd9c

Choose a tag to compare

v2.0.8

v2.0.7

09 Nov 11:01
224697d

Choose a tag to compare

update blaze version 2.0.7-SNAPSHOT (#312)

Co-authored-by: zhangli20 <zhangli20@kuaishou.com>