Add ChunkedSeq: chunk-based immutable sequence#795
Add ChunkedSeq: chunk-based immutable sequence#795najuna-brian wants to merge 2 commits intotypelevel:masterfrom
Conversation
ChunkedSeq is an immutable sequence backed by a balanced tree of array chunks, offering: - O(1) amortized prepend/append - O(log n) indexed access, take, drop, splitAt, updated - O(n) iteration via chunk-aware iterators - O(1) uncons/unsnoc - O(1) size (cached) - O(log n) concatenation The internal representation stores elements in contiguous Array[Any] chunks (default size 32) at the leaves of a balanced binary tree. This provides good cache locality during iteration while maintaining persistent data structure semantics. Includes: - Full Cats typeclass instances (Monad, Alternative, Traverse, CoflatMap, FunctorFilter, Eq/Order, Monoid, Show) - ScalaCheck Arbitrary/Cogen instances - Comprehensive law-based and property-based test suite (158 tests) - JMH benchmarks comparing against List and Vector Addresses typelevel#634
3aa621a to
6e453bf
Compare
|
@najuna-brian I only had a quick look yet, so it's a question on the changeset technicalities - did you copy the copyright headers as is from other sources here? |
|
Yes @gemelen |
johnynek
left a comment
There was a problem hiding this comment.
Thank you for taking this on!
| private[collections] def concatSafe[A](left: ChunkedSeq[A], right: ChunkedSeq[A]): ChunkedSeq[A] = | ||
| if (left.isEmpty) right | ||
| else if (right.isEmpty) left | ||
| else new Concat(left, right, left.size + right.size) |
There was a problem hiding this comment.
this looks like it is the same as cats.data.Chain. What is the difference?
The idea I had in mind here was something like:
enum ChunkedSeq[A] {
case Empty
case NonEmpty[A](items: Array[Any], idx: Int, tail: ChunkedSeq[A])
}(maybe just Array[Any], tail, but benchmarks will show if allocating the same size array every time is a win).
So, to push on, we copy the items with one more on the front. So allocate one longer, put the new item at 0, and copy the rest starting at 1. That's until the items gets too long, in which case we allocate a new NonEmpty wrapper.
That was the idea at least. This will not give you a fast append, but it would be an O(1) prepend.
The theory for why this would work is each block of items gives you cache locality, so less pointer indirection. I would expect maybe a max size of something like 8 to 16 would be a good tradeoff between cache locality and the cost to allocate each time. But I'm just guessing.
There was a problem hiding this comment.
Thank you @johnynek!
True! the current Concat tree structure is essentially Chain with array leaves.
I'll rework the implementation to match your original vision: a linked list of fixed-size array chunks (NonEmpty(items, idx, tail)) focused on O(1) prepend and cache-friendly iteration.
There was a problem hiding this comment.
And for the chunk size, should I start with 16 and let benchmarks guide the final value?
There was a problem hiding this comment.
To be sure....
On prepend semantics: when the current array is full, we allocate a new NonEmpty node with a fresh array containing just the new element. When it's not full, we copy the array with the new element at the front?
I am gettig it well?
| while (i < n) { cs = cs :+ i; i += 1 } | ||
| bh.consume(cs) | ||
| } | ||
| } |
There was a problem hiding this comment.
can you post the results of running this benchmark?
There was a problem hiding this comment.
Here are the benchmark results for the current (tree-of-chunks) implementation:
Environment: Linux, JDK 21.0.10 (OpenJDK), JMH 1.37, 3 warmup / 5 measurement iterations, 1 fork
Benchmark (n) Mode Cnt Score Error Units
ChunkedSeqBench.prependList 100 thrpt 5 1404269.542 ± 319406.301 ops/s
ChunkedSeqBench.prependChunkedSeq 100 thrpt 5 260304.083 ± 99441.675 ops/s
ChunkedSeqBench.prependList 1000 thrpt 5 92515.375 ± 16912.586 ops/s
ChunkedSeqBench.prependChunkedSeq 1000 thrpt 5 26580.504 ± 4959.982 ops/s
ChunkedSeqBench.prependList 10000 thrpt 5 9157.360 ± 1389.371 ops/s
ChunkedSeqBench.prependChunkedSeq 10000 thrpt 5 2683.471 ± 658.784 ops/s
ChunkedSeqBench.appendVector 100 thrpt 5 106538.875 ± 30318.740 ops/s
ChunkedSeqBench.appendChunkedSeq 100 thrpt 5 222856.981 ± 56341.714 ops/s
ChunkedSeqBench.appendVector 1000 thrpt 5 10952.448 ± 1249.280 ops/s
ChunkedSeqBench.appendChunkedSeq 1000 thrpt 5 18305.411 ± 3957.131 ops/s
ChunkedSeqBench.appendVector 10000 thrpt 5 1269.500 ± 522.176 ops/s
ChunkedSeqBench.appendChunkedSeq 10000 thrpt 5 2053.364 ± 545.453 ops/s
ChunkedSeqBench.sumList 100 thrpt 5 2607832.084 ± 269022.770 ops/s
ChunkedSeqBench.sumChunkedSeq 100 thrpt 5 1123287.645 ± 132675.176 ops/s
ChunkedSeqBench.sumChunkedSeqFromPrepend 100 thrpt 5 284501.765 ± 38095.094 ops/s
ChunkedSeqBench.sumList 1000 thrpt 5 254294.620 ± 40244.658 ops/s
ChunkedSeqBench.sumChunkedSeq 1000 thrpt 5 116728.143 ± 9345.607 ops/s
ChunkedSeqBench.sumChunkedSeqFromPrepend 1000 thrpt 5 27174.670 ± 4925.705 ops/s
ChunkedSeqBench.sumList 10000 thrpt 5 24305.200 ± 2782.436 ops/s
ChunkedSeqBench.sumChunkedSeq 10000 thrpt 5 10597.345 ± 2107.648 ops/s
ChunkedSeqBench.sumChunkedSeqFromPrepend 10000 thrpt 5 2671.329 ± 905.274 ops/s
ChunkedSeqBench.randomAccessVector 100 thrpt 5 435440.887 ± 35263.835 ops/s
ChunkedSeqBench.randomAccessChunkedSeq 100 thrpt 5 224730.551 ± 38789.073 ops/s
ChunkedSeqBench.randomAccessList 100 thrpt 5 94615.869 ± 16056.867 ops/s
ChunkedSeqBench.randomAccessVector 1000 thrpt 5 420804.935 ± 33169.638 ops/s
ChunkedSeqBench.randomAccessChunkedSeq 1000 thrpt 5 157360.262 ± 32886.195 ops/s
ChunkedSeqBench.randomAccessList 1000 thrpt 5 7131.094 ± 1797.918 ops/s
ChunkedSeqBench.randomAccessVector 10000 thrpt 5 355877.005 ± 37412.779 ops/s
ChunkedSeqBench.randomAccessChunkedSeq 10000 thrpt 5 105170.238 ± 16941.927 ops/s
ChunkedSeqBench.randomAccessList 10000 thrpt 5 664.834 ± 61.015 ops/s
Thanks @johnynek , the current Concat-tree design doesn't beat List on prepend or iteration 😞 , which is the whole point. The tree overhead negates the cache locality benefit (especially visible in sumChunkedSeqFromPrepend where the tree built by repeated :: is 9× slower than List's sum).
I'll please rework the implementation to the linked-list-of-arrays design you described and re-run these benchmarks.
| var rights: ChunkedSeq[A] = ChunkedSeq.empty | ||
| var result: (A, ChunkedSeq[A]) = null | ||
| while (result eq null) { | ||
| if (current.isInstanceOf[ChunkedSeq.Chunk[_]]) { |
There was a problem hiding this comment.
I'm not a maintainer, but am on the mail thread that led to this PR, hence my drive-by commenting here...
(Feel free to show me the door if I shouldn't be butting in. 😁)
Why are there so many uses of .asInstance[<subclass of ChunkedSeq>] throughout this code in cascaded if-else statements?
Wouldn't it be more natural and readable to use pattern matching?
Speaking of pattern matching, would it be easier to make Chunk, Concat into into case classes and EmptyNode into a case object?
What was the motivation for using the old Java-style approach?
Was this code automatically translated from another language or generated / worked on by an LLM, or was there some specific reason for using this style?
Description
Summary
This PR adds
ChunkedSeq[+A], an immutable sequence backed by a balanced tree of contiguous array chunks. It fills a gap in the collections ecosystem by combining O(1) concatenation (likeChain) with cache-friendly O(N) iteration (likeVector) and O(log N) indexed access.Addresses #634
Motivation
Existing immutable sequences force a trade-off:
ListVectorChainChunkedSeqChunkedSeqis useful when you need to build a sequence cheaply via prepend/append/concat (e.g. collecting results from multiple sources) and then consume it via iteration or indexed lookup.Design
Internally,
ChunkedSeqis asealed abstract classwith three node types:EmptyNode: the singleton empty sequenceChunk: a contiguousArray[Any]slice holding up to 32 elementsConcat: a binary tree node pairing two subtrees with a cached total sizeElements live in bounded-size array chunks at the leaves. The balanced tree spine keeps structural operations at O(log N), while sequential iteration visits contiguous memory within each chunk for good cache behavior. All operations are stack-safe.
Complexity
::/:+/++Concatnode; no copyinguncons/unsnocget/getUnsafe@tailrectree walktake/drop/updatedfoldLeft/toIterator/mapsize/isEmptyCats typeclass instances
Provided with proper implicit priority via
ChunkedSeqInstances0/ChunkedSeqInstances1:Monad,Alternative,Traverse,CoflatMap,FunctorFilterEq/PartialOrder/Order(when the element type supports it)Monoid,ShowImplementation patterns
Follows existing conventions from
TreeListandHashMap:sealed abstract classwithprivate[collections]internal subtypesisInstanceOf/asInstanceOfover pattern matching for performance@tailrecand explicit stacks for stack safetyArray[Any]+System.arraycopyfor cache-friendly internalsfilterreturnsthiswhen nothing is removed)What's included
core/.../ChunkedSeq.scalascalacheck/.../ArbitraryChunkedSeq.scalaArbitraryandCogeninstances for property-based testingtests/.../ChunkedSeqSuite.scalabench/.../ChunkedSeqBench.scalaListandVectordocs/chunkedseq.mdscala mdocexamplesdocs/directory.confTesting
158 tests, all passing across Scala 2.12, 2.13, and 3.
checkAll:MonadTests,AlternativeTests,TraverseTests,CoflatMapTests,FunctorFilterTests,OrderTests,MonoidTestsList: every core operation (map,flatMap,filter,foldLeft,take,drop,get,uncons,unsnoc,reverse,updated, etc.) is verified to produce the same results as the equivalentListoperationfromListand repeated prependVerification
scalafmtcleandocs/mdoccompiles with 0 errorsvalidateJVMpasses — 944 total tests (all existing + ChunkedSeq)