Skip to content

Comments

Add ChunkedSeq: chunk-based immutable sequence#795

Open
najuna-brian wants to merge 2 commits intotypelevel:masterfrom
najuna-brian:add-chunked-seq
Open

Add ChunkedSeq: chunk-based immutable sequence#795
najuna-brian wants to merge 2 commits intotypelevel:masterfrom
najuna-brian:add-chunked-seq

Conversation

@najuna-brian
Copy link

@najuna-brian najuna-brian commented Feb 10, 2026

Description

Summary

This PR adds ChunkedSeq[+A], an immutable sequence backed by a balanced tree of contiguous array chunks. It fills a gap in the collections ecosystem by combining O(1) concatenation (like Chain) with cache-friendly O(N) iteration (like Vector) and O(log N) indexed access.
Addresses #634

Motivation

Existing immutable sequences force a trade-off:

Structure Prepend Append Concat Index Iteration cache locality
List O(1) O(N) O(N) O(N) Poor (pointer chasing)
Vector O(~1) O(~1) O(N) O(~1) Good
Chain O(1) O(1) O(1) O(N) Poor (unbalanced tree)
ChunkedSeq O(1) O(1) O(1) O(log N) Good (array chunks)

ChunkedSeq is useful when you need to build a sequence cheaply via prepend/append/concat (e.g. collecting results from multiple sources) and then consume it via iteration or indexed lookup.

Design

Internally, ChunkedSeq is a sealed abstract class with three node types:

  • EmptyNode: the singleton empty sequence
  • Chunk: a contiguous Array[Any] slice holding up to 32 elements
  • Concat: a binary tree node pairing two subtrees with a cached total size

Elements live in bounded-size array chunks at the leaves. The balanced tree spine keeps structural operations at O(log N), while sequential iteration visits contiguous memory within each chunk for good cache behavior. All operations are stack-safe.

Complexity

Operation Complexity Notes
:: / :+ / ++ O(1) Creates a Concat node; no copying
uncons / unsnoc O(log N) amortized Iterative, stack-safe
get / getUnsafe O(log N) @tailrec tree walk
take / drop / updated O(log N) Shares unaffected subtrees
foldLeft / toIterator / map O(N) Explicit stack; visits chunks contiguously
size / isEmpty O(1) Cached at each node

Cats typeclass instances

Provided with proper implicit priority via ChunkedSeqInstances0 / ChunkedSeqInstances1:

  • Monad, Alternative, Traverse, CoflatMap, FunctorFilter
  • Eq / PartialOrder / Order (when the element type supports it)
  • Monoid, Show

Implementation patterns

Follows existing conventions from TreeList and HashMap:

  • sealed abstract class with private[collections] internal subtypes
  • isInstanceOf / asInstanceOf over pattern matching for performance
  • @tailrec and explicit stacks for stack safety
  • Array[Any] + System.arraycopy for cache-friendly internals
  • Structure-preserving returns (filter returns this when nothing is removed)

What's included

File Lines Purpose
core/.../ChunkedSeq.scala 982 Core data structure, companion object, typeclass instances
scalacheck/.../ArbitraryChunkedSeq.scala 35 Arbitrary and Cogen instances for property-based testing
tests/.../ChunkedSeqSuite.scala 311 158 tests — law checking + property-based homomorphism tests
bench/.../ChunkedSeqBench.scala 149 JMH benchmarks vs List and Vector
docs/chunkedseq.md 137 Documentation page with scala mdoc examples
docs/directory.conf +1 Navigation entry for the new page

Testing

158 tests, all passing across Scala 2.12, 2.13, and 3.

  • Law tests via checkAll: MonadTests, AlternativeTests, TraverseTests, CoflatMapTests, FunctorFilterTests, OrderTests, MonoidTests
  • Property-based homomorphism tests against List: every core operation (map, flatMap, filter, foldLeft, take, drop, get, uncons, unsnoc, reverse, updated, etc.) is verified to produce the same results as the equivalent List operation
  • Stack safety tests: sequences of 10,000+ elements built via both fromList and repeated prepend

Verification

  • Compiles on Scala 2.12, 2.13, and 3
  • scalafmt clean
  • MiMa passes (new additions only)
  • docs/mdoc compiles with 0 errors
  • validateJVM passes — 944 total tests (all existing + ChunkedSeq)

ChunkedSeq is an immutable sequence backed by a balanced tree of
array chunks, offering:

- O(1) amortized prepend/append
- O(log n) indexed access, take, drop, splitAt, updated
- O(n) iteration via chunk-aware iterators
- O(1) uncons/unsnoc
- O(1) size (cached)
- O(log n) concatenation

The internal representation stores elements in contiguous Array[Any]
chunks (default size 32) at the leaves of a balanced binary tree.
This provides good cache locality during iteration while maintaining
persistent data structure semantics.

Includes:
- Full Cats typeclass instances (Monad, Alternative, Traverse,
  CoflatMap, FunctorFilter, Eq/Order, Monoid, Show)
- ScalaCheck Arbitrary/Cogen instances
- Comprehensive law-based and property-based test suite (158 tests)
- JMH benchmarks comparing against List and Vector

Addresses typelevel#634
@najuna-brian najuna-brian marked this pull request as ready for review February 10, 2026 11:44
@gemelen
Copy link
Collaborator

gemelen commented Feb 10, 2026

@najuna-brian I only had a quick look yet, so it's a question on the changeset technicalities - did you copy the copyright headers as is from other sources here?

@najuna-brian
Copy link
Author

Yes @gemelen
I copied the headers from the existing source files in typelevel (e.g. TreeList.scala, HashMap.scala).
I did this to match the MIT license declared in build.sbt and make them auto-verified by the headerCheckAll sbt task.
Could there be something I need to have put into consideration?

Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this on!

private[collections] def concatSafe[A](left: ChunkedSeq[A], right: ChunkedSeq[A]): ChunkedSeq[A] =
if (left.isEmpty) right
else if (right.isEmpty) left
else new Concat(left, right, left.size + right.size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it is the same as cats.data.Chain. What is the difference?

The idea I had in mind here was something like:

enum ChunkedSeq[A] {
  case Empty
  case NonEmpty[A](items: Array[Any], idx: Int, tail: ChunkedSeq[A])
}

(maybe just Array[Any], tail, but benchmarks will show if allocating the same size array every time is a win).

So, to push on, we copy the items with one more on the front. So allocate one longer, put the new item at 0, and copy the rest starting at 1. That's until the items gets too long, in which case we allocate a new NonEmpty wrapper.

That was the idea at least. This will not give you a fast append, but it would be an O(1) prepend.

The theory for why this would work is each block of items gives you cache locality, so less pointer indirection. I would expect maybe a max size of something like 8 to 16 would be a good tradeoff between cache locality and the cost to allocate each time. But I'm just guessing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @johnynek!
True! the current Concat tree structure is essentially Chain with array leaves.
I'll rework the implementation to match your original vision: a linked list of fixed-size array chunks (NonEmpty(items, idx, tail)) focused on O(1) prepend and cache-friendly iteration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for the chunk size, should I start with 16 and let benchmarks guide the final value?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure....
On prepend semantics: when the current array is full, we allocate a new NonEmpty node with a fresh array containing just the new element. When it's not full, we copy the array with the new element at the front?
I am gettig it well?

while (i < n) { cs = cs :+ i; i += 1 }
bh.consume(cs)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you post the results of running this benchmark?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the benchmark results for the current (tree-of-chunks) implementation:

Environment: Linux, JDK 21.0.10 (OpenJDK), JMH 1.37, 3 warmup / 5 measurement iterations, 1 fork

Benchmark                                   (n)   Mode  Cnt        Score        Error  Units
ChunkedSeqBench.prependList                  100  thrpt    5  1404269.542 ± 319406.301  ops/s
ChunkedSeqBench.prependChunkedSeq            100  thrpt    5   260304.083 ±  99441.675  ops/s
ChunkedSeqBench.prependList                 1000  thrpt    5    92515.375 ±  16912.586  ops/s
ChunkedSeqBench.prependChunkedSeq           1000  thrpt    5    26580.504 ±   4959.982  ops/s
ChunkedSeqBench.prependList                10000  thrpt    5     9157.360 ±   1389.371  ops/s
ChunkedSeqBench.prependChunkedSeq          10000  thrpt    5     2683.471 ±    658.784  ops/s

ChunkedSeqBench.appendVector                 100  thrpt    5   106538.875 ±  30318.740  ops/s
ChunkedSeqBench.appendChunkedSeq             100  thrpt    5   222856.981 ±  56341.714  ops/s
ChunkedSeqBench.appendVector                1000  thrpt    5    10952.448 ±   1249.280  ops/s
ChunkedSeqBench.appendChunkedSeq            1000  thrpt    5    18305.411 ±   3957.131  ops/s
ChunkedSeqBench.appendVector               10000  thrpt    5     1269.500 ±    522.176  ops/s
ChunkedSeqBench.appendChunkedSeq           10000  thrpt    5     2053.364 ±    545.453  ops/s

ChunkedSeqBench.sumList                      100  thrpt    5  2607832.084 ± 269022.770  ops/s
ChunkedSeqBench.sumChunkedSeq                100  thrpt    5  1123287.645 ± 132675.176  ops/s
ChunkedSeqBench.sumChunkedSeqFromPrepend     100  thrpt    5   284501.765 ±  38095.094  ops/s
ChunkedSeqBench.sumList                     1000  thrpt    5   254294.620 ±  40244.658  ops/s
ChunkedSeqBench.sumChunkedSeq               1000  thrpt    5   116728.143 ±   9345.607  ops/s
ChunkedSeqBench.sumChunkedSeqFromPrepend    1000  thrpt    5    27174.670 ±   4925.705  ops/s
ChunkedSeqBench.sumList                    10000  thrpt    5    24305.200 ±   2782.436  ops/s
ChunkedSeqBench.sumChunkedSeq              10000  thrpt    5    10597.345 ±   2107.648  ops/s
ChunkedSeqBench.sumChunkedSeqFromPrepend   10000  thrpt    5     2671.329 ±    905.274  ops/s

ChunkedSeqBench.randomAccessVector           100  thrpt    5   435440.887 ±  35263.835  ops/s
ChunkedSeqBench.randomAccessChunkedSeq       100  thrpt    5   224730.551 ±  38789.073  ops/s
ChunkedSeqBench.randomAccessList             100  thrpt    5    94615.869 ±  16056.867  ops/s
ChunkedSeqBench.randomAccessVector          1000  thrpt    5   420804.935 ±  33169.638  ops/s
ChunkedSeqBench.randomAccessChunkedSeq      1000  thrpt    5   157360.262 ±  32886.195  ops/s
ChunkedSeqBench.randomAccessList            1000  thrpt    5     7131.094 ±   1797.918  ops/s
ChunkedSeqBench.randomAccessVector         10000  thrpt    5   355877.005 ±  37412.779  ops/s
ChunkedSeqBench.randomAccessChunkedSeq     10000  thrpt    5   105170.238 ±  16941.927  ops/s
ChunkedSeqBench.randomAccessList           10000  thrpt    5      664.834 ±     61.015  ops/s

Thanks @johnynek , the current Concat-tree design doesn't beat List on prepend or iteration 😞 , which is the whole point. The tree overhead negates the cache locality benefit (especially visible in sumChunkedSeqFromPrepend where the tree built by repeated :: is 9× slower than List's sum).

I'll please rework the implementation to the linked-list-of-arrays design you described and re-run these benchmarks.

var rights: ChunkedSeq[A] = ChunkedSeq.empty
var result: (A, ChunkedSeq[A]) = null
while (result eq null) {
if (current.isInstanceOf[ChunkedSeq.Chunk[_]]) {
Copy link

@sageserpent-open sageserpent-open Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a maintainer, but am on the mail thread that led to this PR, hence my drive-by commenting here...

(Feel free to show me the door if I shouldn't be butting in. 😁)

Why are there so many uses of .asInstance[<subclass of ChunkedSeq>] throughout this code in cascaded if-else statements?

Wouldn't it be more natural and readable to use pattern matching?

Speaking of pattern matching, would it be easier to make Chunk, Concat into into case classes and EmptyNode into a case object?

What was the motivation for using the old Java-style approach?

Was this code automatically translated from another language or generated / worked on by an LLM, or was there some specific reason for using this style?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants