Speed up line index construction #802

reese · 2026-01-18T11:24:19Z

At the very beginning of each file, we have to construct a LineIndex to get the offsets of where each line starts, and we also look up which lines have Ruby vs. just comments/whitespace. Each pass we do is very expensive, since it requires us to iterate byte-by-byte over the entire source, and we currently unnecessarily do two passes instead of one.

This PR collapses those into a single pass. It also changes lines_with_ruby to be a sorted Vec instead of a BTreeSet. BTreeSet doesn't really make much sense here, since lookups are the same speed as a Vec + binary_search, but insertion is also O(log n). We could use a HashSet, but unless we're formatting 100k line Ruby files, the hashing overhead seems to outweigh the difference in lookup performance -- Vec is better for almost all normal usage.

This also pulls in the memchr crate, which has specialized instructions for doing character searches -- it seems to be a few times faster than a standard .iter() for our use case of searching for newline characters. memchr is already a transitive dependency for several crates, so this doesn't end up adding much to compilation time or anything.

Looking at profiling data, this was previously taking about 3% of total CPU time on average, and this PR cuts that at least in half on my laptop.

froydnj

I was looking at this last night and was going to write up a Prism patch that exposes the newline vector, which will probably make this faster? But until we have that information, this is an excellent fix. (Prism release schedules are also not conducive to getting Rust crate changes out quickly.)

reese · 2026-01-18T12:05:58Z

Yeah that would be an excellent addition to the Prism bindings -- it's always felt a little silly that we have to do this, so it'd be great to have that done during parsing for us.

reese added 2 commits January 18, 2026 10:47

Add memchr crate

9d4fedd

Speed up line index construction

cf3c08c

froydnj approved these changes Jan 18, 2026

View reviewed changes

reese merged commit 5290e84 into trunk Jan 18, 2026
8 checks passed

reese deleted the reese-comments-single-pass branch January 18, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up line index construction #802

Speed up line index construction #802

reese commented Jan 18, 2026

Uh oh!

froydnj left a comment

Uh oh!

reese commented Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Speed up line index construction #802

Speed up line index construction #802

Conversation

reese commented Jan 18, 2026

Uh oh!

froydnj left a comment

Choose a reason for hiding this comment

Uh oh!

reese commented Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants