Skip to content

fix: crash when parsing heredocs with identifiers >= 256 chars#287

Open
look wants to merge 1 commit intotree-sitter:masterfrom
look:look/fix-heredoc-crash
Open

fix: crash when parsing heredocs with identifiers >= 256 chars#287
look wants to merge 1 commit intotree-sitter:masterfrom
look:look/fix-heredoc-crash

Conversation

@look
Copy link

@look look commented Feb 10, 2026

I identified a crash in GitHub's symbol extraction system related to tree-sitter-ruby. Investigation revealed it was caused by very long HEREDOC identifiers (these are legal in Ruby, I checked...).

The heredoc word length was serialized as a single byte, which silently truncated identifiers of >= 256 characters. The deserializer would then read the wrong length, leaving unread bytes in the buffer, and hit the assert(size == length) check.

To fix, I adopted the approach used in tree-sitter-bash and tree-sitter-php: store the HEREDOC identifier length in a uint32_t.

This PR includes a reproduction test. If you run tree-sitter test on master with test/corpus/literals.txt you will get:

Assertion failed: (size == length), function deserialize, file scanner.c, line 160.
Abort trap: 6              tree-sitter test

On this branch, it is fixed.

Disclaimer: I used GitHub Copilot to help diagnose and fix this bug.

The heredoc word length was serialized as a single byte, which silently
truncated identifiers of >= 256 characters. The deserializer would then
read the wrong length, leaving unread bytes in the buffer, and hit the
assert(size == length) check.
for (uint32_t i = 0; i < scanner->open_heredocs.size; i++) {
Heredoc *heredoc = array_get(&scanner->open_heredocs, i);
if (size + 2 + heredoc->word.size >= TREE_SITTER_SERIALIZATION_BUFFER_SIZE) {
if (size + 3 + sizeof(uint32_t) + heredoc->word.size >= TREE_SITTER_SERIALIZATION_BUFFER_SIZE) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 2 is incorrect in the old code: we write 3 bools below, not 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant