Skip to content

Conversation

@derrickstolee
Copy link

This is also going upstream as part of gitgitgadget#2027.

However, I found out about this segfault due to a reproducible error happening in the 1JS monorepo's PR build pipelines. After trying to uncover a repro by adding tracing to my own fork and using that version within the pipeline, I was finally able to get a reproducer in a 1JS Codespace. Debugging with a locally-build version of Git helped me find the situation to fix this.

1JS has a workaround for this issue by running git status before the necessary build step that runs the git diff command. That refresh of the index prevents the freed diff queue items, preventing the issue. This reduces the urgency somewhat, but I also don't know where else this could be impacting users.

  • This is an early version of work already under review upstream.

I'm recommending this version on top of the vfs-2.52.0 branch so we can get this fixed more quickly for an upcoming release of microsoft/git.

When computing a diff in a partial clone, there is a chance that we
could trigger a prefetch of missing objects at the same time as we are
freeing entries from the global diff queue. This is difficult to
reproduce, as we need to have some objects be freed from the queue
before triggering the prefetch of missing objects. There is a new test
in t4067 that does trigger the segmentation fault that results in this
case.

The fix is to set the queue pointer to NULL after it is freed, and then
to be careful about NULL values in the prefetch.

The more elaborate explanation is that within diffcore_std(), we may
skip the initial prefetch due to the output format (--name-only in the
test) and go straight to diffcore_skip_stat_unmatch(). In that method,
the index entries that have been invalidated by path changes show up as
entries but may be deleted because they are not actually content diffs
and only newer timestamps than expected. As those entries are deleted,
later entries are checked with diff_filespec_check_stat_unmatch(), which
uses diff_queued_diff_prefetch() as the missing_object_cb in its diff
options. That can trigger downloading missing objects if the appropriate
scenario occurs to trigger a call to diff_popoulate_filespec(). It's
finally within that callback to diff_queued_diff_prefetch() that the
segfault occurs.

The test was hard to find because it required some real differences,
some not-different files that had a newer modified time, and the order
of those files alphabetically was important to trigger the deletion
before the prefetch was triggered.

I briefly considered a "lock" member for the diff queue, but it was a
much larger diff and introduced many more possible error scenarios.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant