-
Notifications
You must be signed in to change notification settings - Fork 167
Description
The problem
Entire stores full AI session transcripts (prompts, responses, tool calls, file paths, commands) on the entire/checkpoints/v1 branch within the same git repository, which means transcripts inherit the repo's access model.
For open source projects, this is especially concerning because the repo is public, so the transcripts are too. Anyone who fetches the branch can read the complete AI conversation history, including the developer's reasoning, mistakes, internal context, and potentially sensitive information that slipped past redaction.
Even for private repos, transcripts become visible to every collaborator with read access, and the trust boundary for "who can see the code" is not the same as "who should see the raw AI session history."
Why a flag doesn't solve this
--skip-push-sessions exists, but it's not even the default, so sessions are pushed to the remote on every git push unless you explicitly opt out. Even with the flag, the transcripts are still committed to a local branch that can be inadvertently pushed, forked, or included in mirrors. The fundamental issue is that coupling transcript storage to the git repo means they will always travel with the code.
The sensitivity of AI transcripts
AI coding transcripts are uniquely sensitive because they can contain:
- Internal reasoning and architectural decision-making
- Partial secrets or credentials that slip past entropy-based redaction
- Context about proprietary systems shared in prompts
- Debugging discussions that reveal security weaknesses
- File paths and system information
These transcripts capture a complete record of how code was written, including the parts developers would never put in a commit message or PR description.
Alternative approaches
Projects like AgentLogs decouple transcript storage from the repository entirely, which seems like a more sound architecture for this kind of data because the transcripts don't inherit the repo's access model and can be managed with their own access controls.
Has the team considered an architecture where transcripts are stored outside the git repo (e.g., a local database, a self-hostable server, or an optional remote backend), rather than on a git branch?