Skip to content

Fork registry & network tracking — historical fork visibility for project health #64

@djdarcy

Description

@djdarcy

Fork registry & network tracking — historical fork visibility for project health

Problem

GitHub's Forks API shows a point-in-time snapshot of current public forks, but forks can disappear — deleted, made private, or transferred. The forks_count on repo metadata includes all forks (public, private, and nested), but the enumerated forks list only returns what's publicly visible. There's no built-in way to see:

  • Who forked your project historically (including forks that no longer exist)
  • When a fork was created vs when it disappeared
  • How many forks are hidden (private or deleted) vs visible

For open-source maintainers, fork visibility is a basic project health signal. Knowing your project's reach — who's building on it, which forks are actively maintained, which ones disappeared — helps with community engagement, downstream collaboration, and license compliance awareness.

What GitHub exposes vs what's missing

Data point Available? Source
Current public forks (owner, created_at, pushed_at) Yes GET /repos/{owner}/{repo}/forks
Total fork count (including private/nested) Yes repo.forks_count in repo metadata
Fork creation events (username + timestamp) Yes, ~90 day window Events API ForkEvent
Fork deletion events No — no event fired Not available
Identity of private fork owners No — only count gap visible N/A
Clone identity (who ran git clone) No — architecturally anonymous Traffic API gives aggregate counts only

Key insight: The gap between forks_count and the enumerated forks list reveals hidden forks. And by polling the forks list daily, disappearances can be detected (fork present yesterday, absent today).

Proposed solution

Add a fork registry to GTT's data collection — a historical record of the fork network built from daily snapshots.

Phase 1: Fork list archiving (workflow change)

Add to the daily workflow:

// Fetch current forks list
const forksResp = await github.rest.repos.listForks({
  owner, repo, sort: 'newest', per_page: 100
});

// Store daily snapshot
state.forkRegistry = state.forkRegistry || [];
const today = new Date().toISOString().split('T')[0];
const currentForks = forksResp.data.map(f => ({
  owner: f.owner.login,
  repo: f.full_name,
  created: f.created_at,
  lastPush: f.pushed_at,
  stars: f.stargazers_count
}));

// Detect new forks (in today's list but not in registry)
// Detect disappeared forks (in registry but not in today's list)

Store in state.json:

{
  "forkRegistry": [
    {
      "owner": "user123",
      "repo": "user123/project-fork",
      "firstSeen": "2026-02-15",
      "lastSeen": "2026-03-01",
      "created": "2026-02-15T10:00:00Z",
      "status": "active"
    },
    {
      "owner": "user456",
      "repo": "user456/project-fork",
      "firstSeen": "2026-02-20",
      "lastSeen": "2026-02-25",
      "created": "2026-02-20T14:00:00Z",
      "status": "disappeared"
    }
  ],
  "forkSummary": {
    "totalSeen": 14,
    "currentlyVisible": 11,
    "disappeared": 3,
    "hiddenEstimate": 2
  }
}

Phase 2: Dashboard display

Add to the Community tab:

  • Fork registry table — all historically-seen forks with status (active / disappeared / new), last push date, star count
  • Fork count deltaforks_count vs visible forks, showing the hidden fork gap
  • Timeline markers — when forks appeared and disappeared on the Community Trends chart
  • Fork activity indicators — which forks are actively maintained (recent pushes) vs dormant

Phase 3: Fork events (stretch)

Poll the Events API for ForkEvent to capture fork creation timestamps even for forks that are deleted before the next daily snapshot. Events have ~90-day retention, so continuous polling captures what the forks list misses.

Design considerations

  • Privacy-conscious framing: The fork registry shows publicly available information (the forks list is public). It adds historical memory, not new surveillance. The "disappeared" status is factual (fork was listed, now isn't), not accusatory.
  • API rate limits: The forks endpoint returns up to 100 per page. For repos with 100+ forks, pagination is needed. GTT already runs within Actions' generous rate limits.
  • Schema version: This would be part of a schema v4 or v5 bump alongside Daily referrer delta tracking — per-day counts from rolling-window snapshots #46/Daily popular paths delta tracking — per-day page views from rolling-window snapshots #47 (referrer/path deltas).
  • Stargazer registry: The same approach works for stargazers — polling /stargazers with timestamps to detect un-stars. Could be a companion feature.
  • hiddenEstimate: Calculated as forks_count - len(enumerated_forks). This is an estimate because forks_count includes forks-of-forks in the network.

Acceptance criteria

  • Workflow fetches forks list daily and stores in forkRegistry[] in state.json
  • New forks detected and timestamped with firstSeen
  • Disappeared forks detected (in registry but not in current list) with lastSeen
  • Fork count gap (forks_count vs visible) surfaced in dashboard
  • Dashboard Community tab shows fork registry table with status indicators
  • Historical fork data accumulates across runs (append-only registry)
  • Stretch: Events API polling for ForkEvent captures transient forks

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    analyticsDerived metrics, insights, and conversion ratiosdashboardDashboard UI and visualizationenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions