Skip to content

Tool counts don't update after enabling/disabling servers (stale stats race condition) #285

@Dumbris

Description

@Dumbris

Description

After enabling or disabling an upstream server, the tool count displayed in the Web UI (Servers page and Dashboard) doesn't update for an extended period (10+ minutes). The REST API also returns stale statistics.

Expected behavior: Stats should update within 3-5 seconds after any server state change (enable/disable/add/remove).

Actual behavior: Tool counts remain stale until the user manually refreshes or waits an unpredictable amount of time.

Screenshot

Stale tool counts in Web UI

The screenshot shows 976 "Total Tools" even after a server was disabled - this count should decrease.

Root Cause Analysis

The issue is a race condition between asynchronous tool discovery and synchronous SSE event emission.

The Flow

When a server is enabled/disabled:

  1. T0: EnableServer() is called
  2. T1: emitServersChanged() event is sent immediately
  3. T2: Frontend receives SSE event, calls GET /api/v1/servers
  4. T3: REST API returns stale tool count from StateView
  5. T4: Background goroutine starts DiscoverAndIndexTools() (takes several seconds)
  6. T5: StateView is updated with correct tool counts
  7. T6: No second SSE event is emitted to notify frontend

Code Analysis

internal/runtime/lifecycle.go:867-913 - EnableServer():

func (r *Runtime) EnableServer(serverName string, enabled bool) error {
    // ... storage and config updates ...
    
    // Event emitted IMMEDIATELY
    r.emitServersChanged("enable_toggle", map[string]any{...})
    
    // Tool discovery runs AFTER event in background goroutine
    r.HandleUpstreamServerChange(r.AppContext())  // <-- ASYNC!
    
    return nil
}

internal/runtime/lifecycle.go:1158-1175 - HandleUpstreamServerChange():

func (r *Runtime) HandleUpstreamServerChange(ctx context.Context) {
    go func() {  // <-- BACKGROUND GOROUTINE - doesn't block
        if err := r.DiscoverAndIndexTools(ctx); err != nil {
            r.logger.Error("Failed to update tool index", zap.Error(err))
        }
        r.cleanupOrphanedIndexEntries()
        // NO EVENT EMITTED after discovery completes!
    }()
    
    // Event emitted before goroutine finishes
    r.emitServersChanged("upstream_change", map[string]any{"phase": phase})
}

Affected Components

Component File Issue
REST API internal/server/server.go:485 GetUpstreamStats() reads stale StateView
Runtime internal/runtime/lifecycle.go:867 Emits event before tool discovery
Runtime internal/runtime/lifecycle.go:1158 Tool discovery in background goroutine
StateView internal/runtime/stateview/stateview.go Not updated until discovery completes
Frontend frontend/src/stores/servers.ts No retry logic for stale data

Proposed Solution

Option A: Emit SSE event after tool discovery completes (Recommended)

func (r *Runtime) HandleUpstreamServerChange(ctx context.Context) {
    go func() {
        if err := r.DiscoverAndIndexTools(ctx); err != nil {
            r.logger.Error("Failed to update tool index", zap.Error(err))
        }
        r.cleanupOrphanedIndexEntries()
        
        // NEW: Emit event AFTER discovery completes
        r.emitServersChanged("tools_indexed", map[string]any{
            "reason": "tool_discovery_complete",
        })
    }()
}

Option B: Frontend retry with exponential backoff

The frontend could detect stale data and retry:

async function fetchServersWithRetry(maxRetries = 3) {
    const initialToolCount = totalTools.value
    
    for (let i = 0; i < maxRetries; i++) {
        await fetchServers(true)
        if (totalTools.value !== initialToolCount) {
            return // Stats updated
        }
        await sleep(1000 * (i + 1)) // Exponential backoff
    }
}

Option C: StateView includes "pending discovery" flag

Add a discoveryPending flag to ServerStatus so the UI can show a loading indicator until stats are finalized.

Acceptance Criteria

  • Tool counts update within 5 seconds after enabling a server
  • Tool counts update within 5 seconds after disabling a server
  • Dashboard Token Distribution updates after server changes
  • Web UI header (17/19 Servers, 976 Tools) updates after changes
  • REST API /api/v1/status returns current stats
  • REST API /api/v1/servers returns current tool counts per server

Reproduction Steps

  1. Start MCPProxy with multiple upstream servers configured
  2. Open Web UI, note the "Total Tools" count on Servers page
  3. Disable one of the servers with multiple tools
  4. Observe that "Total Tools" count doesn't change immediately
  5. Wait 10+ minutes or manually refresh browser - count finally updates

Environment

  • MCPProxy version: latest main branch
  • OS: macOS
  • Browser: Chrome/Safari

Labels

bug, web-ui, rest-api, sse, race-condition

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions