Skip to content

feat(httpproxy): add external CA support with security hardening#8

Open
dedene wants to merge 2 commits intomainfrom
feat/external-ca-support
Open

feat(httpproxy): add external CA support with security hardening#8
dedene wants to merge 2 commits intomainfrom
feat/external-ca-support

Conversation

@dedene
Copy link
Owner

@dedene dedene commented Feb 19, 2026

Summary

Add first-class support for externally managed CA certificates (cert-manager, k8s secrets) with hot-reload capability.

  • Custom cert_file/key_file config options for cert-manager compatibility (tls.crt/tls.key)
  • External mode (external: true): fail-fast if CA missing, relaxed key permissions for k8s secret mounts
  • File watcher via fsnotify for hot-reload on cert rotation (handles k8s ..data symlink pattern)
  • Thread-safe certificate access with RWMutex

Security hardening

  • Path traversal prevention: Validate cert_file/key_file are simple filenames (no ../, /, etc.)
  • Expiry validation: Reject expired or not-yet-valid certificates at startup and on reload
  • Audit trail: Log when permission checks are relaxed in external mode
  • Log injection prevention: Sanitize paths in log messages (strip control chars)

Example config

http_proxy:
  ca:
    path: /etc/claw/ca
    cert_file: tls.crt          # cert-manager naming
    key_file: tls.key
    external: true              # fail if missing, relax perms, enable watcher

Test plan

  • Unit tests for all new functionality (path traversal, expiry, sanitization)
  • Watcher tests for hot-reload behavior
  • All tests pass with -race flag
  • Manual test with cert-manager in k8s (TODO)

Add first-class support for externally managed CA certificates (cert-manager,
k8s secrets) with hot-reload capability via fsnotify.

Features:
- Custom cert_file/key_file config (default: ca.crt/ca.key)
- External mode: fail-fast if CA missing, relaxed key permissions for k8s
- File watcher for hot-reload on cert rotation (handles k8s ..data symlink)
- Thread-safe certificate access with RWMutex

Security fixes:
- Path traversal prevention in cert_file/key_file validation
- Certificate expiry validation (rejects expired/not-yet-valid certs)
- Audit logging when permission checks are relaxed
- Log injection prevention via path sanitization
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for externally managed CA certificates (e.g., from cert-manager or Kubernetes secrets) with hot-reload capability using fsnotify. The implementation includes comprehensive security hardening through path traversal prevention, certificate expiry validation, and log injection sanitization.

Changes:

  • Added external, cert_file, and key_file configuration options to CAConfig for cert-manager compatibility
  • Implemented file watcher with fsnotify to detect and reload certificates on rotation (handles k8s ..data symlink pattern)
  • Added certificate validation (CA flag, expiry checks) and security hardening (path traversal prevention, log sanitization)

Reviewed changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
internal/config/config.go Added CAConfig fields (cert_file, key_file, external) and validation logic for path traversal prevention
internal/config/config_test.go Comprehensive tests for new CA configuration options and filename validation
internal/httpproxy/ca.go Core implementation: external mode, file watcher, hot-reload, certificate validation, and thread-safe access
internal/httpproxy/ca_test.go Extensive test coverage for external mode, watcher functionality, security validation, and edge cases
internal/httpproxy/proxy.go Integrated watcher lifecycle management (start on proxy start, stop on proxy stop)
internal/httpproxy/security_test.go Minor formatting fix (removed trailing space in comment)
internal/credentials/parser.go Code formatting improvement (consistent alignment)
internal/config/interpolate.go Removed trailing newline
go.mod, go.sum Added fsnotify v1.9.0 dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +400 to +414
func (m *CAManager) reloadCA() error {
cert, err := m.loadCA()
if err != nil {
return err
}

if err := m.validateCA(cert); err != nil {
return fmt.Errorf("validation failed: %w", err)
}

m.certMu.Lock()
m.cert = cert
m.certMu.Unlock()

return nil
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reloadCA method updates the in-memory certificate but does not update the goproxy global state (goproxy.GoproxyCa, goproxy.OkConnect, etc.). This means that when the CA certificate is rotated by cert-manager, the proxy will continue using the old certificate for MITM, defeating the purpose of hot-reload.

The goproxy library uses these global variables and TLS configs that are set once during setupMITM in proxy.go. When the certificate is reloaded here, those global configurations remain unchanged. This will cause the proxy to reject new HTTPS connections with certificate errors after rotation.

Consider one of these approaches:

  1. Add a callback mechanism so the Proxy can be notified when the CA is reloaded and can call setupMITM again
  2. Use goproxy's GetCertificate callback in TLS config instead of setting a static certificate
  3. Periodically check and update the goproxy state in the watch loop

Copilot uses AI. Check for mistakes.
return nil, fmt.Errorf("CA key has insecure permissions %04o (want 0600 or stricter)", perm)
}
} else {
log.Printf("[INFO] External CA mode: key permission check relaxed (k8s compat)")
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log message will be printed every time loadCA is called in external mode, including during hot-reload events. This could lead to log spam if the certificate is rotated frequently or if loadCA is called multiple times for other reasons.

Consider logging this message only once, perhaps during initialization (in EnsureCA when first loading, or in StartWatcher), or use a debug-level log here since it's expected behavior that's already communicated at startup.

Copilot uses AI. Check for mistakes.
Comment on lines +475 to +479
// Wait for watcher to pick up change
time.Sleep(200 * time.Millisecond)

// Check if cert was reloaded
cert2 := mgr.Certificate()
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses a fixed 200ms sleep to wait for the file watcher to detect changes. This can make the test flaky on slow systems or under load. While file watching typically happens quickly, there's no guarantee of timing.

Consider using a polling approach with a timeout (e.g., retry for up to 2 seconds, checking every 50ms) to make the test more robust:

// Wait for watcher to pick up change (with timeout)
deadline := time.Now().Add(2 * time.Second)
var cert2 *tls.Certificate
for time.Now().Before(deadline) {
    cert2 = mgr.Certificate()
    if cert2 != nil {
        x509Cert2, _ := x509.ParseCertificate(cert2.Certificate[0])
        if origSerial.Cmp(x509Cert2.SerialNumber) != 0 {
            break // Certificate changed
        }
    }
    time.Sleep(50 * time.Millisecond)
}
Suggested change
// Wait for watcher to pick up change
time.Sleep(200 * time.Millisecond)
// Check if cert was reloaded
cert2 := mgr.Certificate()
// Wait for watcher to pick up change (with timeout)
deadline := time.Now().Add(2 * time.Second)
var cert2 *tls.Certificate
for time.Now().Before(deadline) {
cert2 = mgr.Certificate()
if cert2 != nil {
x509Cert2, _ := x509.ParseCertificate(cert2.Certificate[0])
if origSerial.Cmp(x509Cert2.SerialNumber) != 0 {
break // Certificate changed
}
}
time.Sleep(50 * time.Millisecond)
}
// Check if cert was reloaded

Copilot uses AI. Check for mistakes.
m.watcher = watcher
m.stopCh = make(chan struct{})
m.watcherInit = true
m.stopOnce = sync.Once{} // Reset for potential restart
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resetting sync.Once by assignment (m.stopOnce = sync.Once{}) is problematic. sync.Once cannot be safely reused after it has been triggered - it's designed to ensure a function is executed exactly once for the lifetime of the Once instance.

If StartWatcher is called again after StopWatcher, this reset allows StopWatcher to be called again, but this pattern is fragile. If there are any concurrent calls to StopWatcher during the reset, you could have race conditions.

Consider either:

  1. Not supporting watcher restart (document that StartWatcher should only be called once)
  2. Create a new CAManager instance if restart is needed
  3. Use a different synchronization primitive if you need restartable behavior (e.g., atomic bool with manual reset)

Copilot uses AI. Check for mistakes.
m.watcher = nil
}
m.stopCh = nil
m.watcherInit = false
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting watcherInit to false at the end of StopWatcher creates a potential race condition. If another goroutine calls StartWatcher while StopWatcher is running (after stopOnce.Do completes but while still inside the function), watcherInit could be checked as false, allowing StartWatcher to proceed, then immediately set to false by the completing StopWatcher.

Consider protecting watcherInit with a mutex, or restructuring the code so that watcherInit is set inside the stopOnce.Do closure before modifying other state. This ensures the flag is consistent with the actual watcher state.

Copilot uses AI. Check for mistakes.
Comment on lines +302 to +328
func (m *CAManager) StartWatcher() error {
if !m.external {
return nil
}

watcher, err := fsnotify.NewWatcher()
if err != nil {
return fmt.Errorf("create watcher: %w", err)
}

// Watch the directory containing the cert (handles k8s secret updates)
dir := filepath.Dir(m.certPath)
if err := watcher.Add(dir); err != nil {
watcher.Close()
return fmt.Errorf("watch directory %s: %w", sanitizePath(dir), err)
}

m.watcher = watcher
m.stopCh = make(chan struct{})
m.watcherInit = true
m.stopOnce = sync.Once{} // Reset for potential restart

m.watcherWg.Add(1)
go m.watchLoop()

log.Printf("[INFO] CA file watcher started for %s", sanitizePath(dir))
return nil
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If StartWatcher is called multiple times concurrently, or called while a previous watcher is still running, this could lead to resource leaks and race conditions. The watcherInit check is not protected by a mutex, so two goroutines could both see watcherInit as false/true and proceed simultaneously.

The second call would overwrite m.watcher, m.stopCh without closing the previous watcher, causing:

  1. File descriptor leak (previous watcher not closed)
  2. Goroutine leak (previous watchLoop goroutine won't stop cleanly)
  3. Race on watcherWg counter

Consider adding mutex protection around the entire StartWatcher logic, or add an explicit check and return an error if a watcher is already running.

Copilot uses AI. Check for mistakes.
Comment on lines 90 to 96
// CAConfig holds CA certificate configuration for MITM proxy.
type CAConfig struct {
Path string `yaml:"path"`
CertFile string `yaml:"cert_file"` // default "ca.crt"
KeyFile string `yaml:"key_file"` // default "ca.key"
External bool `yaml:"external"` // external management mode (cert-manager, etc.)
ValidityDays int `yaml:"validity_days"`
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new external CA functionality (cert_file, key_file, external fields) is not documented in docs/CONFIG.md. Users need documentation to understand:

  1. What the external mode is for (cert-manager, k8s secrets integration)
  2. How to configure custom filenames
  3. How hot-reload works
  4. The differences in behavior (permission checks relaxed, fail-fast on missing files)

Consider adding a section in CONFIG.md explaining these new options with examples, especially the cert-manager use case mentioned in the PR description.

Suggested change
// CAConfig holds CA certificate configuration for MITM proxy.
type CAConfig struct {
Path string `yaml:"path"`
CertFile string `yaml:"cert_file"` // default "ca.crt"
KeyFile string `yaml:"key_file"` // default "ca.key"
External bool `yaml:"external"` // external management mode (cert-manager, etc.)
ValidityDays int `yaml:"validity_days"`
// CAConfig holds CA certificate configuration for the HTTP MITM proxy.
//
// The proxy supports two modes of operation:
//
// 1. Managed / internal CA (External == false, default):
// - The daemon owns the CA private key and certificate.
// - If the CA files are missing, it will generate a new CA and write
// them to disk under Path/CertFile and Path/KeyFile.
// - Strict file-permission checks are enforced to avoid leaking the
// private key (e.g. mode 0600 on the key file).
// - ValidityDays and Organization are used when creating a new CA.
//
// 2. External CA (External == true):
// - CA material is managed by an external system such as cert-manager
// or a Kubernetes Secret/ConfigMap mounted into the container.
// - The daemon does NOT create, rotate, or modify the CA files; it only
// reads them from disk.
// - CertFile and KeyFile can be customized to match the filenames used
// by the external system (for example, cert-manager's secret keys).
// - On startup, the daemon fails fast if the configured files are
// missing or unreadable, rather than silently generating a new CA.
// - Permission checks on the key file are relaxed to accommodate
// typical Kubernetes Secret mounts.
// - Changes to the CA files on disk are picked up via hot-reload, so
// external rotation (e.g. by cert-manager) does not require a restart.
//
// Path is the directory containing the CA certificate and key. CertFile and
// KeyFile are the filenames within that directory. If CertFile or KeyFile
// are empty in the configuration, they default to "ca.crt" and "ca.key",
// respectively.
type CAConfig struct {
// Path is the directory where the CA certificate and key are stored.
// In a Kubernetes deployment, this is typically the mountPath of a
// Secret or ConfigMap containing the CA material.
Path string `yaml:"path"`
// CertFile is the filename (relative to Path) of the CA certificate.
// Default: "ca.crt".
CertFile string `yaml:"cert_file"`
// KeyFile is the filename (relative to Path) of the CA private key.
// Default: "ca.key".
KeyFile string `yaml:"key_file"`
// External enables external management mode for the CA.
// When true, the daemon will not create or rotate the CA; instead, it
// expects the certificate and key to be written by an external system
// (such as cert-manager) and will hot-reload changes on disk.
External bool `yaml:"external"`
// ValidityDays controls the lifetime of a newly generated CA certificate
// when running in managed / internal CA mode. It is ignored when External
// is true and the CA is provided by an external system.
ValidityDays int `yaml:"validity_days"`
// Organization is the subject organization used when creating a new CA
// in managed / internal mode. It is not modified in external mode.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments