Conversation
Add MaintenanceSpec to PeerDBClusterSpec for configuring PeerDB's native maintenance mode during upgrades. This includes: - MaintenanceSpec type with image, backoffLimit, and resources fields - Two new UpgradePhase constants: StartMaintenance and EndMaintenance - ConditionMaintenanceMode condition type for tracking maintenance state - Reason constants: MaintenanceStarting, Active, Ending, Complete, Failed - Regenerated deepcopy methods and CRD manifests Amp-Thread-ID: https://ampcode.com/threads/T-019ccbea-b6d3-7583-8ac6-4f8a88c21dbd Co-authored-by: Amp <amp@ampcode.com>
Add BuildStartMaintenanceJob and BuildEndMaintenanceJob that create Kubernetes Jobs using the ghcr.io/peerdb-io/flow-maintenance image. The Jobs run PeerDB's maintenance entrypoint with 'start' or 'end' subcommands to trigger the StartMaintenance/EndMaintenance Temporal workflows. Jobs inherit catalog connection config via the shared ConfigMap and password secret, following the same pattern as init jobs. Amp-Thread-ID: https://ampcode.com/threads/T-019ccbea-b6d3-7583-8ac6-4f8a88c21dbd Co-authored-by: Amp <amp@ampcode.com>
f9060f2 to
b0334dd
Compare
Insert StartMaintenance and EndMaintenance phases into the upgrade lifecycle. When spec.maintenance is configured: Waiting → StartMaintenance → Config → InitJobs → FlowAPI → Server → UI → EndMaintenance → Complete The new upgradePhaseMaintenanceJob method handles Job creation, completion polling, and failure retry (delete + recreate) for both phases. When spec.maintenance is nil, the upgrade flow is unchanged. Key behaviors: - StartMaintenance Job pauses mirrors before any component restarts - EndMaintenance Job resumes mirrors after all components are upgraded - Failed maintenance Jobs are auto-deleted for retry with Degraded condition - MaintenanceMode condition tracks whether mirrors are paused Amp-Thread-ID: https://ampcode.com/threads/T-019ccbea-b6d3-7583-8ac6-4f8a88c21dbd Co-authored-by: Amp <amp@ampcode.com>
b0334dd to
0f9906f
Compare
Add four test cases covering the maintenance mode upgrade lifecycle: - Job creation during upgrade when maintenance is configured - Advancing past StartMaintenance when job completes successfully - Skipping maintenance phases when spec.maintenance is not set - Failed maintenance job deletion, retry, and Degraded condition Amp-Thread-ID: https://ampcode.com/threads/T-019ccbea-b6d3-7583-8ac6-4f8a88c21dbd Co-authored-by: Amp <amp@ampcode.com>
Add an e2e test that verifies the maintenance mode integration in a real cluster: - Creates a PeerDBCluster with spec.maintenance configured - Patches the version to trigger an upgrade - Verifies the start maintenance Job is created with correct command - Verifies the upgrade status shows StartMaintenance phase - Verifies ownerReferences point to PeerDBCluster for GC Amp-Thread-ID: https://ampcode.com/threads/T-019ccbea-b6d3-7583-8ac6-4f8a88c21dbd Co-authored-by: Amp <amp@ampcode.com>
Update documentation across all relevant files: - README: add Maintenance Mode Integration to features list - API reference: add MaintenanceSpec type, MaintenanceMode condition, StartMaintenance/EndMaintenance upgrade phases - Architecture: add Maintenance Jobs to diagram and reconciliation strategy, add maintenance_jobs.go to project structure - Safe upgrade runbook: add Maintenance Mode section with YAML examples, update upgrade order and phases table Amp-Thread-ID: https://ampcode.com/threads/T-019ccbea-b6d3-7583-8ac6-4f8a88c21dbd Co-authored-by: Amp <amp@ampcode.com>
0f9906f to
bbedb60
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Integrates PeerDB's native maintenance mode (PeerDB-io/peerdb#2211) into the operator's upgrade state machine. When
spec.maintenanceis configured on aPeerDBCluster, the operator gracefully pauses all running mirrors before upgrading components and resumes them after.Upgrade flow with maintenance mode
When
spec.maintenanceis not set, the upgrade flow remains unchanged (fully backwards compatible).Changes
API (
api/v1alpha1/)MaintenanceSpectype withimage,backoffLimit, andresourcesfieldsspec.maintenancefield onPeerDBClusterSpecStartMaintenanceandEndMaintenanceupgrade phasesMaintenanceModecondition type and associated reason constantsResource builders (
internal/resources/)maintenance_jobs.go:BuildStartMaintenanceJob/BuildEndMaintenanceJob— creates K8s Jobs usingghcr.io/peerdb-io/flow-maintenance:stable-{version}withstart/endsubcommandsController (
internal/controller/)upgradePhaseMaintenanceJob— generic handler for both maintenance phases: creates Job, polls completion, deletes and retries on failureupgradePhaseWaitingroutes toStartMaintenancewhen maintenance is configuredUpgradePhaseUIroutes toEndMaintenance(instead ofComplete) when maintenance is configuredDocumentation
MaintenanceSpec,MaintenanceModecondition, new upgrade phasesTesting
Unit tests (run locally)
go test ./...Four new test cases verify:
StartMaintenancewhen Job completesspec.maintenanceis not setDegradedconditionE2E tests (requires Kind cluster)
New test creates a cluster with
spec.maintenance: {}, patches version, and verifies:StartMaintenancephaseManual verification
PeerDBClusterwithspec.maintenance: {}:kubectl patch peerdbcluster <name> --type merge -p '{"spec":{"version":"v0.36.8"}}'kubectl get job— maintenance start job createdkubectl get peerdbcluster <name> -o jsonpath="{.status.upgrade}"— phases progress throughStartMaintenance → ... → EndMaintenance → Completekubectl get peerdbcluster <name> -o jsonpath="{.status.conditions}"—MaintenanceModecondition togglesRelated
StartMaintenance/EndMaintenanceTemporal workflows and theflow-maintenancecontainer imageghcr.io/peerdb-io/flow-maintenance— runs via a dedicated Temporal task queue, ensuring compatibility across version upgrades