Skip to content

e2e: add Fleet integration tests#2154

Closed
anmazzotti wants to merge 1 commit intomainfrom
add_fleet_integration_tests
Closed

e2e: add Fleet integration tests#2154
anmazzotti wants to merge 1 commit intomainfrom
add_fleet_integration_tests

Conversation

@anmazzotti
Copy link
Contributor

@anmazzotti anmazzotti commented Feb 24, 2026

What this PR does / why we need it:

This PR adds validation for the Fleet integration.
This is a 1:1 port of CAAPF _test-import-all: https://github.com/rancher/cluster-api-addon-provider-fleet/blob/release-0.14/justfile#L229

The reason this is an opt-in feature is that it's not relevant for all users, like in the integration-suite-example

Test run: https://github.com/rancher/turtles/actions/runs/22659509280/job/65676505647

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1529

Special notes for your reviewer:

Checklist:

  • squashed commits into logical changes
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

@anmazzotti anmazzotti self-assigned this Feb 24, 2026
@anmazzotti anmazzotti force-pushed the add_fleet_integration_tests branch from d14bae2 to 7a22e1d Compare February 24, 2026 09:14
@anmazzotti anmazzotti added kind/enhancement Categorizes issue or PR as related to a new feature. needs-area and removed needs-area needs-kind labels Feb 24, 2026
@anmazzotti anmazzotti force-pushed the add_fleet_integration_tests branch 10 times, most recently from 6c7b012 to ac01d3b Compare February 26, 2026 14:27
@anmazzotti anmazzotti force-pushed the add_fleet_integration_tests branch 3 times, most recently from fc2162c to 4adc038 Compare February 27, 2026 08:30
@anmazzotti
Copy link
Contributor Author

anmazzotti commented Feb 27, 2026

I found out that on GKE the fleet-agent fails to initialize correctly since the metrics port 8080 is already bound (also see the reserved port list).

{"level":"error","ts":"2026-02-26T15:32:50Z","logger":"setup","msg":"problem running manager","error":"failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use","stacktrace":"github.com/rancher/fleet/internal/cmd/agent.start\n\t/home/runner/_work/fleet/fleet/internal/cmd/agent/operator.go:181\ngithub.com/rancher/fleet/internal/cmd/agent.(*FleetAgent).Run.func1\n\t/home/runner/_work/fleet/fleet/internal/cmd/agent/root.go:143"}
{"level":"error","ts":"2026-02-26T15:32:50Z","logger":"setup","msg":"failed to start agent","error":"failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use","stacktrace":"github.com/rancher/fleet/internal/cmd/agent.(*FleetAgent).Run.func1\n\t/home/runner/_work/fleet/fleet/internal/cmd/agent/root.go:144"}
{"level":"error","ts":"2026-02-26T15:32:50Z","logger":"controller-runtime.source.Kind","msg":"failed to get informer from cache","error":"Timeout: failed waiting for *v1alpha1.BundleDeployment Informer to sync","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.4/pkg/internal/source/kind.go:80\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.35.0/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.35.0/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.35.0/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.4/pkg/internal/source/kind.go:68"}
{"level":"error","ts":"2026-02-26T15:33:05Z","logger":"clusterstatus","msg":"failed to report initial cluster status","cluster":"cluster-gke-pq8v6b","interval":900,"error":"client rate limiter Wait returned an error: context canceled","stacktrace":"github.com/rancher/fleet/internal/cmd/agent/clusterstatus.Ticker.func1\n\t/home/runner/_work/fleet/fleet/internal/cmd/agent/clusterstatus/ticker.go:42"}

This can be configured using fleet-agent environment variables, however there are two issues:

  1. The FleetAddonConfig is embedded in the rancher-turtles-providers chart (and previously it was embedded in the turtles one)
  2. CAAPF does not allow changing the configuration per Cluster (see Allow different agent configurations per Cluster cluster-api-addon-provider-fleet#428)

So I see no other way than changing this for all Clusters and for all rancher-turtles-providers chart users.
This is an opinionated choice, however since we also use the hostNetwork setting, trying to bind to 18080 and 18081 is probably safer in most cases.

This however has the consequence of rolling out the fleet-agent on already provisioned Clusters to bind to the newly set ports, which is surely going to be an unexpected change for current users.

@anmazzotti anmazzotti force-pushed the add_fleet_integration_tests branch 4 times, most recently from b714005 to 07f7de1 Compare February 27, 2026 16:32
@anmazzotti anmazzotti force-pushed the add_fleet_integration_tests branch 3 times, most recently from 1f4065a to 3de9617 Compare March 4, 2026 11:42
@anmazzotti anmazzotti marked this pull request as ready for review March 4, 2026 11:45
@anmazzotti anmazzotti requested a review from a team as a code owner March 4, 2026 11:45
@anmazzotti anmazzotti moved this to PR to be reviewed in CAPI / Turtles Mar 4, 2026
@anmazzotti anmazzotti force-pushed the add_fleet_integration_tests branch from 3de9617 to 572ea67 Compare March 5, 2026 08:07
Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>
@anmazzotti anmazzotti force-pushed the add_fleet_integration_tests branch from 572ea67 to be6af13 Compare March 9, 2026 08:55
Copy link
Contributor

@salasberryfin salasberryfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anmazzotti.

With #2176 all Turtles clusters will have a corresponding Fleet cluster. Does this mean that the opt-in can be removed and this check can be applied generally?

Changes look good to me and this is not a blocker.

@anmazzotti
Copy link
Contributor Author

Thanks @anmazzotti.

With #2176 all Turtles clusters will have a corresponding Fleet cluster. Does this mean that the opt-in can be removed and this check can be applied generally?

Changes look good to me and this is not a blocker.

No. This is testing CAAPF integration mostly.
#2176 is not going to work the same, won't create fleetworkspaces, bundlemappings, clustergroups, and so on.
It's a different feature set.

Copy link
Contributor

@salasberryfin salasberryfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anmazzotti.

In this case I think it would be more accurate if we moved (only) the Fleet cluster check out of the CAAPF-specific validation, as this behavior will no longer be exclusive of CAAPF. This will also help when non-CAAPF tests are added in the coming weeks/months.

What do you think?

@anmazzotti
Copy link
Contributor Author

I don't know whether this PR is compatible with #2176.
Can be closed if no longer needed.

@salasberryfin
Copy link
Contributor

I think this is a valid check. CAAPF is not going away anytime soon and it's good that we can improve test coverage and reduce flakiness.

However, during the time when CAAPF co-exists with stand-alone Turtles, we'll have to progressively add tests that do not rely on the add-on provider. The simplest approach for this is probably to have a separate management cluster that re-uses the existing import gitops. In these cases the creation of the Fleet cluster is still a valid verification.

My suggestion is that we just re-structure the checks to make it future-proof for tests that do not use CAAPF but will co-exist with it.

@anmazzotti anmazzotti closed this Mar 10, 2026
@github-project-automation github-project-automation bot moved this from PR to be reviewed to Done in CAPI / Turtles Mar 10, 2026
@anmazzotti anmazzotti deleted the add_fleet_integration_tests branch March 13, 2026 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci kind/enhancement Categorizes issue or PR as related to a new feature.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[e2e] Add Fleet validation checks

2 participants