Move ControllerModifyVolume to GA#588
Move ControllerModifyVolume to GA#588bswartz merged 1 commit intocontainer-storage-interface:masterfrom
Conversation
a6080db to
4f9da2b
Compare
|
Thought the build failure is related to protocolbuffers/protobuf#16163, but manually updated it, even make succeeded locally the presubmit did not pass. |
|
/retest |
4f9da2b to
66d1b6b
Compare
66d1b6b to
b64b6a6
Compare
|
+1 |
|
lgtm |
| } | ||
| message ControllerModifyVolumeRequest { | ||
| option (alpha_message) = true; | ||
|
|
There was a problem hiding this comment.
Before we promote this to GA, can we change mutable_parameters to OPTIONAL? And add words like:
When not specified, SPs MAY cancel previously requested in-progress modifications.
SPs SHOULD return success only when the volume is stable and no modifications on going.
This extends the use-case and can be a breaking change for SP. But the semantic should be clear and straightforward.
This can be useful for Kubernetes when reverting volumeAttributeClassName to empty, to ensure we've reached a stable state.
And if we add topology support in the future, passing empty mutable_parameters can be useful for CO to fetch the current topology of the volume without actually modify it, also useful in the reverting volumeAttributeClassName to empty case.
There was a problem hiding this comment.
Why do we want to do that? We made it explicitly as something we need for Modify. And if you need a cancelling operation, we should have a CancelOperation API instead?
There was a problem hiding this comment.
And if you need a cancelling operation, we should have a CancelOperation API instead?
The key point is not canceling, but waiting for a stable state, to avoid left a mess behind. For use-case like kubernetes/kubernetes#132106 . In the PR description:
rollback from an infeasible pvc.spec.VacName to no VAC
However, there is no concept like infeasible error in the CSI spec. I think the only way we can sure that the volume returns to a stable state is retry the RPC until it returns success.
There was a problem hiding this comment.
And if we want to support topology in the future (kubernetes/enhancements#5381), the current proposal is allowing setting new topology before modify success. So what if the topology has change and we want to rollback? We will need a way to fetch the current topology of the volume without actually modify it.
Of course we can validate the parameters before change topology. But there are still possibilities that the operation may fail half-way.
- We may change multiple aspect of the volume in the same RPC call, e.g. disk tags, disk type, performance level. Some of them may success, but others may still fail.
- The validation rules embedded in the CSI can be outdated.
- Modification can be slow. There maybe other operator modifying the same volume concurrently, invalidating the previously valid request.
Or should we just say: SPs should only return OK or Abort if any part of the modification succeeded? But I'm afraid that SP may also lost track of previously failed modification tasks. SP may also not able to distinct the partial or complete failure.
There was a problem hiding this comment.
However, there is no concept like infeasible error in the CSI spec. I think the only way we can sure that the volume returns to a stable state is retry the RPC until it returns success.
For most errors we do not allow going back to older state. We have pretty well defined semantic of what we consider infeasible in k8s+csi - https://github.com/kubernetes-csi/external-resizer/blob/master/pkg/util/util.go#L274 .
We do not have an API to cancel any of in-progress CSI operations and that has worked well enough.
There was a problem hiding this comment.
We're not supporting topology changes with this API, and we've never supported rollbacks in CSI.
Error recovery is to retry until it succeeds or the user requests a different thing (i.e. a subsequent VAC modification with new parameters).
Storage vendors that want to mutate allowed topology should prototype that feature out-of-tree, and work together with other storage vendors to standardize some common use cases, then propose a CSI spec change.
In today's SIG Storage meeting, there were concerns that the error message descriptions aren't clear enough for the CO to discern whether the SP had made partial progress or no progress (i.e. an "infeasible" error sitaution)
|
Please see #597 |
What type of PR is this?
Feature
What this PR does / why we need it:
This PR moves ControllerModifyVolume to GA.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce an API-breaking change?: