OpenShift release blocker bug definition#1926
OpenShift release blocker bug definition#1926openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
Conversation
|
/assign @dgoodwin |
dgoodwin
left a comment
There was a problem hiding this comment.
Two small formatting issues, otherwise looks great and thank you for porting this Mike.
| * Most bugs that result in Data Loss, Service Unavailability, or Data Corruption are blockers. | ||
| * Bugs that cause failed installs and upgrades may be release blockers based on the scope of the failure. If the failure is limited to a specific form-factor/platform, consider having a conversation with the relevant stakeholders. More on this below. | ||
| * Bugs which cause the perception of a failed upgrade may be release blockers | ||
| * Most bugs which are a regression are blockers, such as regressions discovered by Layered Product Testing or regressions in functionality not otherwise detected by Component Readiness. |
There was a problem hiding this comment.
I'd pushed against this kind of thing internally while folks were talking in this space, but repeating my arguments here. I have no opinions on what folks think should be counted as a release blocker for an initial z-stream GA (e.g. 4.20.0). I do have opinions on what should be counted as a release blocker for a later patch release in an existing z-stream (e.g. 4.20.1). The current enhancement text includes:
The Release Blocker field in OCPBUGS has a different meaning for z-stream releases.
but then subsequent sections like this one where I'm commenting don't seem to be making that distinction?
Personally, unless a bug leads to every single type of cluster having catastrophic update-into-that-release failures, I don't think it should be a z-stream blocker. If there is even one type of cluster where the cluster can update into the exposed release and not be destroyed, then we should release and use conditional update risk declarations to inform the exposed clusters' admins of the risk. The exception would be using the property for extremely short-term wiggles in deployment flow (e.g. someone knows a fix for a hot bug is merged and just about to arrive in nightlies, sure, set the bug a release-blocker to let ART know to wait for that nightly). My reasoning is that significant delays for the next z-stream patch release impact every customer running older patches in that z-stream, and while the bug you're considering might be bad for some of them, it seems unlikely that anyone has a complete enough picture to be able to say that it's worth leaving all those other bugs unfixed while folks work on getting a fix together for the bug you're considering. I'd absolutely encourage folks looking at a serious bug to talk to whoever they need to talk to to get sufficient prioritization for the fix work. But I don't think we want them slowing down admin access to all the other fixes other folks have delivered, which are just waiting to ship.
There was a problem hiding this comment.
Are we agreed that disocovering the upcoming .z is breaking a feature (which a subset of customers are using, perhaps not every single one), compared to 4.y.0, would be considered a .z release blocker? In this case I would think we cannot break someone to deliver fixes to someone else who is already broken?
Otherwise your point makes sense to me, .z blockers should be exceedingly rarely used and never for minor, already in the wild issues. I wouldn't want to see us ever saying this build will break some customers but it's got to go out to fix some others.
There was a problem hiding this comment.
I'm going to LGTM but @wking if you wouldn't mind submitting a followup with the modifications you'd like we can continue discussion there.
There was a problem hiding this comment.
I think the difference here between .0 and .z is that in .0 we can control the timeline, if we defer a feature release oh well. Once it's GA however and we fix some CVE or some critical bug there are expectations that a customer receives that fix in a timely manner, we don't have as much freedom to delay delivering that fix due to a regression elsewhere that affects only a subset of use cases. That's why we err on the side of shipping but informing customers about the regression so they can decide on their own.
And yeah, I agree with using z-stram blocker bugs to assert some fix that made it in before the cutoff should be considered for potentially delaying the promotion by a reasonable period of time. I say this because it's not uncommon for there to be 12-24 hours from the time that something merges and when it's in a nightly let alone accepted. So when we say the cutoff is on Wednesday and an engineer lands the fix on Tuesday I'm going to tell ART/ERT/QE that we have to wait for the pipeline to catch up.
Currently the most recent 4.20 nightly is 17 hours and today is a cutoff. If we don't see a fresh build by EoD I may just pick something that's merged today and mark that bug as a z-stream blocker. TBH this lever hasn't actually been particularly successful in getting ART to delay promotion but it seems like the right lever to be pulling in these scenarios.
Lots of words, but I'm fine merging as is and coming back to clarify more about this point. I don't think it's common for people to think that a z-stream bug can be made a blocker.
There was a problem hiding this comment.
I've opened #1928 floating an attempt to address this.
b80d4a3 to
cbe2e13
Compare
|
@mffiedler: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
|
Need to remove WIP and may need to find an approver. |
|
Going to request some additional reviews from the program team, then will look for an approver. |
|
Sounds good, I reached out to Scott/Mrunal for approve if that helps. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR provides an updated definition of what is a Release Blocker bug in OpenShift. It provides guidance on release blocker categories, when to set the blocker flag in JIRA and an FAQ to address some common issues.