Conversation
| } | ||
|
|
||
| // Worker encountered upstream API rate limiting. Client may resubmit request after some time. | ||
| class ServiceOverLoadError extends ClientError { |
There was a problem hiding this comment.
We have a similar error in api-process already (look for TooManyRequestsError). Could we merge those two classes into one instead, and refactor a bit where possible?
There was a problem hiding this comment.
I based the ServiceOverLoad reason name on the design documents attached to the ASSETS-8997, though there are places where the error is listed as "TooManyRequests/ServiceOverLoad", and it wasn't clear if there was ambiguity over which error name to use throughout, or if it was intentional to have both. I personally see value in supporting both errors, since TooManyRequests is more readily associated with an HTTP 429 response originating from the Asset Compute Service itself, with the ability to provide a retry-after directive for the client, while ServiceOverLoad would represent a more general error type that Asset Compute can throw asynchronously when it encounters throttling from upstream/3rd-party services (such as when a worker receives a 429 Too Many Requests HTTP response).
If the AEM client receives either error, the proper behavior is to retry the original after some time has passed, but with TooManyRequests, the client may be given an explicit Retry-After, whereas with ServiceOverLoad it's basically
Retry-After: 🤷
I kind of had a hybrid approach in mind where we could support both of these error types in AEM for rendition_failed events just in case, and define both types in asset-compute-commons, along with making the semantic distinction more clearly defined along the lines I described above. Would that work?
There was a problem hiding this comment.
It would work, but I'm not sure if having two different errors names initially was intentional or not. @pheenomenon probably can clarify if the two different errors where intended or are just "synonyms" (talking about current design, not what we'll have eventually).
There was a problem hiding this comment.
Use of "TooManyRequests/ServiceOverLoad" was not meant to be the same. It was used so to only express the idea.
I have seen, our downstream services could get overloaded for a variety of reasons and return 500 instead of 429. So I like the idea of keeping it flexible as ServiceOverload instead of TooManyRequestsError.
To the question if TooManyRequestsError (we use in api-process for Nui throttling) should be converged to ServiceOverload - we can take that route if we want, but that won't have an API dependency with AEM and won't bring a huge advantage. So hybrid approach sounds good to me too.
There was a problem hiding this comment.
In that case (which is also what confused me): Although 500 is generic, 503 should be ServiceOverload then (503 usually means server is busy - but our services don't use it yet as far as I know). Otherwise it could be confusing for developers using our APIs: Why do they get an Overload error when there is a 500 (which could be anything, since it's generic)?
There was a problem hiding this comment.
(Could we maybe still move the TooManyRequests exception here too, while at it, @adamcin? If it doesn't throw you off-track?)
tmathern
left a comment
There was a problem hiding this comment.
Approving, with the note that ServiceOverload should be reserved for HTTP code 503, and not generic.
jdelbick
left a comment
There was a problem hiding this comment.
LGTM, please additionally update the readme with the new error type and description:
https://github.com/adobe/asset-compute-commons#custom-errors
jira: https://jira.corp.adobe.com/browse/ASSETS-8997
downstream dependent @adobe/asset-compute-sdk PR#182
This change adds a new
rendition_failedreason (ServiceOverLoad) for use by asset compute workers that encounter upstream API rate limiting and need to indicate to downstream clients that a resubmission of the original asset compute request is necessary after some time has passed.Also defined is a
ServiceOverLoadErrorType, which extendsClientErrorrather thanGenericError, because it is defined in the spirit of HTTP 4xx (429, specifically).