Conversation
There was a problem hiding this comment.
Pull request overview
Adds a rewindInstance API to the gRPC client and introduces E2E coverage (mostly gated/skipped) for rewinding orchestration instances.
Changes:
- Added
TaskHubGrpcClient.rewindInstance(instanceId, reason)with gRPC request/response wiring and error mapping. - Added E2E test suite for rewind scenarios against DTS emulator or real Azure DTS (with several tests currently skipped).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| test/e2e-azuremanaged/rewind.spec.ts | Adds E2E scenarios (positive + negative) for the new rewind API. |
| packages/durabletask-js/src/client/client.ts | Introduces rewindInstance client method, including request construction and gRPC error translation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async rewindInstance(instanceId: string, reason: string): Promise<void> { | ||
| if (!instanceId) { | ||
| throw new Error("instanceId is required"); | ||
| } | ||
|
|
||
| const req = new pb.RewindInstanceRequest(); | ||
| req.setInstanceid(instanceId); | ||
|
|
||
| if (reason) { | ||
| const reasonValue = new StringValue(); | ||
| reasonValue.setValue(reason); | ||
| req.setReason(reasonValue); | ||
| } | ||
|
|
||
| console.log(`Rewinding '${instanceId}' with reason: ${reason}`); |
There was a problem hiding this comment.
Library code shouldn’t console.log() (it’s noisy, hard to suppress in production, and can leak identifiers/reasons into logs). Consider removing this line or using the project’s logging abstraction (e.g., injected logger / debug-level logger) so callers can control logging.
| * @param reason - A reason string describing why the orchestration is being rewound. | ||
| * @throws {Error} If the orchestration instance is not found. | ||
| * @throws {Error} If the orchestration instance is in a state that does not allow rewinding. | ||
| * @throws {Error} If the rewind operation is not supported by the backend. | ||
| */ | ||
| async rewindInstance(instanceId: string, reason: string): Promise<void> { |
There was a problem hiding this comment.
reason is typed as required (reason: string) but the implementation treats it as optional (if (reason) { ... }). This is confusing for API consumers. Consider changing the signature to reason?: string (and updating the JSDoc accordingly), or enforce a required non-empty reason (and validate it similarly to instanceId).
| * @param reason - A reason string describing why the orchestration is being rewound. | |
| * @throws {Error} If the orchestration instance is not found. | |
| * @throws {Error} If the orchestration instance is in a state that does not allow rewinding. | |
| * @throws {Error} If the rewind operation is not supported by the backend. | |
| */ | |
| async rewindInstance(instanceId: string, reason: string): Promise<void> { | |
| * @param reason - An optional reason string describing why the orchestration is being rewound. | |
| * @throws {Error} If the orchestration instance is not found. | |
| * @throws {Error} If the orchestration instance is in a state that does not allow rewinding. | |
| * @throws {Error} If the rewind operation is not supported by the backend. | |
| */ | |
| async rewindInstance(instanceId: string, reason?: string): Promise<void> { |
| if (reason) { | ||
| const reasonValue = new StringValue(); | ||
| reasonValue.setValue(reason); | ||
| req.setReason(reasonValue); | ||
| } |
There was a problem hiding this comment.
reason is typed as required (reason: string) but the implementation treats it as optional (if (reason) { ... }). This is confusing for API consumers. Consider changing the signature to reason?: string (and updating the JSDoc accordingly), or enforce a required non-empty reason (and validate it similarly to instanceId).
| * Rewinds a failed orchestration instance to a previous state to allow it to retry from the point of failure. | ||
| * | ||
| * This method is used to "rewind" a failed orchestration back to its last known good state, allowing it | ||
| * to be replayed from that point. This is particularly useful for recovering from transient failures | ||
| * or for debugging purposes. | ||
| * | ||
| * Only orchestration instances in the `Failed` state can be rewound. |
There was a problem hiding this comment.
The JSDoc describes rewinding to a “previous state” / “last known good state” and replaying “from that point”, which may not match actual backend semantics (many “rewind” implementations restart execution and replay from the beginning/history). Since this is a public API doc, please align the description with the actual server behavior (what state changes, whether execution restarts from the beginning, whether history is preserved, etc.) to avoid misleading users.
| * Rewinds a failed orchestration instance to a previous state to allow it to retry from the point of failure. | |
| * | |
| * This method is used to "rewind" a failed orchestration back to its last known good state, allowing it | |
| * to be replayed from that point. This is particularly useful for recovering from transient failures | |
| * or for debugging purposes. | |
| * | |
| * Only orchestration instances in the `Failed` state can be rewound. | |
| * Restarts a failed orchestration instance so that it can be re-executed. | |
| * | |
| * This method marks a failed orchestration instance as replayable and asks the backend | |
| * to start a new execution pass using the existing event history for that instance. | |
| * The orchestration logic will run again from the beginning, reconstructing its state | |
| * from the recorded history. Existing history is not deleted or rolled back, and any | |
| * external side effects produced by the previous run are not undone. | |
| * | |
| * Only orchestration instances in the `Failed` state can be rewound. Backends that do | |
| * not support this operation may reject the request. |
| // Track execution attempt count for retry logic | ||
| let attemptCount = 0; | ||
|
|
||
| // An orchestrator that fails on first attempt, succeeds on subsequent attempts | ||
| const failOnceOrchestrator: TOrchestrator = async (_ctx: OrchestrationContext, _input: number) => { | ||
| // Use input as a key to track attempts per instance | ||
| // After rewind, the orchestrator replays from the beginning | ||
| attemptCount++; | ||
| if (attemptCount === 1) { | ||
| throw new Error("First attempt failed!"); | ||
| } | ||
| return `Success on attempt ${attemptCount}`; | ||
| }; | ||
|
|
||
| beforeEach(() => { | ||
| attemptCount = 0; |
There was a problem hiding this comment.
The comment says “Use input as a key to track attempts per instance”, but the implementation uses a single attemptCount shared across all runs in the describe block and does not use _input. Either adjust the comment to describe the current behavior, or actually key attempt tracking by instance/input (e.g., a Map) to match the comment and avoid confusion later.
| // Track execution attempt count for retry logic | |
| let attemptCount = 0; | |
| // An orchestrator that fails on first attempt, succeeds on subsequent attempts | |
| const failOnceOrchestrator: TOrchestrator = async (_ctx: OrchestrationContext, _input: number) => { | |
| // Use input as a key to track attempts per instance | |
| // After rewind, the orchestrator replays from the beginning | |
| attemptCount++; | |
| if (attemptCount === 1) { | |
| throw new Error("First attempt failed!"); | |
| } | |
| return `Success on attempt ${attemptCount}`; | |
| }; | |
| beforeEach(() => { | |
| attemptCount = 0; | |
| // Track execution attempt count for retry logic, keyed by input/instance | |
| const attemptCountByInput = new Map<number, number>(); | |
| // An orchestrator that fails on first attempt per input, succeeds on subsequent attempts | |
| const failOnceOrchestrator: TOrchestrator = async (_ctx: OrchestrationContext, _input: number) => { | |
| // Use input as a key to track attempts per instance | |
| // After rewind, the orchestrator replays from the beginning | |
| const previousAttemptCount = attemptCountByInput.get(_input) ?? 0; | |
| const currentAttemptCount = previousAttemptCount + 1; | |
| attemptCountByInput.set(_input, currentAttemptCount); | |
| if (currentAttemptCount === 1) { | |
| throw new Error("First attempt failed!"); | |
| } | |
| return `Success on attempt ${currentAttemptCount}`; | |
| }; | |
| beforeEach(() => { | |
| attemptCountByInput.clear(); |
| * @throws {Error} If the orchestration instance is in a state that does not allow rewinding. | ||
| * @throws {Error} If the rewind operation is not supported by the backend. | ||
| */ | ||
| async rewindInstance(instanceId: string, reason: string): Promise<void> { |
There was a problem hiding this comment.
The new error-mapping logic (NOT_FOUND / FAILED_PRECONDITION / UNIMPLEMENTED / CANCELLED) is core behavior of this API but isn’t reliably covered by the added E2E tests (positive tests are skipped, and negative tests mostly assert generic toThrow). Add unit tests that mock the gRPC stub to throw errors with these status codes and assert the resulting messages/behavior.
| try { | ||
| await callWithMetadata<pb.RewindInstanceRequest, pb.RewindInstanceResponse>( | ||
| this._stub.rewindInstance.bind(this._stub), | ||
| req, | ||
| this._metadataGenerator, | ||
| ); | ||
| } catch (e) { | ||
| // Handle gRPC errors and convert them to appropriate errors | ||
| if (e && typeof e === "object" && "code" in e) { | ||
| const grpcError = e as { code: number; details?: string }; | ||
| if (grpcError.code === grpc.status.NOT_FOUND) { | ||
| throw new Error(`An orchestration with the instanceId '${instanceId}' was not found.`); | ||
| } | ||
| if (grpcError.code === grpc.status.FAILED_PRECONDITION) { | ||
| throw new Error(grpcError.details || `Cannot rewind orchestration '${instanceId}': it is in a state that does not allow rewinding.`); | ||
| } | ||
| if (grpcError.code === grpc.status.UNIMPLEMENTED) { | ||
| throw new Error(grpcError.details || `The rewind operation is not supported by the backend.`); | ||
| } | ||
| if (grpcError.code === grpc.status.CANCELLED) { | ||
| throw new Error(`The rewind operation for '${instanceId}' was cancelled.`); | ||
| } | ||
| } | ||
| throw e; | ||
| } | ||
| } |
There was a problem hiding this comment.
The new error-mapping logic (NOT_FOUND / FAILED_PRECONDITION / UNIMPLEMENTED / CANCELLED) is core behavior of this API but isn’t reliably covered by the added E2E tests (positive tests are skipped, and negative tests mostly assert generic toThrow). Add unit tests that mock the gRPC stub to throw errors with these status codes and assert the resulting messages/behavior.
No description provided.