When you hit the “Place Order” button, that event triggers a series of steps, including inventory reservation, payment processing, and shipping initialization. Now, suppose your card is charged by the payment service (Stripe π€), but the API call to the third-party shipping service failed.
Modern systems don’t live inside a single database anymore. You can’t just rollback everything like a normal database transaction. In this era of distributed services, Saga patterns solve the problem of distributed rollback.
What is the Saga Pattern?
Saga is a sequence of steps carried out in a workflow. For each successful step, there exists a compensating step. As the saga progresses, these steps are stored in a list. Down the lane, if any step fails, all compensating steps are executed in reverse order.

Saga Pattern with AWS Durable Lambda Functions
Because of built-in checkpoints and replay mechanisms, AWS Lambda functions are perfect for implementing the saga pattern.
Each step of the saga can be wrapped in a durable step, allowing independent retry strategies. After each successful step, durable function checkpoints state; hence, on retry, already completed steps are not executed again.
To implement forward steps and compensation, all that we need is a try-catch block. The idea is simple: keep moving forward in the try block. If any step fails, compensating steps run in the catch block.
Prefer GitHub instead? https://gist.github.com/TrickSumo/61744e540d0e5953c3dd5fdc6d8a2338
import {
withDurableExecution,
DurableContext,
createRetryStrategy,
JitterStrategy,
} from '@aws/durable-execution-sdk-js';
import { randomUUID } from 'crypto';
//Types
interface OrderEvent {
orderId: string;
customerId: string;
items: Array<{ sku: string; qty: number; price: number }>;
paymentMethod: { type: string; token: string };
shippingAddress: { street: string; city: string; zip: string };
}
// Custom Error Classes
// Used by retryableErrorTypes - SDK checks instanceof, so real classes are needed
class NetworkError extends Error {
name = 'NetworkError';
}
class PaymentDeclinedError extends Error {
name = 'PaymentDeclinedError';
}
// Retry Strategies
/**
* Exponential backoff with jitter - for external API calls.
* Attempts: 1s β 2s β 4s β 8s β 16s (random jitter added to avoid thundering herd)
* Only retries NetworkError - other errors won't be retried.
*/
const apiRetryStrategy = createRetryStrategy({
maxAttempts: 5,
initialDelay: { seconds: 1 },
maxDelay: { seconds: 30 },
backoffRate: 2.0,
jitter: JitterStrategy.FULL,
retryableErrorTypes: [NetworkError], // β class constructor, not string
});
/**
* Custom retry for payment - retries network errors but NOT declined cards.
* Uses a function instead of createRetryStrategy for fine-grained control.
*/
const paymentRetryStrategy = (error: Error, attemptCount: number) => {
// retryableErrorTypes equivalent - never retry a declined card
if (error instanceof PaymentDeclinedError) {
return { shouldRetry: false };
}
// maxAttempts: 5 equivalent
if (attemptCount >= 5) {
return { shouldRetry: false };
}
// initialDelay(1s) + backoffRate(2.0) + maxDelay(30s) + jitter(FULL) equivalent
const baseDelay = 1 * Math.pow(2.0, attemptCount - 1); // exponential: 1s β 2s β 4s β 8s β 16s
const capped = Math.min(baseDelay, 30); // maxDelay: never exceed 30s
const jittered = Math.random() * capped; // FULL jitter: random between 0 and capped
const seconds = Math.max(1, Math.round(jittered)); // minimum 1s, rounded to whole second
return { shouldRetry: true, delay: { seconds } };
};
// Handler
export const handler = withDurableExecution(async (event: OrderEvent, context: DurableContext) => {
context.logger.info('Order processing started', { orderId: event.orderId });
// Saga compensations array tracks what to undo if something fails later
const compensations: Array<{ name: string; fn: () => Promise<void> }> = [];
try {
// ββ Step 1: Validate ββββββββββββββββββββββββββββββββββββββββββββββββββββ
// Throws a plain Error for invalid input, "shouldRetry: false" strategy means it fails immediately without retry
await context.step('validate-order', async () => {
context.logger.info('Validating order');
if (!event.orderId || !event.customerId) {
throw new Error('Order missing required fields');
}
if (!event.items || event.items.length === 0) {
throw new Error('Order has no items');
}
const total = event.items.reduce((sum, i) => sum + i.price * i.qty, 0);
if (total <= 0) {
throw new Error('Order total must be greater than zero');
}
return { valid: true, total };
},
{ retryStrategy: () => ({ shouldRetry: false }) }
);
// ββ Step 2: Reserve inventory βββββββββββββββββββββββββββββββββββββββββββ
// Retries with exponential backoff, inventory service may be temporarily busy
const reservation = await context.step(
'reserve-inventory',
async () => {
context.logger.info('Reserving inventory');
return await callInventoryService(event.orderId, event.items);
},
{ retryStrategy: apiRetryStrategy }
);
// Register compensation, if something fails later, cancel this reservation
compensations.push({
name: 'cancel-reservation',
fn: async () => { await callInventoryService(event.orderId, event.items, 'cancel'); },
});
context.logger.info('Inventory reserved', { reservationId: reservation.id });
// ββ Step 3: Generate idempotency key βββββββββββββββββββββββββββββββββββ
// Key is generated ONCE inside a step then checkpointed and same value returned on every replay.
// This is the recommended pattern for payment APIs that support idempotency keys.
// WARNING: Never generate outside a step because it changes on replay, defeating deduplication.
const idempotencyKey = await context.step('payment-idempotency-key', async () =>
randomUUID()
);
// ββ Step 4: Charge payment ββββββββββββββββββββββββββββββββββββββββββββββ
// AtLeastOnce (default) + idempotency key = safe to retry.
// Even if Lambda crashes after charge but before checkpoint, the retry sends
// the same idempotency key - payment provider deduplicates and returns original result.
// No double charge risk.
const payment = await context.step(
'charge-payment',
async () => {
context.logger.info('Charging payment', { idempotencyKey });
return await callPaymentService(event.paymentMethod, getTotalAmount(event), idempotencyKey);
},
{ retryStrategy: paymentRetryStrategy } // retries NetworkError safely
);
// Register compensation - if shipping fails, refund the payment
compensations.push({
name: 'refund-payment',
fn: async () => {
await callPaymentService(event.paymentMethod, getTotalAmount(event), idempotencyKey, 'refund', payment.id);
},
});
context.logger.info('Payment charged', { paymentId: payment.id });
// ββ Step 4: Create shipment βββββββββββββββββββββββββββββββββββββββββββββ
// Retries with exponential backoff, shipping service may be temporarily down
const shipment = await context.step(
'create-shipment',
async () => {
context.logger.info('Creating shipment');
return await callShippingService(event.orderId, event.shippingAddress, event.items);
},
{ retryStrategy: apiRetryStrategy }
);
context.logger.info('Shipment created', { trackingId: shipment.trackingId });
// ββ Step 5: Send confirmation βββββββββββββββββββββββββββββββββββββββββββ
// Simple fixed-delay retry: email service is reliable, 3 attempts is enough
await context.step(
'send-confirmation',
async () => {
context.logger.info('Sending confirmation email');
await callNotificationService(event.customerId, event.orderId, shipment.trackingId);
},
{
retryStrategy: createRetryStrategy({
maxAttempts: 3,
initialDelay: { seconds: 2 },
backoffRate: 1, // fixed delay, no backoff
}),
}
);
context.logger.info('Order processing complete', { orderId: event.orderId });
return {
success: true,
orderId: event.orderId,
reservationId: reservation.id,
paymentId: payment.id,
trackingId: shipment.trackingId,
};
} catch (error) {
// ββ Saga: undo completed steps in reverse order βββββββββββββββββββββββββ
// Example: payment charged but shipping failed β refund payment, cancel reservation
context.logger.error('Order failed, running compensations', {
orderId: event.orderId,
error: (error as Error).message,
});
for (const comp of compensations.reverse()) {
try {
// Each compensation is its own durable step, checkpointed and retried
await context.step(comp.name, async () => comp.fn());
context.logger.info(`Compensation done: ${comp.name}`);
} catch (compError) {
// Log but continue, try all compensations even if one fails
context.logger.error(`Compensation failed: ${comp.name}`, compError);
}
}
throw error; // re-throw so execution is marked FAILED in console
}
});
// βββ Simulated Service Calls ββββββββββββββββββββββββββββββββββββββββββββββββββ
// In real code these would call actual APIs. Here we simulate occasional failures
// to demonstrate retry behavior.
async function callInventoryService(
orderId: string,
items: OrderEvent['items'],
action = 'reserve'
): Promise<{ id: string }> {
// 20% chance of network failure - demonstrates retry kicking in
if (Math.random() < 0.2) throw new NetworkError('Inventory service timeout');
return { id: `RES-${orderId}-${Date.now()}` };
}
async function callPaymentService(
method: OrderEvent['paymentMethod'],
amount: number,
idempotencyKey: string,
action = 'charge',
paymentId?: string
): Promise<{ id: string }> {
if (Math.random() < 0.1) throw new NetworkError('Payment gateway timeout');
// Token "DECLINED" simulates a declined card, triggers PaymentDeclinedError, no retry
if (method.token === 'DECLINED') throw new PaymentDeclinedError('Card declined');
// In real code: pass idempotencyKey to payment provider (Stripe, etc.)
// Provider returns same result if key was already used, no double charge
return { id: `PAY-${idempotencyKey}` };
}
async function callShippingService(
orderId: string,
address: OrderEvent['shippingAddress'],
items: OrderEvent['items']
): Promise<{ trackingId: string }> {
if (Math.random() < 0.1) throw new NetworkError('Shipping service unavailable');
return { trackingId: `TRACK-${orderId}-${Date.now()}` };
}
async function callNotificationService(
customerId: string,
orderId: string,
trackingId: string
): Promise<void> {
console.log(`Confirmation sent to ${customerId} for order ${orderId}, tracking: ${trackingId}`);
}
function getTotalAmount(event: OrderEvent): number {
return event.items.reduce((sum, i) => sum + i.price * i.qty, 0);
}
Let’s run the code!
Copy and paste the above code into the durable Lambda code editor. Then use the commands below to invoke the lambda with different payloads
Success scenario:-
aws lambda invoke \
--function-name durable-parallel-test:prod \
--payload '{"orderId":"ORD-001","customerId":"CUST-123","items":[{"sku":"ITEM-1","qty":2,"price":25}],"paymentMethod":{"type":"card","token":"tok_valid"},"shippingAddress":{"street":"123 Main St","city":"Seattle","zip":"98101"}}' \
--cli-binary-format raw-in-base64-out \
response.json && cat response.json
Fail scenario:-
aws lambda invoke \
--function-name durable-parallel-test:prod \
--payload '{"orderId":"ORD-002","customerId":"CUST-123","items":[{"sku":"ITEM-1","qty":1,"price":50}],"paymentMethod":{"type":"card","token":"DECLINED"},"shippingAddress":{"street":"123 Main St","city":"Seattle","zip":"98101"}}' \
--cli-binary-format raw-in-base64-out \
response.json && cat response.json
Invalid input:
aws lambda invoke \
--function-name durable-parallel-test:prod \
--payload '{"orderId":"","customerId":"","items":[],"paymentMethod":{"type":"card","token":"tok"},"shippingAddress":{"street":"","city":"","zip":""}}' \
--cli-binary-format raw-in-base64-out \
response.json && cat response.json
Conclusion
Saga patterns make rollback to a consistent state very simple in distributed transactions. And with lambda durable functions’ checkpointing and retry capability, it is easy-peasy to implement. Thanks for readingπ