Implementing Saga Pattern With Lambda Durable Function

When you hit the “Place Order” button, that event triggers a series of steps, including inventory reservation, payment processing, and shipping initialization. Now, suppose your card is charged by the payment service (Stripe πŸ€”), but the API call to the third-party shipping service failed.

Modern systems don’t live inside a single database anymore. You can’t just rollback everything like a normal database transaction. In this era of distributed services, Saga patterns solve the problem of distributed rollback.

What is the Saga Pattern?

Saga is a sequence of steps carried out in a workflow. For each successful step, there exists a compensating step. As the saga progresses, these steps are stored in a list. Down the lane, if any step fails, all compensating steps are executed in reverse order.

Saga Pattern Demo

Saga Pattern with AWS Durable Lambda Functions

Because of built-in checkpoints and replay mechanisms, AWS Lambda functions are perfect for implementing the saga pattern.

Each step of the saga can be wrapped in a durable step, allowing independent retry strategies. After each successful step, durable function checkpoints state; hence, on retry, already completed steps are not executed again.

To implement forward steps and compensation, all that we need is a try-catch block. The idea is simple: keep moving forward in the try block. If any step fails, compensating steps run in the catch block.

Prefer GitHub instead? https://gist.github.com/TrickSumo/61744e540d0e5953c3dd5fdc6d8a2338

import {
  withDurableExecution,
  DurableContext,
  createRetryStrategy,
  JitterStrategy,
} from '@aws/durable-execution-sdk-js';
import { randomUUID } from 'crypto';

//Types

interface OrderEvent {
  orderId: string;
  customerId: string;
  items: Array<{ sku: string; qty: number; price: number }>;
  paymentMethod: { type: string; token: string };
  shippingAddress: { street: string; city: string; zip: string };
}

// Custom Error Classes
// Used by retryableErrorTypes - SDK checks instanceof, so real classes are needed

class NetworkError extends Error {
  name = 'NetworkError';
}

class PaymentDeclinedError extends Error {
  name = 'PaymentDeclinedError';
}

// Retry Strategies

/**
 * Exponential backoff with jitter - for external API calls.
 * Attempts: 1s β†’ 2s β†’ 4s β†’ 8s β†’ 16s (random jitter added to avoid thundering herd)
 * Only retries NetworkError - other errors won't be retried.
 */
const apiRetryStrategy = createRetryStrategy({
  maxAttempts: 5,
  initialDelay: { seconds: 1 },
  maxDelay: { seconds: 30 },
  backoffRate: 2.0,
  jitter: JitterStrategy.FULL,
  retryableErrorTypes: [NetworkError], // ← class constructor, not string
});

/**
 * Custom retry for payment - retries network errors but NOT declined cards.
 * Uses a function instead of createRetryStrategy for fine-grained control.
 */
const paymentRetryStrategy = (error: Error, attemptCount: number) => {
  // retryableErrorTypes equivalent - never retry a declined card
  if (error instanceof PaymentDeclinedError) {
    return { shouldRetry: false };
  }

  // maxAttempts: 5 equivalent
  if (attemptCount >= 5) {
    return { shouldRetry: false };
  }

  // initialDelay(1s) + backoffRate(2.0) + maxDelay(30s) + jitter(FULL) equivalent
  const baseDelay = 1 * Math.pow(2.0, attemptCount - 1); // exponential: 1s β†’ 2s β†’ 4s β†’ 8s β†’ 16s
  const capped = Math.min(baseDelay, 30);                 // maxDelay: never exceed 30s
  const jittered = Math.random() * capped;                // FULL jitter: random between 0 and capped
  const seconds = Math.max(1, Math.round(jittered));      // minimum 1s, rounded to whole second

  return { shouldRetry: true, delay: { seconds } };
};

// Handler

export const handler = withDurableExecution(async (event: OrderEvent, context: DurableContext) => {
  context.logger.info('Order processing started', { orderId: event.orderId });

  // Saga compensations array tracks what to undo if something fails later
  const compensations: Array<{ name: string; fn: () => Promise<void> }> = [];

  try {

    // ── Step 1: Validate ────────────────────────────────────────────────────
    // Throws a plain Error for invalid input, "shouldRetry: false" strategy means it fails immediately without retry
    await context.step('validate-order', async () => {
      context.logger.info('Validating order');

      if (!event.orderId || !event.customerId) {
        throw new Error('Order missing required fields');
      }
      if (!event.items || event.items.length === 0) {
        throw new Error('Order has no items');
      }

      const total = event.items.reduce((sum, i) => sum + i.price * i.qty, 0);
      if (total <= 0) {
        throw new Error('Order total must be greater than zero');
      }

      return { valid: true, total };
    },
      { retryStrategy: () => ({ shouldRetry: false }) }
    );

    // ── Step 2: Reserve inventory ───────────────────────────────────────────
    // Retries with exponential backoff, inventory service may be temporarily busy
    const reservation = await context.step(
      'reserve-inventory',
      async () => {
        context.logger.info('Reserving inventory');
        return await callInventoryService(event.orderId, event.items);
      },
      { retryStrategy: apiRetryStrategy }
    );

    // Register compensation, if something fails later, cancel this reservation
    compensations.push({
      name: 'cancel-reservation',
      fn: async () => { await callInventoryService(event.orderId, event.items, 'cancel'); },
    });

    context.logger.info('Inventory reserved', { reservationId: reservation.id });

    // ── Step 3: Generate idempotency key ───────────────────────────────────
    // Key is generated ONCE inside a step then checkpointed and same value returned on every replay.
    // This is the recommended pattern for payment APIs that support idempotency keys.
    // WARNING: Never generate outside a step because it changes on replay, defeating deduplication.
    const idempotencyKey = await context.step('payment-idempotency-key', async () =>
      randomUUID()
    );

    // ── Step 4: Charge payment ──────────────────────────────────────────────
    // AtLeastOnce (default) + idempotency key = safe to retry.
    // Even if Lambda crashes after charge but before checkpoint, the retry sends
    // the same idempotency key - payment provider deduplicates and returns original result.
    // No double charge risk.
    const payment = await context.step(
      'charge-payment',
      async () => {
        context.logger.info('Charging payment', { idempotencyKey });
        return await callPaymentService(event.paymentMethod, getTotalAmount(event), idempotencyKey);
      },
      { retryStrategy: paymentRetryStrategy } // retries NetworkError safely
    );

    // Register compensation - if shipping fails, refund the payment
    compensations.push({
      name: 'refund-payment',
      fn: async () => {
        await callPaymentService(event.paymentMethod, getTotalAmount(event), idempotencyKey, 'refund', payment.id);
      },
    });

    context.logger.info('Payment charged', { paymentId: payment.id });

    // ── Step 4: Create shipment ─────────────────────────────────────────────
    // Retries with exponential backoff, shipping service may be temporarily down
    const shipment = await context.step(
      'create-shipment',
      async () => {
        context.logger.info('Creating shipment');
        return await callShippingService(event.orderId, event.shippingAddress, event.items);
      },
      { retryStrategy: apiRetryStrategy }
    );

    context.logger.info('Shipment created', { trackingId: shipment.trackingId });

    // ── Step 5: Send confirmation ───────────────────────────────────────────
    // Simple fixed-delay retry: email service is reliable, 3 attempts is enough
    await context.step(
      'send-confirmation',
      async () => {
        context.logger.info('Sending confirmation email');
        await callNotificationService(event.customerId, event.orderId, shipment.trackingId);
      },
      {
        retryStrategy: createRetryStrategy({
          maxAttempts: 3,
          initialDelay: { seconds: 2 },
          backoffRate: 1, // fixed delay, no backoff
        }),
      }
    );

    context.logger.info('Order processing complete', { orderId: event.orderId });

    return {
      success: true,
      orderId: event.orderId,
      reservationId: reservation.id,
      paymentId: payment.id,
      trackingId: shipment.trackingId,
    };

  } catch (error) {
    // ── Saga: undo completed steps in reverse order ─────────────────────────
    // Example: payment charged but shipping failed β†’ refund payment, cancel reservation
    context.logger.error('Order failed, running compensations', {
      orderId: event.orderId,
      error: (error as Error).message,
    });

    for (const comp of compensations.reverse()) {
      try {
        // Each compensation is its own durable step, checkpointed and retried
        await context.step(comp.name, async () => comp.fn());
        context.logger.info(`Compensation done: ${comp.name}`);
      } catch (compError) {
        // Log but continue, try all compensations even if one fails
        context.logger.error(`Compensation failed: ${comp.name}`, compError);
      }
    }

    throw error; // re-throw so execution is marked FAILED in console
  }
});

// ─── Simulated Service Calls ──────────────────────────────────────────────────
// In real code these would call actual APIs. Here we simulate occasional failures
// to demonstrate retry behavior.

async function callInventoryService(
  orderId: string,
  items: OrderEvent['items'],
  action = 'reserve'
): Promise<{ id: string }> {
  // 20% chance of network failure - demonstrates retry kicking in
  if (Math.random() < 0.2) throw new NetworkError('Inventory service timeout');
  return { id: `RES-${orderId}-${Date.now()}` };
}

async function callPaymentService(
  method: OrderEvent['paymentMethod'],
  amount: number,
  idempotencyKey: string,
  action = 'charge',
  paymentId?: string
): Promise<{ id: string }> {
  if (Math.random() < 0.1) throw new NetworkError('Payment gateway timeout');
  // Token "DECLINED" simulates a declined card, triggers PaymentDeclinedError, no retry
  if (method.token === 'DECLINED') throw new PaymentDeclinedError('Card declined');
  // In real code: pass idempotencyKey to payment provider (Stripe, etc.)
  // Provider returns same result if key was already used, no double charge
  return { id: `PAY-${idempotencyKey}` };
}

async function callShippingService(
  orderId: string,
  address: OrderEvent['shippingAddress'],
  items: OrderEvent['items']
): Promise<{ trackingId: string }> {
  if (Math.random() < 0.1) throw new NetworkError('Shipping service unavailable');
  return { trackingId: `TRACK-${orderId}-${Date.now()}` };
}

async function callNotificationService(
  customerId: string,
  orderId: string,
  trackingId: string
): Promise<void> {
  console.log(`Confirmation sent to ${customerId} for order ${orderId}, tracking: ${trackingId}`);
}

function getTotalAmount(event: OrderEvent): number {
  return event.items.reduce((sum, i) => sum + i.price * i.qty, 0);
}

Let’s run the code!

Copy and paste the above code into the durable Lambda code editor. Then use the commands below to invoke the lambda with different payloads

Success scenario:-

aws lambda invoke \
  --function-name durable-parallel-test:prod \
  --payload '{"orderId":"ORD-001","customerId":"CUST-123","items":[{"sku":"ITEM-1","qty":2,"price":25}],"paymentMethod":{"type":"card","token":"tok_valid"},"shippingAddress":{"street":"123 Main St","city":"Seattle","zip":"98101"}}' \
  --cli-binary-format raw-in-base64-out \
  response.json && cat response.json

    Fail scenario:-

    aws lambda invoke \
      --function-name durable-parallel-test:prod \
      --payload '{"orderId":"ORD-002","customerId":"CUST-123","items":[{"sku":"ITEM-1","qty":1,"price":50}],"paymentMethod":{"type":"card","token":"DECLINED"},"shippingAddress":{"street":"123 Main St","city":"Seattle","zip":"98101"}}' \
      --cli-binary-format raw-in-base64-out \
      response.json && cat response.json

    Invalid input:

    aws lambda invoke \
      --function-name durable-parallel-test:prod \
      --payload '{"orderId":"","customerId":"","items":[],"paymentMethod":{"type":"card","token":"tok"},"shippingAddress":{"street":"","city":"","zip":""}}' \
      --cli-binary-format raw-in-base64-out \
      response.json && cat response.json

    Conclusion

    Saga patterns make rollback to a consistent state very simple in distributed transactions. And with lambda durable functions’ checkpointing and retry capability, it is easy-peasy to implement. Thanks for reading😊

    Leave a Comment

    This site uses Akismet to reduce spam. Learn how your comment data is processed.