How Webhooks Work (and Why They Fail)

A technical guide to webhooks: the push vs pull model, HMAC-SHA256 signature verification, idempotency, the 5xx retry problem, delivery ordering guarantees, and how to build a reliable webhook handler that doesn't process events twice.

By HunchbiteMarch 30, 202613 min read

webhooksAPIevent-driven

Webhooks are conceptually simple — a server POSTs to your URL when something happens — but getting them right in production requires handling a set of failure modes that aren't obvious until they bite you. This guide covers how webhooks work, why they fail, and how to build a handler that's reliable under the actual conditions of production systems.

Push vs pull: why webhooks exist

The alternative to webhooks is polling: your application periodically asks "did anything change?" The problems with polling are well understood — you're either checking too frequently (wasting resources, hitting rate limits) or not frequently enough (delayed reactions to events).

Webhooks invert this. Instead of you asking "did anything change?", the external service notifies you when something changes. Push instead of pull.

The basic flow:

1. You register a webhook endpoint: POST https://api.stripe.com/v1/webhook_endpoints
   { url: "https://yourapp.com/webhooks/stripe", events: ["payment_intent.succeeded"] }
 
2. User pays on your app → Stripe processes the payment
 
3. Stripe sends:
   POST https://yourapp.com/webhooks/stripe
   {
     "id": "evt_1PxK2rLkdIwHu7ixoZVBFpXs",
     "type": "payment_intent.succeeded",
     "data": { "object": { ... payment intent ... } }
   }
 
4. Your handler processes the event, returns HTTP 200
 
5. Stripe marks the event as delivered

A key characteristic: the sender doesn't care what you do with the webhook. Stripe doesn't know if you sent the user a confirmation email, updated your database, or threw the event away. It only knows whether your server responded with a 2xx status code within its timeout window. Everything else is your problem.

This decoupling is both the power and the source of most webhook bugs.

Webhook signatures and HMAC-SHA256 verification

If you expose a webhook endpoint at https://yourapp.com/webhooks/stripe, anyone can POST to it. Without verification, an attacker could forge payment events and trigger your fulfillment logic for free. This is not a theoretical risk. Unverified or forgeable webhook endpoints are a recurring finding in security assessments — if you process payment or account events this way, it's worth knowing when to run a penetration test on your integrations.

Webhook providers solve this with HMAC signatures. Stripe, GitHub, and most others sign each request using HMAC-SHA256 with a secret key that only you and Stripe know.

How HMAC-SHA256 signing works:

signature = HMAC-SHA256(message, secret)

Stripe's implementation includes a timestamp to prevent replay attacks:

signed_payload = timestamp + "." + raw_request_body
signature = HMAC-SHA256(signed_payload, endpoint_signing_secret)

The signature goes in the Stripe-Signature header:

Stripe-Signature: t=1714500000,v1=abc123...,v1=def456...

(Multiple v1= values appear when Stripe rotates secrets — both old and new signatures are included during the transition period.)

Verification in Node.js (pseudocode):

import crypto from 'crypto';
 
function verifyStripeWebhook(rawBody, signatureHeader, secret) {
  // Parse the header
  const parts = signatureHeader.split(',');
  const timestamp = parts.find(p => p.startsWith('t=')).slice(2);
  const signatures = parts
    .filter(p => p.startsWith('v1='))
    .map(p => p.slice(3));
 
  // Reconstruct the signed payload
  const signedPayload = `${timestamp}.${rawBody}`;
 
  // Compute expected signature
  const expected = crypto
    .createHmac('sha256', secret)
    .update(signedPayload, 'utf8')
    .digest('hex');
 
  // Constant-time comparison (prevents timing attacks)
  const isValid = signatures.some(sig =>
    crypto.timingSafeEqual(
      Buffer.from(sig, 'hex'),
      Buffer.from(expected, 'hex')
    )
  );
 
  if (!isValid) {
    throw new Error('Invalid webhook signature');
  }
 
  // Check timestamp freshness (prevent replay attacks)
  const tolerance = 300; // 5 minutes
  if (Math.abs(Date.now() / 1000 - parseInt(timestamp)) > tolerance) {
    throw new Error('Webhook timestamp too old');
  }
 
  return JSON.parse(rawBody);
}

Critical implementation detail: you must verify against the raw request body bytes, not the parsed JSON. Your HTTP framework may parse the body before your handler sees it, reformatting whitespace or reordering keys. If you compute HMAC against JSON.stringify(req.body), you're computing it against a potentially different string than what was signed. In Express, use express.raw({ type: 'application/json' }) as middleware for your webhook route instead of express.json().

Stripe's SDK wraps all of this:

const event = stripe.webhooks.constructEvent(rawBody, signatureHeader, endpointSecret);

Use the SDK if available. The manual implementation above is for understanding, not for production.

Idempotency: handling duplicate deliveries

Webhook delivery is at-least-once. Stripe's documentation says explicitly: "Stripe attempts to deliver your webhooks for up to three days with an exponential back off." During those retries, the same event arrives multiple times.

If your handler is not idempotent — if running it twice on the same event causes double-sending an email, double-charging a user, or creating a duplicate record — you have a production reliability problem waiting to surface.

The standard idempotency pattern:

async function handleWebhookEvent(event) {
  const eventId = event.id; // e.g., "evt_1PxK2rLkdIwHu7ixoZVBFpXs"
 
  // Check if already processed
  const existing = await db.webhookEvents.findUnique({
    where: { eventId }
  });
  if (existing?.processedAt) {
    console.log(`Skipping duplicate event ${eventId}`);
    return; // Already processed — return successfully
  }
 
  // Mark as in-progress (prevents concurrent processing)
  await db.webhookEvents.upsert({
    where: { eventId },
    create: { eventId, receivedAt: new Date() },
    update: { receivedAt: new Date() }
  });
 
  // Process the event
  await processEvent(event);
 
  // Mark as complete
  await db.webhookEvents.update({
    where: { eventId },
    data: { processedAt: new Date() }
  });
}

Store the processed event IDs in a database table. Check before processing. Mark as complete after. The check-and-mark should ideally be atomic (a database transaction with unique constraint on event_id) to handle concurrent delivery of the same event. Because this lookup runs on every single delivery, the event_id column needs to be indexed — the unique constraint gives you one automatically, and our guide on how database indexes work explains why that matters as event volume grows.

The 5xx response problem

Here's the failure mode that trips up most implementations:

Stripe sends event → Your handler starts processing
Your handler calls Mailgun to send email → Mailgun returns 503
Your handler returns 500 to Stripe → Stripe retries
Stripe sends same event again → Your handler sends email successfully
Stripe sends same event AGAIN (because the first retry also failed for some reason)
→ User receives two emails

The problem: you're doing work before returning the 200. Any error in that work causes a non-2xx response, which triggers a retry, which may process the event again.

The correct pattern:

1. Receive webhook
2. Verify signature
3. Store the raw event in your database (this is fast and reliable)
4. Return 200 immediately
5. A background worker picks up the event and processes it asynchronously

// Webhook handler — fast path
app.post('/webhooks/stripe', rawBodyMiddleware, async (req, res) => {
  let event;
  try {
    event = stripe.webhooks.constructEvent(
      req.rawBody,
      req.headers['stripe-signature'],
      process.env.STRIPE_WEBHOOK_SECRET
    );
  } catch (err) {
    return res.status(400).send(`Webhook verification failed: ${err.message}`);
  }
 
  // Store for async processing — idempotent upsert
  await db.incomingWebhookEvents.upsert({
    where: { eventId: event.id },
    create: {
      eventId: event.id,
      eventType: event.type,
      payload: event,
      receivedAt: new Date(),
    },
    update: {} // Already exists — do nothing
  });
 
  res.json({ received: true }); // Return 200 fast
});
 
// Background worker — processes events asynchronously
async function processWebhookQueue() {
  const unprocessed = await db.incomingWebhookEvents.findMany({
    where: { processedAt: null },
    orderBy: { receivedAt: 'asc' }
  });
 
  for (const record of unprocessed) {
    try {
      await processEvent(record.payload);
      await db.incomingWebhookEvents.update({
        where: { id: record.id },
        data: { processedAt: new Date() }
      });
    } catch (err) {
      await db.incomingWebhookEvents.update({
        where: { id: record.id },
        data: { lastError: err.message, errorCount: { increment: 1 } }
      });
    }
  }
}

This pattern makes your webhook handler nearly indestructible. The only thing that can cause a non-2xx response is a database failure when storing the raw event — and if your database is down, you have bigger problems.

Delivery order is not guaranteed

Webhook providers do not guarantee events arrive in the order they occurred. Network routing, retry timing, and infrastructure quirks mean you might receive payment_intent.payment_failed before payment_intent.created. Or customer.subscription.updated (new plan) before the initial customer.subscription.created.

Your handler must be designed to handle out-of-order events. Strategies:

Timestamp-based ordering: Use the event's created timestamp to determine ordering when conflicts arise, not arrival order.
Idempotent state transitions: Make your state transitions idempotent — processing "subscription cancelled" twice should be safe, and processing "subscription created" after "subscription cancelled" should be handled gracefully.
Fetch from API on ambiguity: For critical state, don't trust the webhook payload alone. Fetch the current state from the API (e.g., call stripe.subscriptions.retrieve(id)) to confirm before acting. The webhook is a notification; the API is the source of truth.

Retry policies across providers

Understanding how your providers retry helps you design your error handling:

Provider	Initial timeout	Retry schedule	Max attempts
Stripe	30 seconds	Exponential backoff over 3 days	~17
GitHub	30 seconds	3 retries at 5-minute intervals	4
Svix	5 seconds	Exponential: 5s, 1m, 5m, 30m, 2h, 5h, 10h, 10h...	11
SendGrid	3 seconds	Not documented explicitly	Varies

If your endpoint consistently returns 5xx errors, Stripe will eventually disable the endpoint and notify you. You can manually resend events from the Stripe Dashboard — useful for recovering from outages.

Dead letter queues and webhook logs

For production systems, you need visibility into what's been received and what's failed.

What to log per event:

Event ID and type
Received timestamp
Verification result (pass/fail)
Processing status (pending, processing, succeeded, failed)
Error message if failed
Retry count
Raw payload (for re-processing)

This gives you the ability to manually re-process failed events, audit what happened during an incident, and detect patterns (specific event types failing, a provider timing out consistently).

For teams using a message queue (BullMQ, SQS) as the async processing layer, the queue's built-in dead letter mechanism handles failed jobs after max retries — they land in a DLQ for manual inspection.

Testing webhooks locally

The challenge with local development: Stripe can't POST to localhost:3000. Two standard solutions:

ngrok: Creates a public tunnel to your local server.

ngrok http 3000
# Outputs: https://abc123.ngrok-free.app → localhost:3000

Register https://abc123.ngrok-free.app/webhooks/stripe as your Stripe webhook URL for development. ngrok's dashboard at localhost:4040 shows every request and response, lets you inspect the raw bodies, and replay requests — invaluable for debugging.

Stripe CLI: The official approach for Stripe specifically.

stripe listen --forward-to localhost:3000/webhooks/stripe

The Stripe CLI creates a secure connection to Stripe's servers and forwards webhook events to your local handler. It also prints the webhook signing secret for your local session. Trigger test events:

stripe trigger payment_intent.succeeded

smee.io: A free webhook proxy that works like ngrok for any provider, not just Stripe. Less feature-rich but no installation required.

Building a reliable webhook handler: the checklist

Verify the signature before doing anything else. Reject requests that don't verify.
Use the raw body for signature verification, not parsed JSON.
Store the raw event in your database immediately, before processing.
Return 200 before processing. Enqueue for async processing.
Check for duplicates before processing. Use the event ID as an idempotency key.
Don't trust delivery order. Design state transitions to be order-independent.
Log everything: event ID, type, received time, processing outcome, errors.
Monitor your webhook endpoint — if it's returning 5xx errors, your event processing pipeline is broken.
Handle the case where the webhook payload may be stale — for critical decisions, confirm against the provider's API.

Shipping integrations that don't break in production?

Webhooks, background jobs, third-party integrations — the infrastructure around your core product is where reliability is built or lost. Hunchbite's developer experience service covers integration architecture, async processing patterns, and the operational practices that keep event pipelines running under real conditions.

→ Developer Experience

Call +91 90358 61690 · Book a free call · Contact form

FAQ

Why is Stripe sending the same webhook event multiple times?: Stripe's webhook delivery is at-least-once: if your endpoint doesn't return a 2xx response within the timeout window (typically 30 seconds), Stripe retries. If your handler returns a 5xx error, Stripe retries. If Stripe's infrastructure has a delivery hiccup, Stripe retries. The result is that any given event might arrive 2, 3, or more times. This is by design — it's better to process an event multiple times than to miss it entirely. Your handler needs to handle this by storing processed event IDs and checking before processing: if you've already processed event `evt_abc123`, skip it. This is called idempotency.
What should my webhook handler return?: Return HTTP 200 (or any 2xx) as quickly as possible — ideally within 5-10 seconds, well before the provider's timeout. Do not block the response on your business logic. The correct pattern is: verify the signature, validate the payload, enqueue the event for async processing, return 200. If you process synchronously (send email, update database, call third-party APIs) before returning 200, any failure in that processing chain causes a 5xx response, which triggers a retry, which may process the event a second time. The 200 response means 'I received this event and will process it' — not 'I have finished processing this event.'
How do I verify that a webhook actually came from Stripe?: Stripe signs every webhook request with an HMAC-SHA256 signature using your endpoint's signing secret. The signature is in the `Stripe-Signature` header. To verify: extract the timestamp and signature from the header, reconstruct the signed payload string (timestamp + '.' + raw request body), compute HMAC-SHA256 of that string using your signing secret, and compare against the received signature. Use a constant-time comparison to prevent timing attacks. Never use the already-parsed JSON body for this — you must use the raw bytes exactly as received. Stripe's official SDK handles this with `stripe.webhooks.constructEvent()`, which also checks that the timestamp isn't too old (replay attack prevention).

Weighing a technical decision?

Get a second opinion before you commit.

Stack choices, architecture trade-offs, build-vs-buy — a 30-minute call with senior engineers can save you months. No sales pitch, just a straight answer.

Book a Free Call Technical Due Diligence

Trusted by VMAC Industries, TKD Logistics, Astitva Jewellery & more. See our recent work →

Fixed-price, no hourly billing · No obligation · We tell you upfront if we're not a fit

Technology Decisions

Drizzle ORM Setup Guide: Type-Safe Database Access with PostgreSQL

How to set up Drizzle ORM with PostgreSQL from scratch — schema definition, migrations, query patterns, connection pooling, and the configuration decisions that matter in production Next.js applications.

11 min read Technology Decisions

How Database Indexes Work (And Why the Wrong Index Is Worse Than None)

A technical guide to database indexes: B-tree internals, composite index column ordering, covering indexes, partial indexes, the write cost of over-indexing, EXPLAIN ANALYZE interpretation, and the common indexing mistakes that degrade production performance.

14 min read

All Guides