A technical guide to message queues: the producer-consumer pattern, message acknowledgement, at-least-once vs exactly-once delivery, dead letter queues, when queues solve real problems vs when they add unnecessary complexity, and how to choose between Redis, SQS, and Kafka.
Every application eventually encounters a class of work that shouldn't happen synchronously in an HTTP request: sending emails, generating reports, calling slow external APIs, processing uploaded files, triggering notifications to thousands of users. A message queue is the standard tool for handling this work reliably. Understanding how queues work — the acknowledgement model, delivery guarantees, failure modes — is what separates a reliable async system from one that silently drops jobs or processes them twice.
Without a queue, the naive approach to async work is: do it in the HTTP handler, or fire-and-forget (spawn a Promise and hope it completes). Both are problematic.
Synchronous in the HTTP handler: The user waits for the slow work to finish. A 3-second image resize or a 500ms Mailgun API call adds directly to your response time. More critically, if the slow work fails (external API is down), the whole request fails. Retry logic in HTTP handlers is messy.
Fire-and-forget: Spawn a Promise or background task that isn't tracked anywhere. The work might complete; it might fail silently; the process might restart mid-job and the work is lost with no record it was ever started. This is fine for truly non-critical work with no reliability requirements, and a disaster for anything that matters.
A queue decouples the work from the request. The HTTP handler enqueues a job (fast — typically a Redis write, taking <1ms) and returns. A separate worker process picks up the job and does the slow work. If the worker fails, the job stays in the queue and another worker picks it up. The user's request was not affected by the worker's failure.
This is the producer-consumer pattern. Producers create jobs and add them to the queue. Consumers (workers) take jobs from the queue and execute them.
Acknowledgement is the mechanism that ensures jobs aren't lost when workers crash. It's the most important thing to understand about queue reliability.
Without acknowledgement (wrong):
Queue: [job_A, job_B, job_C]
Worker picks up job_A → queue removes job_A
Worker crashes while processing job_A
job_A is gone forever
With acknowledgement (correct):
Queue: [job_A, job_B, job_C]
Worker picks up job_A → queue moves job_A to "processing" state (still in queue)
Worker processes job_A successfully → worker sends ACK → queue deletes job_A
Queue: [job_B, job_C]
-- OR --
Worker picks up job_A → queue moves job_A to "processing" state
Worker crashes → lock expires after timeout
Queue: [job_A, job_B, job_C] ← job_A is back
Another worker picks up job_A and processes it
In BullMQ (Node.js/Redis), this is handled automatically. Jobs are moved to an "active" set when picked up. They're only removed from active and moved to "completed" when job.moveToCompleted() is called (implicitly, when your job processor function resolves). If the worker crashes or the lock expires, BullMQ's stalled job recovery moves the job back to "waiting."
In AWS SQS, the equivalent is the visibility timeout — when a consumer receives a message, it becomes invisible to other consumers for the timeout duration. The consumer must explicitly delete the message after processing. If it doesn't delete within the timeout, the message reappears.
FIFO (first in, first out): Jobs are processed in the order they were enqueued. Standard for most use cases where ordering matters or where fairness is the goal. SQS Standard queues are not strictly FIFO (they have at-least-once delivery with possible ordering variation); SQS FIFO queues guarantee ordered delivery at the cost of lower throughput.
Priority queues: Jobs have a numeric priority; higher-priority jobs are processed before lower-priority ones regardless of enqueue time. BullMQ supports priority natively. Useful when mixing job types with different urgency — a user-triggered action (priority: high) and a nightly batch report (priority: low) can share a worker pool, with the high-priority work always getting processed first.
Delayed jobs: A variant where jobs are enqueued with a delay — "process this in 30 minutes." Common for: retrying after an expected delay, sending a follow-up email 24 hours after signup, scheduling reminders. BullMQ supports { delay: 1000 * 60 * 30 } (milliseconds). Redis sorted sets handle the scheduled state — delayed jobs sit in a sorted set ordered by their run-at timestamp, and a scheduler process promotes them to the waiting queue when their time comes.
What happens to a job that keeps failing? Without a dead letter queue, it retries indefinitely — potentially blocking other jobs or consuming worker capacity. With a DLQ, jobs that exceed their maximum retry count are moved to a separate queue (the dead letter queue) where they stop being retried and can be inspected, debugged, or manually re-processed.
In BullMQ:
const queue = new Queue('email', { connection });
const worker = new Worker('email', async (job) => {
await sendEmail(job.data);
}, {
connection,
settings: {
backoffStrategy: (attemptsMade) => {
return Math.min(Math.pow(2, attemptsMade) * 1000, 30000); // Exponential backoff, max 30s
}
}
});
// Jobs that fail after maxAttempts go to the failed state (BullMQ's DLQ equivalent)
// Add this when adding to queue:
queue.add('send-welcome', { userId, email }, {
attempts: 5,
backoff: { type: 'exponential', delay: 2000 }
});In SQS, you configure a DLQ as a separate SQS queue and set maxReceiveCount on the source queue. After a message is received (and not deleted) maxReceiveCount times, SQS automatically moves it to the DLQ.
A DLQ is not optional for production queue systems. Jobs fail for reasons you can't predict in advance — downstream APIs returning unexpected responses, data that violates assumptions your processor makes, infrastructure events. Without a DLQ, failed jobs either retry forever or disappear silently. Neither is acceptable.
Every message queue has a delivery guarantee, and none of them is exactly-once by default.
At-least-once delivery: The queue guarantees every message will be delivered to a consumer at least once. In practice, most messages are delivered exactly once. But in failure scenarios (worker crashes after processing but before ACK, duplicate delivery due to network issues), a message may be delivered multiple times. Your consumers must be idempotent.
At-most-once delivery: Messages may be lost (if the consumer fails before processing) but are never delivered twice. Used when duplicate delivery is more harmful than occasional loss — uncommon for application workloads.
Exactly-once delivery: Every message is delivered exactly once, no more, no less. The dirty secret: true exactly-once delivery is theoretically impossible in distributed systems without coordination that destroys throughput. What providers call "exactly-once" is actually "effectively-once" — they use deduplication mechanisms that reduce duplicates to near-zero under normal conditions, with caveats.
SQS FIFO queues offer exactly-once processing within a deduplication window (5 minutes) using a client-provided deduplication ID. Within that window, messages with the same deduplication ID are deduplicated. Outside the window, if the same message is sent again with the same ID, it's treated as a new message.
The practical conclusion: design your job processors to be idempotent (running them twice produces the same outcome as running once). Don't rely on the queue to prevent duplicates.
// Idempotent job processor
async function processPaymentJob(job) {
const { paymentIntentId } = job.data;
// Check if already processed
const existing = await db.payments.findUnique({
where: { stripePaymentIntentId: paymentIntentId }
});
if (existing) {
return { alreadyProcessed: true };
}
// Process and record atomically
await db.$transaction([
db.payments.create({ data: { stripePaymentIntentId: paymentIntentId, ... } }),
db.orders.update({ where: { ... }, data: { status: 'paid' } })
]);
}| Tool | Runtime | Backing store | Key strengths |
|---|---|---|---|
| BullMQ | Node.js | Redis | Full-featured, dashboard (Bull Board), scheduling, priority, delayed jobs |
| Sidekiq | Ruby | Redis | Mature, fast, excellent monitoring, Rails integration |
| Celery | Python | Redis or RabbitMQ | Mature, flexible, broad ecosystem |
| AWS SQS | Any | Managed | No ops, scales infinitely, integrates with Lambda |
| AWS SQS + Lambda | Any | Managed | Serverless workers, auto-scaling out of the box |
| RabbitMQ | Any | Self-hosted | Complex routing rules, multiple exchange types |
| Temporal | Any | PostgreSQL/Cassandra | Durable workflows, not just simple jobs |
For most Node.js applications, BullMQ (the successor to Bull) is the practical default. Redis is already in most stacks for caching; BullMQ uses Redis efficiently and has a solid dashboard. For applications already on AWS that need to scale workers independently, SQS + Lambda or SQS + ECS workers eliminates queue infrastructure entirely.
Queues add complexity: a Redis instance (or SQS), worker processes to deploy and monitor, job state to inspect when things go wrong. Don't add this infrastructure until you have a concrete reason.
You probably don't need a queue when:
Email and notification sending. Email providers (Mailgun, Sendgrid, Resend) have API latency and rate limits. Sending email synchronously in a request adds 200-500ms and introduces a failure dependency. Enqueue the send; the user's request completes instantly.
Heavy computation. PDF generation, video transcription, image processing, ML inference. These operations take seconds to minutes. They belong in a worker, not a request handler.
External API calls with rate limits. If you're calling an API that has a 100 requests/minute limit, you need rate-limited queue processing, not unbounded concurrent HTTP handlers.
Fan-out notifications. "User posted a comment → notify 500 followers." This is 500 database writes or push notification calls. Don't do it synchronously in the request that triggers it.
Payment processing side effects. After a payment succeeds, you might need to: provision access, send a receipt, update CRM, notify the finance team, generate an invoice. These can be separate jobs triggered by the payment event, each retrying independently if they fail.
Anything that must be retried on failure. If the consequence of failure is that the work doesn't happen (user doesn't get their password reset email, order isn't fulfilled), you need retry logic. Queues provide this natively.
These terms get conflated. They're related but solve different problems.
Message queue (BullMQ, SQS): Work distribution. A job is picked up by one worker, processed once (or retried until successful), then removed. The queue is consumed — each job disappears after processing. Good for: background jobs, task distribution.
Pub/sub (Redis pub/sub, Google Cloud Pub/Sub, SNS): Fan-out notifications. A message published to a topic is delivered to all subscribers. Multiple consumers each receive a copy. The message is not "consumed" — it's broadcast. Good for: real-time events where multiple independent components need to react.
Event streaming (Kafka, Kinesis): An append-only log of events. Consumers read the log at their own pace and maintain their own position (offset). New consumers can replay the entire history. Events are retained for a configured period (days, weeks, forever). Good for: audit logs, event sourcing, analytics pipelines, systems that need to replay history.
The key distinction between a queue and Kafka: in a queue, a job is "owned" by one consumer and disappears after processing. In Kafka, every consumer group reads the full stream independently — adding a new service that needs order events means it reads from offset 0 (or from "now") without affecting other consumers or the message producers.
A queue's depth (number of jobs waiting) is a key operational metric. A queue that's growing without bound means workers aren't keeping up — you need more workers, faster workers, or fewer jobs being enqueued.
BullMQ exposes queue metrics:
const counts = await queue.getJobCounts('waiting', 'active', 'failed', 'delayed');
// { waiting: 143, active: 8, failed: 2, delayed: 0 }Useful thresholds to alert on:
For autoscaling: if you're running workers on Kubernetes, you can use KEDA (Kubernetes Event-Driven Autoscaling) to scale worker deployments based on queue depth. On AWS with SQS + ECS, Application Auto Scaling supports scaling ECS services based on SQS queue depth via CloudWatch metrics.
Queues, background jobs, async processing patterns — getting these right early prevents the reliability problems that emerge at scale. Hunchbite helps technical leads and engineering teams design backend architecture that's robust under real production conditions.
Call +91 90358 61690 · Book a free call · Contact form
If this guide resonated with your situation, let's talk. We offer a free 30-minute discovery call — no pitch, just honest advice on your specific project.
How to set up Drizzle ORM with PostgreSQL from scratch — schema definition, migrations, query patterns, connection pooling, and the configuration decisions that matter in production Next.js applications.
11 min readguideA technical guide to database indexes: B-tree internals, composite index column ordering, covering indexes, partial indexes, the write cost of over-indexing, EXPLAIN ANALYZE interpretation, and the common indexing mistakes that degrade production performance.
14 min read