Bartleby and the Dead Letter Queue
From Melville to modern cloud architecture: Why every system needs a place for the messages that 'prefer not to' be processed.
Before the era of instant delivery, a letter was a leap of faith. When the ink smudged or a recipient vanished, the message didn’t simply disappear. It was sent to the Dead Letter Office.
This was a centralized purgatory, the “end of the line” for every message that lost its way. Inside, clerks performed a morbid form of triage. Legally authorized to break wax seals, they opened the envelopes of strangers to decide if a packet could be corrected, returned, or destroyed.
It was soul-drenching work. In Herman Melville’s Bartleby, the Scrivener, the titular character’s descent into catatonia began in a Dead Letter Office, “sorting those letters for flames.” To be a clerk was to be the human buffer between a functioning society and the chaos of lost communication, the custodian of intentions that should have worked, but didn’t.
We like to think our modern, high-speed systems have evolved past the need for dusty rooms and weary clerks. But data, like letters, is still a leap of faith. When a schema changes or a “poison pill” enters the stream, we have to resurrect that 19th-century necessity.
We call it the Dead Letter Queue (DLQ).
The Anatomy of a Failure: Why Messages “Die”
In a perfect world, every message is consumed, processed, and acknowledged. But in distributed systems, the “happy path” is only one of many. When a consumer fails to process a message, we face a choice: retry forever (blocking the entire queue) or move it to the Dead Letter Office.
There are three primary reasons a message ends up in the “Misfit” pile:
1. The Poison Pill (Hard Failure)
This is the most common reason for a DLQ. The message arrives, but the recipient cannot process it. The message is a Poison Pill. It’s malformed or contains data that crashes your code every time it’s read. No amount of retrying will fix it.
After a set number of failed attempts (the Max Delivery Count), the system realizes this specific letter is causing a bottleneck and moves it to the DLQ to clear the way for the rest of the traffic.
2. The Broken Bridge (Transient Failure):
The message is perfect, but the environment is broken. Perhaps your database is down or an external API is rate-limiting your requests.
If you keep retrying immediately, you waste resources and block healthy messages from being processed. So, you wait a bit and try again, if it fails you try a bit more, finally you resign and move the message to the DLQ.
3. Logical Expiry (The Vanished Addressee)
Sometimes a message has a “Time to Live” (TTL). If a message sits in a queue too long without being processed, it becomes irrelevant, like a newspaper delivered a week late.
Instead of letting these messages vanish silently (data loss), a well-configured system moves them to the DLQ. This serves as an alarm: if your DLQ is filling with expired messages, your consumers are too slow to keep up with the mail.
The Triage: What do we do with the data?
Just like the clerks of the 1850s, once data hits the DLQ, an engineer must perform a triage. You generally have three paths:
Re-drive (The Correction): You fix the bug in the consumer, then “replay” the messages from the DLQ back into the main stream.
Discard (The Furnace): You realize the data is truly garbage or duplicate and you delete it to save storage.
Manual Intervention (The Detective Work): You inspect the payload to understand a new edge case in your business logic that you hadn’t accounted for.
Architectures of Failure: Queues vs. Streams
To understand how to handle “dead” data, we first have to understand the two different ways we move information in the cloud.
1. The Queue: The Individual Mailbox
Examples: AWS SQS, Azure Service Bus
Think of this as a stack of letters. A post office worker takes the top letter, tries to deliver it (to a Consumer), and if successful, the letter is gone.
Handling Failure: If the postman tries to deliver the letter address and fails (or crashes…), they simply “put the letter back” and walk away.
The System’s Role: The post office (the Queue system) keeps track of how many times that letter has been returned. After the 5th or 10th attempt, the post office manager says, “Clearly, this letter is unreadable,” and the system, not the Consumer, automatically moves it to the Dead Letter Office.
The Result: The postman never has to worry about that specific letter again. They just move on to the next one in the stack. This is surgical and handled entirely by the infrastructure.
2. The Stream: The Industrial Conveyor Belt
Examples: Apache Kafka, AWS Kinesis
This is a long conveyor belt where messages are bolted down in a fixed order. They aren’t removed after being read; they stay on the belt until they reach the end and fall into a shredder (Retention Time).
The Private Courier: Anyone wanting these letters sends a private courier (the Consumer) to the belt. The courier photocopies a letter, delivers the copy to their employer, and marks it as “read.” The original stays on the belt so other couriers can read it, too.
Handling Failure: If the courier hits a letter that is gibberish (a Poison Pill), they are stuck. There is no “Manager” to pluck the bad letter off the belt for them. If the courier walks away and comes back, the same bad letter is still right there.
The Manual Work: To keep the line moving, the courier must take a photo of the bad letter, toss it into a different bin (the Dead Letter Topic), and then manually click “Done” to move their eyes to the next message.
A Quick Recap
Of course, this is a simplification, metaphors only take us so far, and there are layers of technical detail we haven’t dug into here. But the core architectural truth remains:
A Queue is built for one-to-one delivery. It demands a confirmation (an “ACK”) from the consumer before it destroys a message. If that confirmation fails repeatedly, the system (the post office) intervenes and moves the message to the DLQ.
A Stream is built for one-to-many broadcasting. It doesn’t care if a message has been read, understood, or ignored. It simply keeps the data available for any courier to pick up. Because the stream itself is indifferent, the responsibility of identifying a failure and sending it to a DLQ falls entirely on the Consumer.
In the end, we use the Dead Letter Office not because we expect to fail, but because we are realistic enough to know that in any complex system, some letters, and some data, will always lose their way.
Beside writing posts on this select * from substack, I also work as data consultant for my own company. It was an excuse to work with people that I like and get some more interesting projects. If you have a data challenge, if your delivery speed is too slow, or you if think you can do more with data. Feel free to contact us.





