Wrestling with Postel’s Law
Someone took the time to put together a history of Postel's Law, which was relied upon heavily for this article. If you're short on time, you can skip to the section titled "Rebellion."
I was in university when I first encountered Postel's Law, presented as "The Robustness Principle: Be liberal in what you accept and conservative in what you send." According to my professor at the time, this was one of the guiding principles in the conception and invention of TCP (RFC 793).
I understood this to mean that the internet was a wild and crazy place, like the Final Frontier, with Kirk at the helm. In order to preserve any kind of positive user experience, systems should be resilient in the face of bugs and misbehaving clients. As a practical young go-getter, this was almost self-evident: Everyone else's software is resilient, so what if some may misbehave (after all, IE6 was still a thing at that time)?
If my website is the only one that wasn't resilient to bad clients, users would assume my site was broken.
If a client was passing an incorrect type, it should be cast to the correct type.
If my API received a null argument, I should set a sensible default and continue on my way.
Whitespace where it wasn't expected? Trim it.
Entity not found? Log a warning.
Hit an error? Back off and retry.
My function returns an array and your parameters are invalid. Here, have an empty array…
This principle has been generalized and employed across all layers, not just transport layer protocols. With consensus from academia and industry, Postel's Law was universally accepted. Best of all, I felt smart quoting it.
This approach worked wonders for me. I was doing great on school projects, doing devops at work, writing web apps, and consulting. I was a one-man-show, know-it-all, full-stack developer who worked alone and didn't have time to unit test. A complete n00b.
Feeling bored, overworked, and underpaid, I went looking for career excitement and found it at Workiva. True to its startup roots, Workiva had epic code jams, massive features materializing overnight, an ambience of momentum, exotic technology choices, teams spinning up out of nowhere, and an explosion in lines of code. Working in a team environment with great peers and lots to learn in a seemingly endless code base, I finally felt like I was home.
One of the first things I learned was the value of unit testing. I started doing test driven development: setting up bad state, or exercising stubs for to-do features, to iteratively build and fix. Suddenly, unit testing was saving me time. I discovered that one of the greatest parts of unit testing was the instant feedback. As I made code changes and ran my tests: AssertionError: Expected empty list! That feedback is powerful in driving design, and with the right assertions, brings exceptions and broken state to the forefront of the development process.
There have been cases, however, when unit tests and integration tests did not catch bugs that I would have expected them to. The code was well-written and well-documented, and of course robust as any code I'd ever written.
Using a NoSQL database, we were liberated to make changes on the fly, without much concern for migration. As with most technology choices, there is always a trade-off. With our new found NoSQL freedom came a terrible price: the feature I was commissioned to work on would crawl and analyze various data structures to detect and resolve errors or inconsistency within the system.
Initially, a massive amount of time was spent cataloguing various states of inconsistency, and coming up with ways of automatically deriving and resolving those states. For example, if an object in our schema-less database was known to be in violation of our application layer schema, we could trace the history or context of the object to derive its proper state, and bring it back in line with our expectations.
As new bugs cropped up and were resolved new states of inconsistency would emerge, be classified, defined, and resolved as part of the framework. As you can imagine, this was time-intensive and tedious, to put it mildly. Our process was becoming slower and less efficient as new bad states were defined and resolved. We soldiered on, solving issues as they arose with a fairly high degree of success, lazily migrating where possible and working toward a more robust infrastructure. However, we could not always find time to determine root cause(s).
We knew somewhere along the lines, features within the application were acting up and propagating bad state, but it was difficult to track down exactly what, where, and why. Our system was too large and complex for a single team to tackle root cause analysis for all of the issues our feature was detecting. For the most part, our systems did not show any outward signs of inconsistency or error.
One particularly ugly bug was a cast that failed and so set the result to null, and a null check that skipped a code block, which would have blown up in testing had a null not been explicitly no-op'd ignored. As we grew more experienced with the kinds of issues that caused these inconsistent states and the seemingly endless problem of a growing self-healing suite, we began to question our design.
Is there such a thing as being too robust?
It was becoming clear that some code was written without much thought to downstream consequences or upstream consumers. As code was copied and pasted around—as is inevitable in a large codebase—certain patterns emerged that intended to stabilize client code. "Failing gracefully" was becoming "passing the buck."
I remember asking on a somewhat unrelated code review:
Why do we null check? How could this possibly be null? What happens to the calling code when you just return here?
Secretly, I rebelled against Postel's Law. I could see it eroding quality. I began defining schemas and raising assertion errors in production code. I ran out of patience for callers of core pieces of our infrastructure sending in the wrong types.
This API takes a list of strings, not a string. If your list has a single string in it, that's fine, send me a list with a single string—in Python strings are iterable. No, I will not cast your integer or Boolean to a string and jam it into a list or any other crazy backwards shenanigans to account for your loosey-goosey consumer code.
When I decided to write off Postel's law, I didn't speak up about it. When others mentioned it, I nodded my head and winced, but stayed silent. Who am I to discount Jon Postel, whose contributions to the field are immense?
I found support for my sacrilegious views recently in a draft submitted by Martin Thomson of Mozilla: "The Harmful Consequences of Postel's Maxim." Thomson's sentiments feel very familiar as he describes the Protocol Decay Hypothesis. In Thomson's opinion, the design, specification and implementation of any protocol should follow a new maxim, "Protocol designs and implementations should be maximally strict."
It is interesting to note here that does not limit stringency just to implementation, but demands the same rigidity in the design specification. This is very important, as many defenders of Postel's Law will fall back to ambiguity in the specification as an excuse for deviating implementations. Ambiguity in a specification, to me at least, is a bug in the design layer. Undefined, unspecified or implementation-defined behavior? We have a name for that: Buffer Overflow. Thomson goes on to recommend a Fail Fast and Hard approach.
The Fail Fast and Hard approach makes a ton of sense if you think of the API you're offering as a test bed, and the code consumers write as tests built on top of it. Exceptions raised in your API are your consumer's assertion errors. Earlier we mentioned that the power of unit testing comes from having immediate feedback related to expectations. When you are loose with what you accept, you are effectively silencing a consumer's failing unit test. Assume that your consumer understands your API and that they intend to adhere to it. Assume that incorrectly typed values or undefined parameters imply a bug further up in the consumer stack and that they want to know about it.
"This is too dangerous. If my system fails here, we are going to corrupt user data!"
—Anonymous Proponent of Postel's Law, MacLeod Broad's Imagination
Wrong. The API's creator should document and properly communicate the exceptions or error conditions it raises on. A consumer should know what it is expected to catch, why, and try/except handle those specified exceptions accordingly.
In retrospect, I began wondering if maybe I wasn't rebelling at all, but had totally misunderstood Postel's Law. Certainly I was overusing it for scenarios that didn't warrant it. I found a forum post that claims we all got it wrong:
This statement is based upon a terrible misunderstand of Postel's robustness principle. I knew Jon Postel. He was quite unhappy with how his robustness principle was abused to cover up non-compliant behavior, and to criticize compliant software. Jon's principle could perhaps be more accurately stated as "in general, only a subset of a protocol is actually used in real life. So, you should be conservative and only generate that subset. However, you should also be liberal and accept everything that the protocol permits, even if it appears that nobody will ever use it.
However, the origin of Postel's Law stated in RFC 760 "DOD STANDARD INTERNET PROTOCOL" (1980) disagrees with the above:
The implementation of a protocol must be robust. Each implementation must expect to interoperate with others created by different individuals. While the goal of this specification is to be explicit about the protocol there is the possibility of differing interpretations. In general, an implementation should be conservative in its sending behavior, and liberal in its receiving behavior. That is, it should be careful to send well-formed datagrams, but should accept any datagram that it can interpret (e.g., not object to technical errors where the meaning is still clear).
Regardless of how well anyone knows Jon, RFC 760 asserts that even datagrams that are not well-formed should be accepted, as long as they are unambiguously interpretable. The problem here is that how the ambiguity is to be interpreted is determined by the receiving end. It is self-evident that the sender's intent is unclear unless they properly implement the specification. If both receiver and sender agree on the interpretation of an ambiguous payload, it is not much more than a happy accident. If both receiver and sender formally agree on the payload, well, it's not ambiguous at all and should be part of the specification.
While researching for this article, I came across a comment reply noting that Postel's Law parallels a suggestion by philosopher Rudolph Carnap:
"Let us be cautious in making assertions and critical in examining them, but tolerant in permitting linguistic forms."
I believe Carnap's warning to be the precursor to Postel's Law. Carnap was not describing machine-to-machine protocols exactly, but rather the syntax of philosophical expression. Although he was a stickler for syntax, as the inventor of "logical syntax," he cautioned users of his syntax to keep an open mind—that is, an argument should only be dismissed on its merits after thorough analysis, and not dismissed based simply on the technicalities (syntactical correctness) of its expression.
Oddly enough, I agree wholeheartedly with Carnap, while at the same time disagree with Postel, even though they're essentially saying the same thing. The difference really comes down to intent and context:
- Postel's Law as a design principle was intended to enable rapid adoption of TCP, placing more value on ease-of-use and stability than correctness.
- Thomson's Law is intended to reduce ambiguity and errors by hardening the specification and in turn actualized implementations, placing more value on interoperability, correctness and—arguably—maintainability than ease-of-use (ease-of-misuse?).
- Carnap's Warning is intended to encourage tolerance and exploration of ideas, to prove out their worth prior to dismissal, placing more value on content than syntactical correctness.
After considering all three approaches, I can see value for each with the proper intent and context. All three of them can guide software design in certain limited ways:
When initially working on a specification, during the brainstorming phase, we may come up with some loosely defined draft specification. At this point, we should heed Carnap's Warning and be open-minded that other people may have differing opinions on what we're designing, how it should be implemented, and so on. Feedback should be taken seriously, regardless of how well or poorly it is expressed. This loosely defined specification may serve us well enough to get a prototype off the ground.
As we define APIs for our prototype, we may follow Postel's Law. However, this only makes sense if we will have consumers of our prototype service and good logging. This may be a hard sell, as we will be letting our consumers know that this is strictly a prototype and may be killed off at any time. The reason logging is important here, is that our loosely defined interface that handles just about everything, is tracking consumer usage patterns for us. By doing this, we can identify how consumers are misusing our API, what they like and don't like, what consumers expect, and what we're missing. By analyzing consumer patterns at this stage, we've informed our specification and can further iterate on it to nail down the final form.
Finally, when our problem space and solution is sufficiently discovered, following Thomson's Law, we're ready to tighten up our specification and weed out ambiguity. We could follow suit on our prototype and include it as a reference implementation. Wouldn't that be nice? All implementations built from the spec must be maximally strict. This is just a natural extension of Carnap's intent, since after initially tolerating linguistic forms, Carnap would insist on transforming the idea into his logical syntax, which is a strict, unambiguous notation for describing an argument.
Postel's Law is not law and should only be considered if: you're prototyping, your system needs to interoperate with systems whose maintenance cycle is over, or you don't respect your consumers. These are systems where the cost to fix them isn't worth it, or when it would kill your user-base to harden your implementation. I should probably say something about Python 3, but I won't.
Now I will coin a new term: Postel's Cycle.
Create brand new specification to replace existing broken specification.
Follow Postel's Law in its implementation.
Live with nightmarish maintenance burden.
Repeat steps 1–4 indefinitely.
I don't see this cycle as an inevitability. Certainly protocols, APIs, software, hairstyles, etc. will continue to evolve and be replaced. However, I believe the lifetime and quality of software can be drastically improved through discipline. There are too many of us who invoke Postel's Law as our default design mode. To break the cycle, we need to stop our well-intentioned chanting of "Be liberal in what you accept and conservative in what you send."