As a way of not thinking about the election, I wanted to write about something trivial. The most trivial writing I can think of is a long screed responding to an opinion I disagree about on a topic that ultimately does not matter. Hence, Robert “Uncle Bob” Martin’s 2017 essay (this blog remains timely as ever) “The Dark Path.” In this essay, Martin advances a very silly argument against strong type systems in programming languages.
If that introductory sentence caused a nose bleed or a sudden desire to be looking at literally any other web page, I’ll endeavor to explain. A computer, at the end of the day, is an electrical machine that can be rewired on the fly. Nowadays, computers are so cheap and so easy to configure that people stick computers in truly bizarre places, like on the front of toasters.
Back in the day, setting up a computer was serious business. In the earliest days, you had to rewire your computer by hand by carefully plugging things into sockets. Later advances meant you could rejigger your computer by feeding in punchcards with holes carefully cut out. Internally, these punchcards encoded a list of instructions for the computer to carry out. Even later, computers got keyboard inputs and printouts and eventually even screens so that you could type the instructions directly into the computer.
This was amazing. It also really sucked.
The format in which you had to type in the instructions was obscure, made for the machine’s convenience rather than yours.
Each instruction only did something precise and small, so you needed a heck of a lot of them to do anything useful.
Different computers would have different instructions. So if you wanted to do the same thing on two different computers, you’d have to take all of your work and modify it for the new computer.
Eventually, people got sick of this and told the computers to write their own damn instructions.1 In particular, they wrote a set of instructions that would take input in a certain easier-to-write format and output a corresponding set of instructions. This is called a compiler: the inputs it takes are programs and their format is a programming language.
Despite the names and despite some high schools in America using them to junk Spanish class, programming languages don’t have that much to do with the language you or (maybe) I speak. Instead, they’re artificial and symbolic languages. They follow a set of rules for how they’re constructed much stricter and more precise than the syntax and grammar of natural language. (Except for Perl.) And what gets written in a programming language does not have meaning in the same way natural language does: programs aren’t about anything. An analogy might be with machine schematics: a program is a picture of an abstract machine that the compiler can actually fabricate.
If you want a flavor of what a compiler does, here’s a link to a simple C program (C is an early and still used programming language) with corresponding instructions. Not only is using a programming language to tell computers what to do way less tedious than the alternative, but it unlocks a whole bunch of benefits. So in terms of our three problems above, all of them are immediately addressed.
Programming languages are meant to be human readable, helping you understand other people’s programs and your own.
Not only do programming languages save you typing so many instructions, but they can (and now universally do) help you reuse functionality across programs, making it much easier to build off past work and the work of others. Many of the programs that get written are “libraries”: programs that don’t do anything themselves, but which define functionality for other programs to use.
The work of porting programs across computers is much easier. You just have to port the compiler and then everything written in that language comes along for the ride.
Beyond these advances, programming languages offer new avenues for additional convenience.
The programming language can solve entire categories of problems for you. A major example here is memory management. Take as an example a music player app. To play a track from an album, the player needs to load the data for that rack. This will look like: allocating space in memory to put that data, copying it from disk, and then playing the audio.
Once you go to the next track, the player will do this again. Now, at this point it better release the memory for the previous track. If it doesn’t, the program will “leak” memory, building up more and more garbage memory it’s not using until the computer runs out of memory and it crashes.2
Most modern programming languages will come bundled with a garbage collector. This is in effect a program inside of your program that watches how your program is using memory and cleans up memory that isn’t being used for you. Makes it much harder to leak memory (although you can still do it, if you really want to).
Instead of doing things for you, the language can yell at you when you do things wrong. This might sound annoying, and it is. But in the end it’s much nicer when the compiler tells you you’re stupid than when it doesn’t and you have to wait for reality to do that. (Think about all the forms you’ve ever messed up by putting the wrong information in the wrong box; much better to have an accountant check over your taxes than to turn it in and get audited by the IRS.)
With this second point, we’re almost ready to get to the promised ragging on Uncle Bob. First, though, we need to get a bit more clear about this second advantage.
To the computer, what matters is just the instructions that the compiler spits out and when the program executes at the end of the day all it’s doing is manipulating 1s and 0s according to a set of rules. For the programmer, though, those 1s and 0s have specific meaning. Some of them encode the name of the product the customer is looking at. Yet others hold references to the items already in a customer’s cart.
As a program grows in size and complexity, the more data it will have to keep track of when running and the more different kinds of data. The programming language can help the programmer keep track of this complexity by providing facilities to name data in memory and to make type assertions about that data, stating that it follows a certain format. The compiler can check those type assertions for consistency and reject the program if it tries to do something that doesn’t make sense, like calling the routine check_out_cart
on data that is not a user’s cart.
The rules that the compiler follows in checking these type assertions constitute the programming language’s type system. With a type system, the programmer has to put in a constant effort throughout the program of proving to the compiler that they’re following its rules. The tradeoff is that the compiler can then verify that there are some mistakes that the program does not make. This is not at all the same as verifying that the program is mistake-free, but it certainly helps. And type checking is prone to introducing unnecessary friction: rejecting programs that would have worked fine, but happened to transgress a rule.
Not all type systems are created equal. More advanced type systems improve in one of two directions: either increasing the scope of the type system to check more behaviors of the program or increasing the expressiveness of the type system to reduce that friction, making it easier to verify programs. The tradeoff here is generally is that there is more upfront investment on the programmer’s side in learning and using the more advance systems.
With that, we’re close to actually getting to the point of this post. Ultimately, this is a debate over whether the more advanced type systems are worth the extra cost and annoyance it takes to use them. Yes, says I; no, Uncle Bob. To fairly represent the negative position here, we need to cover two more aspects of modern programming languages: exception handling and testing.
Recall one of the advantages of programming languages I had mentioned: the programming language can implement basic features for any program written in the language to use. Memory management is one of the big ones. Exception handling is another.
All over the place, programs try to do things that may fail. They try and read a file with some configuration on it, turns out that file was accidentally deleted. They send a request to another computer, turns out that computer was turned off. They try and charge a customer for the products they’re buying, turns out the customer is flat broke.
These steps that fail generally fall into larger jobs. And when a step fails, the job has to be canceled, the remaining steps skipped (you don’t want to ship items the customer can’t pay for), and recovery steps run. What those failures are, what causes them, what recovery is needed, these will vary on a case-by-case basis, but the basic pattern — try to do X, if it fails skip to recovery R, otherwise proceed to do Y — is ubiquitous across programming.
To make this kind of logic easy to write, modern languages generally provide exception handling: the ability to raise and to handle exceptions. To break that down: an exception is a bit of data that describes what failed. When an exception is raised, the program will make a note of that exception and start skipping everything that follows the point where that exception was raised until it finds a corresponding part of the program that handles that exception and run that part of the program instead.
This functionality is extremely useful. Having separate statements to raise and handle exceptions helps separate out deciding when a failure has occurred from how to recover from it. The programmer might realize that a customer might not have enough money to cover a transaction and put in a check for that and raise an exception when the transaction is attempted, without having to decide then and there when and how they’re going to break that delicate news to the customer. Further, storing information about the exception is helpful if not for recovering from failures then for reporting and triaging failures: generally the compiler will automatically add information about where in a program an exception is raised to the exception.
This kind of exception handling is generally orthogonal to a programming language’s type checking. Programming languages can and do have both. However, this kind of exception handling can make it significantly more feasible for a programming language to do without types. Instead of having a type checker prevent the programmer from compiling code that would do something invalid, you can let them write that bad code and have the language’s runtime detect if something invalid is being attempted and to raise a corresponding exception. The trade-off here is that you’re freeing yourself from the annoyance of a type checker but some of the mistakes that would be caught before the program ever runs are instead exceptions that get raised in the middle of runtime.
Programmers and users generally don’t love it when programs fail, even when they fail with lovely exception messages saying precisely what went wrong. To limit how frequently this happens, programmers use the classic strategy: try it out and see if it works. The simplest version of this strategy is manual testing: run the program on your computer, try some things out, see what happens. This has its place, but when programmers talk about testing software, they’re talking about automated tests: writing programs that run another program (or part of a program) and validate it works as expected.
Decent programmers recognize the necessity of writing tests. Talk to them about testing and you’ll find disagreements about how best to go about writing testing, how to decide when enough is enough before sending the program over to real users, but that there will be tests is a given. What is at issue in “The Dark Path” is over the sufficiency of tests to replace features in the type system.
And I promise, I promise, we’re about to actually start talking about this article. I just need one last bit of set-up. So far we have talked about programming languages in abstract terms. We need some coverage of actual programming languages and their usage. Let’s sketch a few very broad categories.
The default choices. Languages that are company’s first choice for a project, either because they are extremely well-supported (mature runtimes and battle-tested libraries, a steady stream of security updates; e.g. Java and C#) or are the native choice for a certain platform (Javascript for websites, Objective C for iOS).
The contenders. Languages that see real industry use, but more as a fringe choice. Often these are positioned as clean upgrades of default choices. The two languages Robert Martin discusses fall in this category. Kotlin is a stand-in replacement for Java: you can use the same great libraries and runtime, the language is similar to Java, just nicer. Similarly, Swift is a replacement for Objective C, letting you write programs for iOS at a somewhat smaller cost to your sanity.
Niche/toy/academic/hobby languages. Programming languages have an allure for programmers. They’re fun to write and think about, and given your job as a programmer is mostly wrestling with programming languages, you tend to develop a strong set of opinions on what would make a good programming language. Many programming languages get written that are nothing more than a compiler thrown together over a weekend and, if you’re lucky, a page of documentation.
Programming languages progress in accordance with the dialectics of frustrations. The default choices are not especially lovely. Since the crown moves but slowly, these top languages are on the older side, with decades of accumulated bad decisions. In part, their place at the top has a recursive, arbitrary quality: they get used because they’re well supported and they’re well supported because they get used.
Programming language nerds turn their private frustrations into designs and toy implementations of dream languages: abstract labyrinths of parentheses and type systems cribbed from the formal logic a German mathematician whispered into their ear on his deathbed. Occasionally, the collective frustration of a group of engineers leads to a contender being minted. Facebook will realize they are burning three trucksful of dollar bills fortnightly to PHP being kind of shitty and scratch together a replacement language. A fintech startup might skim a niche functional language from the top of the refuse pile and use it to attract MIT boffins into writing statistical models of grain future options 13-hours a day. Occasionally, the king will stir and notice the riffraff posing around the keep walls and instate a streaming interface to quiet them a moment.
The monarchy has its Loyalists, of course. Such is Robert Martin. As he writes in “The Churn.”
New languages aren’t better; they are just shiny. And the search for the golden fleece of a new language, or a new framework, or a new paradigm, or a new process has reached the point of being unprofessional.
The king’s a funny old man, but his rule is not so bad, is it? Surely you saw the streaming interface he just put up, really brings a touch of color to the oubliettes. And the rabble, with their applicative functors and algebraic steppers, surely they’d just dirty up the place. I am not unsympathetic here, more Whig than Jacobin, but we must consider the possibility that the crowd is crowing about something.
So let’s review Martin’s claims. He has “dabbled in two new languages. Swift and Kotlin.” Having so dabbled, he objects to their type system — “not the fact that Swift and Kotlin are statically typed” — but the “depth of that static typing.” By the grace of God, the king has all the strictness one will ever need. Anything more is perversely puritanical. Just so I don’t have to waffle even further about programming arcana, let’s just take Martin’s first example: Swift’s exception handling.3
If you have a programming language with a type system and exception handling, you’ll have to decide whether and how to represent exceptions in the type system. One of the easy mistakes to make the raise-handle exception handling I described is missing that a part of the program can raise an exception and so failing to handle it. This is especially easy to make when using other people’s code (or code you wrote more than a few hours ago) that may not properly document the exceptions it can raise. (And even then, it’s very easy to gloss over exceptions.)
Swift steps in here with a couple of rules in the type system: every function that can raise an exception must declare that it can raise an exception, and every line that calls a function that may raise an exception must handle that exception. If you don’t know how to handle that exception, you can reraise it and let something higher up handle the exception. All this raising and handling might get tedious, so Swift also has shorthand so that it can be done in a few key presses.
I admit to not having worked in Swift (I have luckily avoided the Apple ecosystem), so I cannot say how this works out in practice. But at a glance it seems like an okay idea. Not my favorite way to handle exceptions (Rust/Haskell union types, if you’re curious), but better than nothing. On the face of it, I agree with Robert Martin’s imagined interlocutor: this is a good thing (or at least a better thing than Martin’s preferred alternative: namely, nothing).
Now, perhaps you think this is a good thing. Perhaps you think that there have been a lot of bugs in systems that have resulted from un-corralled exceptions. Perhaps you think that exceptions that aren’t escorted, step by step, up the calling stack are risky and error prone. And, of course, you would be right about that. Undeclared and unmanaged exceptions are very risky.
Martin’s has a couple of complaints here: (a) all of this catching and raising you have to do to satisfy the compiler is annoying to type, even with the shorthand, and (b) it makes changing a function from one that cannot raise an exception to one that can a breaking change: you have to go and update all of the callers to handle the new exception. The alternative Martin proposes is to get good at writing tests and simply add tests around exceptions being raised and handled properly.
Two quick responses to the complaints and then a long ramble about the alternative. In terms of annoyingness to type: invest in a good mechanical keyboard and learn to type faster. And in the case of adding exceptions being a breaking change, this seems all upside to me: adding an exception is a change very likely to break users of a function; it’s nice that the compiler forces you to react to such changes.
Now for the ramble. My father has always been slapdash handyman. Axing branches from atop a ladder currently resting on said branch. Chucking cheap plywood through a hand-me-down bandsaw barehanded and ungoggled. We were lucky if he wore so much as a shirt when facing oblivion in the form of a $25 Home Depot box of power tools. After a few scrapes and one hospitalization, he’s calmed down a skosh, become willing to offload the life-threatening parts to the professionals.
Was the problem with his work the tools or the technique? One would have to say both. The technique was horrific, but something, anything, by way of protective equipment would have taken the harrowing edge off the work. So when Uncle Bob asks
The question is: Whose job is it to manage that risk? Is it the language’s job? Or is it the programmer’s job?
I can only reply: Yes. Both.
It is a very strange dilemma Robert Martin sets up, a slippery bit of rhetoric preparing the ground for his personal hobbyhorse: You aren’t testing enough. If you were, bucko, Tim Apple wouldn’t have to descend from his carbon-fiber penthouse and wipe the drool from your bib with a lace napkin made of checked exceptions. And though they be covered in a pink felt, these handcuffs are no joking matter.
Why are these languages adopting all these features? Because programmers are not testing their code. And because programmers are not testing their code, we now have languages that force us to put the word
open
in front of every class we want to derive from. We now have languages that force us to adorn every function, all the way up the calling tree, withtry!
. We now have languages that are so constraining, and so over-specified, that you have to design the whole system up front before you can code any of it.
Now, Bob, bubbale, I hate the children more than anyone. But I’m not sure there’s a volume of pulverized Adderall Xanther can snort through their rainbow LED nose ring to power enough test coverage to satisfy you. The problem seems less in our discipline than the world: every year there’s more of it.
Like England, software rises a foot a year on an accumulation of cadavers. Every morning, Mark Zuckerberg personally lumbers a sequoia and tosses it into a wood furnace that powers the compilation of the ten thousand lines of additional user tracking the poltergeists at Meta added overnight. And software expands not just upwards and outwards but inwards. Even now, Dave from one department of user analytics is preparing for a meeting with Tina from another department of user analytics (same org, different cost centers; it gets tense) to orchestrate an integration of the ML system that processes user interactions with the ML system that performs genetics and heritage inference so that they can decide when Pat clicks on the slightly risqué advertisement of children’s dresses, whether that was a misclick born of early-onset cerebral palsy (and so they should sell his name, email and SSN to Ipsen Biopharmaceuticals) or the signal of the kind of prurient interest that To Catch a Predator and the FBI would be interested in. We can forgive the youngins for reaching for programming languages created in their own lifetimes.
What is perhaps strangest about Robert Martin’s prescription is that it is so much worse than the disease. The primary consideration Martin avers in favor of the discipline of tests over types is the punishment the type checker inflicts when you have to change your mind. Yet how much harsher the hand of a battery of tests!
Add a checked exception in Swift and in one instant VSCode explodes in a bigarrure of red lines and compilation failures. Spend ten minutes tedium sprinkling try
s across the codebase and it’s back to business. The compiler is an obnoxious but fair enforcer, satisfied when you return the forms in triplicate. Without such compiler enforcement, the immediate friction of the change is gone, but the work is much greater.
A danger has been silently introduced: every usage that the compiler flagged is one we still have to look at and decide how to handle this new error. By introducing a new exception to one function, we’ve quietly changed the logic not only of that function, but every function that calls it and every function that calls those and so on. If we are to follow Uncle Bob’s test discipline, that’s an overnight shift of tracking each and every one of those down and supplying a corresponding test.4
Isn’t all this bloviating excessive? Yes, it’s explicitly a lark. But there is a deeper point I want to make here that has nothing to do with the relative roles of types and testing in software quality. Leave that for the birds.
Rather, the take away is this: this is a bad field for gurus. As in most domains, expertise is mostly a matter of practical wisdom rather than factual knowledge, i.e. an inarticulable intuition. Advice, even good advice, is of very limited value: if you were wise enough to correctly apply a piece of advice, you’d be wise enough not to need it. At most, advice can function as a directional guide. If smart person says to do X, then maybe try doing more X than you were previously doing. You’ll still fuck it up, but you might develop a better sense of how much X is the right amount.
This is a general problem with advice. In software, good advice is I think harder to come by than most other fields. This is because of (a) the relative youth of the field, (b) the scope and speed of change in the field. It’s entirely possible to find two people in different subfields of programming for whom “good software engineering” looks completely different. It is very likely that for a person “good software engineering” looks quite different at the end of their career than it did from the start. Not to say there are no facts on the ground, but the facts are context dependence. Broad advice is more a product of arrogance than wisdom.
Well, actually, the idea of getting a computer to output computer instructions predates computers actually existing.
In ye olden times, a memory leak like this would crash the whole computer. Now we have operating systems on the computer in charge of things like memory that will hopefully step in and kill the offending program.
Strangely, Martin lists Java as a preferable alternative to Swift and Kotlin; but Java’s exception handling is even more restrictive than either of theirs.
To grant Robert Martin a point here, which he insists upon in his follow up, the type checking does not necessarily eliminate the need for testing. At the least though, it makes obvious to the programmer (and to a coverage checker) that the logic of the program as a whole has changed in such a way as to need additional tests. This check otherwise is done entirely by hand and is very easy to miss spots. Martin is certainly wrong in his overall claim that types never replace tests. If you switch from nullable to non-nullable types, you no longer have to write “what happens if this value is null” tests; such tests won’t even compile.
Excellent! I was reminded of a project I worked on at IBM in the early 90s. Some boffin (an IBM Fellow or such) had devised a white-box testing approach (or was it black-box?). Wherein one could mysteriously mathematically "prove" the correctness of your code without running a single test. I evaluated the cost of this and concluded that it would take approximately 100X longer and refused to do it (much to the chagrin of the IBM manager)....... As you say, gurus should keep well away from software!
I chuckled all throughout. This should be a stage play. Thanks for writing it!