Hollerith Machine for the 1890 census. No one had ever seen anything like it. Wrote one awestruck observer, “The apparatus works as unerringly as the mills of the Gods, but beats them hollow as to speed.” Another, however, reasoned that the invention was of limited use: “As no one will ever use it but governments, the inventor will not likely get very rich.” This prediction, which Hollerith clipped and saved, would not prove entirely correct. Hollerith’s firm merged with several others in 1911 to become the Computing-Tabulating-Recording Company. A few years later it was renamed—to International Business Machines, or IBM.
Sorting continued to spur the development of the computer through the next century. The first code ever written for a “stored program” computer was a program for efficient sorting. In fact, it was the computer’s ability to outsort IBM’s dedicated card-sorting machines that convinced the US government their enormous financial investment in a general-purpose machine was justified. By the 1960s, one study estimated that more than a quarter of the computing resources of the world were being spent on sorting. And no wonder—sorting is essential to working with almost any kind of information. Whether it’s finding the largest or the smallest, the most common or the rarest, tallying, indexing, flagging duplicates, or just plain looking for the thing you want, they all generally begin under the hood with a sort.
But sorting is more pervasive, even, than this. After all, one of the main reasons things get sorted is to be shown in useful form to human eyes, which means that sorting is also key to the human experience of information. Sorted lists are so ubiquitous that—like the fish who asks, “What is water?”—we must consciously work to perceive them at all. And then we perceive them everywhere.
Our email inbox typically displays the top fifty messages of potentially thousands, sorted by time of receipt. When we look for restaurants on Yelp we’re shown the top dozen or so of hundreds, sorted by proximity or by rating. A blog shows a cropped list of articles, sorted by date. The Facebook news feed, Twitter stream, and Reddit homepage all present themselves as lists, sorted by some proprietary measure. We refer to things like Google and Bing as “search engines,” but that is something of a misnomer: they’re really sort engines. What makes Google so dominant as a means of accessing the world’s information is less that it finds our text within hundreds of millions of webpages—its 1990s competitors could generally do that part well enough—but that it sorts those webpages so well, and only shows us the most relevant ten.
The truncated top of an immense, sorted list is in many ways the universal user interface.
Computer science gives us a way to understand what’s going on behind the scenes in all of these cases, which in turn can offer us some insight for those times when we are the one stuck making order—with our bills, our papers, our books, our socks, probably more times each day than we realize. By quantifying the vice (and the virtue) of mess, it also shows us the cases where we actually shouldn’t make order at all.
What’s more, when we begin looking, we see that sorting isn’t just something we do with information. It’s something we do with people. Perhaps the place where the computer science of establishing rank is most unexpectedly useful is on the sporting field and in the boxing ring—which is why knowing a little about sorting might help explain how human beings are able to live together while only occasionally coming to blows. That is to say, sorting offers some surprising clues about the nature of society—that other, larger, and more important kind of order that we make.
The Agony of Sorting
“To lower costs per unit of output, people usually increase the size of their operations,” wrote J. C. Hosken in 1955, in the first scientific article published on sorting. This is the economy of scale familiar to any business student. But with sorting, size is a recipe for disaster: perversely, as a sort grows larger, “the unit cost of sorting, instead of falling, rises.” Sorting involves steep diseconomies of scale, violating our normal intuitions about the virtues of doing things in bulk. Cooking for two is typically no harder than cooking for one, and it’s certainly easier than cooking for one person twice. But sorting, say, a shelf of a hundred books will take you longer than sorting two bookshelves of fifty apiece: you have twice as many things to organize, and there are twice as many places each of them could go. The more you take on, the worse it gets.
This is the first and most fundamental insight of sorting theory. Scale hurts.
From this we might infer that minimizing our pain and suffering when it comes to sorting is all about minimizing the number of things we have to sort. It’s true: one of the best preventives against the computational difficulty of sock sorting is just doing your laundry more often. Doing laundry three times as frequently, say, could reduce your sorting overhead by a factor of nine. Indeed, if Hillis’s roommate stuck with his peculiar procedure but went thirteen days between washes instead of fourteen, that alone would save him twenty-eight pulls from the hamper. (And going just a single day longer between washes would cost him thirty pulls more.)
Even at such a modest, fortnightly scope we can see the scale of sorting beginning to grow untenable. Computers, though, must routinely sort millions of items in a single go. For that, as the line from Jaws puts it, we’re going to need a bigger boat—and a better algorithm.
But to answer the question of just how we ought to be sorting, and which methods come out on top, we need to figure out something else first: how we’re going to keep score.
Big-O: A Yardstick for the Worst Case
The Guinness Book of World Records attributes the record for sorting a deck of cards to the Czech magician Zdeněk Bradáč. On May 15, 2008, Bradáč sorted a 52-card deck in just 36.16 seconds.* How did he do it? What sorting technique delivered him the title? Though the answer would shed interesting light on sorting theory, Bradáč declined to comment.
While we have nothing but respect for Bradáč’s skill and dexterity, we are 100% certain of the following: we can personally break his record. In fact, we are 100% certain that we can attain an unbreakable record. All we need are about 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000 attempts at the title. This number, a bit over 80 unvigintillion, is 52 factorial, or “52!” in mathematical notation—the number of ways that a deck of 52 cards can possibly be ordered. By taking roughly that many attempts, sooner or later we are bound to start with a shuffled deck that is in fact completely sorted by chance. At that point we can proudly enter Christian-Griffiths into The Guinness Book alongside a not-too-shabby sort time of 0m00s.
To be fair, we’d almost certainly be trying until the heat death of the universe before we got our perfect record attempt. Nonetheless, this highlights the biggest fundamental difference between record keepers and computer scientists. The fine folks at Guinness care only about best-case performance (and beer). They’re hardly blameworthy, of course: all records in sports reflect the single best performance. Computer science, however, almost never cares about the best case. Instead, computer scientists might want to know the average sort time of someone like Bradáč: get him to sort all of the 80 unvigintillion deck orders, or a reasonably sized sample, and score him on his average speed across all attempts. (You can see why they don’t let computer scientists run these things.)
Moreover, a computer scientist would want to know the worst sort time. Worst-case analysis lets us make hard guarantees: that a critical process will finish in time, that deadlines won’t be blown. So for the rest of this chapter—and the rest of this book, actually—we will be discussing only algorithms’ worst-case performance, unless noted otherwise.
Computer science has developed a shorthand specifically for measuring algorithmic worst-case scenarios: it’s called “Big-O” notation. Big-O notation has a particular quirk, which is that it’s inexact by design. That is, rather than expressing an algorithm’s performance in minutes and seconds, Big-O notation provides a way to talk about the kind of relationship that holds between the size of the problem and the program’s running time. Because Big-O notation deliberately sheds fine details, what emerges is a schema for dividing problems into different broad classes.
Imagine you’re hosting a dinner party with n guests.