of problems there is: called “Big-O of one,” written O(1), it is also known as “constant time.” Notably, Big-O notation doesn’t care a whit how long the cleaning actually takes—just that it’s always the same, totally invariant of the guest list. You’ve got the same work to do if you have one guest as if you have ten, a hundred, or any other n.
Now, the time required to pass the roast around the table will be “Big-O of n,” written O(n), also known as “linear time”—with twice the guests, you’ll wait twice as long for the dish to come around. And again, Big-O notation couldn’t care less about the number of courses that get served, or whether they go around for second helpings. In each case, the time still depends linearly on the guest list size—if you drew a graph of the number of guests vs. the time taken, it would be a straight line. What’s more, the existence of any linear-time factors will, in Big-O notation, swamp all constant-time factors. That is to say, passing the roast once around the table, or remodeling your dining room for three months and then passing the roast once around the table, are both, to a computer scientist, effectively equivalent. If that seems crazy, remember that computers deal with values of n that could easily be in the thousands, millions, or billions. In other words, computer scientists are thinking about very, very big parties. With a guest list in the millions, passing the roast once around would indeed make the home remodel seem dwarfed to the point of insignificance.
What if, as the guests arrived, each one hugged the others in greeting? Your first guest hugs you; your second guest has two hugs to give; your third guest, three. How many hugs will there be in total? This turns out to be “Big-O of n-squared,” written O(n2) and also known as “quadratic time.” Here again, we only care about the basic contours of the relationship between n and time. There’s no O(2n2) for two hugs apiece, or O(n2 + n) for hugs plus passing the food around, or O(n2 + 1) for hugs plus home cleaning. It’s all quadratic time, so O(n2) covers everything.
Constant time, written O(1); linear time, written O(n); and quadratic time, written O(n2).
It gets worse from there. There’s “exponential time,” O(2n), where each additional guest doubles your work. Even worse is “factorial time,” O(n!), a class of problems so truly hellish that computer scientists only talk about it when they’re joking—as we were in imagining shuffling a deck until it’s sorted—or when they really, really wish they were.
The Squares: Bubble Sort and Insertion Sort
When then senator Obama visited Google in 2007, CEO Eric Schmidt jokingly began the Q&A like a job interview, asking him, “What’s the best way to sort a million thirty-two-bit integers?” Without missing a beat, Obama cracked a wry smile and replied, “I think the Bubble Sort would be the wrong way to go.” The crowd of Google engineers erupted in cheers. “He had me at Bubble Sort,” one later recalled.
Obama was right to eschew Bubble Sort, an algorithm which has become something of a punching bag for computer science students: it’s simple, it’s intuitive, and it’s extremely inefficient.
Imagine you want to alphabetize your unsorted collection of books. A natural approach would be just to scan across the shelf looking for out-of-order pairs—Wallace followed by Pynchon, for instance—and flipping them around. Put Pynchon ahead of Wallace, then continue your scan, looping around to the beginning of the shelf each time you reach the end. When you make a complete pass without finding any out-of-order pairs on the entire shelf, then you know the job is done.
This is Bubble Sort, and it lands us in quadratic time. There are n books out of order, and each scan through the shelf can move each one at most one position. (We spot a tiny problem, make a tiny fix.) So in the worst case, where the shelf is perfectly backward, at least one book will need to be moved n positions. Thus a maximum of n passes through n books, which gives us O(n2) in the worst case.* It’s not terrible—for one thing, it’s worlds better than our O(n!) shuffle-till-it’s-sorted idea from earlier (in case you needed computer science to confirm that). But all the same, that squared term can get daunting quickly. For instance, it means that sorting five shelves of books will take not five times as long as sorting a single shelf, but twenty-five times as long.
You might take a different tack—pulling all the books off the shelf and putting them back in place one by one. You’d put the first book in the middle of the shelf, then take the second and compare it to the first, inserting it either to the right or to the left. Picking up the third book, you’d run through the books on the shelf from left to right until you found the right spot to tuck it in. Repeating this process, gradually all of the books would end up sorted on the shelf and you’d be done.
Computer scientists call this, appropriately enough, Insertion Sort. The good news is that it’s arguably even more intuitive than Bubble Sort and doesn’t have quite the bad reputation. The bad news is that it’s not actually that much faster. You still have to do one insertion for each book. And each insertion still involves moving past about half the books on the shelf, on average, to find the correct place. Although in practice Insertion Sort does run a bit faster than Bubble Sort, again we land squarely, if you will, in quadratic time. Sorting anything more than a single bookshelf is still an unwieldy prospect.
Breaking the Quadratic Barrier: Divide and Conquer
At this point, having seen two entirely sensible approaches fall into unsustainable quadratic time, it’s natural to wonder whether faster sorting is even possible.
The question sounds like it’s about productivity. But talk to a computer scientist and it turns out to be closer to metaphysics—akin to thinking about the speed of light, time travel, superconductors, or thermodynamic entropy. What are the universe’s fundamental rules and limits? What is possible? What is allowed? In this way computer scientists are glimpsing God’s blueprints every bit as much as the particle physicists and cosmologists. What is the minimum effort requred to make order?
Could we find a constant-time sort, O(1), one that (like cleaning the house before the bevy of guests arrive) can sort a list of any size in the same amount of time? Well, even just confirming that a shelf of n books is sorted cannot be done in constant time, since it requires checking all n of them. So actually sorting the books in constant time seems out of the question.
What about a linear-time sort, O(n), as efficient as passing a dish around a table, where doubling the number of items to sort merely doubles the work? Thinking about the examples above, it’s tough to imagine how that might work either. The n2 in each case comes from the fact that you need to move n books, and the work required in each move scales with n as well. How would we get from n moves of size n down to just n by itself? In Bubble Sort, our O(n2) running time came from handling each of the n books and moving them as many as n places each. In Insertion Sort, quadratic running time came from handling each of the n books and comparing them to as many as n others before inserting them. A linear-time sort means handling each book for constant time regardless of how many others it needs to find its place among. Doesn’t seem likely.
So we know that we can do at least as well as quadratic time, but probably not as well as linear time. Perhaps our limit lies somewhere between linear time and quadratic time. Are there any algorithms between linear and quadratic, between n and n × n?
There are—and they were hiding in plain sight.
As we mentioned earlier, information processing began in the US censuses of the nineteenth century, with the development, by Herman Hollerith and later by IBM, of physical punch-card sorting devices. In 1936, IBM began producing a line of machines called “collators” that could merge two separately ordered stacks of cards into one. As long as the two stacks were themselves sorted, the procedure of merging them into a single sorted stack was incredibly straightforward and took linear time: simply compare the two top cards to each other, move the smaller of them to the new stack you’re creating, and repeat until finished.
The program that John von Neumann wrote in 1945 to demonstrate the power of the stored-program computer took the idea of collating to its beautiful