Madness is not a complete Mergesort—it doesn’t produce a full ordering of all 64 teams. To truly rank the teams, we’d need an extra set of games to determine second place, another for third, and so on—taking a linearithmic number of games in sum. But March Madness doesn’t do that. Instead, just like the lawn tennis tournament that Dodgson complained about, it uses a Single Elimination format where the eliminated teams are left unsorted. The advantage is that it runs in linear time: since every game eliminates exactly one team, in order to have one team left standing you need just n − 1 games—a linear number. The disadvantage is that, well, you never really figure out the standings aside from first place.
Ironically, in Single Elimination no tournament structure is actually necessary at all. Any 63 games will yield a single undefeated champion. For instance, you could simply have a single “king of the hill” team take on challengers one by one until it is dethroned, at which point whoever defeated it takes over its spot and continues. This format would have the drawback of needing 63 separate rounds, however, as games couldn’t happen in parallel; also, one team could potentially have to play as many as 63 games in a row, which might not be ideal from a fatigue standpoint.
Though born well over a century after Dodgson, perhaps no one carries forward his mathematical take on sporting into the twenty-first century as strongly as Michael Trick. We met Trick back in our discussion of optimal stopping, but in the decades since his hapless application of the 37% Rule to his love life he’s become not only a husband and a professor of operations research—he’s now also one of the principal schedulers for Major League Baseball and for NCAA conferences like the Big Ten and the ACC, using computer science to decide the year’s matchups.
As Trick points out, sports leagues aren’t concerned with determining the rankings as quickly and expeditiously as possible. Instead, sports calendars are explicitly designed to maintain tension throughout the season, something that has rarely been a concern of sorting theory.
For instance in Major League Baseball, you often have races to see who is going to win the division. Now, if we ignored the divisional setup, some of those races might get resolved fairly early in the season. But instead what we do is we make certain in the last five weeks, everybody plays everybody else within their division. The purpose of that is it doesn’t matter who’s in a divisional race: they’re going to have to play their next closest opponent at least six games in the final five weeks of the season. That allows for more interest in the schedule or interest in the season because in this case, uncertainty is delayed in its resolution.
What’s more, sports are not, of course, always designed strictly to minimize the number of games. Without remembering this, some aspects of sports scheduling would otherwise seem mysterious to a computer scientist. As Trick says of baseball’s regular season of 2,430 games, “We know that n log n is the right number of comparisons to do a full sort. That can get you everybody. Why do they do n2 in order to just get, in some sense, the top, if that’s all they care about?” In other words, why do a full O(n2) Round-Robin and then some, if we know we can do a full sort in linearithmic time, and can crown an undefeated Single Elimination champion in less than n games? Well, minimizing the number of games isn’t actually in the league’s interest. In computer science unnecessary comparisons are always bad, a waste of time and effort. But in sports that’s far from the case. In many respects, after all, the games themselves are the point.
Griping Rights: Noise and Robustness
Another, perhaps even more important way of training an algorithmic lens on sports is to ask not what confidence we should have in the silver medal, but what confidence we should have in the gold.
As Michael Trick explains, in some sports, “for instance baseball, a team is going to lose 30% of their games and a team is going to win 30% of their games practically no matter who they are.” This has disturbing implications for the Single Elimination format. If NCAA basketball games, say, are won by the stronger team 70% of the time, and winning the tournament involves prevailing in 6 straight games, then the best team has only a 0.70 to the 6th power—less than 12%—chance of winning the tournament! Put another way, the tournament would crown the league’s truly best team just once a decade.
It may be that in some sports, having even 70% confidence in a game’s outcome might be putting too much stock in the final score. UCSD physicist Tom Murphy applied numerical modeling techniques to soccer and concluded that soccer’s low scores make game outcomes much closer to random than most fans would prefer to imagine. “A 3:2 score gives the winning team only a 5-in-8 chance of actually being a better team … Personally, I don’t find this to be very impressive. Even a 6:1 blowout leaves a 7% chance that it was a statistical fluke.”
Computer scientists call this phenomenon noise. All of the sorting algorithms that we’ve considered thus far assume perfect, flawless, foolproof comparisons, ones that never mess up and mistakenly judge the lesser of two quantities to be the greater. Once you allow for a “noisy comparator,” some of computer science’s most hallowed algorithms go out the window—and some of its most maligned have their day of redemption.
Dave Ackley, professor of computer science at the University of New Mexico, works at the intersection of computer science and “artificial life”—he believes computers can stand to learn a few things from biology. For starters, organisms live in a world where few processes have anywhere near the level of reliability that computers depend on, so they are built from the ground up for what researchers call robustness. It’s time, argues Ackley, that we started recognizing the virtues of robustness in algorithms too.
Thus, while the authoritative programming tome Sorting and Searching boldly declares that “bubble sort has no apparent redeeming features,” the research of Ackley and his collaborators suggests that there may be a place for algorithms like Bubble Sort after all. Its very inefficiency—moving items only one position at a time—makes it fairly robust against noise, far more robust than faster algorithms like Mergesort, in which each comparison potentially moves an item a long way. Mergesort’s very efficiency makes it brittle. An early error in a Mergesort is like a fluke loss in the first round of a Single Elimination tournament, which can not only dash a favored team’s championship hopes but also permanently relegate them to the bottom half of the results.* In a Ladder tournament, on the other hand, as in a Bubble Sort, a fluke loss would only set a player back a single place in the standings.
But in fact it isn’t Bubble Sort that emerges as the single best algorithm in the face of a noisy comparator. The winner of that particular honor is an algorithm called Comparison Counting Sort. In this algorithm, each item is compared to all the others, generating a tally of how many items it is bigger than. This number can then be used directly as the item’s rank. Since it compares all pairs, Comparison Counting Sort is a quadratic-time algorithm, like Bubble Sort. Thus it’s not a popular choice in traditional computer science applications, but it’s exceptionally fault-tolerant.
This algorithm’s workings should sound familiar. Comparison Counting Sort operates exactly like a Round-Robin tournament. In other words, it strongly resembles a sports team’s regular season—playing every other team in the division and building up a win-loss record by which they are ranked.
That Comparison Counting Sort is the single most robust sorting algorithm known, quadratic or better, should offer something very specific to sports fans: if your team doesn’t make the playoffs, don’t whine. The Mergesort postseason is chancy, but the Comparison Counting regular season is not; championship rings aren’t robust, but divisional standings are literally as robust as it gets. Put differently, if your team is eliminated early in the postseason, it’s tough luck. But if your team fails to get to the postseason, it’s tough truth. You may get sports-bar sympathy from your fellow disappointed fans, but you won’t get any from a computer scientist.
Blood Sort: Pecking Orders and Dominance Hierarchies
In all the examples we’ve considered so far, the sorting process in every case has been imposed from the top down: a librarian shelving books, the NCAA telling teams whom to play and when. But what if head-to-head comparisons happened only voluntarily? What does sorting look like when it emerges