# The Oracle of Brackets

Wednesday, March 25, 2009

It’s March and you know what that means: Time for a little madness at the big dance! OK, so I don’t know anything about basketball, but a minor point like that isn’t going to stop me from entering a bracket in the office pool.

Just so we’re all on the same page, the NCAA College Basketball Tournament is a single elimination tournament with 64 teams; hence, a total of 63 games are played in 6 rounds. A bracket consists of projected winners in each of these 63 teams, and a bracket can earn up to 32 points in each round, depending on how many games you call correctly. In particular, each first round game is worth 1 point, second round games are worth 2 points, third round games worth 4, and so on, with the final championship game worth a whopping 32 points.

Now, how to make the picks? In situations like this I feel it’s best to piggyback off the hard work of others. The time-honored (albeit somewhat boring) strategy for the almost-but-not-quite sports aficionado is to go with the favorite team, as determined by their tournament seeding. One problem with this strategy is that teams are ranked 1-16 in each of four different regions, leaving you with only a partial ordering. So you can use the seeding to make your picks for the first four rounds, but you’re on your own for the last two.

Instead of going with the tournament seeding, which is established by a select group of just ten people, fellow yahoo Cong Yu and I thought we’d try a more democratic approach. Lots of people (millions?) enter their brackets on Yahoo! Sports for a chance to win up to a million bucks. Looking at these user-submitted brackets, for each of the 63 tournament games we picked the team that was selected by the most people to win that game. You can check out our complete bracket here. FYI, we have UNC going all the way.

So how are we doing? After two rounds, we picked 38 of 48 games correctly, for a total of 52 points. This puts us in 6th place out of 62 entries in the office pool. I can live with that.

A Connection to Arrow’s Theorem. The “wisdom of crowds” strategy we used isn’t guaranteed to generate a consistent bracket. For example, it’s possible the crowd tells you Louisville and North Carolina are going to the finals, but that Connecticut is going to win the championship. Huh? Yeah, that would be weird, but it could happen. Suppose everyone thinks Louisville, UConn, Pitt and UNC will make it to the final four (then Louisville would play UConn, and Pitt would play UNC in the Semis) and that:

• A little more than one-quarter of people think Louisville will beat UConn, UNC will beat Pitt, and UNC will win the championship.
• A little more than one-quarter think Louisville will beat UConn, UNC will beat Pitt, and Louisville will win the championship.
• A little less than one-half think UConn will beat Louisville, Pitt will beat UNC, and UConn will win the championship.

Then more than half the people think Louisville and UNC will win their respective semi-final games, but UConn would still be the team favored to win the championship by the most people. Intuitively, this could happen if about half the people think Louisville and UNC are comparably matched teams, but that they are much better than the other two; and the remaining half think UConn is a powerhouse that will crush any opponent.

This kind of thing can happen in elections as well. Suppose there are three candidates,  A, B, and C, with A and B pretty similar to one another but quite different from C. For example, in a primary, A and B might be liberal Democrats and C a conservative Democrat. Now imagine that a little less than two-thirds of people prefer both of the liberal candidates over the conservative one, and that slightly more than the one-third of people remaining prefer the conservative candidate. Then it would seem that one of the liberals should be elected–this principle is called the independence of irrelevant alternatives. But, if they end up dividing their base, the conservative could sneak in the win. That is, if about half of the liberal voters support candidate A and the other half support candidate B, than A and B will each end up with a little less than one-third of the vote, putting them both behind the conservative.

In general, it is hard to aggregate rankings. And Arrow’s Theorem shows that no matter what aggregation scheme you use, you can’t eliminate all of these unsettling cases.

Illustration by Kelly Savage

Tags: , , , , ,

• Cong Yu

For the ‘favorite team’ approach, the semin-final and final can be determined by the AP ranking, which has Louisville beating UConn, UNC beating Pitt, and Louisville winning the championship.

• Denzel Li

Most of models have tried to approach this by teams regular season, and conference tournament records using models like linear regression or page rank. Some of them predict OK result, while others not.

I have never thought of the “Crowd strategy”, which is a fun read. The bracket looks good too. But it is always hard to predict Cinderella teams, like Arizona gets into sweet 16. Hopefully, no more Cinderella or Upset in Midwest region.(You know whose fan I am).
It will be interesting to see a model adapted to changes and most relevant infomation: new result, injuries, team chemistry, or integrated with information from multiple perspectives.

Game time, have Fun!
Denzel Li @ Louisville

• Pingback: The Perfect Bracket | Messy Matters