Sunday, June 21, 2009

Chip distributions

For a long time I have wondered what it would look like to graph the chip stacks of a poker tournament at some arbitrary point. (Yes, my brain works in strange ways. Seriously, you don't want to know.) As long as it's not too near the beginning or end, you would clearly get some sort of bell curve, with a big bulge of players in the middle, and smaller numbers of big and short stacks out on the tails.

But after observing and participating in lots of tournaments, I became pretty sure that it would not form a "normal" distribution, with symmetric tails. Instead, it seemed obvious to me that the short-stack tail on the left would be a lot shorter and steeper, the big-stack tail on the right longer and flatter.


If you've played tournaments, you might be able to intuit why this is so. Your stack can only fall so low before you're forced to make a move and pick a hand on which to gamble all your chips, at which point you either double up or you're out. But there is no cap on how big your stack can be (except at the theoretical limit of having all the chips when you win the tournament). In short, there's a lot more potential variability in how far above average a stack can be than in how far below average it can be.

Put another way, the "average" chip stack, defined as the mathematical mean, would be above the median (the median being the size of stack that is smack in the middle if you lined up every player's chip count in order--the guy with the same number of opponents with more chips and with fewer chips), instead of the mean and median being at the same point, as happens with a normal distribution.

The other day, while doing one of my World Series of Poker nightly wrapup reports for PokerNews, it occurred to me that the chip counts reported at the beginning of Day 2 of most events might serve as an adequate database for testing my theory.

I picked three events: a smaller (in terms of number of players) no-limit hold'em event, one of the big-field NLHE tournaments, and a limit event:

A) Event #2, the $40,000 NLHE special anniversary event, attendance at which was obviously restricted by the huge buy-in. 201 people entered, and 87 were left to play the second day. Standings are taken from here.

B) Event #34, a $1500 NLHE "donkament," with 2095 players starting and 240 left at the start of Day 2. Standings reported here.

C) As a sample of a limit event (to see if it might be any different; I suspected that the same phenomenon would be present, though relatively more compressed in range), I went with Event #3, the $1500 Omaha/8, which started with 918 and was down to 197 for Day 2. Standings can be found here.

For each one, I pasted the list of chip stacks into Excel, and sorted them into order. I then used Excel's histogram tool. I defined intervals, or "bins," and had the software tabulate the number of players with chip stacks in that bin, then popped out a graph of the results.

Here's Event #2:



So the chip stack sizes are along the x-axis, and the number of players with stacks near that size in the y-axis. Here, the bins are defined in 20,000-chip increments. For example, the second blip from the right indicates that there was one player with a stack in the "740,000" bin, which includes anything between 720,000 and 740,000.

The number of entrants is small, so the effect is kind of messy and hard to see. Still, I think you can convince yourself that the right-hand "tail" of the curve is more stretched out than that on the left.

Mathematically, the mean--what would be shown on the tournament information screen as "average chip stack"-- was about 272,000 at this point, which I have marked above with a red line. The median, however, was 229,000, which I have marked with a black line. So if you had an "average" chip stack, you would actually be in 36th out of 87, ahead of most of your competition.

The effect is a lot easier to see if we instead look at the big-field event, #34:



I defined bins in 1000-chip increments to get a decent spread. The mean (red) was 39,400, the median (black) 33,500. If you had an "average" chip stack, you would be in 95th out of 240 left.

Finally, here's Event #3:



As I predicted to myself, the effect is still there in a limit event, but somewhat compressed left-to-right, probably because it's significantly harder to accumulate monster stacks. The mean (20,900, red) and median (19,300, black) are relatively closer together. The average chip stack would be in 95th place out of 240.

I'm not prepared to expound fully on the practical implications of this observation, but the main one is this: If you have an "average" chip stack, you're ahead of most of the field.

I think it would be really interesting to have one of the online sites use their continuously-acquired data in a large-field event to generate a movie of how the distribution of chip stacks changes over the course of the tournament. Obviously at the start it would be a single, very tall vertical line, with all equal stacks, and at the end it would be a single point, with one person owning all the chips. But in between, you'd see the kind of curve suggested above develop and morph, with the amplitude of the peak of the curve decreasing as players were knocked out. I can sort of envision how it would have to go, but I'd love to actually see it play out dynamically.

1 comment:

THETA Poker said...

Very cool analysis! I actually went through the process mentally a few months ago when I improved the estimated user rank in THETA Poker (an iPhone/iPod touch game: see http://www.thetapoker.com). I came to the same basic conclusion that you did, but you used real data, so I really appreciate your work!

Graphing this over time would indeed be very interesting. Obviously, everyone starts out tied for first place, but after just one hand life gets very interesting. If you weren't in the blinds and folded your first hand, where would you rank? If it was a quick hand, probably still close to #1. But if it was a slow hand, it should be somewhere around the 90th percentile (i.e., only the 10% of the players who won the first hand are now ahead of you). I didn't pursue the idea much further, as my users may never even see this information because they have to click to view it.

The related data that I faked was how quickly players bust out. The charts would necessarily show this as well (I take M into account but not bubbles for the same reason as above).

Thanks for your excellent blog! I enjoy reading you and Cardgrrl (who pointed me your way) every day!