While I was out playing poker this evening, B.J. Nemeth mentioned on Twitter that only 13 out of 242 women cashed in this year's Main Event--fewer than would be predicted based on their numbers. That represented only 5.4% of the female entrants, whereas 10.1% (693/6865) of the whole field and 10.3% (680/6623) of the men made the money.

His characterization of women "underperforming" in terms of results (he was not, he clarified, trying to imply anything about skill) touched off some discussion about whether the numbers were statistically significant, given the small sample size.

It wasn't something I wanted to try testing while in the midst of shaking down tourists at Bally's. But now that I'm home, I can give it a shot.

Using the simplest statistical test available for a dichotomous variable--the binomial probability--we can easily determine whether a given rate of cashing is within the expected range of random statistical variance, if we take the entire field's rate of cashing as the expected baseline. Put another way, our null hypothesis is that the women's lower rate of cashing compared to the men's is due only to randomness. Does the math cause us to accept or reject that hypothesis?

Reject.

The expected number of women cashing would be 24. The probability of getting a result at least as skewed as was actually seen this year (i.e., 13 or fewer women cashing out of 242 who entered) is only 0.0063. In other words, if we knew that the women played exactly as well as the men so that any difference in the two groups' rates of cashing was due to chance alone, and we played this same tournament a thousand times, we would expect to see a result deviating this far from the predicted only 6 times. The other 994 times we would expect to see more women than that make the money. That is a pretty striking result.

If you want to see the numbers calculated for yourself, follow the link above to the probability calculator and enter n=242, k=13, p=0.101.*

The conclusion is that it is overwhelmingly likely that there is some force at work other than randomness to explain women's lower rate of cashing. The most obvious explanation sure seems to me to also be the most likely: On average, the women played less well than the men, to a measurable degree, if we take survival to the money as a surrogate measure of quality of play.

Of course one can play perfectly and still get knocked out early, or play terribly and win the whole thing. But we lack any ability to measure poker skill or quality of play directly, and we're stuck measuring what can be measured. We have to hope and assume that there is sufficient correlation between quality of play and survival success that the latter can shed at least some meaningful light on the former.

Please note that saying that "the women played less well than the men" is not the same thing as saying that "women are worse poker players than men." It is an observation limited to this particular group of players in this tournament. Still, I find it difficult to conjure up a plausible explanation for the large difference in cash rates other than a significant difference between the two sexes in average quality of play.

B.J. did not claim to know the outcome of any statistical test, and even seemed to deny caring much what the statistics might say; his position was that the difference in cashing rates was sufficient large to be interesting and noteworthy regardless of how a statistician would analyze it. I'm adding to that subjective opinion the objective conclusion that the difference in cashing rates is large enough to be highly unlikely to be the result of random variance alone. Make of that what you will, but I'm forced to agree with B.J. that it is, at the very least, interesting and noteworthy.

*I suppose it would be technically better to use the men's rate of cashing as the baseline, rather than the entire field, but I was trying to be generous. If we use the men's rate instead, we enter p=0.103 instead of 0.101. That yields a slightly lower probability that the difference is due to chance alone: about 5/1000 instead of 6/1000.

## 15 comments:

Thanks for an impressive first start at evaluating the statistical significance of whether underperformance is occurring. The next step is for someone with far greater statistical analysis ability than me to determine whether the relatively small sample size of women in this field skews the results. I think something like 2-3% of the field were women. At that small level, variance can have a magnified effect that is reduced or eliminated in larger relative samples (say 20-30%). As an example, if a herd of 1000 cattle has 20 red cows and the rest black, having the 20 red cows die off by chance alone is greater probability than if the herd were 200 red cows and the rest black. The larger the sample size for the group, the more confidence we have that variance is not the factor at play in the observed results. This is why drugs that look promising in small-scale trials may disappoint in large-scale trials.

(Actuaries and economists could do this analysis in their sleep with one arm tied behind their back, but my stats analysis skills are beyond rusty. Use it or lose it!).

The binomial calculation is already answering that question, pretty resoundingly, and the answer is that the women's field is plenty large enough to have confidence that the difference in rate of cashing is not due to chance.

It is certainly true that the smaller the sample the more likely it is that a difference can be due to chance. But the field of women entrants would have to get down to about 100 before the probability of such a relatively low cashing rate gets into the zone in which one traditionally concludes that we can't tell if the results are due to chance or some other factor (5% or above).

For example, if women had the same rate of cashing but there had been 10 times as many of them playing (2420, with 130 cashes among them), we'd calculate a probability of such a rate occurring by chance alone of <0.000001, and the results would be basically unarguable.

Conversely, if it were 1/10 as many (24, with 1 cashing), the probability of that happening by chance alone would be around 29%, and we'd have to conclude that the sample size is too small to reject the null hypothesis (i.e., such a difference could easily be just a fluke).

When the probability works out to 0.6%, it's not IMPOSSIBLE that the degree of difference in cashing rates observed was due to chance; it's just very, very unlikely. The sample size is entirely adequate for safely concluding that there is something far more than variance at work here.

The 6:1000 is not, as you say impossible to relate to random chance, and could occur. I'd like to see some stats from previous years analyzed this way.

I think your hypothesis that the only other reason is worse play is flawed. There could be numerous other reasons for the lack of cashes. The best one I can think of is the way the men in the field view the women players. They may call down light more often than against their counterparts because they feel that the women are easier pickings. This could help the women or hurt them, but it would undoubtedly introduce more variance.

Wow, lots of big words that make my head hurt.

Are the resources (and your time) available to gather the same information from the last couple main events to see if this is unique to this year only?

What if you had 5,000 players from the U.S. enter. It would be fair to guess that there are a whole range of (good, medium and bad) players in that group. Now what if Outer Tazmania sent five players. Wouldn't it be fair to guess that only their better players would bother to come play?

I think it's the same with the women entrants. There are plenty of bad men players who will pay the $10,000 entry fee, but fewer women (proportionally) who would do so. In other words, (I'm hypothesizing) the women tend to send their better players.

What does this mean? I'm wondering if the state of women's poker might be more grim than what your data suggests.

Your thoughts?

In junior motor-racing it is very difficult for the minority female drivers because the boys will often crash into them instead of letting a girl beat them, so subsequently the females underperform whilst the lads take turns ruining their own races. Females in a poker tournament may well find men play them differently. Is it not the case at a poker tournament that if a player is perceived as weak (even if for the wrong reasons) that players start picking on them? Even non-sexist players will see an advantage in picking on the player being pushed around by the table. In a shorter tournament, it might be easier for a female to take advantage of over-aggressive play against her with a selective tight/aggressive strategy, but in a very long tournament will either have to back down too often and bleed chips or have to make do-or-die decisions too often to keep getting away with them in the long run.

Given: I'm a girl; math is hard. So help me to understand this, please.

Let's say that there were 240 entrants whose last names began with "V." Only a dozen of them make the money.

Or 230 entrants from Portugal, of whom eleven survive the bubble.

Or 250 redheads start the Main Event and only thirteen cash.

Am I right that each of these results is approximately as statistically disproportionate as the one pertaining to women?

My point is that there are thousands of ways to parse out the Main Even field into minorities. Of those thousands of ways of doing it, some of them are going to be statistically anomalous, based on chance alone.

We see the world through a lens of male & female for no doubt overdetermined evolutionary and cultural reasons. Whether it has anything to do with this specific statistical result is another question.

I don't suppose anyone cares to determine whether any particular ethnic or racial group "underperformed?"

*crickets*

Yeah. I thought so.

[By the way, it wouldn't surprise me if it were actually the case currently that, on average, women poker players aren't as good as the men. Our society does not value or reward in women the qualities that make for a good poker player, as it does in men. Poker is often still a hostile work environment for women who are learning to play. There are still plenty of barriers that keep women from achieving their full measure of potential in the game. I would be surprised if that didn't change in the long run. But we all know that the long run can be very long indeed.]

I am extremely pleased to see this article. Though I agree w/the previous commen that I'd love to see this done on previous years as well, I appreciate that math doesn't lie &, that in this tounament in particular, the women played less well statistically. Because a result that proves different than the norm can occur at least 6 times out of 1000, I'd love to see a follow up article taking a sample of 10 or so previous years w/ analysis on their result. Such result could very well be further proof that this is more than a game of chance.

According to Nolan Dalla, this is the first year that the WSOP compiled data regarding the number of women entrants.

Based on anecdotal accounts (relying in large part on BJ Nemeth's and Nolan Dalla's observations), women generally represent about 3% to 3.5% of the field in the large buy-in NL events (WPTs, WSOP Main Event, etc.)

The only possible flaw in the analysis is the requirement that a binomial distribution must be made up of "trials" where the probability of success (long term) of each is constant. Clearly this is not the case in this situation and I would think could very easily vary by orders of magnitude.

That being said, my experience tells me that n=242 is more than sufficiently large to allow rejection of the null hypothesis where p-hat (0.054) is only slightly larger than half of p (0.101).

NT:

1. I think you know me well enough to believe me when I say that I would every bit as willing to run the numbers on a racial minority as on women, if they were available. Furthermore, if the analysis fell out the same way, I'd draw the same conclusion. I've never been too afraid of being politically incorrect, if that's where the facts lead.

2. Multiple analyses can be a real problem in statistical analysis. For example, many years ago I saw a study about the effects of religious groups praying for ICU patients. The problem was that the researchers used a whole raft of outcome measurements (death, days in ICU, number of infections, amount of pain medication used, need for second operation, etc; I'm just making up these examples, because I don't remember the details, but they're illustrative)--20 or more, as I recall. The more outcome variables you look at after a single intervention, the more likely that at least one of them will look significant by chance alone. There are mathematical adjustments one is supposed to make to compensate for this effect, and this study didn't do that, and thus inappropriately concluded that there was some beneficial effect of prayer.

I don't think it's a problem here, for a few reasons. First, I'm only doing one analysis. The fact that others could theoretically be done doesn't change that. Second, the result is so far from being just borderline statistical significance that it gives us less reason to worry that it's anomalous. Third, there is, as you point out, plausible explanation that fits the observed result. If we chopped the data a thousand different ways and found that those with last names starting with V were underrepresented in cashes to the degree that we see here for women, it would be much harder to concoct a plausible explanatory framework.

I in no way intended to point the finger at you with the racial/ethnic *crickets* remark. I do indeed know you well enough to know that you would not hesitate to do such an analysis.

It was just pointing out that you hear widespread discussion of women's poker ability in generic terms in ways you NEVER would hear people talk about any other kind of sub-group, and I think that discrepancy is sociologically telling (and in itself is part of the problem that women poker-players face).

NT is right - everyone who wasn't a male underperformed at the WSOP.

If you're seen playing poker at 3 am, nobody raises an eyebrow. If a good looking woman does the same, she'll be hit upon - even not so good looking.

I once sat down at Binion's and won a huge pot on my second hand. The nine men - mostly regulars I assume - started in with how the woman got lucky and has to be taught a lesson. I lasted one more orbit - very unpleasant vibes - and decided the best thing would be to leave with their money. Doesn't happen a lot, but it happens.

Online I played as a man - easier, less ganging up, fewer misogynistic sneers.

Now if you got a good drag queen to do a convincing make up job and played poker as a woman for one day...

Although sometimes it can be an advantage, especially against men who think women are stupid.

I wonder how much of this is related to players buying their wives/girlfriends seats. After noting that 2 of the final 3 women standing are the GFs of pro (one of whom is a pro, the other not). Without runnign the numbers, I wouldn't be surprised given the small % of women in the WSOP if a couple dozen "dead money" entries would be enough to skew the women's cash percentage down.

I think that one possible contributing factor is that for a significant period of time in the tournament there is more than a 20:1 chance that the chips from a bustout went to a male (based strictly on the number of entrants, not skill). That seems to me like it would have a statistically significant influence on what gender has more cashes.

-John B

Post a Comment