the puckish prognosticator of punxsutawney

Groundhog Day was not long ago, and although I wasn’t paying attention, a friend on Amherst’s online community was; and she leveled the quite reasonable criticism, “You know his track record is 34%? That’s LESS THAN RANDOM, people.” And the only way a blog post gets written about this is to ask the question, “But is it?”

Of course, a fair coin will give you 50% on any binary decision (here, “early spring” versus “long winter,” or ES vs. LW), so in a sense this is true. But, of course, to the extent that it is true, you’d be well served to guess the opposite of what Phil guesses — if he gets it right only 34% of the time, then you’ll get it right 76% of the time. And, indeed, Phil’s record of 39% accuracy (not 34%) over 115 years is nonrandom by binomial test (p=0.025).

Does that mean there’s something about Phil that somehow knows how the weather will be, even though he guesses wrong? (To be fair, we may be the ones guessing wrong — we may just be mistaken about how Phil’s attitude toward his shadow relates to the coming weather. It’s not like he says “Spring’s a-comin’!” in his squeaky little groundhoggy voice.) Not necessarily. Phil might be more like a weighted die than a fair coin — he might have a bias to guess one outcome or another. And in fact, he does, an enormous one: Out of 115 predictions, he’s only guessed an early spring 15 times, a mite’s hair over 13%. Of course, if early spring and long winter are equally probable, biased Phil should still have a 50% record; of the 50% early springs, he’ll get only 13% right, or 6.5% of the total, but of the 50% late winters, he’ll get 87% right, or 43.5% of the total, so his right answers add up to 50%. Only if nature has a bias opposite Phil’s — that is, a bias to actually emit an early spring — could a bias on Phil’s part cause him to do worse than 50% “by chance.”

So is that it? Is Phil just a biased coin? Well, consider this: If Phil is biased toward predicting long winters, but every time he predicts an early spring he gets it right, wouldn’t you view that as evidence of some kind of information in his signal? To concretize it, take a 100-year hypothetical —

LW + PHIL LW: 27
ES + PHIL LW: 60
ES + PHIL ES: 13

So nature is biased toward early spring (73%), and Phil is biased toward long winter (87%), and his total success rate is a sad 40% — but when he does predict early spring, he’s uncannily accurate. Remember, when you get Phil’s guess, you don’t know what’s going to happen. So if he guesses long winter, you’re probably better off doing the opposite, since he’s only right about that 31% of the time (27/27+67). But if he guesses early spring, he’s always right, so you’d better guess what he guesses. This more detailed look at the “data” affords a better picture of what’s going on; there are cases in which Phil is very accurate and cases in which he’s worse than chance. (And, in case you care, a Fisher’s exact test confirms that he’s distributing his guesses differently when it’s winter vs. when it’s spring, p=0.018.)

However, this analysis doesn’t really save Phil as a prognosticator, only as a subject of some meteorological or perhaps neuroscientific interest — because, according to the analysis, you should guess early spring regardless of what Phil guesses. The bias trumps all.

Just as an appendix, here’s another fake dataset showing a pattern of guesses that has similar characteristics to the one above, but gives clear evidence that he’s just guessing at random:

LW + PHIL LW: 24
ES + PHIL LW: 63
ES + PHIL ES: 10

Phil is still biased toward guessing long winter (87%), and nature is still biased toward early spring (73%), and Phil’s success rate is now 34%, but you can see that he distributes his guesses the same way regardless of the actual outcome — as quantified by a nonsignificant Fisher’s test (p=1). In this case, again, always guessing early spring earns you 73% successes, and betting against Phil only earns you 66%, so again the bias trumps all. (However, it wouldn’t take much tweaking to generate a dataset that was qualitatively similar, but in which betting against Phil was the better policy than betting on early spring.)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s