22.10.05

The Jeter Hypothesis

Derek Jeter, Yankee icon extraordinaire, once (I have not been able to find the original source) made a comment about how the wild card team is more likely to make it through the post-season since they're "hot" going into the playoffs. I think he made that comment after the Yankees were eliminated by some wild card team. Sour grapes or perceptive observation by an elite practitioner? Because I have nothing better to do with my time (okay, not really), let's put it to the test. To the statistical test.

A little background: since 1995, major league baseball's post-season has included not just the teams that amassed the best record in their divisions, but also a "wild card" team defined as the team with the best regular-season record that did not win its division. This means that the post-season includes eight teams: six division champions (three American League, three National League) and two wild cards. We're now in the eleventh post-season under these rules; since it's still ongoing I have not included it in the analysis that follows, but even so we have ten post-seasons of data to use in constructing a test of the Jeter Hypothesis.

To give away the punchline: there's no support for the notion that the wild card team is either significantly more likely to make it to the World Series (the end of the post-season) or to win the World Series. Of course, this hasn't stopped popular commentators (including the folks at Fox Sports, purveyors of baseball misinformation since 1996, and Joe Morgan, Hall of Fame player and ESPN commentator who is overly enamored of "small ball" and "post-season experience" and "clutch players" and other such statistically unsound silliness) from deploying the claim, perhaps reading too much into the fact that the 2002, 2003, and 2004 World Series were won by wild-card teams. In fact, if my analysis is correct, they definitely are reading too much into this.

How did I arrive at this conclusion?

The basic idea was to look at the ten full post-seasons to see whether any systematic relationships emerged. Step one was to simply collect the numbers and make two tables: one showing how many wild card teams made it to the World Series, and another showing how many wild card teams won the World Series.

Wild CardDivision Champion
In World Series614
Not in World Series1446


Wild CardDivision Champion
Won World Series46
Didn't Win World Series1654


On their own, these tables don't tell us much. In order to see whether or not there's any significant relationship involved, I ran a chi-square test on the matrices. [You can actually run this test using a web-based interface like this one, but being something of a baseball stats geek I coded my own Excel spreadsheet to do it for me.] The chi-square tests how far the observed values in a matrix differ from the values you'd expect if there were no systematic relationship between the categories; in order to do this it basically corrects the values by taking the proportions of the population in each category. So the fact that there have been 20 wild card teams, as compared to 60 division champion teams, in the post-season over the past ten years means that if there were no systematic relationship we'd expect that wild card teams would be in the World Series, and win the World Series, about one-fourth of the time. Wild card teams have been in and won the World Series slightly more than this.

But significantly more often? The chi-square value for the first matrix is .356; it would have to be 3.841 in order to be significant. The second matrix has a value of 1.371; again, it'd have to be 3.841 in order to be 95% likely not to be the result of sheer random chance.

Just for the sake of completeness, I also ran a series of tests to see whether the wild card team was more likely to make it to the second round of the playoffs -- perhaps the "hotness" of the wild card team only lasts through the short first-round series. No dice there either: chi-square was 1.067.

What does this all mean? Ultimately, I think it means that there haven't been enough post-seasons under the contemporary arrangement for us to really know whether the wild card makes that much of a difference. At the moment, there doesn't appear to be any significant relationship, but if the Astros win the World Series this year (which they won't -- White Sox in seven, I say), the population size is small enough that it might make a significant difference. In a week or so I'll update the numbers and we'll see then. Sorry, Derek.

More interesting, I think, is the fact that there's almost a significant relationship between having the best regular-season record and appearing in the World Series (chi-square: 3.2), even though the wild card system introduces more noise into the system by making it possible that a team that finished well behind in its division might win the championship. Maybe the best will out after all…but I'm not quite prepared to make that claim without further analysis.


[Posted with ecto]

No comments: