4.6.05

(Baseball) Statistics Never Lie

[cross-posted at The Duck of Minerva]

One of the things that I find the most fascinating about the game of baseball is the fact that statistical data about the performance of players and teams is actually meaningful. This is so largely because the kinds of things being measured -- whether a team wins or loses, how many balls and strikes a pitcher throws, what percentage of the time a player gets on base as opposed to making an out -- involve a repetition of the same basic actions a sufficient number of times that random fluctuations cancel out. Players do the same basic things enough times that over the course of a season their ability to do those things (like get the bat on the ball, throw a strike, and so forth) will be reflected in their numbers.

For example, every "plate appearance" that a batter has over the course of a season is roughly similar to every other plate appearance in its basic contours, and over the course of a 162-game season the average player can expect to come to the plate about 500 times -- a sufficiently "large n" that saying that a batter has an on-base percentage of .446 and a slugging percentage of .536 is a meaningful statement.1 Contrast this to "football statistics," which are based on a regular season of only 16 games; players rarely get sufficient chances to do things like catch passes and rush for yardage to make meaningful quantitative comparisons possible. This doesn't stop people from making those comparisons, and playing "fantasy football" based on them, but I'll keep my statistics operating in realms where they make some sense, thank you very much.

The fact that baseball statistics are meaningful allows observers of the game to conduct very precise analyses of how well their teams and players are doing. Just for kicks, this morning I plugged some numbers into a spreadsheet to make a rudimentary calculation not about which teams were doing the best in terms of wins and losses -- that information is readily available in any major newspaper, and all over the web (for instance, here) -- but about which teams were performing most efficiently. I took information about the 2005 payrolls of all 30 major league teams from this site, and had Excel calculate approximately how much money each team was paying for each of the wins it had thus far achieved this season.2


The results are interesting, although I won't bore you with all of the details. The important results are these:

  • the Yankees have the most expensive wins, at $2,571,689.10 per win; the Devil Rays have the cheapest, at $498,447.13. This is not a major surprise, since the Yankees' overall payroll is about seven times as large as the Devil Rays' payroll.
  • what is surprising is that the Yankees are paying almost twice as much for a win as the next team in the list, the Boston Red Sox. And the Red Sox are doing better than the Yankees in the overall win-loss standings.
  • of the teams whose winning percentage is .500 or greater, the three teams paying the least for their wins thus far this season are the Toronto Blue Jays ($535,243.19), the Washington Nationals ($568,748.94), and the Minnesota Twins ($574,432.48). The payrolls for all three teams are in the bottom third of all major league teams.
What this tells me is not just the the Yankees are playing crappy baseball this year -- that much I knew already -- but precisely how crappy their play has been. It also underscores precisely how impressively the Washington Nationals have been playing; they're making a little bit of salary go a very long way.

The analysis also demonstrates, pretty concretely, that simply spending money on a baseball team doesn't guarantee you success. You also have to use that salary efficiently, and get sufficient bang for your buck. The Yankees are spending about $208 million and thus far have a 27-27 win-loss record; the Nationals' total payroll is about $48.5 million, and their record is 29-26. Put another way, the Nationals are spending 1.171% of their salary for each win, while the Yankees are spending 1.235%; the difference may not look like much, but over a 162-game season, minor variations between teams and players translate into major differences.

The fascinating thing is that in baseball we can determine precisely how much difference these things make. I'd be very resistant to running numbers like this in most other situations, but in baseball, bring on the quantitative analysis!

1 On-base percentage measures how often a batter reaches base successfully, whether through getting a hit or drawing a walk or being hit by a pitch; slugging percentage measures the total number of bases that a player reaches in all of his at bats; details on basic baseball stats can be found here. The specific numbers that I used for this example are Nick Johnson's stats for the present season. The "500 at-bats" figure is approximately the minimum number of at-bats required to qualify for a batting title under the present 162-game regular-season schedule. 2 We're approximately 1/3 of the way through the season at the moment, so if we divide each team's salary by 3 and then divide that number by the number of wins, we get the effective "price per win" that each team is paying -- note that this does not include the salaries for managers, coaches, etc., but is limited to the combined salaries of all the players on the team's payroll.