Archive for the ‘Statistics’ category

The Drop Off to Second Best


This morning Buster Olney of ESPN tweeted, “Because of the difference between Rivera and others at his position, for me, he should be part of NYY’ Rushmore, with Ruth, Gehrig, DiMaggio.” I find this an interesting claim in a lot of ways. First note the 4 players: Ruth, Gehrig, DiMaggio, Rivera. I would without a second thought shove Mantle ahead of both DiMaggio and Rivera. Second, I take his claim to be that the dropoff from Rivera to others refers to all other relievers, not just all other Yankee relievers. That follows an earlier tweet which said, “The difference between Rivera and any other player at his position in history is the greatest of any position.” That is a more interesting question. To get a quick and dirty look at the drop from the best to the second-best at various positions, I’m going to do a bit of fiddling with WAR, as measured on Baseball Reference. I will also summarily exclude 19th-Century Players. (This means I excluded both Cy Young and George Davis.) The method is simple: Take the WAR of player 2, divide it by player 1, and multiply by 100. This gives the second player’s production as a percentage of the first player. So, is the dropoff from Rivera the biggest? Let’s turn to the stats.


Player 1 WAR 1 Player 2 WAR 2 Percentage Total Drop
Gehrig 118.4 Foxx 95.2 80.41 23.2
Hornsby 127.8 Collins 126.7 99.14 1.1
Wagner 134.5 Ripken 89.9 66.84 44.6
Schmidt 108.3 Rodriguez 105 96.95 3.3
Ruth 190 Aaron 141.6 74.53 48.4
Cobb 159.5 Mays 154.7 96.99 4.8
Bonds 171.8 Musial 127.8 74.39 44
Bench 71.3 Fisk 67.3 94.39 4
W. Johnson 139.8 Clemens 128.8 92.13 11
Rivera 55.8 Gossage 39.5 70.79 16.3
Eckersley 58.3 Rivera 55.8 95.71 2.5
Rivera 55.8 Hoffman 30.4 54.48 25.4

First, these are full career WAR stats, so Ruth has a serious bump from being a pitcher, and Walter Johnson gets a nice little bump from his hitting. Second, I calculated relievers three different ways. First, I ran Rivera against Gossage, the two highest pitchers who accumulated almost all their WAR in relief. Next I did Rivera against Eckersley, because Eck had the highest WAR of any pitcher who is in the Hall of Fame as a reliever. Nonetheless, his WAR is so high because he gets a giant boost from all of his years as a starter. Finally I compared Rivera to the next highest modern closer, that is the highest WAR from a reliever since the advent of the modern closer circa 1980. That would be Trevor Hoffman. So where does this get us?

First, the drop at shortstop is gigantic. Even adding George Davis back in doesn’t help much. That is the lowest percentage drop among position players. Next, the drop from Ruth to Aaron is impressive. It is the largest raw WAR drop, and the third lowest percentage. Quite a drop considering this is Hank Aaron we are talking about. Finally, relievers are tricky. First, if Eckersley is included, Rivera isn’t the best ever. Next, if you include higher inning relievers from the 1970’s, the percentage is not the lowest, but it is second. Finally, if you limit Rivera to his most comparable group, other closers, you see Buster Olney’s point in big numbers. Rivera is nearly twice the pitcher of any other closer, when measured by WAR. I find that fact astonishing.


Book of the Month: Beyond Batting Average


For the first time, I’d like to talk about a book I know solely because I follow its author online. Lee Panas blogs at Tiger Tales, a blog devoted, as the title implies, to the happenings of the Detroit Tigers. It is definitely one of the best single-team blogs around, as it includes good discussion of the current team with interesting work on the team historically. For our purposes, though, Panas has interests beyond the narrow confines of the Detroit Tigers. He has recently published a sabermetric primer called Beyond Batting Average: Statistics for the 21st Century. I picked up the electronic version recently to review here, so let me give the book its due.

The book does three things: It gives a history of the development of baseball statistics, lays out newer statistics in detail, and gives recommendations for the best statistics to use in player evaluation. Let us consider each in turn. The history of statistics that Panas gives is necessarily brief. It is not a primary focus of the book, and for that reason it is primarily confined to Chapter 1. However, each ensuing chapter gives some information on the history of stats dealing with the chapters topic. Given my own historical bent, I would have liked to see more of this. Given the book’s primary focus as a primer on newer statistics, it is unsurprising this area was not explored in more depth.

Next, Panas lays out newer statistics in detail. It is at this that the book truly excels. Want to know what OPS, WAR, UZR, qERA, FIP, or other statistics are? Panas’ gives some of the most succinct and clear explanations that I have seen. He is not bogged down in the exact mathematical derivations of each statistic, and instead he focuses on the originator, formula, and clarifying examples. Each example does a good job in illustrating the stats strengths and weaknesses. As someone already interested in statistics and particularly in using statistics to compare players, I found this portion of the book invaluable.

Finally, Panas gives recommendations for the best statistics to evaluate players. He emphasizes several things in these stats: repeatability, comprehensiveness, and sources. Repeatability refers to statistics that correlate highly from year to year. Why is on-base average better than batting average? For one reason, it is more predictive of a player’s performance next season. Comprehensiveness describes statistics that evaluate each area of a player’s full performance. Stats like WAR focus on players’ hitting and fielding contributions, instead of focusing narrowly on just one element of play. Sources focuses on where statistics derive their data. Panas considers fielding stats based on play-by-play data better than rivals with less specific sources. He gives more weight to stats that adjust for park, era, etc. than stats that don’t. I think Panas’ three grounds for recommendations make a great deal of sense. They have the advantage of forcing the analyst to look at what biases are built into each statistics and therefore add some humility into our evaluations.

Overall, Panas wrote a short and accessible introduction to sabermetric statistics. If you want to learn more about the most advanced statistics on the market right now, it is tough to find a better source.

Introducing Statistics: Passer Rating


Passer rating is probably the most common stat to toss around when evaluating quarterbacks.  When announcers are covering an NFL game, they regularly update viewers on what a QBs in-game rating is.  But the number itself has no intrinsic meaning.  Touchdowns tell you how many passes a quarterback has thrown that have resulted in a touchdown.  Interceptions tell you how many of a quarterbacks passes have been completed to the opposition.  But what about passer rating what does it do?  First, we need some background to explain why anyone should care.  Next let’s focus on the equation itself.  Finally, we will wrap by highlighting passer rating potential failures.

Quarterback touchdowns are a little like RBI’s in baseball. They highlight the most important part of a sequence while ignoring the rest. In the end, the most important base to advance in baseball is from third to home. Similarly the most important pass is the one that goes across the goal line. But it matters how you get there. If Adrian Peterson takes the ball at his own 3 and breaks a 90-yard run, he does not get credit for a touchdown. If on the next play, Brett Favre completes a 7-yard touchdown pass, he receives as much credit as Ryan Fitzpatrick received for his 98-yard touchdown pass last week. Passer rating is an attempt to correct for the assymetry of Favre receiving disproportionate credit in the example above. To correct for problems like this, Don Smith in 1971 created passer rating.

In college football the formula for passer rating is:

{(8.4 \times YDS) + (330 \times TD) + (100 \times COMP) - (200 \times INT) \over ATT}

In college this number is nearly infinite. A great game goes up and up, while a horrible game continues to drop. The NFL modifies the formula to give an upper bound of 158.3, lower than the number Colt Brennan had for the entire 2006 season at Hawaii of 186.0. This makes the substantially more complicated:

[25 + 10 * (Completion Percentage) + 40 * (Touchdown Percentage)
– 50 * (Interception Percentage) + 50 * (Yards/Attempt)] /12

The extra terms are attempts to cap a quarterback’s performance. Thus NFL quarterbacks can have perfect games, though no one has ever approached that over multiple games.

So what? First, passer rating tells you little when comparing across time. This is a problem in the NFL generally, when the sport regularly expands the schedule and makes the massive change from one-platoon football to unlimited substitutions. Nevertheless rules about defensive pass coverage have also changed dramatically, leading to massive changes in completion percentage that are at the heart of passer rating. Second, passer rating is dependent, to an extent, on factors outside a quarterback’s control. The formula hopes that drops by receivers (a QB’s bad luck) will even out with freakish runs after catches (QB’s good luck). That may or may not be true. Passer rating is a stat that has its uses. A high rating, over 100, tells you someone had a good season. A low rating, under 80, tells you someone had a poor season. But the exact number is meaningless, and you cannot put too much stock in single-game variations or slight gaps between one QB and another.

Introducing Statistics: Replacement Level


Statistics do not exist in vacuum.  They have meaning only in comparison to something else.  While someone scoring 100 runs has some independent meaning, i.e. the player crossed home plate 100 times, that tells you nothing about whether 100 runs is good or bad. To determine that, you must have a point of comparison. This comparison is to what is called the reference group. In baseball, a number of different reference groups are used, but for today I want to focus on the replacement level reference group.

The concept of replacement level dates back to the introduction by Keith Woolner of the stat VORP (value over replacment player.) Replacement level is a measure of marginal utility. How much better is a player than the lowest level of players that will be paid by a major league team? That, in essence, is how much more valuable a player is than a replacement level player. To explain more, think back to the bell curve illustration from the post on Rivera and Lidge. If Albert Pujols is the best player in baseball, he is the far right end of the curve, all by himself. As you move back to the left, you find steadily more players at each equivalent amount of talent. At some point you run out of major league roster spots, but there are still large numbers of players of essentially equal ability fighting for those last spots. That pool of players, large numbers of whom play AAA baseball, are the replacement level players.

What does replacement level mean, in statistical terms? Woolner defined replacement level as 70 points of OPS less than the league average by position. That is, a replacement level shortstop is 70 points worse than the league average OPS of other shortstops. This controls for the fact that it is harder to play shortstop than first base. For pitchers, Woolner used 1.00 greater than the league average RA (Run Average, i.e. earned and unearned run average). He also controlled for park factors, to keep in mind the better offensive numbers of a replacement level third basemen in Coors Field than one in Petco Park.

So what? Who cares about the notion of replacement level? Think about how much a player is worth. A replacement level player, by definition, should be making the league minimum. They are not really a scarce resource, in baseball terms, as their supply greatly exceeds their demand. For that reason, they should only make $400,000, the league minimum. At each step above that, supply becomes scarcer, and compensation subsequently should increase. Further, replacement level explains how teams can increase productivity at the margins. The weakest players on a roster, as a rule, can be replaced by replacement level players, saving money and possibly increasing production. This is the central point of my criticism of the Griffey contract. At this stage of his career, Griffey is only a replacement level performer, yet he is being paid as someone much better. That is a waste of money and production.

The concept has morphed since Woolner first wrote about replacement level. Tom Tango created a scale of strict positional adjustments, in an attempt to quantify how much harder it is to find adequate shortstop play than it is to find adequate corner outfielders. Sean Smith added further support to this system by comparing the adjustments explicitly to expected contributions from the best players not on major league rosters. Where, then, does this get us in the end? Replacement level is key to calculating a variety of stats, most prominently Wins Above Replacement, which exists in numerous variations all across the internet. Replacement level, though interesting in itself, is most useful as a stepping stone to further considerations of production, compensation, and ability.

Introducing Statistics: Park Effects


Baseball games do not occur in a vacuum; instead they occur in particular stadiums.  Park effects are the notion that stadiums have differential impact on individual baseball games. These effects can in turn skew a host of stats in order to make certain players look much or worse than they truly are, simply because of their home stadium. Each stadium is assign a park factor in order to account for these effects. Let me delve briefly into how park factors are derived, then we can conclude with a few applications.

Park factors adjust baseball events by how much they are dependent upon a given stadium. Thus, there is a park factor for home runs at Yankee Stadium which differs from the park factor for doubles at Yankee stadium. There are a variety of methods of producing park factors. The simplest method is the one used by ESPN, listed in the chart linked to above. To calculate the park factor for home runs:

((homeHR + homeHRA)/(homeG)) / ((roadHR + roadHRA)/(roadG))

You add the number of home runs a team hits at home to the number of home runs it gives up at home, dividing by the total number of home games. That factor is divided by its road equivalent. This gets you a ratio in which a score of 1 is neutral, i.e. your park neither increases or decreases home run production. Numbers higher than 1 increase the number of home runs, while numbers lower than 1 decrease them. The same method can of course be applied to any other event, like hits, runs, doubles, etc., simply by switching home runs out for the other stat. Other methods of calculating exist, most commonly changing this basic model by using several years to determine a park factor, such as this historical example that I will use below.

Why do park effects matter? For starters, consider the 1995 NL MVP race. In 1995, Barry Larkin of the Reds edged Dante Bichette of the Rockies for the MVP. In the aggregate, Larkin hit .319/.394/.492, while Bichette hit .340/.364/.620. Given that, how did Larkin win? He had an advantage in OBP and nothing else. Bichette even lead in wOBA .413 to .405, and both teams made the playoffs. But consider Bichette’s home/road splits. At home, he hit .377/.397/.755, but on the road he hit .300/.329/.473. 31 of Bichette’s 40 home runs were hit at home. Larkin in contrast hit .328/.424//498 at home and .309/.360/.486 on the road. Larkin was the same player everywhere in 1995, while Bichette’s great overall numbers were a product of playing his home games in Coors Field. In 1995, Coors had a park factor of 1.23, while Cincinnati had a park factor of exactly 1. Bichette’s stats are inflated by a factor of 1.23, while with Larkin what you see is what you get. Because of this, Larkin received the edge in MVP voting, and Bichette settled for second place.

Introducing Statistics – Recap and Suggestions


Today, I would like to gather up all of the introducing statistics posts so far, to give people an easy way to find fairly simple introductions to a variety of very complex statistics. From oldest to newest:

For a sample of how these stats can be used to evaluate players, consider the second part of my World Series preview.

What other statistics should I do introduce? Please leave suggestions in the comments, or send them via e-mail. As the World Series nears its conclusion, this blog will likely do more coverage of football and other sports than it has up to this point. Feel free, then, to suggest stats from sports other than baseball.

Introducing Statistics: Ultimate Zone Rating


Fielding statistics in general have been the black hole of sabermetrics.  Branch Rickey, recognizing the importance of fielding, still said “[t]here is nothing on earth anybody can do with fielding.” Fielding statistics stayed remarkably static from the earliest days of baseball, when fielding percentage was invented. Fielding percentage simply takes errors and divides by total chances. The fewer errors, the higher the percentage. Unfortunately for the stat, players in general make less errors each year due to advances in baseball gloves and the quality of the field of play. This has led to a search for stats that better quantify fielding. I am going to focus today on one such attempt, ultimate zone rating (UZR).

UZR was developed originally by Mitchel Lichtman. It attempted to advance beyond range factor, a fielding stat developed by Bill James calculated as (putouts + assists)/games played. UZR attempted to correct for several problems with range factor, first by using innings as a measure instead of games and second by controlling for the effects of pitching/luck. UZR attempts to measure how well a fielder can turn a batted ball into an out. In particular it measures this as the number of runs saved.

UZR divides the fields up into a variety of zones, gives individual fielders responsibility for those zones, and calculates how often a hit into each zone is turned into an out. By comparing the probability of an out being made, on average, and the number of outs that a fielder actually converts, you get the number of runs that a fielder saves above average. From this starting point, FanGraphs adds double play runs, outfield arm runs, and error runs, three categories left out of the original calculation. It also reports a rate stat, UZR/150, which number of runs saved per 150 defensive games. This is intended to correct for part timers and the fact that very few players play 162 games in a year.

UZR has one large minus: It is almost completely inaccessible to any but the most devoted sabermetric fan. Casual fans can understand it and look it up on sites that list it, but they don’t have access to the data necessary to calculate it easily. This is part of the explanation for the persistence of fielding percentage; it is easy math. It is also ahistorical; the data does not exist far enough back for meaningful comparisons to be made about players from different decades, let alone eras. However, it has the advantage of incorporating range into fielding statistics in a way that fielding percentage never did. It also allows for a more complex understanding of range than range factor. UZR, then, is a useful current stat, but it is a serious challenge to comprehend or calculate for the casual fan.