WAR problems and the Mets Crazy Horse.

If you go onto (the oddly green) Fangraphs.com and wander (with your mouse) into the value section, you can see that among the Mets everyday players, David Wright has been most valuable in the first half of 2010. Wright has accumulated 4.1 Wins Above Replacement (WAR) — that’s the second most in the National League, with Joey Votto leading Wright by just 0.1 WAR. Angel Pagan is second on the Mets with 3.1 WAR. Makes sense, Wright then Pagan.

If you then wander over onto Baseball-Reference.com, you’ll see something interesting. According to Baseball-Reference’s version of WAR, Angel Pagan has been the most valuable Mets player, worth 4.0 WAR. David Wright is now second, with 3.9 WAR. Pagan’s 4.0 WAR makes him the second most valuable position player in the NL. He trails Adrian Gonzalez by 0.2 WAR on the leaderboard.

So, on one website, David Wright is tops on the Mets team, and the second best in the NL. On the next website, Angel Pagan is the best on the Mets, and second in the league. On one, Pagan is 3.1 WAR; on the other, 4.0 WAR. Joe Posnanski recently complained about this WAR discrepancy. Seeing that there are now two easily and freely available versions of WAR on the Internet, I think it might be worth it to look into the differences between the two versions of WAR, and WAR itself — by talking to myself, of course.

Okay, so what’s WAR?

WAR stands for Wins Above Replacement. It’s a statistic that’s supposed to measure how valuable a player has been to his team, and then puts that value into a number of wins.

What is “Above Replacement”?

Replacement means “replacement level player.” Replacement level players are the sort of dudes every team has stocked away in AAA. All those guys on the 2009 Mets that were pulled out of nowhere after everyone good was injured? Like Emil Brown? Ramon Martinez? That’s basically replacement level. WAR is supposed to measure how much better a player is than someone barely serviceable.

But it has decimal points. “3.9 WAR.” I don’t like that.

I really don’t like it either. WAR would probably catch on more if it was expressed in whole numbers. Like, what the heck is 0.5 of a win? Baseball fans seem to be okay with whole numbers (RBI, saves) or percentages (batting average, slugging), but mixed numbers . . . not so much. Sabermetricians are generally pretty bad at coming up with accessible acronyms and numbers. Bill James is usually the exception, and he had the right idea by making his Win Shares whole numbers and multiplying them by three so the differences were larger.



So, why does Angel Pagan have two different WAR numbers?

Well, because there’s no single agreed upon way to calculate WAR. It’s not like batting average, “hits divided by at bats.” Each site uses a different method for WAR — this doesn’t help it catch on more widely, either, because now there’s a steep learning curve.

Sigh. Okay. What are the differences between Fangraphs and Baseball-Reference WAR?

Well, first we’ll need to break down the pieces that add up to make WAR. We’re going to ignore WAR for pitchers, which is totally different, in this post; I’ll deal with them later in the week.

Think of it this way — every position player is responsible for two things:

A. Creating runs on offense

B. Preventing runs on defense

And that right there is really the heart of WAR. It measures a player’s offensive contributions and their defensive contributions in terms of runs, and then converts those runs into wins.

The problem is that Fangraphs and B-R evaluate both offense and defense differently, and each comes up with different number of runs for the same player.

So I’m supposed to buy into a stat no one can agree how to calculate?

. . . .yes?

I’ll play along for now. Well, to start, how is a player’s offense measured differently on each site?

Well, it’s not that different. Each version of WAR bases a player’s offensive contributions on basically the same few things: his number of singles, doubles, triples, home runs, stolen bases, caught stealing, reached on errors, walks, and hit by pitches. Each batting event is assigned a value, and then those values are adjusted for the park, year, and league the player is in, so that batters in 1968 Dodgers Stadium can be compared to batters in 1997 Coors Field. The adjustments are slightly different on each site, but the ideas behind doing so are the same.

Baseball-Reference also adds in some baserunning events Fangraphs does not, such as advancing on passed balls and going first to third on a single. B-R also gives credit to hitters for avoiding hitting into double plays. Neither of these are enormous differences, but they are differences.

Fangraphs combines all theirs offense into one number, called “weighted runs above average.” (wRAA) . . .

Wait, why is the “w” in “wRAA” lowercase? Why do they always do that?

Um . . . I’m really not sure. I don’t see a problem with WRAA, other than it sounds like a crow’s noise if you read it aloud like a real word. The lowercase certainly doesn’t make it more appealing, and it’s going to look awkward when I don’t capitalize that “w” at the beginning of this next sentence . . .

wRAA is supposed to measure how many runs a player created on offense, above or below what an “average” player would create.

Baseball-Reference, for their part, breaks offense down into four numbers:

– Double play runs. (Not hitting in double plays.)

– Reached on error runs. (What it sounds like.)

– Baserunning runs. (Stolen bases, caught stealing, advancing on hits, wild pitches, or passed balls.)

– Batting runs. (Everything else — singles, home runs, ect.)

Each one is also compared to the average. B-R likes whole numbers, so sometimes you’ll see “12+1+1 + 0 = 15,“ for the four categories, but that’s only because of the decimal places they don’t show you in the rounding.

Here is what Angel Pagan’s actual 2010 looks like so far, only the table is cut off because I can’t figure out how to make it not do that:

2010 28 NYM NL 80 330 94 17 6 6 19 5 28 .315 .372 .473 5 0
5 Seasons 347 1159 305 62 22 21 45 15 89 .290 .343 .451 11 0
Provided by Baseball-Reference.com: View Original Table
Generated 7/13/2010.

And here’s what it looks like in B-R’s four offensive columns:

Year Age Tm Lg PA Rbat Rbaser Rroe Rdp
2010 28 NYM NL 320 11 1 0 0
Provided by Baseball-Reference.com: View Original Table
Generated 7/13/2010.

B-R says Pagan created about 12 runs above average in those four categories. Fangraphs credits Pagan with creating 13.8 runs above average.

That’s close. So the only difference in offense between the sites is in GIDP and baserunning?

Well, not exactly. They both use a slightly different formula and adjust it differently, but it’s generally going to spit up a similar number.

Also, at the end of the season, B-R’s offensive number is adjusted so that the number of runs a team is credited with creating matches up with the ACTUAL number of runs the team scored; Fangraphs doesn’t do the same thing. B-R’s version of WAR is rooted a bit more in what actually happened on the field.

Baseball-Reference also doesn’t bother to figure out reached on error runs and double play runs until after the season ends. I’m not sure why. You’ll see that right now, everyone, Pagan included, is 0 in both in 2010.

Fair enough. So . . . defense?

Ah, defense. The big problems show up in defense. Fangraphs uses a system called Ultimate Zone Rating (UZR) for their WAR. Baseball-Reference uses something called Total Zone. Fielding is where you’re going to find the really big differences between players. Fangraphs’ UZR gives Angel Pagan’s defense 6.4 fielding runs above an average player; Baseball-Reference’s Total Zone gives him 16 runs above average. Jose Reyes has either been -8 (Total Zone) or -0.4 (UZR). These numbers don’t always line up, and this causes most of the discrepancies between WAR numbers.

Okay. Why the enormous gap in fielding?

Mostly because it’s really, really difficult to figure out defense on an individual level. If Jose Reyes is at the plate and strikes out, we know it’s not David Wright’s fault; the strikeout is all on Reyes. Offense is easy to assign. On the other hand, if a ground ball scoots through between Wright and Reyes in the field, it’s a bit trickier to say whose fault that is — or if it’s the pitcher’s fault, or if it’s anyone’s fault. Assigning individual defensive credit is tough, and we’re not particularly good at it. This is why different systems can spit out vastly different numbers.

As for the differences between Fangraph’s UZR and B-R’s Total Zone — basically, if you kept really, really good scorecards, you could figure out Total Zone on your own with just that; if you recorded every game on your DVR, you could figure out UZR with that. Not really, but almost. Both take into account a player’s range, and his arm if he is an outfielder, and his ability to turn double plays if he is an infielder. They just do it in different ways.

Catcher defense is also evaluated differently by both sites. B-R uses passed balls, wild pitches, caught stealing and the
number of stolen bases allowed to evaluate a catcher; Fangraphs just uses CS and SB numbers.

Which one is better? UZR or Total Zone?

Definitely UZR, but it’s sort of like the difference between playing William Tell with someone who has 800/20 vision and playing with someone who has 200/20 vision. Neither system is perfect, or even close to that, but it’s better than letting the blind guy try to shoot the apple off your head, right? They’re better than the nothing we used to use.

On the other hand, Fangraphs goes with decimal points again for UZR. It’s not a huge deal, but I think it’s easier for everyone to understand Pagan saving 16 runs as opposed to 6.8 runs. No one ever wins a baseball game 6.8 to 4.2.

The big advantage of Total Zone is that it allows us to evaluate the defense of players throughout all of baseball history through Retrosheet, something UZR can’t do.

At the very least, both systems rate Pagan as a great defensive center fielder. The disagreement is about how great.

And that’s everything in WAR?

Almost. There are two other adjustments that need to be made.

The first adjustment is for the position of the player. In other words, defense first positions like shortstops and catchers get bonus runs, and slugging positions like first basemen and designated hitter lose runs. Positions that stress defense are generally played by lighter hitters; this adjusts for that, so we can compare players across positions. Someone who can hit 25 home runs as a first baseman is easier to find than someone who can hit 25 home runs as a shortstop; WAR tries to adjust for that fact.

Pagan gets one run from Baseball-Reference for being a center fielder. Fangraphs, still going strong with the decimals, gives him 1.1 runs.

And the second adjustment?

The second adjustment is for “replacement runs.” Because it’s Wins Above Replacement, and the offense and defense are just compared to AVERAGE and not REPLACEMENT, we need to adjust for that — replacement players are worse than average players.

Basically, what replacement runs means, is that if someone plays 150 games, he starts with about 20 or so runs to his credit just for running himself out on the field; more games gets you more runs, less games gets you less runs. Those are “replacement runs,” an estimate of the difference between replacement and average level.  It’s a weird concept, and probably where most people tune out.

Both sites figure these out differently as well — Baseball-Reference uses a certain number of replacement runs depending on which league. I don’t believe Fangraphs does the same thing, and just uses a blanket replacement level for both leagues.

Anyway, as for El Caballo Loco, Pagan gets 11 replacement runs from Fangraphs, and 9 from Baseball-Reference — about 10 runs for a half season.

But now that’s everything in WAR, correct?

Just about.

*Awkward high five*

No, wait. How do we get from runs to wins?

Oh, right. As a general rule, 10 runs equals one win, but that changes from season to season. In years when less runs are scored, it takes less runs for win, and vice versa — if every game finishes 5-3, one run is 12.5% of the total scoring. If every game is 13-7, one run is 5% of the scoring. A run is a run is a run, but all runs aren’t equally valuable — a run in a lower scoring league is more valuable because it’s more of the total scoring.

Anyway, Baseball-Reference says Pagan is 39 runs above replacement this season, and turns that into 4.0 Wins. Fangraphs says 32.3 runs, and turns that into 3.3 wins. So ten runs roughly equals a win this season.

And that’s it?

That’s it.

*Even more awkward high five*

So . . . what’s all this good for?

It’s probably the best way we have to eyeball who’s having a good season, because it accounts for most things. It also lets us compare players across different eras more easily than something like batting average and home runs would. The league average batting average and the amount of home runs hit change year to year, sometimes going through huge dips and rises. The goal of the game on the other hand — to win — remains unchanged.

Okay. Are there problems with WAR?

Oh yes. Many.

It’s a counting stat, so it has some of the same problems as runs scored and RBI. That means if there are two equal players, the one who plays more will have a higher WAR.

For example, good players who play for bad teams — bad teams that don’t score many runs — will get less plate appearances over the course of a season and less WAR because of that. Last season, Albert Pujols played 160 games for the NL Central Champion Cardinals; he had 700 plate appearances. Adrian Gonzalez played 160 games for the fourth place San Diego Padres; he had 681 plate appearances. Similarly, American League players will generally get more plate appearances than National League players, solely due to the pitchers using up outs in the NL. Most of the WAR leaders this season are in the AL.

It also doesn’t take into account the timeliness of hitting, so a home run in the first inning is worth just as much as a home run in the ninth inning. How big of an issue this is depends on your thoughts on clutch hitting, which is another story.

For some reason, no one ever brings up clutch fielding, but it doesn’t measure that either. Also, as we saw above, defense in general is a mess. Defense is easily the biggest problem. If you see someone at -30 runs or +30 runs, it might be a fluke throwing off their value. Things smooth out over a career, but season to season is a minefield.

It’s also not going to measure immeasurable things, such as leadership.

So it’s got problems — why should anyone use it? Marvin Gaye says WAR is not the answer.

It’s the best system we have. It’s not perfect, but it’s the best we have, at least so far. It might be a Model T, but it sure beats walking everywhere. Jeff Francoeur would certainly agree that walking is overrated . . .


Sorry, lame joke.

Oh, no, I wasn’t booing that awful joke. It was just a reflex from hearing Francoeur’s name.

*Awesome, non-awkward high five*

So, show me Pagan’s WAR again, only this time broken down.

Offense: 13.8; Defense: 6.4; Replacement: 11; Position: 1.1 = 3.3 WAR

Offense: 12  ; Defense: 16  ; Replacement: 9  ; Postion: 1  = 4.0 WAR

Pagan’s been good?

Pagan has been awesome.

Where can I read more about this?

Here is Baseball-Reference’s explanation for WAR. Here is Fangraphs.

And pitchers?

I’ll do that one later in the week. It’s even more complicated, believe it or not, but there’s no baseball on, so what else am I going to do?

Image via slgckgc’s Flickr.


Filed under Statistics, Words

19 responses to “WAR problems and the Mets Crazy Horse.

  1. Anonymous

    >Were it not for your witty and entertaining writing style, I would never have made it through this post. As it was, I skimmed several sections where you were apparently not making jokes. I understand that some very smart baseball people use these new forms of measurement to make important decisions, so apparently they have merit, but I don't care. Looking at the game this way and thinking about the game this way is incredibly tooly. I'm glad there are apparently aspects of the game that cannot at present be calculated by sabermetrics, and I hope they never figure out how to do it. Baseball is constructed in a way that allows numbers-oriented people to go nuts, but the game is about so much more than numbers. Anyone who has been watching the Mets this year and doesn't appreciate how good Angel Pagan has become until they see some acronyms and numbers that tell them he's been good, is missing something. I don't want to think about him in terms of numbers. I just want to watch him run like a Crazy Horse in the outfield and make an over the shoulder catch of ball in the gap that had triple written all over it.

  2. Anonymous

    >p.s. Thanks for trying to explain it, anyway.

  3. Patrick Flood

    >Thanks, I think . . .It's not that looking at the numbers is the only way to enjoy baseball, or it's a way anyone has to, but it is one of the ways I happen to like looking at baseball. Some people like the history, some people like the beauty, some people like the numbers. I like all of those. There are hundreds of ways to like baseball. This is just one of them, and it's not better or worse than any other. The point of this was more that this particular number, the WAR, is the best way to measure players at this time. I was just hoping to make that aspect of the game more accessible for people, because too often the group coming up with these things makes it too tough.But I sincerely support your right as a fan of the game to not care about this stuff. Just, if you or anyone else were interested, I figured it was worth a shot.On the other hand, it took me thousands of words to explain WAR out, so maybe it's still not accessible. I don't know.

  4. pedros rooster

    >Great, great post. The dialogue style and the humor made an esoteric and clinical subject matter very accessible and enjoyable. Looking forward to reading more of your work.

  5. richard

    >thank you for expanding on my knowlege.i've been singing Angel's praises for three years now. he had been a model of consistency before and now has upped his game with the confidence that comes from regular play. it had to happen. it boggled my mind to see met fans call him injury prone when his biggest sin was to put his body on the line.watch him expand on these numbers as the season progresses. he's the real deal and we need him because Senor Beltran will be one iffy addition with a chronic bone on bone ailment…

  6. rdmanapple

    >sorry for typo. i meant, 'knowledge'

  7. acerimusdux

    >The biggest problem with WAR is the defensive stats. Defensive stats simply have very little value in small samples, and can be off quite a bit even in single season samples. But they do pretty well if you look at career samples. So why not just calculate it using the career rates? Sure, you are then no longer measuring only what the player actually did this one season, but you are fooling yourself if you think something like UZR is a valid measure of a players actual performance for a season, or worse half a season, anyway. If you treat the defensive adjustment for what it is, an estimate, it makes perfect sense to use career data there and then combine that with the current offensive performance. Then you wouldn't get absurd results like Angel Pagan being as good as David Wright or Adrian Gonzalez.

  8. Patrick Flood

    >@ AcerimusduxSo are you advocating something closer to VORP, only with a career UZR adjustment instead of just a blanketed positional one? If so, I like that idea.

  9. Saul

    >Baseball's such a broad game in every aspect that deviations from "traditional" stats is inevitable… It's really about one thing as a fan: winning. Who goes to a baseball game to see their team lose voluntarily? So if SABR is going to help teams win, of course people are going to embrace it. Plus for math and stat guys, they find calculating sabermetrics like doing a jigsaw puzzle (fun…!) Sabermetrics also accounts for a lot of defense that is unaccounted for in traditional stats (I know you didn't pick Endy Chavez on your fantasy team because he made the catch).So anyway, what am I really trying to say? This was a great post, take it or leave it.

  10. dave crockett

    >Great post Patrick. The thing I always say to people is that this isn't about "stats." It's about stories. What I appreciate about your post is that you take the time to lay out the intuition behind WAR. For those of us who haven't read Bill James, THAT'S real the contribution–the real innovation. It's less about the sausage-making for a particular stat. Establishing a "zero point" for making player comparisons is a terribly important thing. But, that's not terribly obvious to people who have been comparing players without a zero point for their entire lives. So, kudos to you for telling a story about how it's important.I think sabremetrics types generally could do a much better job of building a story around WHY something like WARP (and its components) is such an enormous conceptual advance over what we grew up with. Any reasonably well-trained eye can identify a great player. We don't need fancy stats with lots of adjustments to do that. But, as you well know, making subtle and precise comparisons between players was enormously difficult before some of the statistical advances. We often overweighed player A's one useful skill but undervalued player B's. Worse yet, we were often stuck using little more than worn out cliches to evaluate player performance. As brilliant a baseball mind as Bobby Cox's was once throwing away seasons worth of ABs on Jeff Francoeur because he's "a good RBI man."

  11. acerimusdux

    >@PatrickYeah, the problem with VORP is that it doesn't take into account at all how good or bad a guy's defense is. If he plays SS he gets credit for SS. WAR includes defense as well as a positional adjustment:Off + Def + positional adj + replacementI'm saying use career stats for the defense part. But the argument then comes down to are you trying to measure what a guy actually did this season, or are you trying to estimate his expected value in the future? If you are really trying to measure player value, rather than performance, you should probably use career stats, or multiple years data for both offense and defense. But then you are turning it into a projection system like CHONE or ZiPS. If you are really trying to predict future value, then you also add aging adjustments, which CHONE and ZiPS also have. I think everyone understands that projections of the future are estimates.But I see nothing wrong with combining a players actual offensive performance for a season with an estimate of his defensive value. For Pagan right now, for the half season, that might look something like:Off + Def + Pos + Rep11.9 + 5.5 + 1.25 + 10 = 28.7 runsThis is estimating Pagan's defense as a +11 run CF for a season (so 5.5 for half), and using an average of the fangraphs and B-Ref measures for his offense.

  12. Michael Sullivan

    >What's with the hate on for decimals? Are you really willing to assume that decimals and fractions to represent shares or expectations of things that are normally discrete is too hard for most people to understand?I guess we should throw out batting average (well, that wouldn't be so bad), on base percentage, slugging and winning percentage while we are at it? What's needed is some good explanations of why decimal wins or batting/fielding runs makes sense, rather than simply hating on fractions.We use decimals because players rarely score or prevent runs all by themselves. The only common play where the run can be completely charged to one player on defense and credited to one player on offense is the one run homer. That's the batter scoring one run, and the pitcher giving up one run. The point of sabermetrics is to better describe the actual contribution of each player, and if you are to do that accurately, then you have to break up the contributions to the run when it's not a solo homer. Batter A leadoff walked. Batter B singles advancing A to third. Batter C hits a single and A comes home. Traditional scoring credits A with a run, and C with an RBI.You can get a rough idea of contribution by looking at R+RBI/2, but what does batter B's look like. A big 0. But C's single is very unlikely to score batter A without B's. If anything B may deserve *more* credit for advancing A; we know B hit a strong single because A took two bases and thus got into position where he could score on many ground balls or flies, while all we know about C's PA is that it was a single that scored A from third (which any single will do).Everybody is comfortable with batting average, which is a decimal, even though it describes expectations of events that are discrete (you can't get .303 of a hit, either you hit or you don't). Why is batting average ok and understandable to fans, but WAR or fielding runs isn't?

  13. Patrick Flood

    >@ Michael SullivanI think my point was more along the lines of "all traditional baseball statistics are in eitherwhole numbersorpercentages."The idea being that no traditional stats puts a decimal point in the middle of a number, unless you have an unusually high slugging percentage or something. DRS puts its fielding metrics into whole numbers, and Bill James did the same thing with Win Shares. I think it makes the advanced stats easier to grab onto for people who aren't as interested in the behind the scenes stuff. I, for one, am interested in that sort of stuff, but I understand that other people might not be.I'm not saying people can't understand it with the decimals. But maybe the casual fan would be more accepting if it was presented in a more traditional baseball stat format. I'm simply in favor of presenting everything as a rounded, whole number. I like how baseball-reference does it for the most part. Otherwise, I have nothing against decimals.

  14. Anonymous

    >1 Thing I have thought about WAR is that a "replacement level" defender is actaully probably better than an average MLB fielder because players make it to the bigs with their bats, but it seems like WAR treats a replacement level players as being well below MLB average at everything. The same thing may also be true of baserunning.

  15. Michael E Sullivan

    >Lots of real life replacement players turn out to be league average or better. Every star got their first callup at some point, and many of them posted high WAR seasons in their rookie year.Replacement level is an average. What does the *average* late season call-up, or minimum salary waivers pickup produce? For every prospect that has great numbers, there are a bunch that go back to the minors before the end of the year because they were playing *below* replacement level.At least the B-R probably does do what you say to an extent. They calculate the overall value of replacement players relative to league averages and use that as one input to their WAR equation. Individual players fielding, baserunning, gidp, and batting runs are not compared to some replacement level, they are compared to the league average. If UZR or TZ says somebody is worth 16 fielding runs, that means they saved 16 runs more than the average player at their position that year did in an equivalent number of games/outs/opportunities. After totally all the categories up, the overall difference between replacement level and league average is added in, to get to WAR. If you ignored this factor, you'd have WAA, or wins above average.WAA might be a better metric for the hall in some ways, as average play counts for zero, and below average play hurts you. The advantage of WAR as a career stat is that it doesn't penalize you for sticking around past your prime unless you really don't belong in the majors and are easy to replace.

  16. Anonymous

    >I am a non-Mets fan who stumbled into here through B-R's link on their blog, and I would like to say I am extremely impressed by the thorough breakdown you gave between the differences in WAR on B-R and Fangraphs. I also am extremely impressed by the high level of discourse in the comments section.

  17. Patrick Flood

    >@ Anonymous 1:27Thanks. Seeing this on B-R was cool, mostly because I spend so much time messing around on that site everyday.And I agree, most of the comments here tend to be well thought out, particularly in this thread.

  18. Pingback: WAR problems: Part Two | PatrickFloodBlog.com

  19. Pingback: A Simple Kind of Fan 1.18.11 Edition | PocketDoppler.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s