Friday, April 22, 2011

The Difference Between a Statistic and a Model


Something that has long annoyed me about baseball statistics is the misuse of terms. For instance, we talk about Batting Average and Slugging Percent. Yet, batting average is a percent (the percentage of base hits in a given number of plate appearances that result in either an out, error or hit) while slugging percent is an average (the average number of bases per at bat that resulted in a hit, error or out, not counting an error as a base). But that's really just being nit picky. Who really cares if you use incorrect terms in those regards? It wouldn't really change the way that anybody looks at either statistic.

However, there is a more nefarious misunderstanding of terminology, and it actually tends to infect those who claim to have the best understanding of statistics. That is the difference between a statistic and a model.

A statistic is simply a single measure of some attribute. Batting average is a statistic. It takes the total sample (all outcomes for all plate appearances), isolates a subgroup (plate appearances resulting in a hit, error or walk) and then puts as the numerator a further subgroup (hits). Mostly, statistics don't make value judgments. You can say that the focus on a given statistic makes a value judgment, that is choosing to pay attention to batting average emphasizes that getting hits is more important than taking walks. However, the statistic itself says "hey, here's some information, use it as you will."

A model on the other hand makes value judgments. Models take phenomena and try to make predictions. For instance, OPS is a very simplistic model. Because slugging percentage and on base percentage measure entirely different things, it doesn't make statistical sense to crudely add the two. The value comes from a belief that when you add the two you get a rough model that gives you some sense of a hitter's value that you wouldn't be able to get from either of the component statistics in isolation. Yet, OPS is often referred to as a statistic, as if it was just giving raw information. Yet, it isn't. An OPS of .800 has all sorts of ways of happening. It could be a .300 OBP and a .500 slugging percent, it could be .400 and .400, etc. The model makes a value judgment in assigning equal importance to OBP and slugging pct. When you look at OPS as a statistic, you are forced to think that OBP and slugging percent have equal worth. Whereas with batting average, you can value the statistic as little or as much as you would like, you aren't forced to value anything to any particular degree.

And therein lies the problem, by calling OPS a statistic, people are mislead into believing that value judgment. By calling it a statistic you are lead to believe "hey, I'm not saying anything myself here, that's just what the numbers say." However, if you understand OPS as a model, you can think "hey, models can be inaccurate."

Most of the new "statistics" that the sabermetric crowd has been pushing lately are models, not statistics. There is nothing wrong with building a model, but you must call it a model. Calling it a statistic, while possibly simpler, is inherently misleading. WAR is a model of predicting how many wins a player adds above a replacement level player. It's not only a model, it's a model built out of other models (for instance a model for what a replacement player is worth, the relative merits of a stolen base v. getting caught stealing, etc). Yet sabermetricians tend to say "these are the stats" as opposed to "this is the prediction the model gives." Additionally, when you know you are working with a model, it's easier to gauge how reliable the model is. Models are inherently more reliable the closer they are to the most common values. For instance WAR is probably MUCH more accurate the closer it gets to players who give relatively average component statistics. WAR is probably pretty inaccurate for a guy like Albert Pujols, who usually is near the top 5% in almost every offensive category.

Furthermore, calling a model a statistic shields you from talking about the assumptions that the model makes. For instance, in calling OPS a statistic, nobody really discusses the assumption that on base percent and slugging percent are of equivalent worth. WAR assumes that a given player is just as valuable on one team as he is on another. And that's definitely not realistic. For instance a player who hits a lot of homeruns is much more valuable on a team that has a high OBP and low slugging percent than on a team with a high slugging percent, but low OBP. War assumes that a given player will perform to the exact same level in all playing environments (for instance, flyball hitters are probably much more valuable in the AL East, where all the ballparks are very hitter hitter friendly). These effects are often minor, but can be important at the margins, which is where WAR is used most often. Nobody needs WAR to tell you that Albert Pujols is more valuable than Alex Gonzalez, we use it most where the numbers are close, which is where the numbers are most likely to fall prey to these otherwise small issues.

Now, this would be all good and fine if sabermetricians just called these models stats for convenience's sake. However, it definitely seems that very few of them understand this difference very clearly.

In sabermetric circles, it lately seems as if the more complex you can make a model, the better. Sabermetricians don't really understand that the more complex you make a model, the more things you have to finely calibrate, and the more data you need to make your model accurate. A simple model may make questionable assumptions at times, but it's relatively easy to calibrate. A complicated model needs tons of data and is only really good for the centermost values.

Emmanuel Derman, perhaps the worlds most famous living financial modeler, once said "people often forget the point of models. You can't make a perfect model, because the only perfect model is reality itself, which is too complex, which is why you wanted to make a model in the first place. You have to find the right balance of simplicity, easy to understand assumptions and relevant assumptions. A model with tons of difficult to understand and complex assumptions that may be very relevant is really no better than a simple model with easy to understand assumptions that may not be as relevant at all times. Because the more complex your model gets, the less you are able to understand when it is going to be accurate and when it is going to be inaccurate." His point was, if you build up incredibly complicated models, it often gives a false sense of accuracy, because it is so often difficult to see where the assumptions might be faulty, because there are so many of them. That's basically the philosophical mistake that Long Term Capital Management made. I often feel like that's the mistake a lot of sabermetricians make. They're so obsessed with coming up with the one model (which they call a statistic) to rule them all, to give the ultimate measure of a player's value, that they forget they're building models. They forget to keep track of where they made assumptions and talk about cases in which their model may very well be highly inaccurate. The point of a model is to make relationships easier to understand, not more obscure. Yet that is often what sabermetricians do when they create new versions of WAR. They obscure the relationships they're trying to express in order to create the one model to rule them all, the one model that spits out a perfect measure of value.

So what can be done? Well, as my econometrics professor often said, "if in doubt, return to the simplest statistics possible." WAR is comprised of a lot of simple statistics that don't make value judgments. Return to those and debate their relative importance. And if in doubt return to the simplest statistics of all, things like walks/PA, singles/PA, doubles/PA, HR/PA, etc. Then talk about how those should relatively weigh and how accurate or inaccurate the various models may be, and in exceptional cases, you probably shouldn't rely on the models at all.

The attraction to WAR is that it does this for you, it makes implicit assumptions about these various values. Yet, this is really its downfall as well. People trust these assumptions when they may very well be far from trustworthy. That's what happens when you confuse a model with a statistic.

**Yes I am aware that there are different version of WAR, they all do the same thing I am talking about here, and if anything this further proves my point that WAR is a model and not a statistic.**

No comments: