Over the past month I have been watching a lot of baseball. For the record, I always watch baseball in October and usually watch the teams that I follow go down, one by one. This year was extra exciting due to the participation of my hometown Toronto Blue Jays, which meant that I got to root for a team I actually cared about as opposed to teams that weren’t the Yankees, which is what I have done for the last 20 year or so.
I like baseball for a bunch of reasons, but as a self-described data nerd, one of the key reasons was that baseball was the first sport to really embrace the use of data and statistics. I’m not talking about Bill James, but rather the fact that information is available about the batting averages of Babe Ruth and Ty Cobb and pitching stats for Cy Young and Christy Mathewson creates a continuity of information through the last 100 years and dates back to a time when those things were calculated by hand.
As I watched games, I was both impressed and unsatisfied by the quantity and quality of data that was provided during the broadcast. The problem with providing people with information is that they want even more information…
First, the things I liked. I am always impressed how each network that shows games is able to provide a compact but complex graphic in one corner of my screen that tells me the inning, score, outs, base runners, current count to the batter, number of pitches thrown, speed of the pitch and even the pitcher and batter’s names. I love when I get multiple views of the strike zone – front, side, top – to see where the pitch really was. (I’m a couch home plate umpire, calling strikes when I see them that way, so always nice to see that I was right in my call.) I like the stats that come up when the batter comes up so I can see their hits and RBIs and homers and strikeouts. There can also be fancy decompositions of their stats – results in the latter part of the season, or with runners in scoring position that give me some sense of how well they are currently playing or how they play under pressure.
So what could I possibly be missing? I’d like to know what part of the field each batter hits to, with what frequency and how many bases they got. I’d like to know what kind of pitches have been thrown – 2 or 4 seam fastballs, curves, sliders, and so on – how many, actual vs. intended location, how many have been strikes and balls by pitch type. But even with all that, there is still the undefined, unmeasurable things that make playoff ball so exciting – the young pitcher throwing a great game despite never being in the playoffs before, the average hitter having an all-star series, or the experienced veteran hitting a ball so purely that as soon as he hits it, everyone watching knows, absolutely knows, that it’s a home run.
What stats could ever predict that? Which is why, every October, this data nerd continues to watch as many games as I can.