Tag Archive for sabermetrics

Is Data Ruining Sports?

Who would you rather have: Tris Speaker or Ty Cobb?

Jason Whitlock says that this question cannot be discussed; it can only be answered, thanks to the popularity of the book-turned-movie Moneyball, and sabermetrics, the advanced statistics that baseball fans and writers can now apply to the game as a lens through which to understand and contextualize the game. (Cobb had a better career OPS+, 168 to 157, so I guess he was better.)

someecards.com - Let's see a movie about a baseball genius who leads his team to winning one playoff series in 14 years

Whitlock argues that data is sapping the fun out of the sports. Little Timmy can’t enjoy the game of basketball anymore because nothing is left open to interpretation; there is a “right answer” to every question. Kobe versus MJ. Wilt versus Russell. Jason Whitlock believes it’s not even worth discussing anymore; some pencil-necked geek will inevitably come up with an empirical correct answer.

The problem with Whitlock’s argument is that it absolutely cannot be proven without resorting to data. On what basis does he believe that sports are being ruined for fans? What led him to this conclusion, other than his own personal distaste for advanced statistical measures? Here is some data to suggest that Jason is wrong.

If fans can’t enjoy sports anymore, because of data, how come ESPN keeps seeing excellent ratings for football, baseball, and basketball? When the Yankees played the Red Sox on August 7, it was the most viewed baseball broadcast on ESPN since 2007. The Patriots-Dolphins on Monday Night Football last week “delivered a 10.3 overnight rating, the second highest opening-game rating since ESPN started airing MNF in 2006.” Why are people watching instead of just watching the players’ statistics change in real time, since data has ruined sports?

If fans can’t enjoy sports anymore, because of data, how come attendance in the NBA has not slipped? It has stayed essentially level—around 21 million, near the cumulative total max capacity of all NBA arenas for 41 home games per team—since at least 2004, which is as far back as my NBA attendance data goes. Mr. Thompson and I did some rudimentary analysis of trends in NBA data. Fans aren’t staying home, and they’re not why the NBA is locked out. They’re having fun and enjoying the beauty and the drama of sports. (Data cannot tell you with certainty whether Kobe is going to hit that fadeaway at the buzzer to beat the Spurs; it can only tell you what the odds are.)

If fans can’t enjoy sports anymore, because of data, why is Major League Baseball reporting revenue increases year over year? As MLB reported after last season, the past seven seasons (2004-2010 inclusive) “are the seven best attended seasons in MLB history.” This coincides with the Moneyball era nicely, as the book was published in 2003. MLB revenue in 2010 approached $7 billion for the first time, putting it at around a 6% increase over 2009.

See, in order to prove that Moneyball and Sabermetrics have ruined sports, you’d need to show the world that they are having some sort of quantifiable negative effect. Jason absolutely cannot do that. His argument boils down to fear of needing to defend a position with more intelligence than “well, I just like Kobe Bryant better than Michael Jordan.” The reason people hate data, in sports just as in business, is that it raises the level of conversation and forces them to think more critically about the world.

Jason says we like data because we lack the ability to understand sports viscerally or strategically. I’m not sure what he means. (Oh no, I’m so buried in my data that I can’t tell what defense the Patriots are playing! Is it the 4-3 or the 3-4? Is that called a “blitz?” I can’t tell because, you know, I’m too nerdy to understand football.) This argument is ridiculous. It’s the same thing we hear in analytics for certain dyed-in-the-wool creatives who feel that data is an insufficient way to understand their “art.”

He says, “I saw Player X, and I know he was good, so therefore he’s good.” I’m afraid that’s unrealistic, Jason. See, you have biases. There are things you prefer in players, but that others might not. Errors or flaws you might not see, but that others do. You might see Brett Favre’s greatest game but miss his 20 game-ending interceptions because you were out getting coffee. This is even more likely to be true if your teams or your favorite players (or, if you’re in marketing or UXD, your favorite content/layout/design) are involved. You need data in order to look at the world on an even plane. You think that’s where the fun ends. I’d say that’s where the fun begins.

Ty CobbLet’s go back to Speaker and Cobb. Their advanced statistics are remarkably similar. You could legitimately make a case for either one. Sure, Jason; I suppose a nerd could come to you and say that Cobb had a higher OPS+, and that therefore there is no argument to be made for Speaker. Baseball fans don’t think that way. Speaker won four World Series with the Boston Red Sox; Cobb never won a World Series. Cobb was a terrible leader—perhaps the worst in sports history. His teammates utterly despised him. Yet, baseball fans are far more likely to know Ty Cobb. He was one of the first five inductees in the baseball Hall of Fame, and one could legitimately argue that he was the greatest natural hitter of all time. He is one of two players to accumulate more than 4,000 hits over his career. Despite all of this, there is a strong argument to be made that Speaker, even with his lower OPS+, would be a better player to build around. There is plenty about sports that cannot be quantified, Jason. Data just makes us think a little bit more about the nuances of the games we love.

Here’s another, more current example: Justin Verlander of the Detroit Tigers. A whole bunch of people believe that Verlander, by far the best statistical starting pitcher in baseball in 2011, should be the American League Most Valuable Player. Verlander has a Wins Above Replacement (WAR) of 8.5, which means that if you were to imagine that Verlander were replaced by an average starting pitcher, the Tigers would have won 8.5 fewer games. That is a massive number of wins to attribute to a single player. It’s the best in the league. If you define “value” in baseball as “wins,” you can definitely see how Verlander might be the MVP. But there are plenty of intelligent, knowledgeable Sabermetricians (myself included) who would accept the argument that the MVP should be someone who is on the field every day, playing in nearly every game (whereas starting pitchers only see action every fifth game). It’s a topic of conversation and debate on sports radio regularly. Fans love discussing it, even fans like me who know how statistically dominant Verlander has been. Where would the Yankees be without Curtis Granderson this season? Or the Red Sox without Jacoby Ellsbury at the top of their lineup? You can make legitimate cases for any of these players, each of whom (surprise!) excel in various statistical categories. Could it be that there is more to the MVP race that pure statistics? But Jason, I thought you said that there were no discussions allowed anymore!

I think Jason Whitlock is scared. He is scared that Hall-of-Fame voting in professional sports will someday be reduced to plugging numbers into a computer and seeing who the best players were. (This would guarantee someone like Todd Helton a spot in Cooperstown.) I don’t think anyone, even the great Bill James, would advocate such a hard-line stance. Eric Peterson made this point earlier this month, and I think it was prescient of him to make the distinction, since we’re going to be hearing these anti-data arguments more often as data usage grows, in both sports and business: we like to be data-informed, not data-driven. It’s important for me to know that Ryan Howard’s numbers don’t justify his massive contract, but that doesn’t mean I wouldn’t want him clubbing home runs for my team. (Perhaps that’s the difference between me and the seemingly data-driven Billy Beane, who still hasn’t won the last game of the season.) When I have data to help me understand what I’m seeing, I can put things into context. I can “relationalize” teams, players, and individual plays in new and exciting ways. Yes, Jason, data helps me and many others enjoy sports more than if it were completely up to our eyes.

I think Jason is also is scared that he can’t articulate why he loves John Elway other than “I like him.” I’m not sure why you’re scared, Jason. It seems easy to defend the fact that, even though Peyton Manning has eight more points of career completion percentage, and Tom Brady has a better postseason record, and Dan Marino has more yards, your boy Elway was a winner. He was a better leader than most of those quarterbacks, numbers be damned. Leadership matters, Jason, and it’s not quantifiable, so you’ve got your argument. You’ve got your discussion. And that doesn’t even touch on the physical aspects of Elway’s game that made him special (such as his arm). I could counter by talking about Tom Brady’s decision making, which also isn’t a statistic. (It isn’t just completing the pass that matters; it’s completing the best pass to the best possible target. This will never show on the stat sheet.) At that level—the Montana/Young/Marino/Manning/Brady/Elway level—you’re splitting hairs anyway. We nerds can say, “objectively, so-and-so is the best of all time.” You’re welcome to make a point that isn’t accounted for in the numbers. I don’t see how that should impact your enjoyment of the game or of discussions about the game, other than to make you think.

Maybe you don’t want to be forced to think. If that’s the case. . . tell me, who is ruining the vibrant discussion of sports, you or me?