Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

Big Game, Big Data: How Football is Being Transformed by Big Data

The 2013 Oakland A’s, says Allen Barra in the Atlantic, not the 2002 team depicted in the film, are the real Moneyball team. Given that football is more popular than baseball, is there a football Moneyball equivalent to the data-driven strategy exemplified in the film about baseball?

If there is one, it’s probably a team in fantasy football.

Traditional football data is abundant. A site called will tell you that 73% of the time, the team picked by oddsmakers win, and that the team that scores first has won 75% of the time. (You will also learn the popular factoid that the average cost of a 30-second Super Bowl ad is $3.5M.) These conclusions are probably not drawn from Big Data.

It was in the SAP Business Trends ezine that we learned that football hero Marshall Faulk (NFL MVP in 2000, Offensive Player of the Year in 2001, Hall of Fame 2011) is a fan of the fantasy football Player Comparison Tool . The Player Comparison Tool is a joint SAP / Intel venture launched in August 2013. Hosted exclusively at (Alexa rank in the U.S = 77), the Player Comparison tool can “generate its analysis based on:

  • Past performance
  • Strength of schedule
  • Consistency
  • Upside
  • Intangibles

According to its sponsors, the Player Comparison Tool uses a Big Data software suite featuring Intel servers and SAP Hana and is visualized using SAP Lumira. Other fantasy football “infrastructure” includes Hadoop. Sounds like Big Data.

Supersize My Data

Still, some researchers believe current analytics are comparatively shallow. Professor Aaron Clauset (a University of Colorado Buffalo @aaronclauset) and collaborator Sears Merritt (@searsmerritt) looked at 1.25 million scoring events across 40,000 games. To some data scientists, this might not qualify as Big Data, but their insights might be of interest to fans of sports betting who are accustomed to spreadsheet-style DIY analytics.

As Clauset told Slate’s Joel Warner,

I’ve never really been a sports person. . . I always took a dim view of data analysis of sports statistics. A lot of it tends to focus on numbers about the players or about the teams with uncertain relevance to game outcomes or game dynamics.

Clauset and Merritt’s mathematical model predicted outcomes for college and pro football, NHL and NBA games. The model’s results were better than’s pregame betting odds and similar to Bovada, a live betting web site.

So what is to be made of traditional sports commentaries? Would Peyton Manning’s greater experience in reading defenses lead to his outperforming Russell Wilson? What is the effect of a single defensive player injury? Clauset believes is there are so many factors that contribute to an outcome that sports events are similar to coin flips: they behave as though they are random processes.

Clauset and Merritt have also studied competitive online games, where the underlying digital framework provides the opportunity for far greater data gathering, both in quantity and precision. Merritt describes the goal of this research:

Competition is ubiquitous in complex social systems, from informal online environments to professional sports, to economic interactions between firms. Traditional studies of competition use theory and small-scale controlled experiments to study the outcomes of competition, not the dynamics that occur within them. Here we are studying the dynamics of competition using a rich and vast data set from the video game Halo: Reach, novel quantitative methods, and big data processing systems (e.g. Hadoop, Hbase). There are two goals of this research. The first is to produce data-driven mathematical models that contribute to a general understanding of the interactions between participants and the environment. The second is designing new algorithms and tools that have the ability to predict, influence, and control the dynamics in these systems. In doing this, we will gain the ability to systematically design competitive social systems to the preferences of its participants as well as identify interesting behaviors that take place within them.

Pair this analysis with the social AI of today’s massive multiplayer games and there’s little doubt that you’re on Pro Big Data turf.

Their paper was published in the Journal for Quantitative Analysis in Sports – a journal of the American Statistical Association.

Fantasy Fan Fodder

Fantasy football leagues have grown so rapidly that there are now an estimated 25 million participants generating more than $1B annually. According to the SAP/Intel team, surveys of these fans demonstrate an appetite for real-time data to improve their league performance. Hall of Fame wide receiver Jerry Rice was recruited to score the point. A fully persuaded Rice told eWeek that “With all the stats and data the fans now have, as opposed to when I played, it’s amazing all the opportunities that are now available.”

Data volume and velocity is increasing. The systems behind fantasy

football develop standings and individual player statistics from “more than 5 million sources,” according to the eWeek report. Each game captures another 20,000 data points, as specific as a quarterback’s completion rate on third-down plays run to the left side of the field in the fourth quarter, a story in SFGate gushes.

Coach’s Coach

Motley Fool’s Jake Mann was somewhat less impressed with the current crop of fantasy football tooling. What brought out his fist pump was a product from Competitive Sports Analysis (CSA), part of Gannett’s USA Today Sports Media Group. Sure, scoutPRO help fantasy football and baseball fans improve their rosters. Mann used scoutPRO himself to help win his fantasy league. But Mann was more impressed by a demo of scoutPRO’s Coaching Edition, which is being tested by at least one NFL team.

This version of scoutPro can be customized for players, style of play (e.g., spread offense vs. wishbone), optimal play calling (when to use a wildcat formation) and division scenarios that develop as playoff matchups become clear. More data on the performance of individual players could improve outcomes for contract negotiations, trades, even changes of position. Mann also believes the Big Data model could be applied to improve recruiting for college teams, where standardized performance data on high school athletes is difficult, if not impossible.

High School Player Evaluation: Next Frontier?

Wired Sports, Streamed Live

What most prognosticators fail to see, though, is the ultimate data-enabled football game. Imagine real time data streaming from every player’s helmet while the Big Game is in progress. Read out foot-pounds of energy expended in every tackle or block, compared to every past collision of those players. See real time predictions for every facet of every play in multiple dashboards, updated as the game proceeds. See optimum receiver placement highlighted as defensive backs change position.

Football’s play-stop-play format is highly conducive to data-driven strategy – more so than more fluid sports like hockey, basketball, soccer or baseball. Those more fluid sports have called plays, but none have football’s clearly delineated play-by-play format. This format enables a data reset between football’s plays, allowing for strategy and counter-strategy to incorporate all previous play results.

Then imagine all this in the hundred fantasy leagues played simultaneously. That ain’t no trash talk.

Mark Underwood writes about knowledge engineering, Big Data security and privacy.


Related Posts