Tremendous Bowl Prediction Mannequin – In the direction of Knowledge Science

Predicting the winner of the 2019 Tremendous Bowl primarily based on common season information 1966–2019

Matthew Littman
NFL Tremendous Bowl Trophy (Lombardi Trophy) picture supply: Wikipedia

Written By:

Matthew Littman M.S Enterprise Analytics, UCI

Addy Dam M.S Enterprise Analytics, UCI

Contributor: Vedant Kshirsagar M.S Enterprise Analytics, UCI

The Tremendous Bowl is an enormously standard sporting occasion that takes place annually to find out the championship staff of the Nationwide Soccer League (NFL). Thousands and thousands of followers collect round televisions on a Sunday in February to have fun this de facto nationwide vacation. Broadcasted in additional than 170 international locations, the Tremendous Bowl is among the most-watched sporting occasions on the earth. That includes elaborate halftime exhibits, celeb appearances and hilarious commercials that add to the enchantment. After greater than 50 years of existence, the Tremendous Bowl has change into a legendary image of American tradition.

With an everyday season that begins in September, the Tremendous Bowl closes out the season. The telecast of final 12 months’s Tremendous Bowl LIII drew out a worldwide TV viewers of about 98.2 million viewers. This makes it one of many trendiest subjects of the 12 months. For that reason, our staff determined to construct a prediction mannequin to forecast the winner of Tremendous Bowl LIV (54) utilizing the info revealed on the official Nationwide Soccer League web site, in addition to a number of others.

The Tremendous Bowl is a big celebration occasion for Individuals. With 98.2 million viewers worldwide, this makes the Tremendous Bowl essentially the most watched sporting occasion within the US. Main companies are shelling out over $5 — $5.5 million for a 30-second industrial and the worth continues to be competing to extend. These big numbers and keenness for the sport introduced our consideration in direction of this challenge. This 12 months additionally marks the 100th anniversary of America’s favourite sport.

NFL 100 12 months anniversary brand. picture supply: Wikipedia

We used three totally different web sites to gather all of our information:,, and Using the Stunning Soup package deal in Python, we started to scrape. We wished to gather all stats from the previous, beginning with the primary Tremendous Bowl of 1966, main as much as the current (Week 13 of the NFL). Beginning with, the stats fell below the two classes of Offense and Protection with 11 tables for offense and eight tables for protection., the official NFL web site, was our principal useful resource to scrape the historic stats resulting from their intensive stat file and cleanliness. The one downfall for the web site was that it didn’t include the data of wins, losses, and ties for every staff. Our challenge required these necessary metrics, so we included the info from a web site by the identify of Professional Soccer Reference. After scraping the data for every staff relationship all the way in which again to 1966, the very last thing that we wanted was to know which groups really gained the Tremendous Bowl from the previous. After gathering all of this information, we merged the knowledge primarily based on the Staff and the Yr as a way to have every row change into one staff from one 12 months and all of their corresponding stats. After sorting by ascending 12 months, our dataset contained 1579 rows and 242 columns and virtually 400ok information factors. It was prepared for cleansing.

Determine 1. Scraped information sources
  • Column names weren’t intuitive and wanted to replicate the place the knowledge got here from.

After additional exploration, it was obvious that sure stats weren’t recorded till sure years. For instance, many of the protection and kicking stats weren’t recorded till 1991. The offensive line stats weren’t recorded till 2009. There have been 89 of our 242 columns that contained lacking information as much as all totally different years.

Every staff’s stats are impartial occasions relative to that staff for a particular 12 months. Which means that all calculations or comparisons have to be carried out after grouping by 12 months. We didn’t drop any columns but, as we deliberate to construct the mannequin to pick out solely crucial variables.

The very first thing we checked out had been the Tremendous Bowl winners over the previous 53 years from 1966 to 2018. Determine 2 exhibits that the Pittsburgh Steelers and the New England Patriots are main the pack with 6 wins every, adopted by the San Francisco 49ers and the Dallas Cowboys with 5 wins every; The New York Giants and the Inexperienced Bay Packers are shut behind with four wins every. The Washington Redskins and Denver Broncos every sit at three wins, adopted by the Oakland Raiders, Miami Dolphins, and Baltimore Ravens who every have 2 wins, and the Backside 11 groups every have one win.

Determine 2. Rating of Tremendous Bowl winners from 1966 to 2018

Determine three beneath exhibits the distribution of wins and losses of all of the groups during the last 53 years. Every crimson circle represents a staff’s common season file for a non winner of the Tremendous Bowl, whereas the inexperienced circles characterize the Tremendous Bowl winners. Be aware that a few of them are stacked as a result of groups can have the identical data. Discover that there are not any groups which have gained the Tremendous Bowl with fewer than 9 wins, besides the 1982 Redskins, however there have been solely 9 video games in that season resulting from a strike. There are two key factors labeled in determine three; the 1972 Miami Dolphins and the 2007 New England Patriots. In 1972, the Miami Dolphins went undefeated within the common season and gained the Tremendous Bowl. After their spectacular season, the Dolphins got here up with a practice to have fun when all groups had misplaced at the least as soon as in your entire season, which means they’d stay as the one undefeated staff. In 2007, the New England Patriots almost broke that 35 12 months custom as they gained all 16 video games, however sadly misplaced the Tremendous Bowl. It is very important be aware that there was once solely 14 video games within the common season, however in 1978 this modified to 16 video games.

Determine three. Tremendous Bowl wins and losses from 1966 to 2018

landing statistics, there are eight methods to make a landing in a soccer recreation; four from the offense and four from the protection. As proven in determine four, in your entire historical past of the NFL, Receiving and Speeding Touchdowns make up nearly all of all touchdowns, equivalent to 55% and 36% of the full, respectively. That is a median throughout all groups no matter Tremendous Bowl winners or losers. Curiously sufficient, when taking a look at particular Tremendous Bowl profitable groups and their averages, the central tendency is round 55% and 36%, however the ranges differ vastly. This leads us to imagine that the groups that gained in these years had been doing one thing totally different than what everybody else was doing and had been both operating extra or throwing extra relying on the 12 months. This exploration exhibits us that the p.c of touchdowns for a staff for dashing or receiving might be an necessary predictor of the Tremendous Bowl winner.

Determine four. Distribution of eight sorts of touchdowns

We checked out a number of statistics that differentiate Tremendous Bowl winners from everybody else. It’s pure that the win rely for the common season is a giant issue for deciding who will win the Tremendous Bowl. Traditionally, Tremendous Bowl winners make three.18 extra Offense Speeding Makes an attempt per recreation on common. This means that groups ought to try extra operating performs. Offense Passing common can also be zero.75 yards larger amongst Tremendous Bowl winners than amongst non winners. The means that groups ought to run performs the place they throw the ball farther to extend their probabilities of profitable.

The stark distinction between winners and losers for Offense Sport Stats Turnovers means that turnovers are an enormous differentiator in terms of who wins the Tremendous Bowl. Lastly, Protection Passing Sacks are 7.15 factors larger amongst Tremendous Bowl winners in comparison with everybody else. Robust defenses result in extra sacks, which may doubtlessly result in extra turnovers!

Desk 1. Comparability of key statistics between Tremendous Bowl winners and losers

After exploring the entire information, it was tough to seek out traits or correlations amongst the entire variables as a result of sheer quantity. With over 240 stats, the perception that could possibly be gained was principally primarily based off of prior data of the game. This recreation sense helped direct our consideration to these fields.

The primary difficulty that wanted to be addressed inside our dataset was that of the lacking information. The issue with dropping these columns that contained lacking values is that lots of them seize essential points of the sport and this negates info for the years we did have worth. Filling these values with the technique of their respective columns didn’t make sense as a result of some columns contained about half lacking values and half current values. This was dependent upon when the NFL began recording them. Utilizing the imply would severely skew the outcomes of these columns, however most significantly, staff efficiency varies vastly all through the years and a imply doesn’t replicate that variation. The lacking stats ought to replicate the groups efficiency for that 12 months which led us to make use of extra superior predictive methods to fill within the lacking information.

Linear Regression

The primary method used a linear regression the place the dependent variable was one of many columns that contained lacking information, and the impartial variables could be the opposite columns. The mannequin educated on the prevailing information for these columns as such, every remark that was lacking could be predicted primarily based off of the opposite columns for that staff and that 12 months. It needs to be famous that for the entire predictive imputation strategies, the info was normalized inside annually as a way to alter for scaling points when predicting. To additional increase predictive accuracy, a way often known as recursive function elimination was carried out for every column. Recursive function elimination (or RFE) is a approach to scale back the variety of columns to solely the related columns. It does this by taking a subset of the entire columns after which permits you to assign a call operate that decides which columns of that subset are crucial columns. In our case, we used a logistic regression which was primarily based on returning these variables with the best coefficients. These coefficients measure the general influence of that variable on the dependent variable. After eradicating the least impactful column amongst that subset, the method repeats by taking a special subset. It repeats this course of till the variety of necessary columns which can be left equal the variety of columns that the consumer specifies, which in our case was 20. Which means that every column with lacking information that was to be predicted used a special set of columns that had been particular to that column for its linear prediction. After predicting the entire lacking information factors, the factors needed to be re-aggregated with the previous dataframe. Sadly, after evaluating the prevailing values to the expected values, it was clear that the expected outcomes weren’t in keeping with the prevailing values and due to this fact couldn’t be used.

Okay-Nearest Neighbors

After session with a earlier statistics professor, Mohamed Abdelhamid, it was prompt to make use of the strategy of Okay-Nearest Neighbors (or KNN). This system seems to be at each remark as a set of all of its attributes and plots every remark in n-dimensional area, the place n is the variety of columns. For the values which can be lacking, KNN makes an attempt to resolve this by discovering the minimal distance between the present remark with its lacking worth and its Okay-nearest neighbors, the place ok is the variety of neighbors that you’re in search of. An instance is proven in determine 5 the place the black worth is the remark who’s attribute we want to impute.

Determine 5. KNN visible for classification as a way to impute lacking values.

As soon as the variety of neighbors has been set, the algorithm takes the common of the entire neighbors values for the worth you are attempting to switch. In our case, we used the 5 nearest neighbors. If a number of columns want imputed values as in our case, the algorithm types the columns that want changing from least quantity of values wanted to most quantity of values wanted. This algorithm solely will get higher with much less lacking values for a column, due to this fact this sorting helps by doing the most effective predictions first. This methodology labored very effectively in idea, however once more couldn’t be used as the expected values weren’t in keeping with the values that existed already.

The 2 earlier imputation strategies had not labored and a closing methodology named a number of imputation by chained equations (or MICE) was tried. This methodology is also called “absolutely conditional specification” or “sequential regression a number of imputation”. In Azur Melissa’s article (2011) about MICE, she explains that this method has been used for information that’s lacking at random. Our information is lacking not at random (MNAR), however additional analysis has proven that it relies on the dataset and that others have used it for information that’s MNAR. The python package deal, “fancyimpute”, was utilized for the MICE algorithm of this challenge.

MICE might be regarded as a mix of the linear regression method that we got here up with earlier and the KNN algorithm. It takes every column because the dependent variable and runs a regression, on this case, a Bayesian ridge regression. That is carried out as a way to use the opposite attributes associated to that row to foretell that worth. The consumer specifies what number of occasions they want the algorithm to cycle after which it creates one information body per cycle. Intimately the algorithm works as follows:

  • Take every column with lacking information (one out of our 89), and run the MICE algorithm for every of these columns.

Determine 6 provides a visible for a way the algorithm works. The quantity of cycles has been set to three thereby creating three information frames reasonably than a max of 1000 in our case.

Determine 6. Visible illustration of MICE algorithm with cycles = three for one column (Stef Van Buuren, 2011)

Evaluating the outcomes of this methodology proved very promising. Many of the values made sense inside their respective rows and since the info was normalized earlier than feeding into the algorithms, the ensuing outputs adopted the essential regular distribution. With the expected values principally falling inside 1 commonplace deviation to the left and proper of the imply or between .three and .7.

Determine 7. Exhibits the conventional distribution of the output prediction imputations. picture supply:

Now that the lacking information has been dealt with, it was time to deal with the unbalanced information difficulty. There have solely been 53 previous Tremendous Bowl winners and with 32 groups per season (there have been fewer groups in earlier years), this results in an unbalance of 1525 groups being “non-winners” and 53 groups being winners. With solely about three% of our complete inhabitants being winners, it is extremely tough to have a mannequin predict these values with out intervention. That is just like the issue with labeling transactions as fraudulent the place many of the transactions are actual and there are only some which can be fraudulent. Granted there’s not an enormous price related to predicting a Tremendous Bowl winner as a non Tremendous Bowl winner, like with the transactions. This unbalance makes modeling tough. Under are some strategies to cope with unbalanced information and their efficacy:

Desk 2. Abstract desk of sampling strategies described in unique SMOTE analysis paper (Chawla,Bowyer,Corridor,Kegelmeyer, 2002)

After researching the attainable strategies of sampling, whether or not it’s oversampling the minority (doesn’t result in higher minority detection) or undersampling the bulk, or some mixture of the 2, the Artificial Minority Oversampling Approach (or SMOTE) proved to be the most suitable choice.

Along with undersampling the bulk as talked about in (Chawla et al, 2002), the minority oversampling methodology is the true genius behind this sampling methodology. The primary merchandise is to determine how a lot oversampling must be carried out. In our case, we wished a 50 50 cut up between “winners” and “non winners”. SMOTE was carried out utilizing the “imblearn” package deal in Python.

For instance functions, we’ll oversample the minority by 200%. This is able to imply that every of the minority factors might be chargeable for creating 2 extra artificial factors as a way to triple our rely of minority information factors. Subsequent, the consumer should determine what number of nearest neighbors to make use of for the era, in (Chawla et al, 2002) they used 5. After these choices, the algorithm does the next:

  • Choose the primary minority level to make use of.

(Be aware these nearest neighbors are solely the minority neighbors, not all factors)

  • Draw edges between these 5 factors.
Determine eight. Artificial samples visible from Kaggle sampling method article (Rafjaa, 2017)
Desk three. SMOTE outcomes earlier than and after

As talked about above within the information imputation-linear regression portion, recursive function elimination (RFE) is a handy method of lowering the variety of columns to solely people who matter most. Now that the lacking information has been crammed in and the unbalanced minority has been balanced, we performed RFE to filter out the highest 20 most impactful columns for who will win the Tremendous Bowl. They’re as follows:

Determine 9. Prime 20 most impactful columns after utilizing RFE

With the 20 most impactful columns chosen, the entire coefficients make sense aside from the offensive line stats. These stats solely began being recorded in 2009 and due to this fact had essentially the most lacking values. When these columns are taken out; nevertheless, the coefficients of the opposite columns now not make sense. Wins grew to become unfavorable, and as beforehand talked about in our exploratory evaluation, this could clearly be a optimistic coefficient contemplating the extra wins you’ve the extra possible you might be to make it to the Tremendous Bowl. Our speculation is that these three columns are getting used as noise added to ensure that the opposite columns to do a greater job and keep away from overfitting, due to this fact we left them in our mannequin. To validate, we checked a correlation desk to make sure there was no multicollinearity among the many variables. Although there’s a sturdy unfavorable correlation between profitable and dropping, we determined to maintain each of those as indicators that our mannequin is performing accurately. This correlation desk might be discovered beneath.

Determine 10. Correlation desk with 20 columns used for mannequin

After utilizing RFE, we went again and carried out a two pattern impartial T-test between Tremendous Bowl winners and non winners and chosen the columns that it labeled as vital. We ran simply these chosen columns by way of our mannequin in addition to filtering the dataset by these columns after which performing RFE on the T-test filtered outcomes. Total, the most effective methodology was to only use RFE when it comes to accuracy and recall.

Now we have mounted the entire lacking information, our unbalanced minority, and chosen the 20 most impactful columns that we wished to feed into the prediction algorithm. We selected to make use of a logistic regression resulting from its explainability and ease of use, although a neural community may have potential given the quantity of numeric information that we have now. This might probably be explored sooner or later.

We used a 20–80% practice validation cut up the place we educated on a random pattern of 80% from 1966–2018 after which validated our mannequin on the opposite 20%. This produced a check accuracy of 95%.

Desk four. Confusion Matrix from Logistic Regression displaying precise versus predicted outcomes
Desk 5. Classification report from Logistic Regression validating accuracy, recall, and precision

With a recall of 89% for non winners, and 100% for winners in addition to an general accuracy of 95%, our mannequin errors extra on the aspect of predicting winners when they’re actually non winners. It additionally predicted zero non winners as winners which signifies that our mannequin is definitely extra prone to say that somebody is a winner when they aren’t. That is really favorable when contemplating an precise prediction for a staff in the course of the season. We’d reasonably have our mannequin give extra individuals a “potential” likelihood reasonably than labeling everybody as non winners. With a really promising confusion matrix, we plotted our ROC curve which confirmed an space below the curve of 95%.

Determine 11. ROC curve exhibits an AUC of .95

Predicted Winners

After coaching and validation, it was time to place our mannequin to the check by attempting to foretell the present 2019 season. Our mannequin is ready to rank every staff with a Tremendous Bowl profitable likelihood by their likelihood of profitable with the best p.c groups on the high as seen in Determine 12. The San Francisco 49ers and the New England Patriots are predicted to have 96% and 76% likelihood of profitable the Tremendous Bowl, respectively. Within the present 2019 NFL playoffs standing, the 49ers are slated to be within the Wild Card slot, with a file of 10 wins and a couple of losses. The Patriots are one of many division leaders, additionally with file of 10 wins and a couple of losses. Additional, the entire groups that our mannequin gave p.c likelihood to, are all of the groups slated to be within the playoffs, or groups with a superb likelihood to make the playoffs. The one exception being the Carolina Panthers.

Determine 12. Winners that our mannequin predicts vs. present NFL outcomes

Predicted Losers

Not solely does our mannequin predict the probability of groups changing into the champions with astounding accuracy, it additionally predicts the groups with low to zero likelihood of profitable the Tremendous Bowl. As seen in Determine 13, our mannequin predicts the Lions, Cardinals, Falcons, Giants, Dolphins, and Bengals to have zero likelihood of profitable the Tremendous Bowl and this is smart contemplating they’ve already been eradicated from even making the playoffs. Our mannequin additionally predicts a number of different groups to have zero likelihood and these groups are what’s often known as being “within the hunt”. Which means that they nonetheless do have an opportunity of constructing the playoffs if they will enhance their data throughout the final four video games of the season. Our mannequin doesn’t assume that these groups can try this. For Determine 12, the underside groups have been reduce out to enhance visibility, and for Determine 13, the center groups had been reduce out. This explains the totally different staff names listed in every of the figures.

Determine 13. Losers that our mannequin predicts vs. present eradicated NFL groups

To conclude, the most effective methodology that gives our mannequin with the best prediction accuracy is a sequential method utilizing MICE, SMOTE, RFE, and Logistic Regression. This system provides us a zero.95 space below the curve, a recall price of zero.89 for non winners and 1.00 for winners. As well as, the F1 scores are .94 for non winners and .95 for winners. This statistic represents the harmonic imply of precision and recall

Among the most vital attributes in predicting the winner of Tremendous Bowl 2019 are:

  1. Proportion of offense receiving touchdowns

For future initiatives, we want to discover the feasibility of utilizing neural networks with our information, in addition to predicting matchups on the recreation ranges using participant statistics, climate information, whether or not it’s a house or an away recreation, and staff matchup historical past.

As well as, we predict that a quantitative method to damage prediction, enabling measures for prevention, could possibly be helpful for gamers in addition to for staff administration. This is able to be difficult and will not be attainable contemplating many accidents occur as accidents reasonably than degradation.

Utilizing the identical dataset, it will be fascinating to find whether or not a superb protection or a superb offense issues extra for profitable the Tremendous Bowl.

Lastly, we predict that constructing a suggestion system for getting/buying and selling gamers or constructing fantasy soccer staff, just like Netflix’s superior suggestion system could possibly be extraordinarily useful. A mannequin resembling this must seize staff dynamic and what standards make up good groups, then consider a present staff and advocate gamers primarily based on the staff’s weaknesses.

All of our code might be discovered at


(Presently engaged on making an online app that may use the code)

Tremendous Bowl Historical past

Historical Editors. “Tremendous Bowl Historical past.” Historical, A&E Tv Networks, 11 Could 2018, https://www.historical activities/super-bowl-history.

Tremendous Bowl Viewership

Perez, Sarah. “Tremendous Bowl LIII Set Streaming Information, Whereas TV Viewership Noticed Huge Drop.” TechCrunch, TechCrunch, 5 Feb. 2019,

A number of Imputation by Chained Equations

Drakos, Georgios. “Dealing with Lacking Values in Machine Studying: Half 2.” Medium, In the direction of Knowledge Science, 5 Oct. 2018,

Azur, Melissa J, et al. “A number of Imputation by Chained Equations: What Is It and How Does It Work?” Worldwide Journal of Strategies in Psychiatric Analysis, U.S. Nationwide Library of Medication, Mar. 2011,

Buuren, S. van, and Okay. Groothuis-Oudshoorn. “MICE: Multivariate Imputation by Chained Equations in R.” Stef Van Buuren, 1 Dec. 2011, https://stefvanbuuren.identify/publication/2011-01-01_vanbuuren2011a/.

Artificial Minority Oversampling Approach

Chawla, Nitesh, et al. “SMOTE: Artificial Minority Over-Sampling Approach.” View of SMOTE: Artificial Minority Over-Sampling Approach, Journal of Synthetic Intelligence Analysis, June 2002,

Rafjaa. “Resampling Methods for Imbalanced Datasets.” Kaggle, Kaggle, 15 Nov. 2017,

Recursive Function Elimination

Li, Susan. “Constructing A Logistic Regression in Python, Step by Step.” Medium, In the direction of Knowledge Science, 27 Feb. 2019,

Keng, Brian, et al. Diving into Knowledge, 20 Dec. 2014, https://weblog.datadive.web/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/

Leave a Reply

Your email address will not be published. Required fields are marked *