Our Data process
A linear regression is a statistical form of predictive analysis. It involves plotting a data set of one or multiple explanatory variables, like the height of a human being and the partnered response variables, like the weight of a human being. With technology, we can create a line from this data that more or less predicts the value of the response variable using any given value for the explanatory variable, and based on the plotted data set. The correlation between two variables may be strong or weak, positive or negative, or even take on other shapes besides linear.
We can simplify our project into two variables: PRE-DRAFT SUCCESS (explanatory) and POST-DRAFT SUCCESS (response). By plotting pre-draft success against post-draft success, we can find out which statistics in college and the NFL Combine are good predictors of post-draft success, or have high correlation.
Before we ask computers to do much of the heavy statistic work for us, we must find enough data to determine a player's PRE-DRAFT SUCCESS and POST-DRAFT SUCCESS. It is often said that for every explanatory variable in a regression, you should have at least 50 data points. With upwards of 30 explanatory variables, we needed at least 1500 players worth of data. This took the form of the 2012, 2013, and 2014 draft classes (as well as our explanatory, pre-draft stats for 2022).
Below you can find information about specific statistics throughout our process AND our full sheet of data.
We can simplify our project into two variables: PRE-DRAFT SUCCESS (explanatory) and POST-DRAFT SUCCESS (response). By plotting pre-draft success against post-draft success, we can find out which statistics in college and the NFL Combine are good predictors of post-draft success, or have high correlation.
Before we ask computers to do much of the heavy statistic work for us, we must find enough data to determine a player's PRE-DRAFT SUCCESS and POST-DRAFT SUCCESS. It is often said that for every explanatory variable in a regression, you should have at least 50 data points. With upwards of 30 explanatory variables, we needed at least 1500 players worth of data. This took the form of the 2012, 2013, and 2014 draft classes (as well as our explanatory, pre-draft stats for 2022).
Below you can find information about specific statistics throughout our process AND our full sheet of data.
Check out our data: |
|
pre-draft statisticsOur pre-draft statistics contained stats from each individual player's NFL Combine performance and college career. The NFL Combine is an organized pre-draft skills test that is put on by the NFL. Scouts and front office members will watch players from the upcoming draft run drills that test speed, strength, and technique. College stats included position-specific stats from the two football seasons prior to the player entering the draft. These were collected from all over the internet as there is no central database for all of these statistics - many players attended small, unknown colleges and it took specific google searches for specific players and stats.
Vertical Divider
|
Post-draft statisticsFiguring out how to quantify a player's NFL success was one of the first hurdles of this project. After realizing that our expected data collection may have just doubled, we began looking for a singular way to quantify an entire career. This led us to a stat called PFF (Pro Football Focus) Grade. This is a grade that is created by statisticians at PFF that watch the film and grade each player's impact on every play. This grade is averaged over a game or season, giving a overall PFF Grade. We settled here because it was so much laborious work put into a single descriptive stat that essentially judged how 'good' a player is, which is exactly what we needed. We averaged each player's best five qualifying years in their career and weighted each grade based on how many years they qualified. Essentially, the PFF Grade is our response variable.
|