Where does your team stand...
This is the third and final in a series where I take data from the 2019 season and conclude which variables affected NFL franchises the most, as well as other forms of analysis.
Click here for all other articles in the series
The 2020 NFL season is upon us, and despite my reservations about the season starting let alone finishing, now is an appropriate time to group the NFL teams in the 2019 season. For this statistical analysis, I will create a cluster dendrogram using the variables win percentage, point differential, OSRS (Offensive Simple Rating System), and DSRS (Defensive Simple Rating System).
Important Notes:
The main purpose of this article is to group teams based on their similarities in the variables mentioned above. Skip to Model Interpretation if you do not want to read the heavily statistical-oriented sections.
Only regular-season statistics used All data from Pro Football Reference
As previously mentioned in my article “Does Strength of Schedule Impact Win Percentage in the NFL?”, win percentage is the proportion, in a decimal format, of games an NFL team won during the 2019 regular-season. The distribution is typically approximately normal, symmetric, and centered around 0.5 as seen in the histogram.
As previously mentioned in my article “Evaluating the Predictability of Win Percentage with Point Differential in the 2019 NFL Season”, point differential is the total sum of the margin of victory or margin of loss throughout an entire season. As expected the center is about zero, and the distribution appears to be approximately normal (or perhaps slightly skewed right).
Point differential was removed in the final model
OSRS measures the quality of a team’s offense relative to the average (calculated by Pro Football Reference). Any team above zero is above average and any team below zero is below average. As described by the histogram, the distribution is approximately normal with a potential outlier at 11, the Baltimore Ravens.
DSRS measures the quality of a team’s defense relative to the average (calculated by Pro Football Reference). Any team above zero is above average and any team below zero is below average. As described by the histogram, the distribution is approximately normal with a potential outlier at -9.1, the Miami Dolphins.
Although most including myself would say that there are numerous factors besides a team’s quality that leads to winning, I decided to include it because it loosely does represent the formidability of a team.
I included OSRS because I needed to represent both sides of the football and OSRS was how I could incorporate offense
I included DSRS in the model because it was a simple way to represent the defensive side of the football.
I removed the point differential because all of the other variables were highly correlated with it. In fact, all of the correlation coefficients were at least 0.8.
In preparation for a cluster dendrogram, I had to standardize all of the data points for the variables I had selected. Subsequently, I use the Euclidean Distance formula on every data point for each variable for every franchise against all the other franchises.
Standardization (sometimes known as z-scoring) is taking the individual data point or statistic of a team, subtracting that from the mean and then dividing the difference with the standard deviation.
Eagles win percentage 2019 = 0.563
Mean win percentage 2019 = 0.500
Standard deviation win percentage 2019 = 0.198
Standardized Score for the Eagles = (0.563 - 0.500)/0.198 = 0.317
EaglesW% = Eagles Win Percentage
Repeat this for every possible two team combination. The lower the score the more related the two teams are.
Height is the number of levels in the dendrogram
Complete linkage means that the closer the teams are in the dendrogram the more related they are based on the variables chosen.
Important: The graph is not a team ranking–order does not matter
These teams were the real championship contenders as seen by the fact that 3 of the final four teams are in this group. These teams were typically above average to elite in all of the statistical categorizes the model incorporated.
However, I would not consider the Packers as a championship contender and if I had included point differential they would not have been in this tier because their differential was 63 compared to the 100+ for the rest of the group.
Teams like the Jaguars, Cardinals, and Jets are all teams with young quarterbacks and in many other positions have a young depth-chart. The rest of the teams have top-end talent but were never able to put it together whether it was due to a dysfunctional organization or having their star quarterback retire. However, the fact most of these teams are teams with immense talent that underachieved or young teams is probably a coincidence.
There is not much to say; these were some of the worst teams in the league.
Most of these teams were vying for a wild-card position and had defensively constructed teams. Three of these teams were in the top five for DSRS.
All of these teams were border-line playoff teams that could have potentially hosted a Lombardi if the ball bounced their way, but ultimately they were pretenders–not contenders.
The Ravens 2019 regular season was extremely dominant and as a result, they were essentially elite in every statistic. As previously mentioned, their OSRS was the best in the league at 11 (standardized score of 2.68). Thus, they were not extremely similar to many teams; however, they were somewhat related to cluster one (the contenders).
With a quarterback in Jameis Winston that threw for 33 touchdowns, 30 interceptions, and a defense that looked worse than it was in reality because of the turnovers, the Buccaneers were the anomaly of the 2019 NFL season. Thus, they had their own cluster but were loosely similar to the borderline playoff teams.
Side Note: I decided to interpret the model as having five clusters–not the model.
The NFL is an ever-changing league, and if I ran this model a year from now the output would probably be completely different. Additionally, if I remade this model again, I would have used a different data set and variables that more specifically represent one aspect of football, for instance, variables that represent special teams.
Follow us on Twitter @flyphillyfly2
All data was via Pro Football Reference
All calculations, models, and graphs were produced using R (Studio).
Featured photo from KeithJJ