Adam Arians

Regression Candidate

Session:  Fall 2006

12/11/10

 

 

Introduction

The QB rating is heavily used to value quarterbacks at every level.  Often times, millions of dollars are tied directly to these ratings.  There are 3 different variations of the current formula depending on what league you’re playing in (NCAA, high school, NFL, AFL).  However, all 3 formulas use a linear multivariable formula with the same 4 variables:  Completions per attempt, touchdowns per attempt, interceptions per attempt and yards per attempt.  

 

The purpose of this project is to develop a new NFL QB rating called “QB win rating” using a linear regression.  The purpose will be to better associate winning with a QB’s performance.  Individual QB statistics for NFL teams will be regressed against the winning percentage of their teams. 

 

A strong predictor of wins is not expected since the QB’s performance is only one contributing factor in the game.  Defense, special teams and rushing offense are also very important. 

 

Data

NFL QB team data was taken from 2008 and 2009 for all 32 teams (source:   http://espn.go.com/nfl/statistics/team/_/stat/passing/year), giving 64 win totals for a 16 game season.  The win% will be initially regressed against these variables:  

Response Variable:  Win% = Wins in a Season/16

 

Additional notes:

 

Initial Model:

Win% = α + β1Comp + β2YSack + β3TDs + β4Yards + β5INTs + β6Sacks

 

Model #1

Considering a QB’s passing ability is only 1 critical factor in winning a football game, the initial model does a good job of explaining the variability of winning with R2 = 0.541.

 

Win% = 0.087 + (0.057)Comp + (0.151)YSack + (1.251)TDs + (0.095)Yards + (-5.689)INTs + (-2.461)Sacks

 

Regression Statistics

 

 

 

 

 

 

 

 

α

Comp

YSack

TDs

Yards

INTs

Sacks

Coefficient

0.0868

0.0567

0.1507

1.2513

0.0948

-5.6891

-2.4613

Standard Error

0.4114

0.7320

0.3694

2.8613

0.0446

2.5379

2.6833

t Stat

0.2111

0.0775

0.4079

0.4373

2.1272

-2.2417

-0.9173

P-value

0.8336

0.9385

0.6849

0.6635

0.0377

0.0289

0.3629

 

 

 

 

 

 

 

 

 

R2

SEy

F stat

df

SSreg

SSresid

 

 

0.54075

0.144427

11.18591

57

1.399962

1.188964

 

 

Based on the P-values, the Completion % is the lowest predictor of Winning %.  At first this may seem surprising because this is one of the variables in the existing QB ratings in football.  However, this variable is likely to be highly correlated to the yardage variable (also in the existing QB ratings).  Yards per attempt (plus sacks) and interceptions per attempt have the lowest p-values, thus the best predictors of Winning %.

 

Comp will be removed for the next regression.

 

Model #2

Reducing the model down to 5 factors predicts Winning % just as well as the previous model.  R2 = 0.541.

 

Win% = 0.111 + (0.151)YSack + (1.282)TDs + (0.096)Yards + (-5.756)INTs + (-2.487)Sacks

Regression Statistics

 

 

 

 

 

 

 

α

YSack

TDs

Yards

INTs

Sacks

Coefficient

0.1114

0.1506

1.2823

0.0964

-5.7555

-2.4872

Standard Error

0.2602

0.3663

2.8089

0.0389

2.3682

2.6395

t Stat

0.4282

0.4113

0.4565

2.4814

-2.4303

-0.9423

P-value

0.6701

0.6824

0.6497

0.0160

0.0182

0.3499

 

 

 

 

 

 

 

 

R2

SEy

F stat

df

SSreg

SSresid

 

0.540702

0.143184

13.65592

58

1.399837

1.189089

 

Based on the p-values, Interceptions per attempt and Yards per attempt are still the strongest predictors.  Yards lost from sacks will be removed next since it is the worst predictor based on the p-values.  This is expected since Sacks per attempt will be highly correlated.

 

Model #3

Reducing the model to 4 factors predicts Winning % just as well as the previous model.   R2 = 0.539.

 

Win% = 0.097 + (1.238)TDs + (0.098)Yards + (-5.616)INTs + (-1.503)Sacks

Regression Statistics

 

 

 

 

 

 

 

 

α

TDs

Yards

INTs

Sacks

 

 

Coefficient

0.0965

1.2379

0.0983

-5.6160

-1.5029

 

 

Standard Error

0.2558

2.7870

0.0383

2.3273

1.1056

 

 

t Stat

0.3774

0.4442

2.5646

-2.4132

-1.3593

 

 

P-value

0.7072

0.6585

0.0129

0.0189

0.1792

 

 

 

 

 

 

 

 

 

 

 

R2

SEy

F stat

df

SSreg

SSresid

 

 

0.539362

0.142172

17.27081

59

1.396369

1.192557

 

 

Yards per attempt and interceptions per attempt remain as the strongest predictors of Winning %.  Touchdowns per attempt will be removed next since it is the weakest predictor. 

 

Model #4    

Reducing the model to 3 factors predicts Winning % just as well as the previous model.   R2 = 0.538.  

 

Win% = 0.065 + (0.11)Yards + (-5.555)INTs + (-1.466)Sacks

Regression Statistics

 

 

 

 

 

 

 

 

α

Yards

INTs

Sacks

 

 

 

Coefficient

0.0650

0.1104

-5.5549

-1.4660

 

 

 

Standard Error

0.2441

0.0268

2.3076

1.0951

 

 

 

t Stat

0.2661

4.1222

-2.4072

-1.3387

 

 

 

P-value

0.7911

0.0001

0.0192

0.1857

 

 

 

 

 

 

 

 

 

 

 

 

R2

SEy

F stat

df

SSreg

SSresid

 

 

0.537822

0.141218

23.27336

60

1.392381

1.196545

 

 

All 3 of the factors have low p-values, suggesting that they are strong predictors of Winning %.  I will remove Sacks per attempt for model 5.

 

Model #5

Reducing the model down to only 2 parameters reduces the R2 value more than any other previous model change.  R2 = 0.524. 

 

Win% = -0.162 + (0.131)Yards + (-5.07)INTs

Regression Statistics

 

 

 

 

 

 

 

 

α

Yards

INTs

 

 

 

 

Coefficient

-0.1620

0.1308

-5.0703

 

 

 

 

Standard Error

0.1767

0.0222

2.2938

 

 

 

 

t Stat

-0.9166

5.8949

-2.2105

 

 

 

 

P-value

0.3630

0.0000

0.0308

 

 

 

 

 

 

 

 

 

 

 

 

 

R2

SEy

F stat

df

SSreg

SSresid

 

 

0.524018

0.142131

33.57806

61

1.356644

1.232282

 

 

The remaining 2 factors have very low p-values that are less than 0.05.  No more factors will be removed.

 

Model #4 Predicted Win% vs Actual Win% Plot

 

 

Conclusion

If a “QB Win Rating” is to be used as a statistical metric for QBs, I recommend using model #4.  This model has only 3 factors (Yards per attempt, Interceptions per attempt, and Sacks per attempt) and still explains 53.8% of the variability in team wins: 

 

QB Win Rating* = [0.065 + (0.11)Yards + (-5.555)INTs + (-1.466)Sacks] x 100

*Winning % response variable is multiplied by 100 to get non-decimal “QB Win Rating”

 

This high of an R2 is remarkable given that a QB is involved in less than 50% of the plays during the course of a football game.  This could explain a correlation between offensive and defensive performance or reflect the importance of the quarterback position.

 

This regression works well for NFL quarterbacks, but if expanded to high school and college I expect that the model will not be as good of a predictor.  In the NFL, most quarterbacks will tend to pass much more than run.  Therefore, passing factors will give a good predictor of winning.  However, in high school and college, it is much more common for quarterbacks to be athletic and run for more yards.  Running statistics such as “Rush Yards per attempt” are likely to be important factors in a regression model with winning as the response variable.