Linear regression with multiple predictors

Lecture 16

Dr. Greg Chism

University of Arizona
INFO 511

Model selection and overfitting

R-squared (\(R^2\))

R-squared is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model.

\[ R^2 = 1 - \frac{RSS}{TSS} \]

\(R^2\) broken down

Residuals
Mean of observations
Sums of squares
\(R^2\)

Residuals are the differences between the observed values and the predicted values from a regression model.
If \(y_i\) is an observed value and \(\hat{y}_i\) is the predicted value, the residual \(e_i\) is given by:

\(e_i = y_i - \hat{y}_i\)

The mean \(\bar{y}\) is the average of all observed values, calculated as

\(\bar{y}=\frac{1}{n} \sum_{i=1}^{n}y_i\)
Where \(n\) is the number of observations.

These are measures of variability within the data set.
Residual Sum of Squares (RSS):

\(RSS = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\)

This measures the total deviation of the predicted values from the observed values.

Total Sum of Squares (TSS):

\(TSS = \sum_{i=1}^{n} (y_i - \bar{y})^2\)
This measures the total deviation of the observed values from their mean.

R-squared is calculated as:

\(R^2 = 1 - \frac{RSS}{TSS}\)
This value ranges from 0 to 1 and indicates how well the independent variables explain the variability of the dependent variable.

Adjusted R-squared (\(R^2_{adj}\))

Formula
Key points
Degrees of freedom

\[ R^2_{adj} = 1 - \frac{RSS / df_{res}}{TSS / df_{tot}} \]

\(df_{res}\) represents the degrees of freedom of the residuals, which is the number of observations minus the number of predictors minus one.
\(df_{tot}\) represents the degrees of freedom of the total variability, which is the number of observations minus one.

Penalizes Complexity: Adjusted R-squared decreases when unnecessary predictors are added to the model, discouraging overfitting.
Comparability: It is more reliable than R-squared for comparing models with different numbers of predictors.
Value Range: Unlike R-squared, adjusted R-squared can be negative if the model is worse than a simple mean model, though it typically ranges from 0 to 1.

\(df_{res}\): Degrees of freedom related to the estimate of the population variance around the model’s predictions.
\(df_{tot}\): Degrees of freedom related to the estimate of the population variance around the mean of the observed values.

In pursuit of Occam’s Razor

Occam’s Razor states that among competing hypotheses that predict equally well, the one with the fewest assumptions should be selected.
Model selection follows this principle.
We only want to add another variable to the model if the addition of that variable brings something valuable in terms of predictive power to the model.
In other words, we prefer the simplest best model, i.e. parsimonious model.