Linear regression with multiple predictors

Lecture 16

Dr. Greg Chism

University of Arizona
INFO 511 - Spring 2025

Model selection and overfitting

R-squared (R2)

R-squared is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model.

R2=1−RSSTSS

R2 broken down

  • Residuals
  • Mean of observations
  • Sums of squares
  • R2
  • Residuals are the differences between the observed values and the predicted values from a regression model.

  • If yi is an observed value and ˆyi is the predicted value, the residual ei is given by:

    ei=yi−ˆyi

  • The mean ˉy is the average of all observed values, calculated as

    ˉy=1n∑ni=1yi

  • Where n is the number of observations.

  • These are measures of variability within the data set.

  • Residual Sum of Squares (RSS):

    RSS=∑ni=1(yi−ˆyi)2

  • This measures the total deviation of the predicted values from the observed values.
  • Total Sum of Squares (TSS):

    TSS=∑ni=1(yi−ˉy)2

  • This measures the total deviation of the observed values from their mean.

  • R-squared is calculated as:

    R2=1−RSSTSS

  • This value ranges from 0 to 1 and indicates how well the independent variables explain the variability of the dependent variable.

Adjusted R-squared (R2adj)

  • Formula
  • Key points
  • Degrees of freedom

R2adj=1−RSS/dfresTSS/dftot

  • dfres represents the degrees of freedom of the residuals, which is the number of observations minus the number of predictors minus one.

  • dftot​ represents the degrees of freedom of the total variability, which is the number of observations minus one.

  • Penalizes Complexity: Adjusted R-squared decreases when unnecessary predictors are added to the model, discouraging overfitting.

  • Comparability: It is more reliable than R-squared for comparing models with different numbers of predictors.

  • Value Range: Unlike R-squared, adjusted R-squared can be negative if the model is worse than a simple mean model, though it typically ranges from 0 to 1.

  • dfres: Degrees of freedom related to the estimate of the population variance around the model’s predictions.

  • dftot​: Degrees of freedom related to the estimate of the population variance around the mean of the observed values.

In pursuit of Occam’s Razor

  • Occam’s Razor states that among competing hypotheses that predict equally well, the one with the fewest assumptions should be selected.

  • Model selection follows this principle.

  • We only want to add another variable to the model if the addition of that variable brings something valuable in terms of predictive power to the model.

  • In other words, we prefer the simplest best model, i.e. parsimonious model.

🔗 datasciaz.netlify.app

1 / 6
Linear regression with multiple predictors Lecture 16 Dr. Greg Chism University of Arizona INFO 511 - Spring 2025

  1. Slides

  2. Tools

  3. Close
  • Linear regression with multiple predictors
  • Model selection and overfitting
  • R-squared (\(R^2\))
  • \(R^2\) broken down
  • Adjusted R-squared (\(R^2_{adj}\))
  • In pursuit of Occam’s Razor
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help