AE 12: Ultimate candy ranking

Application exercise

In this application exercise, we will:

import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns

Examine the data

  • We will use the candy_rankings.csv dataset for this analysis.
candy_rankings = pd.read_csv('data/candy_rankings.csv')
candy_rankings.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   competitorname    85 non-null     object 
 1   chocolate         85 non-null     bool   
 2   fruity            85 non-null     bool   
 3   caramel           85 non-null     bool   
 4   peanutyalmondy    85 non-null     bool   
 5   nougat            85 non-null     bool   
 6   crispedricewafer  85 non-null     bool   
 7   hard              85 non-null     bool   
 8   bar               85 non-null     bool   
 9   pluribus          85 non-null     bool   
 10  sugarpercent      85 non-null     float64
 11  pricepercent      85 non-null     float64
 12  winpercent        85 non-null     float64
dtypes: bool(9), float64(3), object(1)
memory usage: 3.5+ KB

Exercises

Use the variables:

chocolate, fruity, nougat, pricepercent, sugarpercent, sugarpercent*chocolate, pricepercent*fruity

Exercise 1

Create the full model and show the \(R^2_{adj}\):

# add code here

Is the model a good fit of the data?

Add response here.

Exercise 2

Produce all possible models removing 1 term at a time from the full model. Describe what is being removed above each code cell.

# Blank dictionary to store new models
models = {}
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here

Exercise 3

Compare all models using the framework (also use the same below):

best_model_step1 = max(models, key=models.get)
print(f'Best model in Exercise 2: {best_model_step1} with Adjusted R-squared: {models[best_model_step1]}')
  • Which model is best:

Add response here.

Exercise 4

Create all possible models removing 1 term at a time from the model selected in the previous exercise. Again, describe what is being removed above each code cell.

# Blank dictionary to store new models
models = {}
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here

Exercise 5

Compare all models using the framework best_model_step2 = max(models, key=models.get):

# add code here
  • Which model is best:

Add response here.

Exercise 6

Create all possible models removing 1 term at a time from the model selected in the previous step. Again, describe what is being removed above each code cell.

# Blank dictionary to store new models
models = {}
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here
  • Add what is being removed here.
# add code here

Exercise 7

Compare all models using the framework best_model_step3 = max(models, key=models.get):

# add code here
  • Which model is best:

Add response here

  • Show the final model summary and coefficients:
# add code here