AE 09: Hypothesis testing

Application exercise

An article in the The Tucson Citizen-Times published in the summer of 2020 claims that the average price per guest (ppg) for properties in Tucson is $100 on Airbnb. To evaluate their claim we will use a dataset on 50 randomly selected Asheville Airbnb listings in July 2024. These data can be found in data/tucson.csv.

Let’s load the packages we’ll use first.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

And then the data.

# add code here

Hypotheses

  • Write out the correct null and alternative hypothesis. Do this in both words and in proper notation.

Add response here.

Observed data

Our goal is to use calculate the probability of a sample statistic at least as extreme as the one observed in our data if in fact the null hypothesis is true.

  • Calculate and report the sample statistic below using proper notation.
# add code here

\[\bar{x} = 116.24\]

The null distribution

Let’s use simulation-based methods to conduct the hypothesis test specified above.

Generate

We’ll start by generating the null distribution.

  • Generate the null distribution and call it null_dist
np.random.seed(4321)
# add code here
  • Take a look at null_dist. What does each element in this distribution represent?
# add code here

Visualize

  • Question: Before you visualize the distribution of null_dist – at what value would you expect this distribution to be centered? Why?

Add response here.

  • Create an appropriate visualization for your null distribution. Does the center of the distribution match what you guessed in the previous question?
# add code here
  • Now, add a vertical red line on your null distribution that represents your sample statistic.
# add code here
  • Question: Based on the position of this line, does your observed sample mean appear to be an unusual observation under the assumption of the null hypothesis?

Add response here.

p-value

Above, we eyeballed how likely/unlikely our observed mean is. Now, let’s actually quantify it using a p-value.

  • Question: What is a p-value?

Add response here.

  • Demo: Visualize the p-value. Note that the two-sided approach would visualize two lines, one for the sample mean and another for \(H_0 - (\mu - H_0)\) or \(100 - (\mu - 100)\)
# add code here

Conclusion

  • What is the conclusion of the hypothesis test based on the p-value you calculated? Make sure to frame it in context of the data and the research question. Use a significance level of 5% to make your conclusion.

Add response here.

  • Interpret the p-value in context of the data and the research question.

Add response here.

Get real…

  • Question: What we did above was a “toy example” to illustrate hypothesis test. What would you change to make this a real, more robust analysis?

Add response here.

  • Work through the analysis again with these changes.
# Generate a larger random dataset for a more robust analysis
data_large = np.random.normal(loc=75, scale=10, size=1000)

# Recalculate the sample mean
sample_mean_large = np.mean(data_large)
print(f"Sample Mean (x̄) with larger sample = {sample_mean_large}")

# Generate the null distribution with a larger sample
null_dist_large = np.random.normal(loc=75, scale=10, size=10000)

# Recalculate the p-value
p_value_large = (null_dist_large >= sample_mean_large).mean()
print(f"P-value with larger sample = {p_value_large}")
Sample Mean (x̄) with larger sample = 75.53742491599797
P-value with larger sample = 0.4791