import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
AE 09: Hypothesis testing
An article in the The Tucson Citizen-Times published in the summer of 2020 claims that the average price per guest (ppg) for properties in Tucson is $100 on Airbnb. To evaluate their claim we will use a dataset on 50 randomly selected Asheville Airbnb listings in July 2024. These data can be found in data/tucson.csv
.
Let’s load the packages we’ll use first.
And then the data.
# add code here
Hypotheses
- Write out the correct null and alternative hypothesis. Do this in both words and in proper notation.
Add response here.
Observed data
Our goal is to use calculate the probability of a sample statistic at least as extreme as the one observed in our data if in fact the null hypothesis is true.
- Calculate and report the sample statistic below using proper notation.
# add code here
\[\bar{x} = 116.24\]
The null distribution
Let’s use simulation-based methods to conduct the hypothesis test specified above.
Generate
We’ll start by generating the null distribution.
- Generate the null distribution and call it
null_dist
4321) np.random.seed(
# add code here
- Take a look at
null_dist
. What does each element in this distribution represent?
# add code here
Visualize
- Question: Before you visualize the distribution of
null_dist
– at what value would you expect this distribution to be centered? Why?
Add response here.
- Create an appropriate visualization for your null distribution. Does the center of the distribution match what you guessed in the previous question?
# add code here
- Now, add a vertical red line on your null distribution that represents your sample statistic.
# add code here
- Question: Based on the position of this line, does your observed sample mean appear to be an unusual observation under the assumption of the null hypothesis?
Add response here.
p-value
Above, we eyeballed how likely/unlikely our observed mean is. Now, let’s actually quantify it using a p-value.
- Question: What is a p-value?
Add response here.
- Demo: Visualize the p-value. Note that the two-sided approach would visualize two lines, one for the sample mean and another for \(H_0 - (\mu - H_0)\) or \(100 - (\mu - 100)\)
# add code here
Conclusion
- What is the conclusion of the hypothesis test based on the p-value you calculated? Make sure to frame it in context of the data and the research question. Use a significance level of 5% to make your conclusion.
Add response here.
- Interpret the p-value in context of the data and the research question.
Add response here.
Get real…
- Question: What we did above was a “toy example” to illustrate hypothesis test. What would you change to make this a real, more robust analysis?
Add response here.
- Work through the analysis again with these changes.
# Generate a larger random dataset for a more robust analysis
= np.random.normal(loc=75, scale=10, size=1000)
data_large
# Recalculate the sample mean
= np.mean(data_large)
sample_mean_large print(f"Sample Mean (x̄) with larger sample = {sample_mean_large}")
# Generate the null distribution with a larger sample
= np.random.normal(loc=75, scale=10, size=10000)
null_dist_large
# Recalculate the p-value
= (null_dist_large >= sample_mean_large).mean()
p_value_large print(f"P-value with larger sample = {p_value_large}")
Sample Mean (x̄) with larger sample = 75.53742491599797
P-value with larger sample = 0.4791