AE 07: Practicing with probabilities
Goal
Understand and calculate basic probabilities using a real-world dataset.
Data
A cohort study on coffee consumption and mortality from:
Did not die |
Died | |
---|---|---|
Does not drink coffee | 5438 | 1039 |
Drinks coffee occasionally | 29712 | 4440 |
Drinks coffee regularly | 24934 | 3601 |
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788283/
Definitions:
Event A: The person died.
Event B: The person is a non-coffee drinker.
Two important rules
Suppose we have events \(A\) and \(B\), with probabilities \(P(A)\) and \(P(B)\) of occurring. Based on the Kolmogorov axioms:
- Complement Rule: \(P(A^c) = 1 - P(A)\)
- Inclusion-Exclusion: \(P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)\)
Exercises
Calculate the following probabilities for a randomly selected person in the cohort:
Calculate the total number of people in the cohort:
\(Total=5438+1039+29712+4440+24934+3601\)
Add response here.
Calculate \(\small{P(A)}\): Probability that the person died
\(\small{P(A)} = \frac{\text{Total number of people who died}}{\text{Total number of people in the cohort}}\)
Add response here.
Calculate \(\small{P(B)}\): Probability that the person is a non-coffee drinker
\(\small{P(B)} = \frac{\text{Total number of non-coffee drinkers}}{\text{Total number of people in the cohort}}\)
Add response here.
Calculate \(\small{P(A \text{ and } B)}\): Probability that the person died and is a non-coffee drinker
\(\small{P(A \cap B)}=\frac{\text{Number of people who died and are non-coffee drinks}}{\text{Total number of people in the cohort}}\)
Add response here.
Calculate \(\small{P(A \text{ or } B)}\) using the formula for the union of two events:
\(\small{P(A \cup B)}=\small{P(A)}+\small{P(B)}-\small{P(A \cap B)}\)
Add response here.
Calculate \(\small{P(A \text{ or } B^c)}\): Probability that the person died or is a non-coffee drinker
\(\small{P(A \cup B^c)}=\small{P(A)}+\small{P(B^c)}-\small{P(A \cap B^c)}\)
Add response here.
Note: \(\small{P(B^c)}\) is the complement of event \(B\) (i.e., people who drink coffee occasionally or regularly).
Discussion Questions:
What do these probabilities tell us about the relationship between coffee consumption and mortality in this cohort?
Are there any limitations to interpreting these probabilities as causal effects?
How might additional factors or confounders influence these probabilities?