Lab: Probability rules
In this lab, we’ll practice evaluating probability rules using R. This is based on the Probability rules chapter.
While you can complete all the exercises in your browser, we recommend also practicing in RStudio. Using an editor like RStudio will help you build real-world skills for writing, running, and saving your R code.
Load packages and data
First, let’s load the necessary packages and data. We also create summaries of the education and income variables to be used in the exercises. Note that these do not represent real data, so the results are a bit random.
1 Check that the sum of all probabilities is 1
Do the numbers in our income variable add up to 1? You can use the income_summary
object we created above.
The income_summary
data frame has a column called proportion
that contains the probability for each income level.
To calculate the sum, use sum()
on the proportion
column.
To check if it equals 1, compare the sum to 1 using ==
.
# Calculate the sum of proportions for income
sum_income_proportions <- sum(income_summary$proportion) #<1>
# Check if it equals 1
sum_equals_one <- sum_income_proportions == 1 #<2>
# Display the result
cat("Sum of income proportions:", sum_income_proportions, "\n")
cat("Does it equal 1?", sum_equals_one, "\n")
# Calculate the sum of proportions for income
1<- sum(income_summary$proportion)
sum_income_proportions
# Check if it equals 1
2<- sum_income_proportions == 1
sum_equals_one
# Display the result
cat("Sum of income proportions:", sum_income_proportions, "\n")
cat("Does it equal 1?", sum_equals_one, "\n")
- 1
-
Sum all proportions in the
income_summary
data frame - 2
- Check if the sum equals exactly 1
The sum should equal 1, confirming that all probabilities add up correctly!
2 Complement rule
The probability of an event not occurring is 1 minus the probability that it will occur.
Let’s check this for the “Medium” income level.
\[ P(\text{not Medium}) = 1 - P(\text{Medium}) \]
Calculate the complement rule for Medium income.
You already have p_medium
calculated. To find the complement, subtract it from 1.
# Get the probability of Medium income
p_medium <- income_summary |>
filter(income == "Medium") |>
pull(proportion)
# Calculate the complement (probability of NOT Medium income)
p_not_medium <- 1 - p_medium #<1>
# Display the results
cat("P(Medium income) =", round(p_medium, 4), "\n")
cat("P(not Medium income) =", round(p_not_medium, 4), "\n")
cat("Check complement rule, sum =", round(p_medium + p_not_medium, 4), "\n")
# Get the probability of Medium income
<- income_summary |>
p_medium filter(income == "Medium") |>
pull(proportion)
# Calculate the complement (probability of NOT Medium income)
1<- 1 - p_medium
p_not_medium
# Display the results
cat("P(Medium income) =", round(p_medium, 4), "\n")
cat("P(not Medium income) =", round(p_not_medium, 4), "\n")
cat("Check complement rule, sum =", round(p_medium + p_not_medium, 4), "\n")
- 1
- Apply the complement rule: P(not A) = 1 - P(A)
The sum of P(Medium) and P(not Medium) should equal 1, confirming the complement rule.
3 Addition rule
The probability that event A or event B occurs (or both).
\[P(A \cup B) = P(A) + P(B) - P(A \cap B)\]
Calculate the probability of having either “University” education OR being in the “Medium” income group.
You have:
- P(A) =
p_university
- P(B) =
p_medium_income
- P(A ∩ B) =
p_both
Combine them using the addition rule.
# Get individual probabilities
p_university <- edu_summary |>
filter(education == "University") |>
pull(proportion)
p_medium_income <- income_summary |>
filter(income == "Medium") |>
pull(proportion)
# Calculate probability of both University education AND Medium income
p_both <- d_bl |>
filter(education == "University" & income == "Medium") |>
nrow() / nrow(d_bl)
# Apply addition rule
p_either <- p_university + p_medium_income - p_both #<1>
# Display results
cat("P(University education) =", round(p_university, 4), "\n")
cat("P(Medium income) =", round(p_medium_income, 4), "\n")
cat("P(Both) =", round(p_both, 4), "\n")
cat("P(Either) =", round(p_either, 4), "\n")
# Get individual probabilities
<- edu_summary |>
p_university filter(education == "University") |>
pull(proportion)
<- income_summary |>
p_medium_income filter(income == "Medium") |>
pull(proportion)
# Calculate probability of both University education AND Medium income
<- d_bl |>
p_both filter(education == "University" & income == "Medium") |>
nrow() / nrow(d_bl)
# Apply addition rule
1<- p_university + p_medium_income - p_both
p_either
# Display results
cat("P(University education) =", round(p_university, 4), "\n")
cat("P(Medium income) =", round(p_medium_income, 4), "\n")
cat("P(Both) =", round(p_both, 4), "\n")
cat("P(Either) =", round(p_either, 4), "\n")
- 1
- Apply the addition rule: P(A \(\cup\) B) = P(A) + P(B) - P(A \(\cap\) B)
The addition rule accounts for the overlap between the two events to avoid double-counting.
4 Conditional probability
The probability of event B occurring given that event A has occurred:
\[P(B|A) = \frac{P(A \cap B)}{P(A)}\]
Calculate the probability of having “Medium” income given that someone has “University” education.
You have:
- P(A \(\cap\) B) =
p_both
(probability of both University AND Medium income)
- P(A) =
p_university
(probability of University education)
Combine them using the conditional probability formula.
# Get probability of University education
p_university <- edu_summary |>
filter(education == "University") |>
pull(proportion)
# Calculate probability of both University education AND Medium income
p_both <- d_bl |>
filter(education == "University" & income == "Medium") |>
nrow() / nrow(d_bl)
# Calculate conditional probability
p_medium_given_university <- p_both / p_university #<1>
# Display result
cat("P(Medium income | University education) =", round(p_medium_given_university, 4), "\n")
# Get probability of University education
<- edu_summary |>
p_university filter(education == "University") |>
pull(proportion)
# Calculate probability of both University education AND Medium income
<- d_bl |>
p_both filter(education == "University" & income == "Medium") |>
nrow() / nrow(d_bl)
# Calculate conditional probability
1<- p_both / p_university
p_medium_given_university
# Display result
cat("P(Medium income | University education) =", round(p_medium_given_university, 4), "\n")
- 1
- Apply the conditional probability formula: P(B|A) = P(A \(\cap\) B) / P(A)
This tells us the probability of having medium income among those with university education.
5 Summary
In this lab, you learned:
- Probability basics: How to verify that probabilities sum to 1
- Complement rule: P(not A) = 1 - P(A)
- Addition rule: P(A \(\cup\) B) = P(A) + P(B) - P(A \(\cap\) B)
- Conditional probability: P(B|A) = P(A \(\cap\) B) / P(A)