Lab: Probability rules

In this lab, we’ll practice evaluating probability rules using R. This is based on the Probability rules chapter.

Tip

While you can complete all the exercises in your browser, we recommend also practicing in RStudio. Using an editor like RStudio will help you build real-world skills for writing, running, and saving your R code.

Load packages and data

First, let’s load the necessary packages and data. We also create summaries of the education and income variables to be used in the exercises. Note that these do not represent real data, so the results are a bit random.

1 Check that the sum of all probabilities is 1

Do the numbers in our income variable add up to 1? You can use the income_summary object we created above.

The income_summary data frame has a column called proportion that contains the probability for each income level.

To calculate the sum, use sum() on the proportion column.

To check if it equals 1, compare the sum to 1 using ==.

# Calculate the sum of proportions for income sum_income_proportions <- sum(income_summary$proportion) #<1> # Check if it equals 1 sum_equals_one <- sum_income_proportions == 1 #<2> # Display the result cat("Sum of income proportions:", sum_income_proportions, "\n") cat("Does it equal 1?", sum_equals_one, "\n")

# Calculate the sum of proportions for income
1sum_income_proportions <- sum(income_summary$proportion)

# Check if it equals 1
2sum_equals_one <- sum_income_proportions == 1

# Display the result
cat("Sum of income proportions:", sum_income_proportions, "\n")
cat("Does it equal 1?", sum_equals_one, "\n")
1
Sum all proportions in the income_summary data frame
2
Check if the sum equals exactly 1

The sum should equal 1, confirming that all probabilities add up correctly!

2 Complement rule

The probability of an event not occurring is 1 minus the probability that it will occur.

Let’s check this for the “Medium” income level.

\[ P(\text{not Medium}) = 1 - P(\text{Medium}) \]

Calculate the complement rule for Medium income.

You already have p_medium calculated. To find the complement, subtract it from 1.

# Get the probability of Medium income p_medium <- income_summary |> filter(income == "Medium") |> pull(proportion) # Calculate the complement (probability of NOT Medium income) p_not_medium <- 1 - p_medium #<1> # Display the results cat("P(Medium income) =", round(p_medium, 4), "\n") cat("P(not Medium income) =", round(p_not_medium, 4), "\n") cat("Check complement rule, sum =", round(p_medium + p_not_medium, 4), "\n")

# Get the probability of Medium income
p_medium <- income_summary |>
  filter(income == "Medium") |>
  pull(proportion)

# Calculate the complement (probability of NOT Medium income)
1p_not_medium <- 1 - p_medium

# Display the results
cat("P(Medium income) =", round(p_medium, 4), "\n")
cat("P(not Medium income) =", round(p_not_medium, 4), "\n")
cat("Check complement rule, sum =", round(p_medium + p_not_medium, 4), "\n")
1
Apply the complement rule: P(not A) = 1 - P(A)

The sum of P(Medium) and P(not Medium) should equal 1, confirming the complement rule.

3 Addition rule

The probability that event A or event B occurs (or both).

\[P(A \cup B) = P(A) + P(B) - P(A \cap B)\]

Calculate the probability of having either “University” education OR being in the “Medium” income group.

You have:

  • P(A) = p_university
  • P(B) = p_medium_income
  • P(A ∩ B) = p_both

Combine them using the addition rule.

# Get individual probabilities p_university <- edu_summary |> filter(education == "University") |> pull(proportion) p_medium_income <- income_summary |> filter(income == "Medium") |> pull(proportion) # Calculate probability of both University education AND Medium income p_both <- d_bl |> filter(education == "University" & income == "Medium") |> nrow() / nrow(d_bl) # Apply addition rule p_either <- p_university + p_medium_income - p_both #<1> # Display results cat("P(University education) =", round(p_university, 4), "\n") cat("P(Medium income) =", round(p_medium_income, 4), "\n") cat("P(Both) =", round(p_both, 4), "\n") cat("P(Either) =", round(p_either, 4), "\n")

# Get individual probabilities
p_university <- edu_summary |>
  filter(education == "University") |>
  pull(proportion)

p_medium_income <- income_summary |>
  filter(income == "Medium") |>
  pull(proportion)

# Calculate probability of both University education AND Medium income
p_both <- d_bl |>
  filter(education == "University" & income == "Medium") |>
  nrow() / nrow(d_bl)

# Apply addition rule
1p_either <- p_university + p_medium_income - p_both

# Display results
cat("P(University education) =", round(p_university, 4), "\n")
cat("P(Medium income) =", round(p_medium_income, 4), "\n")
cat("P(Both) =", round(p_both, 4), "\n")
cat("P(Either) =", round(p_either, 4), "\n")
1
Apply the addition rule: P(A \(\cup\) B) = P(A) + P(B) - P(A \(\cap\) B)

The addition rule accounts for the overlap between the two events to avoid double-counting.

4 Conditional probability

The probability of event B occurring given that event A has occurred:

\[P(B|A) = \frac{P(A \cap B)}{P(A)}\]

Calculate the probability of having “Medium” income given that someone has “University” education.

You have:

  • P(A \(\cap\) B) = p_both (probability of both University AND Medium income)
  • P(A) = p_university (probability of University education)

Combine them using the conditional probability formula.

# Get probability of University education p_university <- edu_summary |> filter(education == "University") |> pull(proportion) # Calculate probability of both University education AND Medium income p_both <- d_bl |> filter(education == "University" & income == "Medium") |> nrow() / nrow(d_bl) # Calculate conditional probability p_medium_given_university <- p_both / p_university #<1> # Display result cat("P(Medium income | University education) =", round(p_medium_given_university, 4), "\n")

# Get probability of University education
p_university <- edu_summary |>
  filter(education == "University") |>
  pull(proportion)

# Calculate probability of both University education AND Medium income
p_both <- d_bl |>
  filter(education == "University" & income == "Medium") |>
  nrow() / nrow(d_bl)

# Calculate conditional probability
1p_medium_given_university <- p_both / p_university

# Display result
cat("P(Medium income | University education) =", round(p_medium_given_university, 4), "\n")
1
Apply the conditional probability formula: P(B|A) = P(A \(\cap\) B) / P(A)

This tells us the probability of having medium income among those with university education.

5 Summary

In this lab, you learned:

  1. Probability basics: How to verify that probabilities sum to 1
  2. Complement rule: P(not A) = 1 - P(A)
  3. Addition rule: P(A \(\cup\) B) = P(A) + P(B) - P(A \(\cap\) B)
  4. Conditional probability: P(B|A) = P(A \(\cap\) B) / P(A)