Lab: Testing two means

In this lab, we’ll practice calculating p-values and confidence intervals for the difference between two means. This is based on the Testing two means and contingency tables chapter.

If you are working in R Studio, first load packages and data:

Calculate the mean difference and the associated two-sided p-value for post-treatment LSAS scores (lsas_post) between the therapist-guided and the wait-list group using the t.test() function.

You can calculate it using the t.test() function

df_clean %>%
  filter(trt != "self-guided") %>% # filters away the self-guided group
  t.test(______ ~ trt, data = ., var.equal = TRUE) # performs a t-test, but on what variable?

Or if you want to do it manually, you can use the following formulas

To test the difference between two means, we can use the Student’s independent groups t-test:

\[ t = \frac{\bar{X}_1 - \bar{X}_2}{SE_{\text{pooled}}} \]

Where \(\bar{X_1}\) and \(\bar{X_2}\) are the means of the first and second group, and where the pooled standard error is:

\[ SE_{\text{pooled}} = \sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)} \]

and the pooled variance is:

\[ s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} \]

The full solution is:

#using the t.test() function
df_clean %>%
  filter(trt != "self-guided") %>%
  t.test(lsas_post ~ trt, data = ., var.equal = TRUE)

# or manually
x_bar_tg <- mean(df_clean$lsas_post[df_clean$trt=="therapist-guided"], na.rm=T)
x_bar_wl <- mean(df_clean$lsas_post[df_clean$trt=="waitlist"], na.rm=T)
s_2_tg <- var(df_clean$lsas_post[df_clean$trt=="therapist-guided"], na.rm=T)
s_2_wl <- var(df_clean$lsas_post[df_clean$trt=="waitlist"], na.rm=T)
n_tg <- sum(!is.na(df_clean$lsas_post[df_clean$trt=="therapist-guided"]))
n_wl <- sum(!is.na(df_clean$lsas_post[df_clean$trt=="waitlist"]))
sp2 <- ((n_tg - 1) * s_2_tg + (n_wl - 1) * s_2_wl) / (n_tg + n_wl - 2) # pooled variance
SE_pooled <- sqrt(sp2 * (1/n_tg + 1/n_wl)) # pooled standard error

# and put the together
t_value <- (x_bar_wl - x_bar_tg) / SE_pooled

#find the p-value
df <- n_tg + n_wl -2 
p_value <- 2 * (1 - pt(abs(t_value), df)) # multiplied by two to get the two-tailed p-value
p_value

Do you think the assumption of equal variance between the groups is justified? Remember that you can use the var() function to get the variance.

You can check the variance of each group

var(df_clean$______[df_clean$trt=="_____"], na.rm=T)
var(df_clean$_____[df_clean$trt=="therapist-guided"], na.rm=T)

The full solution is:

var(df_clean$lsas_post[df_clean$trt=="waitlist"], na.rm=T)
var(df_clean$lsas_post[df_clean$trt=="therapist-guided"], na.rm=T)

In these results show rather similar variances, although not exactly equal

How do the results differ if you use the Welch t-test instead, that does not assume equal variances? You can have a look at the different setting of the t-test function by typing ?t.test() into R Studio.

You can again calculate it using the t.test() function, but need to change the setting of the var.equal argument

df_clean %>%
  filter(trt != "_____") %>% # here we want to filer away the "self-guided" group
  t.test(______ ~ trt, data = ., var.equal = _____) # and get rid of the assumption of equal variances (before it was set to TRUE)

The full solution is:

df_clean %>%
  filter(trt != "self-guided") %>%
  t.test(lsas_post ~ trt, data = ., var.equal = FALSE)

Results are very similar to when the variances where assumed to equal, so the difference in variance between the groups do not 0affect our results to any large degree

Also calculate the 95% confidence interval for the difference in LSAS post-treatment scores (lsas_post) between the therapist-guided and wait-list group. Again you can use the t.test function.

You can again calculate it using the t.test() function

df_clean %>%
  filter(trt != "_____") %>%
  t.test(______ ~ trt, data = ., var.equal = TRUE)

Or if you want to do it manually, use the formula for the z-score 95% CI of mean differences

\[ (\bar{X}_1 - \bar{X}_2) \;\pm\; z_{\alpha/2} \cdot SE_{z} \]

where

\[ SE_{z} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \]

The full solution is:

#using the t.test() function
df_clean %>%
  filter(trt != "self-guided") %>%
  t.test(lsas_post ~ trt, data = ., var.equal = TRUE)

# or manually

#get the parts of the formula
x_bar_tg <- mean(df_clean$lsas_post[df_clean$trt == "therapist-guided"], na.rm = T)
x_bar_wl <- mean(df_clean$lsas_post[df_clean$trt == "waitlist"], na.rm = T)
s_2_tg <- var(df_clean$lsas_post[df_clean$trt == "therapist-guided"], na.rm = T)
s_2_wl <- var(df_clean$lsas_post[df_clean$trt == "waitlist"], na.rm = T)
n_tg <- sum(!is.na(df_clean$lsas_post[df_clean$trt == "therapist-guided"]))
n_wl <- sum(!is.na(df_clean$lsas_post[df_clean$trt == "waitlist"]))

# z-value 95% CI
se_z <- sqrt((s_2_wl / n_wl) + (s_2_tg / n_tg))
lcl <- (x_bar_wl - x_bar_tg) - 1.96 * se_z
ucl <- (x_bar_wl - x_bar_tg) + 1.96 * se_z
print(c(lcl, ucl))

Conduct a dependent t-test for the difference in LSAS-scores between post-treatment (lsas_post) and 12-month follow-up (lsas_fu12) and interprets its meaning. Again, type ?t.test() into R Studio to get some info on how the function works.

You can again calculate it using the t.test() function

t.test(df_clean$_____, df_clean$lsas_post, paired = ____)

Or if you want to do it manually:

For a paired samples t-test, the statistic is:

\[ t = \frac{\bar{D}}{SE_D} \]

where:

\(\bar{D}\) = mean of the difference scores

\(SE_D\) = the standard error of the difference scores calculated as \(s_D/ \sqrt{n}\) = the standard error of the differences, with \(s_D\) = the standard deviation of the differences and \(n\) = the number of paired observations

The full solution is:

# using the t-test function
t.test(df_clean$lsas_fu12, df_clean$lsas_post, paired = TRUE)

# or manually
diff_post_12FU <- df_clean$lsas_fu12 - df_clean$lsas_post
mean_diff <- mean(diff_post_12FU, na.rm = TRUE)
sd_diff <- sd(diff_post_12FU, na.rm = TRUE)
n <- sum(!is.na(diff_post_12FU))
se_diff <- sd_diff / sqrt(n)
t_value <- mean_diff / se_diff
df <- n - 1
p_value <- 2 * (1 - pt(abs(t_value), df))
p_value

The interpretation is that, for the full group, the LSAS score at the 12-month follow-up show a significant difference to the post-treatment LSAS score.

Also calculate the 95% confidence interval for the mean difference in LSAS-scores between post-treatment (lsas_post) and 12-month follow-up (lsas_fu12), using the z-value formula and provide an interpretation of its meaning

We can get the z-score confidence intervals of this difference, using a familiar formula for large samples

\[ \bar{D} \;\pm\; z_{\alpha/2} \cdot SE_D \] where \(SE_D\) = the standard error of the difference scores calculated as \(s_D/ \sqrt{n}\) = the standard error of the differences, with \(s_D\) = the standard deviation of the differences and \(n\) = the number of paired observations

The full solution is:

diff_post_12FU <- df_clean$lsas_fu12 - df_clean$lsas_post
mean_diff <- mean(diff_post_12FU, na.rm = TRUE)
sd_diff <- sd(diff_post_12FU, na.rm = TRUE)
n <- sum(!is.na(diff_post_12FU))
se_diff <- sd_diff / sqrt(n)

# and putting it together
lcl <- mean_diff - 1.96 * se_diff
ucl <- mean_diff + 1.96 * se_diff
print(c(lcl, ucl))