Lab: Testing two means
In this lab, we’ll practice calculating p-values and confidence intervals for the difference between two means. This is based on the Testing two means and contingency tables chapter.
If you are working in R Studio, first load packages and data:
Calculate the mean difference and the associated two-sided p-value for post-treatment LSAS scores (lsas_post
) between the therapist-guided and the wait-list group using the t.test()
function.
You can calculate it using the t.test()
function
%>%
df_clean filter(trt != "self-guided") %>% # filters away the self-guided group
t.test(______ ~ trt, data = ., var.equal = TRUE) # performs a t-test, but on what variable?
Or if you want to do it manually, you can use the following formulas
To test the difference between two means, we can use the Student’s independent groups t-test:
\[ t = \frac{\bar{X}_1 - \bar{X}_2}{SE_{\text{pooled}}} \]
Where \(\bar{X_1}\) and \(\bar{X_2}\) are the means of the first and second group, and where the pooled standard error is:
\[ SE_{\text{pooled}} = \sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)} \]
and the pooled variance is:
\[ s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} \]
The full solution is:
#using the t.test() function
%>%
df_clean filter(trt != "self-guided") %>%
t.test(lsas_post ~ trt, data = ., var.equal = TRUE)
# or manually
<- mean(df_clean$lsas_post[df_clean$trt=="therapist-guided"], na.rm=T)
x_bar_tg <- mean(df_clean$lsas_post[df_clean$trt=="waitlist"], na.rm=T)
x_bar_wl <- var(df_clean$lsas_post[df_clean$trt=="therapist-guided"], na.rm=T)
s_2_tg <- var(df_clean$lsas_post[df_clean$trt=="waitlist"], na.rm=T)
s_2_wl <- sum(!is.na(df_clean$lsas_post[df_clean$trt=="therapist-guided"]))
n_tg <- sum(!is.na(df_clean$lsas_post[df_clean$trt=="waitlist"]))
n_wl <- ((n_tg - 1) * s_2_tg + (n_wl - 1) * s_2_wl) / (n_tg + n_wl - 2) # pooled variance
sp2 <- sqrt(sp2 * (1/n_tg + 1/n_wl)) # pooled standard error
SE_pooled
# and put the together
<- (x_bar_wl - x_bar_tg) / SE_pooled
t_value
#find the p-value
<- n_tg + n_wl -2
df <- 2 * (1 - pt(abs(t_value), df)) # multiplied by two to get the two-tailed p-value
p_value p_value
Do you think the assumption of equal variance between the groups is justified? Remember that you can use the var()
function to get the variance.
You can check the variance of each group
var(df_clean$______[df_clean$trt=="_____"], na.rm=T)
var(df_clean$_____[df_clean$trt=="therapist-guided"], na.rm=T)
The full solution is:
var(df_clean$lsas_post[df_clean$trt=="waitlist"], na.rm=T)
var(df_clean$lsas_post[df_clean$trt=="therapist-guided"], na.rm=T)
In these results show rather similar variances, although not exactly equal
How do the results differ if you use the Welch t-test instead, that does not assume equal variances? You can have a look at the different setting of the t-test function by typing ?t.test()
into R Studio.
You can again calculate it using the t.test()
function, but need to change the setting of the var.equal argument
%>%
df_clean filter(trt != "_____") %>% # here we want to filer away the "self-guided" group
t.test(______ ~ trt, data = ., var.equal = _____) # and get rid of the assumption of equal variances (before it was set to TRUE)
The full solution is:
%>%
df_clean filter(trt != "self-guided") %>%
t.test(lsas_post ~ trt, data = ., var.equal = FALSE)
Results are very similar to when the variances where assumed to equal, so the difference in variance between the groups do not 0affect our results to any large degree
Also calculate the 95% confidence interval for the difference in LSAS post-treatment scores (lsas_post
) between the therapist-guided and wait-list group. Again you can use the t.test
function.
You can again calculate it using the t.test()
function
%>%
df_clean filter(trt != "_____") %>%
t.test(______ ~ trt, data = ., var.equal = TRUE)
Or if you want to do it manually, use the formula for the z-score 95% CI of mean differences
\[ (\bar{X}_1 - \bar{X}_2) \;\pm\; z_{\alpha/2} \cdot SE_{z} \]
where
\[ SE_{z} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \]
The full solution is:
#using the t.test() function
%>%
df_clean filter(trt != "self-guided") %>%
t.test(lsas_post ~ trt, data = ., var.equal = TRUE)
# or manually
#get the parts of the formula
<- mean(df_clean$lsas_post[df_clean$trt == "therapist-guided"], na.rm = T)
x_bar_tg <- mean(df_clean$lsas_post[df_clean$trt == "waitlist"], na.rm = T)
x_bar_wl <- var(df_clean$lsas_post[df_clean$trt == "therapist-guided"], na.rm = T)
s_2_tg <- var(df_clean$lsas_post[df_clean$trt == "waitlist"], na.rm = T)
s_2_wl <- sum(!is.na(df_clean$lsas_post[df_clean$trt == "therapist-guided"]))
n_tg <- sum(!is.na(df_clean$lsas_post[df_clean$trt == "waitlist"]))
n_wl
# z-value 95% CI
<- sqrt((s_2_wl / n_wl) + (s_2_tg / n_tg))
se_z <- (x_bar_wl - x_bar_tg) - 1.96 * se_z
lcl <- (x_bar_wl - x_bar_tg) + 1.96 * se_z
ucl print(c(lcl, ucl))
Conduct a dependent t-test for the difference in LSAS-scores between post-treatment (lsas_post
) and 12-month follow-up (lsas_fu12
) and interprets its meaning. Again, type ?t.test()
into R Studio to get some info on how the function works.
You can again calculate it using the t.test()
function
t.test(df_clean$_____, df_clean$lsas_post, paired = ____)
Or if you want to do it manually:
For a paired samples t-test, the statistic is:
\[ t = \frac{\bar{D}}{SE_D} \]
where:
\(\bar{D}\) = mean of the difference scores
\(SE_D\) = the standard error of the difference scores calculated as \(s_D/ \sqrt{n}\) = the standard error of the differences, with \(s_D\) = the standard deviation of the differences and \(n\) = the number of paired observations
The full solution is:
# using the t-test function
t.test(df_clean$lsas_fu12, df_clean$lsas_post, paired = TRUE)
# or manually
<- df_clean$lsas_fu12 - df_clean$lsas_post
diff_post_12FU <- mean(diff_post_12FU, na.rm = TRUE)
mean_diff <- sd(diff_post_12FU, na.rm = TRUE)
sd_diff <- sum(!is.na(diff_post_12FU))
n <- sd_diff / sqrt(n)
se_diff <- mean_diff / se_diff
t_value <- n - 1
df <- 2 * (1 - pt(abs(t_value), df))
p_value p_value
The interpretation is that, for the full group, the LSAS score at the 12-month follow-up show a significant difference to the post-treatment LSAS score.
Also calculate the 95% confidence interval for the mean difference in LSAS-scores between post-treatment (lsas_post
) and 12-month follow-up (lsas_fu12
), using the z-value formula and provide an interpretation of its meaning
We can get the z-score confidence intervals of this difference, using a familiar formula for large samples
\[ \bar{D} \;\pm\; z_{\alpha/2} \cdot SE_D \] where \(SE_D\) = the standard error of the difference scores calculated as \(s_D/ \sqrt{n}\) = the standard error of the differences, with \(s_D\) = the standard deviation of the differences and \(n\) = the number of paired observations
The full solution is:
<- df_clean$lsas_fu12 - df_clean$lsas_post
diff_post_12FU <- mean(diff_post_12FU, na.rm = TRUE)
mean_diff <- sd(diff_post_12FU, na.rm = TRUE)
sd_diff <- sum(!is.na(diff_post_12FU))
n <- sd_diff / sqrt(n)
se_diff
# and putting it together
<- mean_diff - 1.96 * se_diff
lcl <- mean_diff + 1.96 * se_diff
ucl print(c(lcl, ucl))