library(tidyverse)
library(tidymodels)
library(here)
library(knitr)
<- read_csv(here("data", "steps_clean.csv"))
d_outcomes <- read_rds(here("data", "steps_baseline.rds"))
d_bl
<- d_outcomes |>
d select(
-group,
-ends_with("_screen")
|>
) full_join(
d_bl,by = "id"
)
Classification
In this chapter, we will look at how we can work with classification problems in R. You will learn how to evaluate sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). These measures help us understand how well a model or test can identify or predict binary outcomes (e.g., disease status, treatment response). You will not work on the models themselves; that is the topic of upcoming courses and chapters.
Load packages and data
We load both the outcomes and baseline data, and join them together using the id
variable.
For the purposes of this chapter we need a binary outcome. We will use the lsas_post
variable as our outcome of interest, and evaluate how well we can predict it using the lsas_screen
values.
<- d |>
d_nomiss filter(!is.na(lsas_post))
<- d_nomiss |>
d_nomiss mutate(
lsas_post_bin = if_else(lsas_post < 50, "Low", "High"),
lsas_post_bin = factor(lsas_post_bin, levels = c("Low", "High"))
)
In the following code chunk, we fit a simple logistic regression model so that we have some concrete predictions to work with when evaluating the classification metrics. The theory and practice of fitting statistical models is covered in a later course, but we felt it would be silly to skip it here. We are using the tidymodels
package, which provides a unified framework for fitting and evaluating models. There are many other packages that can be used as well, and even functions in base R.
<- logistic_reg() |> # specify the model type
lsas_fit set_engine("glm") |> # we use the standard glm engine
set_mode("classification") |> # specify classification mode
fit(lsas_post_bin ~ lsas_screen + group, data = d_nomiss) # outcome ~ predictor(s)
<- predict(lsas_fit, new_data = d_nomiss) |> # predict on the original data
lsas_pred bind_cols(d_nomiss)
# create a confusion matrix: true outcome ~ predicted outcome
<- conf_mat(lsas_pred, truth = lsas_post_bin, estimate = .pred_class)
lsas_conf_mat
<- lsas_conf_mat$table[[1]]
true_pos <- lsas_conf_mat$table[[2]]
false_neg <- lsas_conf_mat$table[[3]]
false_pos <- lsas_conf_mat$table[[4]]
true_neg
lsas_conf_mat
Truth
Prediction Low High
Low 16 7
High 27 119
Evaluate classification manually
Now that we have the predictions, we can calculate the metrics of interest: sensitivity, specificity, positive predictive value, and negative predictive value.
First, let’s look at the prevalence of the outcome. If the outcome is rare, this will have a big impact on the classification metrics.
<- (true_pos + false_neg) / (true_pos + false_neg + true_neg + false_pos) prevalence
The prevalence of having LSAS <50 after treatment is 0.25.
Sensitivity
Sensitivity is the proportion of true positives out of all actual positives.
<- true_pos / (true_pos + false_neg) sensitivity
Sensitivity is 0.37.
Specificity
Specificity is the proportion of true negatives out of all actual negatives.
<- true_neg / (true_neg + false_pos) specificity
Specificity is 0.94.
Positive predictive value
Positive predictive value is the proportion of true positives out of all predicted positives. See the difference compared to sensitivity?
<- true_pos / (true_pos + false_pos) pos_pred_val
Positive predictive value is 0.7.
Negative predictive value
Negative predictive value is the proportion of true negatives out of all predicted negatives. Again, look at the difference compared to specificity.
<- true_neg / (true_neg + false_neg) neg_pred_val
Negative predictive value is 0.82.
Evaluate classification using tidymodels
…of course there are built-in functions for this! But it’s good practice to do it manually when you learn these concepts.
<- metric_set(accuracy, sens, spec, ppv, npv)
class_metrics
kable(class_metrics(lsas_pred, truth = lsas_post_bin, estimate = .pred_class), digits = 2)
.metric | .estimator | .estimate |
---|---|---|
accuracy | binary | 0.80 |
sens | binary | 0.37 |
spec | binary | 0.94 |
ppv | binary | 0.70 |
npv | binary | 0.82 |
One particularly difficult outcome to predict is suicide. Even in the most severe populations (e.g., people with previous suicide attempts who have been hospitalised), the 1-year risk of suicide is 5%. Therefore, most individuals classified as “high risk” of suicide will not die by suicide. The PPV is low in all models, it doesn’t matter if you have a simple logistic regression or an ensemble machine learning model.
Compare this to another common use-case of classification: predicting remission status after treatment. For many psychiatric disorders, we can expect remission rates of 40-60%. In these cases, our classification models will be more balanced, and the PPV will be higher.
Can you think of other examples from your own research where the base-rate of the event is low?