Classification

In this chapter, we will look at how we can work with classification problems in R. You will learn how to evaluate sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). These measures help us understand how well a model or test can identify or predict binary outcomes (e.g., disease status, treatment response). You will not work on the models themselves; that is the topic of upcoming courses and chapters.

Load packages and data

We load both the outcomes and baseline data, and join them together using the id variable.

library(tidyverse)
library(tidymodels)
library(here)
library(knitr)

d_outcomes <- read_csv(here("data", "steps_clean.csv"))
d_bl <- read_rds(here("data", "steps_baseline.rds"))

d <- d_outcomes |>
  select(
    -group,
    -ends_with("_screen")
  ) |>
  full_join(
    d_bl,
    by = "id"
  )

For the purposes of this chapter we need a binary outcome. We will use the lsas_post variable as our outcome of interest, and evaluate how well we can predict it using the lsas_screen values.

d_nomiss <- d |>
  filter(!is.na(lsas_post))

d_nomiss <- d_nomiss |>
  mutate(
    lsas_post_bin = if_else(lsas_post < 50, "Low", "High"),
    lsas_post_bin = factor(lsas_post_bin, levels = c("Low", "High"))
  )

Statistical models below

In the following code chunk, we fit a simple logistic regression model so that we have some concrete predictions to work with when evaluating the classification metrics. The theory and practice of fitting statistical models is covered in a later course, but we felt it would be silly to skip it here. We are using the tidymodels package, which provides a unified framework for fitting and evaluating models. There are many other packages that can be used as well, and even functions in base R.

lsas_fit <- logistic_reg() |> # specify the model type
  set_engine("glm") |> # we use the standard glm engine
  set_mode("classification") |> # specify classification mode
  fit(lsas_post_bin ~ lsas_screen + group, data = d_nomiss) # outcome ~ predictor(s)

lsas_pred <- predict(lsas_fit, new_data = d_nomiss) |> # predict on the original data
  bind_cols(d_nomiss)

# create a confusion matrix: true outcome ~ predicted outcome
lsas_conf_mat <- conf_mat(lsas_pred, truth = lsas_post_bin, estimate = .pred_class)

true_pos <- lsas_conf_mat$table[[1]]
false_neg <- lsas_conf_mat$table[[2]]
false_pos <- lsas_conf_mat$table[[3]]
true_neg <- lsas_conf_mat$table[[4]]

lsas_conf_mat

          Truth
Prediction Low High
      Low   16    7
      High  27  119

Evaluate classification manually

Now that we have the predictions, we can calculate the metrics of interest: sensitivity, specificity, positive predictive value, and negative predictive value.

First, let’s look at the prevalence of the outcome. If the outcome is rare, this will have a big impact on the classification metrics.

prevalence <- (true_pos + false_neg) / (true_pos + false_neg + true_neg + false_pos)

The prevalence of having LSAS <50 after treatment is 0.25.

Sensitivity

Sensitivity is the proportion of true positives out of all actual positives.

sensitivity <- true_pos / (true_pos + false_neg)

Sensitivity is 0.37.

Specificity

Specificity is the proportion of true negatives out of all actual negatives.

specificity <- true_neg / (true_neg + false_pos)

Specificity is 0.94.

Positive predictive value

Positive predictive value is the proportion of true positives out of all predicted positives. See the difference compared to sensitivity?

pos_pred_val <- true_pos / (true_pos + false_pos)

Positive predictive value is 0.7.

Negative predictive value

Negative predictive value is the proportion of true negatives out of all predicted negatives. Again, look at the difference compared to specificity.

neg_pred_val <- true_neg / (true_neg + false_neg)

Negative predictive value is 0.82.

Evaluate classification using tidymodels

…of course there are built-in functions for this! But it’s good practice to do it manually when you learn these concepts.

class_metrics <- metric_set(accuracy, sens, spec, ppv, npv)

kable(class_metrics(lsas_pred, truth = lsas_post_bin, estimate = .pred_class), digits = 2)

Table 1: Classification metrics

.metric	.estimator	.estimate
accuracy	binary	0.80
sens	binary	0.37
spec	binary	0.94
ppv	binary	0.70
npv	binary	0.82

The importance of base-rate of the event

One particularly difficult outcome to predict is suicide. Even in the most severe populations (e.g., people with previous suicide attempts who have been hospitalised), the 1-year risk of suicide is 5%. Therefore, most individuals classified as “high risk” of suicide will not die by suicide. The PPV is low in all models, it doesn’t matter if you have a simple logistic regression or an ensemble machine learning model.

Compare this to another common use-case of classification: predicting remission status after treatment. For many psychiatric disorders, we can expect remission rates of 40-60%. In these cases, our classification models will be more balanced, and the PPV will be higher.

Can you think of other examples from your own research where the base-rate of the event is low?