Re-calibrate classification probability predictions — adjust_probability

Calibration is the process of adjusting a model's outputted probabilities to match the observed frequencies of events. This technique aims to ensure that when a model predicts a certain probability for an outcome, that probability accurately reflects the true likelihood of that outcome occurring.

Usage

adjust_probability_calibration(x, method = NULL, ...)

Arguments

x: A tailor().
method: Character. One of "logistic", "multinomial", "beta", "isotonic", "isotonic_boot", or "none", corresponding to the function from the probably package probably::cal_estimate_logistic(), probably::cal_estimate_multinomial(), etc., respectively. The default is to use "logistic" which, despite its name, fits a generalized additive model. Note that when fit.tailor() is called, the value may be changed to "none" if there is insufficient data.
...: Optional arguments to pass to the corresponding function in the probably package. These arguments must be named.

Value

An updated tailor() containing the new operation.

Details

The "logistic" and "multinomial" methods fit models that predict the observed classes as a function of the predicted class probabilities. These models remove any overt systematic trends from the linear predictor and correct new predictions. The underlying code fits that model using mgcv::gam(). If smooth = FALSE is passed to the ..., it uses stats::glm() for binary outcomes or nnet::multinom() for 3+ classes.

The isotonic method uses stats::isoreg() to force the predicted probabilities to increase with the observed outcome class. This creates a step function that will map new predictions to values that are monotonically increasing with the binary (0/1) form of the outcome. One side effect is that there are fewer, perhaps far fewer, unique predicted probabilities. For 3+ classes, this is done using a one-versus-all strategy that ensures that the probabilities add to 1.0. The "isotonic boot" method resamples the data and generates multiple isotonic regressions that are averaged and used to correct the predictions. This may not be perfectly monotonic, but the number of unique calibrated predictions increases with the number of bootstrap samples (controlled by passing the times argument to ...).

Beta calibration (Kull et al, 2017) assumes that the probability estimates follow a Beta distribution. This leads to a sigmoidal model that can be fit to the data via maximum likelihood. There are a few different ways to fit the model; see betacal:: beta_calibration() options parameters to select a specific sigmoidal model.

Data Usage

This adjustment requires estimation and, as such, different subsets of data should be used to train it and evaluate its predictions.

Note that, when calling fit.tailor(), if the calibration data have zero or one row, the method is changed to "none".

References

Kull, Meelis, Telmo Silva Filho, and Peter Flach. "Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers." Artificial intelligence and statistics. PMLR, 2017.

https://aml4td.org/chapters/cls-metrics.html#sec-cls-calibration

Examples

library(modeldata)

# split example data
set.seed(1)
in_rows <- sample(c(TRUE, FALSE), nrow(two_class_example), replace = TRUE)
d_calibration <- two_class_example[in_rows, ]
d_test <- two_class_example[!in_rows, ]

head(d_calibration)
#>    truth      Class1       Class2 predicted
#> 1 Class2 0.003589243 0.9964107574    Class2
#> 3 Class2 0.110893522 0.8891064779    Class2
#> 4 Class1 0.735161703 0.2648382969    Class1
#> 6 Class1 0.999275071 0.0007249286    Class1
#> 7 Class1 0.999201149 0.0007988510    Class1
#> 8 Class1 0.812351997 0.1876480026    Class1

# specify calibration
tlr <-
  tailor() |>
  adjust_probability_calibration(method = "logistic")

# train tailor on a subset of data.
tlr_fit <- fit(
  tlr,
  d_calibration,
  outcome = c(truth),
  estimate = c(predicted),
  probabilities = c(Class1, Class2)
)

# apply to predictions on another subset of data
head(d_test)
#>     truth       Class1      Class2 predicted
#> 2  Class1 0.6786210540 0.321378946    Class1
#> 5  Class2 0.0162399603 0.983760040    Class2
#> 9  Class2 0.4570372595 0.542962741    Class2
#> 10 Class2 0.0976377342 0.902362266    Class2
#> 16 Class1 0.9919162773 0.008083723    Class1
#> 17 Class2 0.0004250834 0.999574917    Class2

predict(tlr_fit, d_test)
#> # A tibble: 244 × 4
#>    truth     Class1    Class2 predicted
#>    <fct>  <dbl[1d]> <dbl[1d]> <fct>    
#>  1 Class1    0.459     0.541  Class2   
#>  2 Class2    0.0396    0.960  Class2   
#>  3 Class2    0.333     0.667  Class2   
#>  4 Class2    0.0954    0.905  Class2   
#>  5 Class1    0.964     0.0359 Class1   
#>  6 Class2    0.0328    0.967  Class2   
#>  7 Class2    0.112     0.888  Class2   
#>  8 Class1    0.665     0.335  Class1   
#>  9 Class1    0.957     0.0434 Class1   
#> 10 Class2    0.0534    0.947  Class2   
#> # ℹ 234 more rows