Re-calibrate classification probability predictions
Source:R/adjust-probability-calibration.R
adjust_probability_calibration.Rd
Calibration is the process of adjusting a model's outputted probabilities to match the observed frequencies of events. This technique aims to ensure that when a model predicts a certain probability for an outcome, that probability accurately reflects the true likelihood of that outcome occurring.
Arguments
- x
A
tailor()
.- method
Character. One of
"logistic"
,"multinomial"
,"beta"
,"isotonic"
, or"isotonic_boot"
, corresponding to the function from the probably packageprobably::cal_estimate_logistic()
,probably::cal_estimate_multinomial()
, etc., respectively.
Data Usage
This adjustment requires estimation and, as such, different subsets of data
should be used to train it and evaluate its predictions. See the section
by the same name in ?workflows::add_tailor()
for more information on
preventing data leakage with postprocessors that require estimation. When
situated in a workflow, tailors will automatically be estimated with
appropriate subsets of data.
Examples
if (FALSE) {
library(modeldata)
# split example data
set.seed(1)
in_rows <- sample(c(TRUE, FALSE), nrow(two_class_example), replace = TRUE)
d_calibration <- two_class_example[in_rows, ]
d_test <- two_class_example[!in_rows, ]
head(d_calibration)
# specify calibration
tlr <-
tailor() %>%
adjust_probability_calibration(method = "logistic")
# train tailor on a subset of data. situate in a modeling workflow with
# `workflows::add_tailor()` to avoid having to specify column names manually
tlr_fit <- fit(
tlr,
d_calibration,
outcome = c(truth),
estimate = c(predicted),
probabilities = c(Class1, Class2)
)
# apply to predictions on another subset of data
head(d_test)
predict(tlr_fit, d_test)
}