Re-calibrate numeric predictions
Source:R/adjust-numeric-calibration.R
adjust_numeric_calibration.Rd
Calibration for regression models involves adjusting the model's predictions to adjust for correlated errors, ensuring that predicted values align closely with actual observed values across the entire range of outputs.
Arguments
- x
A
tailor()
.- method
Character. One of
"linear"
,"isotonic"
, or"isotonic_boot"
, corresponding to the function from the probably packageprobably::cal_estimate_linear()
,probably::cal_estimate_isotonic()
, orprobably::cal_estimate_isotonic_boot()
, respectively.
Data Usage
This adjustment requires estimation and, as such, different subsets of data
should be used to train it and evaluate its predictions. See the section
by the same name in ?workflows::add_tailor()
for more information on
preventing data leakage with postprocessors that require estimation. When
situated in a workflow, tailors will automatically be estimated with
appropriate subsets of data.
Examples
library(tibble)
# create example data
set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_calibration
#> # A tibble: 100 × 2
#> y y_pred
#> <dbl> <dbl>
#> 1 -0.626 -0.934
#> 2 0.184 0.134
#> 3 -0.836 -1.33
#> 4 1.60 0.956
#> 5 0.330 -0.490
#> 6 -0.820 1.36
#> 7 0.487 0.960
#> 8 0.738 1.28
#> 9 0.576 0.672
#> 10 -0.305 1.53
#> # ℹ 90 more rows
# specify calibration
tlr <-
tailor() %>%
adjust_numeric_calibration(method = "linear")
# train tailor on a subset of data. situate in a modeling workflow with
# `workflows::add_tailor()` to avoid having to specify column names manually
tlr_fit <- fit(tlr, d_calibration, outcome = y, estimate = y_pred)
# apply to predictions on another subset of data
d_test
#> # A tibble: 100 × 2
#> y y_pred
#> <dbl> <dbl>
#> 1 0.409 1.10
#> 2 1.69 -0.203
#> 3 1.59 2.76
#> 4 -0.331 -0.549
#> 5 -2.29 0.512
#> 6 2.50 2.76
#> 7 0.667 0.416
#> 8 0.541 0.838
#> 9 -0.0134 -1.03
#> 10 0.510 0.578
#> # ℹ 90 more rows
predict(tlr_fit, d_test)
#> # A tibble: 100 × 2
#> y y_pred
#> <dbl> <dbl>
#> 1 0.409 0.497
#> 2 1.69 0.162
#> 3 1.59 0.580
#> 4 -0.331 -0.0230
#> 5 -2.29 0.408
#> 6 2.50 0.580
#> 7 0.667 0.386
#> 8 0.541 0.463
#> 9 -0.0134 -0.319
#> 10 0.510 0.421
#> # ℹ 90 more rows