Re-calibrate numeric predictions
Source:R/adjust-numeric-calibration.R
adjust_numeric_calibration.Rd
Calibration for regression models involves adjusting the model's predictions to adjust for correlated errors, ensuring that predicted values align closely with actual observed values across the entire range of outputs.
Arguments
- x
A
tailor()
.- method
Character. One of
"linear"
,"isotonic"
,"isotonic_boot"
, or"none"
, corresponding to the function from the probably packageprobably::cal_estimate_linear()
,probably::cal_estimate_isotonic()
, orprobably::cal_estimate_isotonic_boot()
, respectively. The default is to use"linear"
which, despite its name, fits a generalized additive model. Note that whenfit.tailor()
is called, the value may be changed to"none"
if there is insufficient data.- ...
Optional arguments to pass to the corresponding function in the probably package. These arguments must be named.
Value
An updated tailor()
containing the new operation.
Details
The "linear" method fits a model that predicts the observed versus the
predicted outcome values. This model is used to remove any overt systematic
trends from the data, equivalent to removing the model residuals from new
data. The underlying code fits that model using mgcv::gam()
. If
smooth = FALSE
is passed to the ...
, it uses stats::lm()
.
The isotonic method uses stats::isoreg()
to force the predicted values to
increase with the observed outcome. This creates a step function that will
map new predictions to values that are monotonically increasing with the
outcome. One side effect is that there are fewer, perhaps far fewer, unique
predicted values. The "isotonic boot" method resamples the data and generates
multiple isotonic regressions that are averaged and used to correct the
predictions. This may not be perfectly monotonic, but the number of unique
calibrated predictions increases with the number of bootstrap samples
(controlled by passing the times
argument to ...
).
Data Usage
This adjustment requires estimation and, as such, different subsets of data should be used to train it and evaluate its predictions.
Note that, when calling fit.tailor()
, if the calibration data have zero or
one row, the method
is changed to "none"
.
Examples
library(tibble)
# create example data
set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_calibration
#> # A tibble: 100 × 2
#> y y_pred
#> <dbl> <dbl>
#> 1 -0.626 -0.934
#> 2 0.184 0.134
#> 3 -0.836 -1.33
#> 4 1.60 0.956
#> 5 0.330 -0.490
#> 6 -0.820 1.36
#> 7 0.487 0.960
#> 8 0.738 1.28
#> 9 0.576 0.672
#> 10 -0.305 1.53
#> # ℹ 90 more rows
# specify calibration
tlr <-
tailor() |>
adjust_numeric_calibration(method = "linear")
# train tailor on a subset of data.
tlr_fit <- fit(tlr, d_calibration, outcome = y, estimate = y_pred)
#> Registered S3 method overwritten by 'butcher':
#> method from
#> as.character.dev_topic generics
# apply to predictions on another subset of data
d_test
#> # A tibble: 100 × 2
#> y y_pred
#> <dbl> <dbl>
#> 1 0.409 1.10
#> 2 1.69 -0.203
#> 3 1.59 2.76
#> 4 -0.331 -0.549
#> 5 -2.29 0.512
#> 6 2.50 2.76
#> 7 0.667 0.416
#> 8 0.541 0.838
#> 9 -0.0134 -1.03
#> 10 0.510 0.578
#> # ℹ 90 more rows
predict(tlr_fit, d_test)
#> # A tibble: 100 × 2
#> y y_pred
#> <dbl> <dbl>
#> 1 0.409 0.497
#> 2 1.69 0.162
#> 3 1.59 0.580
#> 4 -0.331 -0.0230
#> 5 -2.29 0.408
#> 6 2.50 0.580
#> 7 0.667 0.386
#> 8 0.541 0.463
#> 9 -0.0134 -0.319
#> 10 0.510 0.421
#> # ℹ 90 more rows