Re-calibrate numeric predictions — adjust_numeric

Calibration for regression models involves adjusting the model's predictions to adjust for correlated errors, ensuring that predicted values align closely with actual observed values across the entire range of outputs.

Usage

adjust_numeric_calibration(x, method = NULL, ...)

Arguments

x: A tailor().
method: Character. One of "linear", "isotonic","isotonic_boot", or "none", corresponding to the function from the probably package probably::cal_estimate_linear(), probably::cal_estimate_isotonic(), or probably::cal_estimate_isotonic_boot(), respectively. The default is to use "linear" which, despite its name, fits a generalized additive model. Note that when fit.tailor() is called, the value may be changed to "none" if there is insufficient data.
...: Optional arguments to pass to the corresponding function in the probably package. These arguments must be named.

Value

An updated tailor() containing the new operation.

Details

The "linear" method fits a model that predicts the observed versus the predicted outcome values. This model is used to remove any overt systematic trends from the data, equivalent to removing the model residuals from new data. The underlying code fits that model using mgcv::gam(). If smooth = FALSE is passed to the ..., it uses stats::lm().

The isotonic method uses stats::isoreg() to force the predicted values to increase with the observed outcome. This creates a step function that will map new predictions to values that are monotonically increasing with the outcome. One side effect is that there are fewer, perhaps far fewer, unique predicted values. The "isotonic boot" method resamples the data and generates multiple isotonic regressions that are averaged and used to correct the predictions. This may not be perfectly monotonic, but the number of unique calibrated predictions increases with the number of bootstrap samples (controlled by passing the times argument to ...).

Data Usage

This adjustment requires estimation and, as such, different subsets of data should be used to train it and evaluate its predictions.

Note that, when calling fit.tailor(), if the calibration data have zero or one row, the method is changed to "none".

Examples

library(tibble)

# create example data
set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))

d_calibration
#> # A tibble: 100 × 2
#>         y y_pred
#>     <dbl>  <dbl>
#>  1 -0.626 -0.934
#>  2  0.184  0.134
#>  3 -0.836 -1.33 
#>  4  1.60   0.956
#>  5  0.330 -0.490
#>  6 -0.820  1.36 
#>  7  0.487  0.960
#>  8  0.738  1.28 
#>  9  0.576  0.672
#> 10 -0.305  1.53 
#> # ℹ 90 more rows

# specify calibration
tlr <-
  tailor() |>
  adjust_numeric_calibration(method = "linear")

# train tailor on a subset of data.
tlr_fit <- fit(tlr, d_calibration, outcome = y, estimate = y_pred)
#> Registered S3 method overwritten by 'butcher':
#>   method                 from    
#>   as.character.dev_topic generics

# apply to predictions on another subset of data
d_test
#> # A tibble: 100 × 2
#>          y y_pred
#>      <dbl>  <dbl>
#>  1  0.409   1.10 
#>  2  1.69   -0.203
#>  3  1.59    2.76 
#>  4 -0.331  -0.549
#>  5 -2.29    0.512
#>  6  2.50    2.76 
#>  7  0.667   0.416
#>  8  0.541   0.838
#>  9 -0.0134 -1.03 
#> 10  0.510   0.578
#> # ℹ 90 more rows

predict(tlr_fit, d_test)
#> # A tibble: 100 × 2
#>          y  y_pred
#>      <dbl>   <dbl>
#>  1  0.409   0.497 
#>  2  1.69    0.162 
#>  3  1.59    0.580 
#>  4 -0.331  -0.0230
#>  5 -2.29    0.408 
#>  6  2.50    0.580 
#>  7  0.667   0.386 
#>  8  0.541   0.463 
#>  9 -0.0134 -0.319 
#> 10  0.510   0.421 
#> # ℹ 90 more rows