Skip to contents

Calibration for regression models involves adjusting the model's predictions to adjust for correlated errors, ensuring that predicted values align closely with actual observed values across the entire range of outputs.

Usage

adjust_numeric_calibration(x, method = NULL)

Arguments

x

A tailor().

method

Character. One of "linear", "isotonic", or "isotonic_boot", corresponding to the function from the probably package probably::cal_estimate_linear(), probably::cal_estimate_isotonic(), or probably::cal_estimate_isotonic_boot(), respectively.

Data Usage

This adjustment requires estimation and, as such, different subsets of data should be used to train it and evaluate its predictions. See the section by the same name in ?workflows::add_tailor() for more information on preventing data leakage with postprocessors that require estimation. When situated in a workflow, tailors will automatically be estimated with appropriate subsets of data.

Examples

library(tibble)

# create example data
set.seed(1)
d_potato <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))

d_potato
#> # A tibble: 100 × 2
#>         y y_pred
#>     <dbl>  <dbl>
#>  1 -0.626 -0.934
#>  2  0.184  0.134
#>  3 -0.836 -1.33 
#>  4  1.60   0.956
#>  5  0.330 -0.490
#>  6 -0.820  1.36 
#>  7  0.487  0.960
#>  8  0.738  1.28 
#>  9  0.576  0.672
#> 10 -0.305  1.53 
#> # ℹ 90 more rows

# specify calibration
tlr <-
  tailor() %>%
  adjust_numeric_calibration(method = "linear")

# train tailor on a subset of data. situate in a modeling workflow with
# `workflows::add_tailor()` to avoid having to specify column names manually
tlr_fit <- fit(tlr, d_potato, outcome = y, estimate = y_pred)

# apply to predictions on another subset of data
d_test
#> # A tibble: 100 × 2
#>          y y_pred
#>      <dbl>  <dbl>
#>  1  0.409   1.10 
#>  2  1.69   -0.203
#>  3  1.59    2.76 
#>  4 -0.331  -0.549
#>  5 -2.29    0.512
#>  6  2.50    2.76 
#>  7  0.667   0.416
#>  8  0.541   0.838
#>  9 -0.0134 -1.03 
#> 10  0.510   0.578
#> # ℹ 90 more rows

predict(tlr_fit, d_test)
#> # A tibble: 100 × 2
#>          y  y_pred
#>      <dbl>   <dbl>
#>  1  0.409   0.497 
#>  2  1.69    0.162 
#>  3  1.59    0.580 
#>  4 -0.331  -0.0230
#>  5 -2.29    0.408 
#>  6  2.50    0.580 
#>  7  0.667   0.386 
#>  8  0.541   0.463 
#>  9 -0.0134 -0.319 
#> 10  0.510   0.421 
#> # ℹ 90 more rows