Apply an equivocal zone to a binary classification model.
Source:R/adjust-equivocal-zone.R
adjust_equivocal_zone.Rd
Equivocal zones describe intervals of predicted probabilities that are deemed too uncertain or ambiguous to be assigned a hard class. Rather than predicting a hard class when the probability is very close to a threshold, tailors using this adjustment predict "[EQ]".
Arguments
- x
A
tailor()
.- value
A numeric value (between zero and 1/2) or
hardhat::tune()
. The value is the size of the buffer around the threshold.- threshold
A numeric value (between zero and one) or
hardhat::tune()
. Defaults toadjust_probability_threshold(threshold)
if previously set inx
, or1 / 2
if not.
Details
This function transforms the class prediction column estimate
to have type
class_pred
from probably::class_pred()
. You can loosely think of this
column type as a factor, except there's a possible entry [EQ]
that is
not a level and will be excluded from performance metric calculations.
As a result, the output column has the same number of levels as the input,
except now has a possible entry [EQ]
that tidymodels funcitons know to
exclude from further analyses.
Data Usage
This adjustment doesn't require estimation and, as such, the same data that's
used to train it with fit()
can be predicted on with predict()
; fitting
this adjustment just collects metadata on the supplied column names and does
not risk data leakage.
Examples
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
library(modeldata)
head(two_class_example)
#> truth Class1 Class2 predicted
#> 1 Class2 0.003589243 0.9964107574 Class2
#> 2 Class1 0.678621054 0.3213789460 Class1
#> 3 Class2 0.110893522 0.8891064779 Class2
#> 4 Class1 0.735161703 0.2648382969 Class1
#> 5 Class2 0.016239960 0.9837600397 Class2
#> 6 Class1 0.999275071 0.0007249286 Class1
# `predicted` gives hard class predictions based on probabilities
two_class_example %>% count(predicted)
#> predicted n
#> 1 Class1 277
#> 2 Class2 223
# when probabilities are within (.25, .75), consider them equivocal
tlr <-
tailor() %>%
adjust_equivocal_zone(value = 1 / 4)
tlr
#>
#> ── tailor ──────────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#>
#> • Add equivocal zone of size 0.25.
# fit by supplying column names. situate in a modeling workflow
# with `workflows::add_tailor()` to avoid having to do so manually
tlr_fit <- fit(
tlr,
two_class_example,
outcome = c(truth),
estimate = c(predicted),
probabilities = c(Class1, Class2)
)
tlr_fit
#>
#> ── tailor ──────────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#>
#> • Add equivocal zone of size 0.25. [trained]
# adjust hard class predictions
predict(tlr_fit, two_class_example) %>% count(predicted)
#> # A tibble: 3 × 2
#> predicted n
#> <clss_prd> <int>
#> 1 [EQ] 86
#> 2 Class1 229
#> 3 Class2 185