-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Labels
featurea feature request or enhancementa feature request or enhancement
Description
I’d like to propose adding a utility function to {rsample} that allows appending new data to either the training or testing set of an rsplit or rset object after resampling.
This comes up in my workflow, for instance, to assess the marginal impact of a cohort on model performances (by training on data with and without the additional set, then comparing performance on a common test set).
I wrote a small helper function that could serve as a starting point, but it could likely be improved by leveraging the full internal structure and capabilities of {rsample}.
library(rsample)
add_data_rsample <- function(x, new_data, into = c("train", "test")) {
stopifnot(is(x, "rsplit") | is(x, "rset"))
into <- match.arg(into)
if (is(x, "rsplit")) {
if (into == "train") {
y <- make_splits(dplyr::bind_rows(training(x), new_data), testing(x))
}
if (into == "test") {
y <- make_splits(training(x), dplyr::bind_rows(testing(x), new_data))
}
}
if (is(x, "rset")) {
new_splits <- purrr::map(x$splits, ~ add_data_rsample(., new_data, into))
y <- manual_rset(new_splits, x$id)
}
y
}
mt_cv <- vfold_cv(mtcars[1:24, ], v = 4)
mt_cv
#> # 4-fold cross-validation
#> # A tibble: 4 × 2
#> splits id
#> <list> <chr>
#> 1 <split [18/6]> Fold1
#> 2 <split [18/6]> Fold2
#> 3 <split [18/6]> Fold3
#> 4 <split [18/6]> Fold4
add_data_rsample(mt_cv, mtcars[25:32, ], into = "train") # + 8 rows in train
#> # Manual resampling
#> # A tibble: 4 × 2
#> splits id
#> <list> <chr>
#> 1 <split [26/6]> Fold1
#> 2 <split [26/6]> Fold2
#> 3 <split [26/6]> Fold3
#> 4 <split [26/6]> Fold4
add_data_rsample(mt_cv$splits[[1]], mtcars[25:32, ], into = "test") # + 8 rows in train
#> <Analysis/Assess/Total>
#> <18/14/32>Created on 2025-07-10 with reprex v2.1.1
I hope this function could be useful for others.
Thanks for your consideration, and thanks again for your great work on {rsample}!
DanChaltiel
Metadata
Metadata
Assignees
Labels
featurea feature request or enhancementa feature request or enhancement