Skip to content

Feature request: add supplementary data into train/test after resampling #578

@abichat

Description

@abichat

I’d like to propose adding a utility function to {rsample} that allows appending new data to either the training or testing set of an rsplit or rset object after resampling.

This comes up in my workflow, for instance, to assess the marginal impact of a cohort on model performances (by training on data with and without the additional set, then comparing performance on a common test set).

I wrote a small helper function that could serve as a starting point, but it could likely be improved by leveraging the full internal structure and capabilities of {rsample}.

library(rsample)

add_data_rsample <- function(x, new_data, into = c("train", "test")) {
  stopifnot(is(x, "rsplit") | is(x, "rset"))

  into <- match.arg(into)

  if (is(x, "rsplit")) {
    if (into == "train") {
      y <- make_splits(dplyr::bind_rows(training(x), new_data), testing(x))
    }
    if (into == "test") {
      y <- make_splits(training(x), dplyr::bind_rows(testing(x), new_data))
    }
  }

  if (is(x, "rset")) {
    new_splits <- purrr::map(x$splits, ~ add_data_rsample(., new_data, into))
    y <- manual_rset(new_splits, x$id)
  }

  y
}

mt_cv <- vfold_cv(mtcars[1:24, ], v = 4)
mt_cv
#> #  4-fold cross-validation 
#> # A tibble: 4 × 2
#>   splits         id   
#>   <list>         <chr>
#> 1 <split [18/6]> Fold1
#> 2 <split [18/6]> Fold2
#> 3 <split [18/6]> Fold3
#> 4 <split [18/6]> Fold4
add_data_rsample(mt_cv, mtcars[25:32, ], into = "train") # + 8 rows in train
#> # Manual resampling 
#> # A tibble: 4 × 2
#>   splits         id   
#>   <list>         <chr>
#> 1 <split [26/6]> Fold1
#> 2 <split [26/6]> Fold2
#> 3 <split [26/6]> Fold3
#> 4 <split [26/6]> Fold4
add_data_rsample(mt_cv$splits[[1]], mtcars[25:32, ], into = "test") # + 8 rows in train
#> <Analysis/Assess/Total>
#> <18/14/32>

Created on 2025-07-10 with reprex v2.1.1

I hope this function could be useful for others.

Thanks for your consideration, and thanks again for your great work on {rsample}!

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions