Generates a perturbed version of a dataset using one of three strategies: bootstrap resampling, subsampling without replacement, or Gaussian noise injection.
Usage
perturb_data(
data,
method = c("bootstrap", "subsample", "noise"),
frac = 0.8,
noise_sd = 0.05,
response_col = NULL
)Arguments
- data
A data frame.
- method
Character string specifying the perturbation method. One of
"bootstrap"(default),"subsample", or"noise".- frac
Fraction of rows to retain for subsampling. Ignored for other methods. Must be in \((0, 1]\). Default is
0.8.- noise_sd
Noise level as a fraction of each column's standard deviation. Ignored unless
method = "noise". Default is0.05.- response_col
Optional character string naming the response (outcome) column to exclude from noise injection. Useful when you want to perturb predictors only and leave the outcome unchanged. Ignored for
method = "bootstrap"andmethod = "subsample". WhenNULL(default) all numeric columns including the response receive noise.
Value
A data frame with the same columns as data. The number of
rows equals nrow(data) for bootstrap and noise, and
floor(frac * nrow(data)) for subsampling.
Examples
set.seed(1)
d_boot <- perturb_data(mtcars, method = "bootstrap")
d_sub <- perturb_data(mtcars, method = "subsample", frac = 0.7)
d_nois <- perturb_data(mtcars, method = "noise", noise_sd = 0.1)
# Perturb predictors only, leave the response (mpg) unchanged:
d_pred_only <- perturb_data(mtcars, method = "noise",
noise_sd = 0.1, response_col = "mpg")