Package index • ReproStat

Package Overview

High-level entry point and the S3 class that all diagnostic functions operate on. Start here if you are reading the reference for the first time.

ReproStat ReproStat-package: ReproStat: Reproducibility Diagnostics for Statistical Modeling
run_diagnostics(): Run reproducibility diagnostics
print(<reprostat>): Print a reprostat object

Data Perturbation

perturb_data() is the building block underneath run_diagnostics(). Use it directly when you need fine-grained control over how the data are perturbed — for example, to pass the perturbed datasets to an external modelling pipeline. Three strategies are available: - Bootstrap ("bootstrap") — draws n rows with replacement, mimicking ordinary sampling variability. - Subsampling ("subsample") — draws m = ⌊ρn⌋ rows without replacement, stressing robustness to sample composition. - Noise injection ("noise") — adds Gaussian noise scaled to each predictor’s standard deviation, simulating measurement error.

perturb_data(): Perturb a dataset

Stability Metrics

coef_stability(): Coefficient stability
pvalue_stability(): P-value stability
selection_stability(): Selection stability
prediction_stability(): Prediction stability

Reproducibility Index

The Reproducibility Index (RI) aggregates the four stability components into a single 0–100 score using a per-component normalisation and a simple average. ri_confidence_interval() estimates uncertainty in that score by resampling the stored perturbation draws — no additional model fitting required. RI quick-reference guide | RI | Interpretation | |—|—| | 90–100 | Highly stable under the chosen perturbation design | | 70–89 | Moderately stable; overall pattern is dependable | | 50–69 | Mixed stability; inspect component breakdown | | < 50 | Low stability; results may be fragile | These are interpretive anchors, not universal cutoffs. Always inspect the component decomposition alongside the aggregate score.

reproducibility_index(): Reproducibility index
ri_confidence_interval(): Bootstrap confidence interval for the reproducibility index

Cross-Validation Ranking Stability

cv_ranking_stability() evaluates model-selection stability: given several candidate formulas, which one wins most consistently across repeated K-fold cross-validation? It records each model’s rank in every repeat and summarises the distribution of those ranks. Two summary statistics are particularly useful: - top1_frequency — proportion of repeats in which a model ranked first. High values mean the model is a consistently strong choice. - mean_rank — average rank across all repeats (lower is better). It is possible for the model with the lowest mean error to not have the highest top-1 frequency.

Supports the same four backends as run_diagnostics().

cv_ranking_stability(): Cross-validation ranking stability
plot_cv_stability(): Plot cross-validation ranking stability
plot_cv_stability_gg(): ggplot2-based CV ranking stability plot

Visualization

Two families of plotting helpers are provided. The base-graphics functions (plot_stability(), plot_cv_stability()) have no external dependencies. The ggplot2 variants (plot_stability_gg(), plot_cv_stability_gg()) return ggplot objects that can be further customised with standard ggplot2 layers and themes. Both families are called for their side effects; the ggplot2 variants additionally return a ggplot object invisibly.

plot_stability(): Plot stability diagnostics
plot_stability_gg(): ggplot2-based stability plot