Skip to contents

Package Overview

High-level entry point and the S3 class that all diagnostic functions operate on. Start here if you are reading the reference for the first time.

ReproStat ReproStat-package
ReproStat: Reproducibility Diagnostics for Statistical Modeling
run_diagnostics()
Run reproducibility diagnostics
print(<reprostat>)
Print a reprostat object

Data Perturbation

perturb_data() is the building block underneath run_diagnostics(). Use it directly when you need fine-grained control over how the data are perturbed — for example, to pass the perturbed datasets to an external modelling pipeline. Three strategies are available: - Bootstrap ("bootstrap") — draws n rows with replacement, mimicking ordinary sampling variability. - Subsampling ("subsample") — draws m = ⌊ρn⌋ rows without replacement, stressing robustness to sample composition. - Noise injection ("noise") — adds Gaussian noise scaled to each predictor’s standard deviation, simulating measurement error.

perturb_data()
Perturb a dataset

Stability Metrics

Four complementary views of how model outputs move across perturbation runs. Each function takes a reprostat object returned by run_diagnostics() and returns a numeric summary. | Function | Question answered | Unit | |—|—|—| | coef_stability() | How much do coefficient estimates vary? | variance (lower = more stable) | | pvalue_stability() | How often is each predictor significant? | frequency in [0, 1] | | selection_stability() | Do predictors keep the same direction/inclusion? | proportion in [0, 1] | | prediction_stability() | How much do predictions change? | variance (lower = more stable) | pvalue_stability() and selection_stability() measure different things: the former asks about the stability of a binary significance decision; the latter asks about the direction or inclusion pattern of each predictor.

coef_stability()
Coefficient stability
pvalue_stability()
P-value stability
selection_stability()
Selection stability
prediction_stability()
Prediction stability

Reproducibility Index

The Reproducibility Index (RI) aggregates the four stability components into a single 0–100 score using a per-component normalisation and a simple average. ri_confidence_interval() estimates uncertainty in that score by resampling the stored perturbation draws — no additional model fitting required. RI quick-reference guide | RI | Interpretation | |—|—| | 90–100 | Highly stable under the chosen perturbation design | | 70–89 | Moderately stable; overall pattern is dependable | | 50–69 | Mixed stability; inspect component breakdown | | < 50 | Low stability; results may be fragile | These are interpretive anchors, not universal cutoffs. Always inspect the component decomposition alongside the aggregate score.

reproducibility_index()
Reproducibility index
ri_confidence_interval()
Bootstrap confidence interval for the reproducibility index

Cross-Validation Ranking Stability

cv_ranking_stability() evaluates model-selection stability: given several candidate formulas, which one wins most consistently across repeated K-fold cross-validation? It records each model’s rank in every repeat and summarises the distribution of those ranks. Two summary statistics are particularly useful: - top1_frequency — proportion of repeats in which a model ranked first. High values mean the model is a consistently strong choice. - mean_rank — average rank across all repeats (lower is better). It is possible for the model with the lowest mean error to not have the highest top-1 frequency.

Supports the same four backends as run_diagnostics().

cv_ranking_stability()
Cross-validation ranking stability
plot_cv_stability()
Plot cross-validation ranking stability
plot_cv_stability_gg()
ggplot2-based CV ranking stability plot

Visualization

Two families of plotting helpers are provided. The base-graphics functions (plot_stability(), plot_cv_stability()) have no external dependencies. The ggplot2 variants (plot_stability_gg(), plot_cv_stability_gg()) return ggplot objects that can be further customised with standard ggplot2 layers and themes. Both families are called for their side effects; the ggplot2 variants additionally return a ggplot object invisibly.

plot_stability()
Plot stability diagnostics
plot_stability_gg()
ggplot2-based stability plot