Package index
Package Overview
High-level entry point and the S3 class that all diagnostic functions operate on. Start here if you are reading the reference for the first time.
-
ReproStatReproStat-package - ReproStat: Reproducibility Diagnostics for Statistical Modeling
-
run_diagnostics() - Run reproducibility diagnostics
-
print(<reprostat>) - Print a reprostat object
Data Perturbation
perturb_data() is the building block underneath run_diagnostics(). Use it directly when you need fine-grained control over how the data are perturbed — for example, to pass the perturbed datasets to an external modelling pipeline. Three strategies are available: - Bootstrap ("bootstrap") — draws n rows with replacement, mimicking ordinary sampling variability. - Subsampling ("subsample") — draws m = ⌊ρn⌋ rows without replacement, stressing robustness to sample composition. - Noise injection ("noise") — adds Gaussian noise scaled to each predictor’s standard deviation, simulating measurement error.
-
perturb_data() - Perturb a dataset
Stability Metrics
Four complementary views of how model outputs move across perturbation runs. Each function takes a reprostat object returned by run_diagnostics() and returns a numeric summary. | Function | Question answered | Unit | |—|—|—| | coef_stability() | How much do coefficient estimates vary? | variance (lower = more stable) | | pvalue_stability() | How often is each predictor significant? | frequency in [0, 1] | | selection_stability() | Do predictors keep the same direction/inclusion? | proportion in [0, 1] | | prediction_stability() | How much do predictions change? | variance (lower = more stable) | pvalue_stability() and selection_stability() measure different things: the former asks about the stability of a binary significance decision; the latter asks about the direction or inclusion pattern of each predictor.
-
coef_stability() - Coefficient stability
-
pvalue_stability() - P-value stability
-
selection_stability() - Selection stability
-
prediction_stability() - Prediction stability
Reproducibility Index
The Reproducibility Index (RI) aggregates the four stability components into a single 0–100 score using a per-component normalisation and a simple average. ri_confidence_interval() estimates uncertainty in that score by resampling the stored perturbation draws — no additional model fitting required. RI quick-reference guide | RI | Interpretation | |—|—| | 90–100 | Highly stable under the chosen perturbation design | | 70–89 | Moderately stable; overall pattern is dependable | | 50–69 | Mixed stability; inspect component breakdown | | < 50 | Low stability; results may be fragile | These are interpretive anchors, not universal cutoffs. Always inspect the component decomposition alongside the aggregate score.
-
reproducibility_index() - Reproducibility index
-
ri_confidence_interval() - Bootstrap confidence interval for the reproducibility index
Cross-Validation Ranking Stability
cv_ranking_stability() evaluates model-selection stability: given several candidate formulas, which one wins most consistently across repeated K-fold cross-validation? It records each model’s rank in every repeat and summarises the distribution of those ranks. Two summary statistics are particularly useful: - top1_frequency — proportion of repeats in which a model ranked first. High values mean the model is a consistently strong choice. - mean_rank — average rank across all repeats (lower is better). It is possible for the model with the lowest mean error to not have the highest top-1 frequency.
Supports the same four backends as run_diagnostics().
-
cv_ranking_stability() - Cross-validation ranking stability
-
plot_cv_stability() - Plot cross-validation ranking stability
-
plot_cv_stability_gg() - ggplot2-based CV ranking stability plot
Visualization
Two families of plotting helpers are provided. The base-graphics functions (plot_stability(), plot_cv_stability()) have no external dependencies. The ggplot2 variants (plot_stability_gg(), plot_cv_stability_gg()) return ggplot objects that can be further customised with standard ggplot2 layers and themes. Both families are called for their side effects; the ggplot2 variants additionally return a ggplot object invisibly.
-
plot_stability() - Plot stability diagnostics
-
plot_stability_gg() - ggplot2-based stability plot