Computes a composite reproducibility index (RI) on a 0–100 scale by aggregating normalized versions of coefficient, p-value, selection, and prediction stability.
Arguments
- diag_obj
A
reprostatobject fromrun_diagnostics.
Value
A list with components:
indexScalar RI on a 0–100 scale.
componentsNamed numeric vector with four sub-scores:
coef,pvalue,selection,prediction.pvalueisNAforbackend = "glmnet";selectionis always available.
Details
Each component is mapped to \([0, 1]\) as follows:
- Coefficient component (\(C_\beta\))
\(C_\beta = \frac{1}{p}\sum_{j=1}^{p} \exp\!\left(-S_{\beta,j} / (|\hat\beta_j^{(0)}| + \bar\beta)\right)\), where \(\bar\beta = \max(\mathrm{median}_j|\hat\beta_j^{(0)}|,\, 10^{-4})\) is a global scale reference. This prevents the exponential from collapsing to zero for weakly-identified (near-zero) predictors. Includes all model terms (intercept and predictors).
- P-value component (\(C_p\))
\(C_p = \frac{1}{p'}\sum_{j \neq \text{intercept}} |2 S_{p,j} - 1|\), where \(S_{p,j}\) is the proportion of perturbation iterations in which term \(j\) is significant at level
alphaand \(p'\) is the number of predictor terms. Values near 0 or 1 (consistent decisions) score high; 0.5 (random) scores zero.NAforbackend = "glmnet".- Selection component (\(C_\mathrm{sel}\))
For
"lm","glm","rlm": the mean sign consistency across predictors — the proportion of perturbation iterations in which each predictor's estimated sign agrees with the base-fit sign. Captures stability of effect direction, which is distinct from significance stability (\(C_p\)). For"glmnet": the mean non-zero selection frequency — proportion of perturbation iterations in which each predictor's coefficient is non-zero. Always available (neverNA). Excludes the intercept.- Prediction component (\(C_\mathrm{pred}\))
\(C_\mathrm{pred} = \exp(-\bar S_\mathrm{pred} / (\mathrm{Var}(y) + \varepsilon))\). Prediction variance relative to outcome variance drives the decay.
The RI is \(100 \times (C_\beta + C_p + C_\mathrm{sel} +
C_\mathrm{pred}) / k\), where \(k\) is the number of non-NA
components. For backend = "glmnet", \(C_p\) is NA so
the RI is based on three components (\(k = 3\)): \(C_\beta\),
\(C_\mathrm{sel}\), and \(C_\mathrm{pred}\). All other backends
contribute all four components (\(k = 4\)).
Comparability across backends: because "glmnet" uses three
components and "lm"/"glm"/"rlm" use four, RI values
are not directly comparable across backends.
Examples
set.seed(1)
d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50)
reproducibility_index(d)
#> $index
#> [1] 97.50225
#>
#> $components
#> coef pvalue selection prediction
#> 0.9562648 0.9600000 1.0000000 0.9838253
#>