Compute Posterior Predictive Check (PPC) Statistics

This function performs a Bayesian Posterior Predictive Check for count data models. It compares the observed Pearson Chi-squared statistic against a distribution of statistics calculated from data replicated from the posterior predictive distribution.

Usage

compute_ppc_stats(
  samples_matrix,
  data_response,
  family = c("poisson", "nbinomial", "quasi-poisson"),
  dispersion = NULL
)

Arguments

samples_matrix: A matrix of posterior samples of the expected count rate (mu). Rows correspond to observations, and columns correspond to Monte Carlo samples.
data_response: A numeric vector of the observed count data.
family: Character string defining the likelihood family: "poisson", "nbinomial", or "quasi-poisson".
dispersion: Optional numeric value for the dispersion parameter (e.g., size 'k' for Negative Binomial). Defaults to 1 if NULL for "nbinomial" or "quasi-poisson".

Value

A list of class "ppc_stats" containing:

T_obs: The observed Pearson Chi-squared statistic.
T_rep: A vector of Pearson Chi-squared statistics from replicated data.
p_value: The Bayesian p-value.
family: The likelihood family used.

Details

The function calculates the Pearson Chi-squared statistic for the observed data $T_{obs}$ and for each replicated dataset $T_{rep}$ generated from the posterior samples.

The statistic is defined as: $$T(y, \mu) = \sum_{i=1}^{n} \frac{(y_i - \mu_i)^2}{Var(y_i | \mu_i)}$$

Interpretation:

Bayesian p-value: Represents the probability that the replicated data is "more extreme" than the observed data.
- $p_{B} \approx 0.5$: Indicates an excellent model fit.
- $p_{B} < 0.05$ or $p_{B} > 0.95$: Indicates a significant lack of fit.
Overdispersion: If $T_{obs}$ is consistently larger than the $T_{rep}$ distribution ($p \approx 1$).
Underdispersion: If $T_{obs}$ is consistently smaller than the $T_{rep}$ distribution ($p \approx 0$).

Examples

if (FALSE) { # \dontrun{
#--- Setup mock data: 10 observations, 100 posterior samples
set.seed(123)
n_obs <- 10
n_sim <- 100

# Expected counts (mu) from a hypothetical model
mu_samples <- matrix(rlnorm(n_obs * n_sim, meanlog = 1),
  nrow = n_obs, ncol = n_sim
)

# Observed counts (y) - let's assume the model is a good fit
y_obs <- rpois(n_obs, rowMeans(mu_samples))

#--- Compute PPC statistics for a Poisson family
ppc_results <- compute_ppc_stats(
  samples_matrix = mu_samples,
  data_response = y_obs,
  family = "poisson"
)

#--- View the summary
print(ppc_results)

#--- Visualize the fit
plot(ppc_results)

} # }

Usage

Arguments

Value

Details

See also

Examples