Skip to contents

This function performs a Bayesian Posterior Predictive Check for count data models. It compares the observed Pearson Chi-squared statistic against a distribution of statistics calculated from data replicated from the posterior predictive distribution.

Usage

compute_ppc_stats(
  samples_matrix,
  data_response,
  family = c("poisson", "nbinomial", "quasi-poisson"),
  dispersion = NULL
)

Arguments

samples_matrix

A matrix of posterior samples of the expected count rate (mu). Rows correspond to observations, and columns correspond to Monte Carlo samples.

data_response

A numeric vector of the observed count data.

family

Character string defining the likelihood family: "poisson", "nbinomial", or "quasi-poisson".

dispersion

Optional numeric value for the dispersion parameter (e.g., size 'k' for Negative Binomial). Defaults to 1 if NULL for "nbinomial" or "quasi-poisson".

Value

A list of class "ppc_stats" containing:

T_obs

The observed Pearson Chi-squared statistic.

T_rep

A vector of Pearson Chi-squared statistics from replicated data.

p_value

The Bayesian p-value.

family

The likelihood family used.

Details

The function calculates the Pearson Chi-squared statistic for the observed data \(T_{obs}\) and for each replicated dataset \(T_{rep}\) generated from the posterior samples.

The statistic is defined as: $$T(y, \mu) = \sum_{i=1}^{n} \frac{(y_i - \mu_i)^2}{Var(y_i | \mu_i)}$$

Interpretation:

  • Bayesian p-value: Represents the probability that the replicated data is "more extreme" than the observed data.

    • \(p_{B} \approx 0.5\): Indicates an excellent model fit.

    • \(p_{B} < 0.05\) or \(p_{B} > 0.95\): Indicates a significant lack of fit.

  • Overdispersion: If \(T_{obs}\) is consistently larger than the \(T_{rep}\) distribution (\(p \approx 1\)).

  • Underdispersion: If \(T_{obs}\) is consistently smaller than the \(T_{rep}\) distribution (\(p \approx 0\)).

See also

Examples

if (FALSE) { # \dontrun{
#--- Setup mock data: 10 observations, 100 posterior samples
set.seed(123)
n_obs <- 10
n_sim <- 100

# Expected counts (mu) from a hypothetical model
mu_samples <- matrix(rlnorm(n_obs * n_sim, meanlog = 1),
  nrow = n_obs, ncol = n_sim
)

# Observed counts (y) - let's assume the model is a good fit
y_obs <- rpois(n_obs, rowMeans(mu_samples))

#--- Compute PPC statistics for a Poisson family
ppc_results <- compute_ppc_stats(
  samples_matrix = mu_samples,
  data_response = y_obs,
  family = "poisson"
)

#--- View the summary
print(ppc_results)

#--- Visualize the fit
plot(ppc_results)

} # }