Skip to contents

Evaluates the geometric properties of spatial folds to ensure spatial independence for cross-validation. This function verifies if block sizes and inter-block gaps are sufficient relative to the prior or model's estimated spatial range (\(\rho\)).

Usage

check_folds(object, ...)

# S3 method for class 'DataFolds'
check_folds(object, rho = NULL, plot = TRUE, ...)

Arguments

object

A DataFolds object created by create_folds().

...

Additional arguments.

rho

Numeric. Optional. The practical range (km) estimated from the exploratory analysis or the posterior practical range from a Bayesian analysis, the one estimated from the integrated model (e.g., the Matérn range parameter).

plot

Logical. If TRUE, returns a diagnostic plot.

Value

An object of class GeoDiagnostic.

Details

The function assesses independence based on the minimum gap between folds compared to the practical range (\(\rho\)):

  • Contiguous: Gap = 0. High risk of spatial leakage; observations in test folds are spatially correlated with training data.

  • Weakly Independent: 0 < Gap < \(\rho\). A physical gap exists, but correlation remains above 0.1.

  • Independent: Gap \(\ge \rho\). Spatial correlation is below 0.1 at the boundary, satisfying standard requirements for spatial independence for most blocked cross-validation applications (Roberts et al. 2017).

The spatial range can be estimated using the function cv_spatial_autocor of the blockCV package (Valavi et al. 2018) or any other tool. The authors recommended this value (in metres) as the optimal block size for their spatial blocking scheme. For instance, if a covariance model is fitted to an experimental variogram, the 10% practical range can be derived using an interpolation method. Note that several packages are available to estimate the range from the observed spatial data and different parameterisations are used. We provide the helper function solve_practical_range to allow derive a unified practical range for Matérn covariance fitted to the data with INLA, geoR or spatstat package.

References

  • Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera-Arroita G, Hauenstein S, Lahoz-Monfort JJ, Schröder B, Thuiller W, et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography (2017) 40:913–929. doi:10.1111/ecog.02881 .

  • Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G. blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. bioRxiv (2018). doi:10.1101/357798 .

See also

DataFolds-methods for interacting with DataFolds objects.

Other diagnostic tools: EnvDiagnostic-methods, GeoDiagnostic-methods, check_env_balance(), summarise_fold_diagnostics()

Examples

if (FALSE) { # \dontrun{
library(sf)
library(terra)
library(ggplot2)
library(isdmtools)

# Generate the data as a list of sf objects
set.seed(42)
presence_data <- data.frame(
  x = runif(100, 0, 4),
  y = runif(100, 6, 13),
  site = rbinom(100, 1, 0.6)
) |> st_as_sf(coords = c("x", "y"), crs = 4326)

count_data <- data.frame(
  x = runif(50, 0, 4),
  y = runif(50, 6, 13),
  count = rpois(50, 5)
) |> st_as_sf(coords = c("x", "y"), crs = 4326)

datasets_list <- list(Presence = presence_data, Count = count_data)

ben_coords <- matrix(c(0, 6, 4, 6, 4, 13, 0, 13, 0, 6), ncol = 2, byrow = TRUE)
ben_sf <- st_sfc(st_polygon(list(ben_coords)), crs = 4326)
ben_sf <- st_sf(data.frame(name = "Benin"), ben_sf)

# Create Folds using create_folds()
folds <- create_folds(
  datasets_list,
  region_polygon = ben_sf,
  k = 5,
  cv_method = "cluster"
)

# Check Spatial Independence
# Assuming the autocorrelation practical range (rho) is 150 km
spat_diag <- check_folds(folds, rho = 150, plot = TRUE)

# View results
print(spat_diag)
plot(spat_diag)
} # }