Is Moran’s I robust?

4 May, 2012 (12:15) | R, Spatial analysis | By: Manuel Gimond

Moran’s I (both global and local) is a measure of spatial autocorrelation. ¬†One form of its equation looks like this:

  I = \frac{N}{\sum_{i} \sum_{j} w_{ij}} \frac{\sum_{i} \sum_{j} w_{ij} (X_i - \bar X)(X_j - \bar X)}{\sum_{i} (X_i - \bar X )^2}

Note the residual and spread terms. As such, Moran’s I can be sensitive to outliers. The following example demonstrates this point:

library(spdep)
library(maptools)
# Load NC data
NC <- readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1],
      ID="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
rn <- sapply(slot(NC, "polygons"), function(x) slot(x, "ID"))
NB <- read.gal(system.file("etc/weights/ncCR85.gal", package="spdep")[1], 
      region.id=rn)
n  <- length(NB)
# Define some variables
set.seed(4956)
x.norm <- rnorm(n)     # Normal distribution
rho    <- 0.3          # autoregressive parameter
W      <- nb2listw(NB) # Generate spatial weights
# Generate spatially autocorrelated datasets 
# (one normally distributed the other skewed)
x.norm.auto <- invIrW(W, rho) %*% x.norm # Generate autocorrelated values
x.skew.auto <- exp(x.norm.auto) # Transform orginal data to create a 'skewed' version

You can map the simulated data as follows:

# Append data to NC polygon object
auto_sk <- as.vector(x.skew.auto)
auto_no <- as.vector(x.norm.auto)
rn      <- as.integer(rn)
NCa     <- spCbind(NC,auto_sk)
NCa     <- spCbind(NCa,auto_no)
NCa     <- spCbind(NCa,rn)   # Needed if the GAL file is to be used in GeoDa
# Plot map (skewed data in this example)
brks <- quantile(auto_sk, seq(0,1,1/10))
cols <- grey.colors(length(brks)-1, 0.95, 0.55, 2.2)
plot(NCa, col=cols[findInterval(auto_sk, brks, all.inside=TRUE)])

# Compute Moran's I for both datasets
I.norm = moran.test(x.norm.auto, listw=W)
I.skew = moran.test(x.skew.auto, listw=W)
I.norm$p.value;I.skew$p.value
# The above approach makes assumptions about the Moran's I when in fact, it may not be.
# To correct for this, we can generate an empirical sampling distribution function
# using a Monte Carlo procedure.
MCI.norm <- moran.mc(x.norm.auto, listw=W, nsim=9999)
MCI.skew <- moran.mc(x.skew.auto, listw=W, nsim=9999)
# Compare the P-values
MCI.norm$p.value;MCI.skew$p.value

The p-values from the Monte Carlo simulations are 0.0133 and 0.1672 for the normal and skewed data respectively. The test results under different data distributions provide us with two distinct conclusions at a 5% significance level!

oopar <- par(mfrow=c(1,2), mar=c(4,4,3,2)+0.1)
moran.plot(x.norm.auto,W)
moran.plot(x.skew.auto,W)
par(oopar)

NOTE: To export the spatial data to a shapefile for further analysis in GeoDa:

# Write map object to shapefile
writePolyShape(NCa,"NCa",factor2char = F)

–> If using Geoda, the weight file can be found under C:\Program Files\R\R-2.12.2\library\spdep\etc\weights