R: Does Regression make sense? The area vs. perimeter example

9 November, 2011 (11:47) | R | By: Manuel Gimond

This is an example from the book  Statistics (3rd Edition by Freedman, Pisani and Purves) where the authors remind the reader  “…even if the association [between variables] looks linear… [does] the regression make sense?”.   As an example, they show how the correlation between area and perimeter is strong, yet this regression is “…silly.  The investigator should have looked at the two other variables, length and width…” which determine both area and perimeter.

<pre># Generate random rectangles with dimensions ranging
# from 1 to 8 units
a =  runif(50,1,8)
b =  runif(50,1,8)
area = a * b
perim = 2 * (a + b)

# Plot Area vs Perimeter
plot(area ~ perim,pch=19,col="bisque2")

# Run a regression model
model = lm(area ~ perim)
abline(model,col="blue")
m.cor = round(cor(area,perim),digits=2)
x.txt = 0.2 * (max(range(perim)) -  min(range(perim))) + min(range(perim))
y.txt = 0.9 * (max(range(area)) -  min(range(area))) + min(range(area))

text(x.txt,y.txt,substitute(R^2 == A, list(A = m.cor)))