R: Does Regression make sense? The area vs. perimeter example
This is an example from the book Statistics (3rd Edition by Freedman, Pisani and Purves) where the authors remind the reader “…even if the association [between variables] looks linear… [does] the regression make sense?”. As an example, they show how the correlation between area and perimeter is strong, yet this regression is “…silly. The investigator should have looked at the two other variables, length and width…” which determine both area and perimeter.
<pre># Generate random rectangles with dimensions ranging # from 1 to 8 units a = runif(50,1,8) b = runif(50,1,8) area = a * b perim = 2 * (a + b) # Plot Area vs Perimeter plot(area ~ perim,pch=19,col="bisque2") # Run a regression model model = lm(area ~ perim) abline(model,col="blue") m.cor = round(cor(area,perim),digits=2) x.txt = 0.2 * (max(range(perim)) - min(range(perim))) + min(range(perim)) y.txt = 0.9 * (max(range(area)) - min(range(area))) + min(range(area)) text(x.txt,y.txt,substitute(R^2 == A, list(A = m.cor)))
