{"id":418,"date":"2012-04-09T12:00:41","date_gmt":"2012-04-09T16:00:41","guid":{"rendered":"http:\/\/web.colby.edu\/mgimond\/?p=418"},"modified":"2012-05-02T09:48:04","modified_gmt":"2012-05-02T13:48:04","slug":"a-basic-transformation-and-regression-example-using-census-data","status":"publish","type":"post","link":"https:\/\/web.colby.edu\/mgimond\/2012\/04\/09\/a-basic-transformation-and-regression-example-using-census-data\/","title":{"rendered":"A basic transformation and regression example using Census data"},"content":{"rendered":"<p>The data used in this example are 2010 ACS Census data of per capita income and fraction of population with a Bachelor&#8217;s degree or greater. The data file can be downloaded\u00a0<a href=\"http:\/\/web.colby.edu\/mgimond\/files\/2012\/04\/inc_ed.csv\">here<\/a>.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n\r\n# Before running the following lines, the user shoud run all\r\n# functions found at the bottom of the page\r\n#\r\n# Read datafile\r\ndat = read.csv(&quot;inc_ed.csv&quot;, header=T)\r\nattach(dat)\r\n\r\n# Generate a scatter plot of Y vs X where x and y scales\r\n# are one SD unit in length. Plot the 45\u00b0 SD line along\r\n# with the mean vertical and horizontal lines\r\nSDplot(percE_E,B19301_001E)\r\n\r\n# Compute OLS\r\nM1 = lm(B19301_001E ~ percE_E)\r\n\r\n# Plot regression line\r\nabline(M1,col=&quot;blue&quot;)\r\n\r\n# Plot OLS results\r\nOP = par(mfrow=c(2, 2))\r\nplot(M1)\r\npar(OP)\r\n\r\n# The above assumes normally distributed data when in fact\r\n# the data are not perfectly normal.\r\n# Create a 'ladder' of transformations. The trans() function creates png files,\r\n# so check the directory for image files after running trans().\r\ntrans(c(&quot;B19301_001E&quot;,&quot;percE_E&quot;))\r\n\r\n# Based on a visual analysis of the transformations,\r\n# re-express the data as follows:\r\ntY = BoxCox(B19301_001E,0.33)\r\ntX = BoxCox(percE_E,0.5)\r\n\r\n# Generate scatter plot of transformed data\r\nSDplot(tX,tY)\r\n\r\n# Rerun lm model with transformed data\r\nM2 = lm(tY ~ tX)\r\n\r\n# Plot regression line\r\nabline(M2,col=&quot;blue&quot;)\r\n\r\n# Plot lm diagnostic plots\r\nOP = par(mfrow=c(2, 2))\r\nplot(M2)\r\npar(OP)\r\n\r\ndetach(dat)\r\n\r\n######################################################################\r\n##############\r\n############## FUNCTIONS\r\n##############\r\n######################################################################\r\n#\r\n# ===================================================================\r\n# ===== Standardization function\r\n# ===================================================================\r\n#\r\nstd = function(x,y) {\r\n (x - mean(x,na.rm=T)) \/ sd(x,na.rm=T)\r\n}\r\n#\r\n# ===================================================================\r\n# ===== Create an SD plot\r\n# ===================================================================\r\n#\r\nSDplot = function(X,Y) {\r\n sdX = sd(X,na.rm=T) # Compute x SD\r\n sdY = sd(Y, na.rm=T) # Compute y SD\r\n muX = mean(X,na.rm=T)\r\n muY = mean(Y,na.rm=T)\r\n a = muY - sdY\/sdX * muX # Get y-intercept for SD line\r\n #\r\n par(pty=&quot;s&quot;)  # Keep box square\r\n plot(Y ~ X, asp = (sdX\/sdY),pch=16,cex=0.7,col=&quot;bisque&quot;)\r\n abline(a = a, b = sdY\/sdX, lty=4) # Plot SD line\r\n abline(v = muX,lty = 3, col=&quot;grey&quot;) # Plot SDx = 0\r\n abline(h = muY,lty = 3, col=&quot;grey&quot;) # Plot SDy = 0\r\n}\r\n#\r\n# ===================================================================\r\n# ===== Box Cox transformation\r\n# ===================================================================\r\n#\r\nBoxCox = function(x, p) {\r\n #\r\n # x: Number (or vector) to be re-expressed.\r\n # p: Power.\r\n #\r\n if(p == 0) {\r\n log(x)\r\n } else {\r\n # (x^p - 1)\/p, up to a recentering and rescaling:\r\n if (p &lt; 0) {\r\n -(x^p)\r\n } else {\r\n x^p\r\n }\r\n }\r\n}\r\n\r\n# ===================================================================\r\n# ===== Transformation function\r\n# ===================================================================\r\ntrans = function(asFields){\r\n # The powers to examine.\r\n pp = c(2, 1, 1\/2, 0.33, 0, -0.33, -1\/2, -1, -2)\r\n #\r\n # Output parameters\r\n sColor = &quot;bisque&quot; # Histogram bar color (RGB in hexadecimal)\r\n xLwd = 2 # Weight of the smoothed lines\r\n nBreaks = 40 # Number of cells for the histograms\r\n sPrefix = &quot;transf_&quot; # Prefix for output file names\r\n sMetaFormat = &quot;png&quot; # Metafile format specifier\r\n # (&quot;wmf&quot;, &quot;emf&quot;, &quot;png&quot;, &quot;jpg&quot;, &quot;jpeg&quot;, &quot;bmp&quot;,\r\n # &quot;tif&quot;, &quot;tiff&quot;, &quot;ps&quot;, &quot;eps&quot;, &quot;pdf&quot;)\r\n #\r\n # Compute dimensions of the array of histograms.\r\n nRows = max(1,floor(length(pp)^(1\/2)))\r\n nCols = ceiling(length(pp)\/nRows)\r\n win.graph(width=1000,height=900)\r\n mfrowOld = par(mfrow=c(nRows, nCols)) # Saves the setting\r\n #\r\n # Create a &quot;ladder&quot; for each specified field.\r\n for (sField in asFields) {\r\n #\r\n # Extract non-missing values into the vector x.\r\n x = eval(as.name(sField))\r\n x = x[!is.na(x)]\r\n #\r\n # Create a &quot;ladder&quot; of histograms.\r\n for (p in pp) {\r\n z = BoxCox(x, p)\r\n hist(z, breaks=nBreaks, probability=TRUE, col=sColor, main=paste(sField, &quot;: &quot;, p))\r\n lines(density(z), lwd=xLwd)\r\n }\r\n #\r\n # Save it to a file.\r\n sFileName = paste(sPrefix, sField, &quot;.&quot;, sMetaFormat, sep=&quot;&quot;)\r\n savePlot(filename=sFileName, type=sMetaFormat)\r\n }\r\n par(mfrow=mfrowOld)\r\n}\r\n\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>The data used in this example are 2010 ACS Census data of per capita income and fraction of population with a Bachelor&#8217;s degree or greater. The data file can be downloaded\u00a0here.<\/p>\n","protected":false},"author":1199,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[12807],"tags":[],"_links":{"self":[{"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/posts\/418"}],"collection":[{"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/users\/1199"}],"replies":[{"embeddable":true,"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/comments?post=418"}],"version-history":[{"count":8,"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/posts\/418\/revisions"}],"predecessor-version":[{"id":429,"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/posts\/418\/revisions\/429"}],"wp:attachment":[{"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/media?parent=418"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/categories?post=418"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/web.colby.edu\/mgimond\/wp-json\/wp\/v2\/tags?post=418"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}