Awhile ago I wrote a small piece on how navigating and mapping with the
acs package. One of the problems I ran into with the previous one was an inefficient way to get the data for every state and county. I am not sure where I saw it, but it dawned on me to try the wildcard (*), which worked. With that recently there is new data, and some easier ways to get the bits you want. You can view details about the ACS, see upcoming data releases, and look up tables here. I have really only explored these income levels, but the ACS data seems to be expansive and endlessly interesting. Remember you will need to request an API for the ACS data from ACS request form.
Let’s talk packages for just a moment. Obviously the
acs package, but I also normally use
plyr for light data manipulation so there are a couple uses of handy functions from those. The base maps come from
viridis & finally the new
hrbrthemes available on CRAN for the plotting aesthetics. (check out the themes if you haven’t, they are great)
If you just want a link to the entire code, see here.
library(acs) library(ggplot2) library(plyr) library(tidyr) library(maps) library(viridis) library(hrbrthemes)
Once we have all of our packages loaded we can generate the base maps, we will want county and state level data. Originally I had hodgepodge ways to do this but the
viridis package page helped me immensely with it.
# load the boundary data for all counties county.df<-map_data("county") # rename fields for later merge names(county.df)[5:6]<-c("state","county") state.df<-map_data("county")
Now we can use the key you generated and grab our data, I am using table 'B19001', which again is the income level data. A table lookup is found here.
api.key.install(key="pasteyourkeyhere") #this gets us all the state and counties us.county<-geo.make(state="*", county="*") us.raw<-acs.fetch(endyear=2015, geography=us.county, table.number="B19001", col.names="pretty") #these get our column names raw.cols<-data.frame(email@example.com) strings<-strsplit(as.character(raw.cols$us.raw.acs.colnames),":") col.names<-sapply(strings,"[",2)
Now we have an acs object, and a list of the column names, now let's make something a little easier to play with.
income.brackets<-data.frame(county=geography(us.raw)[], us.transport=as.numeric(estimate(us.raw)), paste0(str_pad(us.raw@geography$state, 2, "left", pad="0"), str_pad(us.raw@geography$county, 3, "left", pad="0"), str_pad(us.raw@geography$tract, 6, "left", pad="0")), us.raw@estimate, stringsAsFactors = FALSE)
Now we have the 'income.brackets' object that has column names as oddly formatted labels of the income levels. We have the column names from earlier so we will replace those columns, rename some objects and clean it up.
names(income.brackets)[4:20]<-col.names names(income.brackets)<-'zip' income.brackets[5:20] <- 100*(income.brackets[5:20]/income.brackets[,4])
You'll noticed we also calculated the percent of the group at each different income level. Now that it is cleaned we can gather the data into a tidy form. One thing you will notice at this point is that we still have every income level. We are going to be expanding this data to match with county data so we will also subset the data for a certain income level. I chose this to be the highest one.
#small & tidy frame new<-income.brackets%>%gather(key=level,value=rate,4:20) new$level<-trimws(new$level) new<-subset(new,level == '$200,000 or more') # clean up county names and find the states, the parish bits are for Louisiana new$state<-tolower(gsub("^.*County, |^.*Parish, ", "", new$county)) new$county<-tolower(gsub(" County,.*| Parish,.*", "", new$county)) choropleth<-merge(county.df, new, by=c("state","county")) choropleth<-choropleth[order(choropleth$order), ]
Now we plot, thanks to the new
hrbrthemes package we have the excellent and visually stunning
ggplot(choropleth, aes(long, lat, group = group)) + geom_polygon(aes(fill = rate), colour = alpha("gray", 1 / 2), size = 0.2) + geom_polygon(data = state.df, colour = "white", fill = NA,cex=.25) + scale_fill_viridis(option="B",direction = -1)+theme(legend.position = 'bottom')+ labs(title='Income level by county',subtitle='Percent of total households with an income greater or equal to $200,000')+ theme_ipsum()