Update: Mapping with acs


Awhile ago I wrote a small piece on how navigating and mapping with the acs package. One of the problems I ran into with the previous one was an inefficient way to get the data for every state and county. I am not sure where I saw it, but it dawned on me to try the wildcard (*), which worked. With that recently there is new data, and some easier ways to get the bits you want. You can view details about the ACS, see upcoming data releases, and look up tables here. I have really only explored these income levels, but the ACS data seems to be expansive and endlessly interesting. Remember you will need to request an API for the ACS data from ACS request form.

Code

Let’s talk packages for just a moment. Obviously the acs package, but I also normally use tidyr and plyr for light data manipulation so there are a couple uses of handy functions from those. The base maps come from maps. Then ggplot2,viridis & finally the new hrbrthemes available on CRAN for the plotting aesthetics. (check out the themes if you haven’t, they are great)

If you just want a link to the entire code, see here.



library(acs)
library(ggplot2)
library(plyr)
library(tidyr)
library(maps)
library(viridis)
library(hrbrthemes)

Once we have all of our packages loaded we can generate the base maps, we will want county and state level data. Originally I had hodgepodge ways to do this but the viridis package page helped me immensely with it.



# load the boundary data for all counties
county.df<-map_data("county")
# rename fields for later merge
names(county.df)[5:6]<-c("state","county")
state.df<-map_data("county")

Now we can use the key you generated and grab our data, I am using table 'B19001', which again is the income level data. A table lookup is found here.



api.key.install(key="pasteyourkeyhere")
#this gets us all the state and counties
us.county<-geo.make(state="*", county="*")
us.raw<-acs.fetch(endyear=2015, geography=us.county, table.number="B19001", col.names="pretty")
#these get our column names
raw.cols<-data.frame(us.raw@acs.colnames)
strings<-strsplit(as.character(raw.cols$us.raw.acs.colnames),":")
col.names<-sapply(strings,"[",2)

Now we have an acs object, and a list of the column names, now let's make something a little easier to play with.



income.brackets<-data.frame(county=geography(us.raw)[[1]], us.transport=as.numeric(estimate(us.raw)),
                            paste0(str_pad(us.raw@geography$state, 2, "left", pad="0"), 
                                              str_pad(us.raw@geography$county, 3, "left", pad="0"), 
                                              str_pad(us.raw@geography$tract, 6, "left", pad="0")), 
                                       us.raw@estimate, stringsAsFactors = FALSE)

Now we have the 'income.brackets' object that has column names as oddly formatted labels of the income levels. We have the column names from earlier so we will replace those columns, rename some objects and clean it up.



names(income.brackets)[4:20]<-col.names
names(income.brackets)[3]<-'zip'
income.brackets[5:20] <- 100*(income.brackets[5:20]/income.brackets[,4])

You'll noticed we also calculated the percent of the group at each different income level. Now that it is cleaned we can gather the data into a tidy form. One thing you will notice at this point is that we still have every income level. We are going to be expanding this data to match with county data so we will also subset the data for a certain income level. I chose this to be the highest one.



#small & tidy frame
new<-income.brackets%>%gather(key=level,value=rate,4:20)
new$level<-trimws(new$level)
new<-subset(new,level == '$200,000 or more')

# clean up county names and find the states, the parish bits are for Louisiana
new$state<-tolower(gsub("^.*County, |^.*Parish, ", "", new$county))
new$county<-tolower(gsub(" County,.*| Parish,.*", "", new$county))
choropleth<-merge(county.df, new, by=c("state","county"))
choropleth<-choropleth[order(choropleth$order), ]

Now we plot, thanks to the new hrbrthemes package we have the excellent and visually stunning theme_ipsum



ggplot(choropleth, aes(long, lat, group = group)) +
  geom_polygon(aes(fill = rate), colour = alpha("gray", 1 / 2), size = 0.2) +
  geom_polygon(data = state.df, colour = "white", fill = NA,cex=.25)  +
  scale_fill_viridis(option="B",direction = -1)+theme(legend.position = 'bottom')+
  labs(title='Income level by county',subtitle='Percent of total households with an income greater or equal to $200,000')+
  theme_ipsum()