I guess what turned into one post about ACS data is now an installment series. The #rstats community is so productive with its output that as I finally figure out the extant of one package someone has made a streamlined, optimized, or shiny new one. Kyle Walker’s new tidycensus package is the latest in that long line and before you go any further I encourage you to follow the link to read his brief introductions.
I have posted a couple times about working with ACS data in R, mainly via the
acs package. The first time I wrote nearly a novel about household income in Nebraska and retrieving that data. The second time was using new found knowledge to get household income over $200k for the U.S. with some updated aesthetic elements. Some of the complications I ran into those first tutorials were with the data pull. The resulting pull from the API creates an acs object which to me are unwieldy at times, which is only one of many places where
Before you start please get your API Key for the Census Bureau Data API.
library(albersusa) library(tidycensus) library(tidyverse) library(ggthemes) #just for theme_map() library(viridis) census_api_key("yourkeyhere") income <- get_acs(geography = "county", variables = "B19013_001", geometry = TRUE)
Simple at that! The resulting pull is already tidied and in an easy to use dataframe instead of the acs object. This saves loads of time and numerous lines of code. Here is where I deviate a little from Kyle's tutorial and use the Albers projection from Bob Rudis'
albersusa package available on github. (I am sure there is a better way to do this than the merge I did, so if you have thoughts drop me a line!) The last line is where I rename the Albers projection geometry to 'geometry' (from geometry.x) because the 'geom_sf' looks for it.
#merge cty_sf <- counties_sf("aeqd") cty_sf$NAME<-paste0(cty_sf$name, ' ', cty_sf$lsad,', ', cty_sf$state) cty_income<-left_join(cty_sf,income,by=c('NAME')) names(cty_income)<-'geometry'
Finally we plot it using the 'inferno' or 'B' color palette from
viridis to get the image below.
#plot cty_income %>% ggplot(aes(fill = estimate, color = estimate)) + geom_sf() + scale_fill_viridis(option = "B", direction=-1) + scale_color_viridis(option = "B", direction=-1) + labs(title='Household Income by County',subtitle='Median household income from the 2011-2015 ACS')+ theme_map(base_size=11) #optional
Interesting. It seems like some of the darkest areas correspond with big metropolitan areas. I think it would be more interesting though if we could compare that median household income to something else. Luckily all we have to do is make another pull, this time for table 'B25105_001', which corresponds to the median monthly housing costs. Then we just calculate the percent and plot it replacing just the 'fill' and 'color' elements.
#add data hc<- get_acs(geography = "county", variables = "B25105_001", geometry = TRUE) hc$estimated<-hc$estimate*12 cty_income$percent<-hc$estimated/cty_income$estimate*100 #plot again cty_income%>% ggplot(aes(fill = percent, color = percent)) + geom_sf() + scale_fill_viridis(option = "inferno") + scale_color_viridis(option = "inferno")+ labs(title='Percent of Median Income toward Housing Costs', subtitle='Median housing costs divided by\nmedian household income by county from the 2011-2015 ACS')+ theme_map(base_size = 11) #optional
With both of these datasets we can start to get a better picture of how income is spread. Though this is an attempt at a very raw cost of living calculation I think it paints a more interesting picture. It seems that there is a pretty distinct band straight through the middle of the country, I guess you have to pay to live on the coast.