Creating a Density Map in R with ZIP codes

Below is a tutorial that helps take ZIP code data and, with R, get rough latitude and longitude data from them as well as County. Then using ggplot2 we can create a nice visual of the data plotted at the county level. The first section was written as part of a larger project and I like to keep it around as it was one of the first tutorials on this website. The latter part though is being continually updated because my ability and the availability of new packages in R are always changing.

First we will need a file with some ZIP code data. Let’s visit the USDA Local Food Directories page via the link below and get some Farmer’s Market Data.

U.S. Farmer’s Markets.

Directly below the table there is an ‘Export to Excel’ button. Which we can download and load directly into R using read.csv which gets us something looking like this. (With many more columns)

Capture

Now that we have that loaded we can start exploring some of the data. This post, because it was written so long ago, stopped at just plotting the csv points to a map, we are going to do a little more now. But we still want to get the latitude and longitude for the ZIP codes. So let’s start with getting a count of markets by ZIP, then merging our ZIP codes data with those counts. Make sure to have the packages at the top installed.



library(zipcode)
library(tidyverse)
library(maps)
library(viridis)
library(ggthemes)
library(albersusa)#installed via github
#data
fm<-read.csv('Export.csv',stringsAsFactors = F) #the file we just downloaded
data(zipcode)
fm$zip<- clean.zipcodes(fm$zip)
#size by zip
fm.zip<-aggregate(data.frame(count=fm$FMID),list(zip=fm$zip,county=fm$County),length)
fm<- merge(fm.zip, zipcode, by='zip')

Great! Now we have a dataframe that contains the ZIP code, a count of the farmer's markets, the city, state latitude and longitude. So we want to map this, to show all of our farmer's markets by ZIP, and we will use the viridis package for the color scale. *Update: If you have been here before you will notice the base plot has changed, I wanted to update it to make the whole thing look a bit better.



us<-map_data('state')

ggplot(fm,aes(longitude,latitude)) +
  geom_polygon(data=us,aes(x=long,y=lat,group=group),color='gray',fill=NA,alpha=.35)+
  geom_point(aes(color = count),size=.15,alpha=.25) +
  xlim(-125,-65)+ylim(20,50)

We loaded the map of the U.S. with map_data then we just plotted our dataframe as points. Which looks a little something like the plot below. You can mess with the color scheme by changing the option and direction as seen here.
new plot

Expanding on this idea

UPDATE: One problem I have is whether or not to revisit old posts. One view I have is that it was the extent of my knowledge at the time and the other is that it is a tutorial and I should improve on them when I can. Cue the recent emails and comments about expanding the graph to include Alaska, Hawaii, as well as less cumbersome code and here we are. Luckily with a couple more lines and an additional package or two we can make an updated and better looking map.



fm.counties<-aggregate(fm$count,by=list(fm$county,fm$state),sum)
names(fm.counties)[1:2]<-c('county','state')
cty_sf <- counties_sf("aeqd")
cty_sf$county<-as.character(cty_sf$name)
cty_sf$state<-as.character(cty_sf$iso_3166_2)
data.fm<-left_join(cty_sf,fm.counties,by=c('state','county'))
data.fm$x<-log(data.fm$x)
data.fm$x[is.na(data.fm$x)]<-0

Now we plot, and with the data from albersusa we joined to the farmer's market data we can use the new geom_sf (if you are having trouble try installing the dev version of ggplot2 via github).



data.fm %>%
  ggplot(aes(fill = x, color = x)) + 
  geom_sf() + 
  scale_fill_viridis(option = "B",direction=-1) + 
  scale_color_viridis(option = "B",direction=-1) +
  theme_map(base_size=11)

  • Pingback: N00b’s guide to mapping in R | Brian Sarnacki()

  • Reblogged this on Austin+Wehrwein and commented:

    Here is a re-blog of one of the most popular tutorials I have done, with R and a creating a map from a csv of zipcodes.

  • Zoe

    Hi,
    Im trying to work with this (new to R) and finding that the zip codes package is not available for R 3.1.2. Is there a workaround for this?

    • Austin W.

      Hello Zoe,

      Thanks for your question. I don’t believe there is an option for the zipcode package for that old of a R version. The most recent version of R is 3.3.0, here is a link to update your R if you can, if that is not an option we can try to work through something, here is the windows download link
      https://cran.r-project.org/bin/windows/base/

  • Frank

    Hi James,
    in your response you’re referring to R 3.3.0. It seems that the package ggmap does not work with 3.3.0 (yet?). Do you have a work-around (i.e. how to use get_map with R 3.3.0)?

  • Frank

    Please ignore my previous comment about ggmap and R 3.3.0, thanks.

    It helps to use the correct spelling 🙂

  • Sketching Sketcher

    Any other updates to this?
    Once correction: changed first ggplot command to:
    ggplot(fm,aes(longitude,latitude)) +
    geom_polygon(data=us,aes(x=long,y=lat,group=group),color=’gray’,fill=NA,alpha=.35)+
    geom_point(aes(color = count),size=.15,alpha=.25) +
    theme_bw()+
    xlim(-125,-65)+ylim(20,50)

    Also, getting error at: > fm$county<-tolower(fm$county)

    • Hello, terribly sorry for the errors, I will take a look and post back

    • So for the county merge I left out this section, which is now fixed on the website:

      fm.zip<-aggregate(data.frame(count=fm$FMID),list(zip=fm$zip,county=fm$County),length)

      • Sketching Sketcher

        Thanks! I wonder though: if I want to map to a zip code polygon, what’s best? I have had a lot of trouble using the choroplethr package. Do you have an example that maps zip code level data? Thanks!

        • Sure no problem, sorry about the error in the first place. As far as the choroplethr package, this tutorial does not use that, I just named an object in R ‘choropleth’ which can be confusing. The result should be like the last image, where it maps the zipcodes as polygons, colored by the value. Let me know if there is something else you mean.

          • Sketching Sketcher

            The last/bottom map on this page is a USA map that appears to show counties. Can a method be used (not the choropleth package) to show a map of ZIP codes? If so, how?

          • Oh, terribly sorry I misunderstood your question, I have not looked directly into the choroplethr package nor have I tried to map by ZIP codes. I won’t be of much help to you there. I have read though that trying to do ZIP codes on a US scale may not render nicely or quickly. Most people suggest mapping by the ZIP if you are looking at a subset of data. If that is what you want to do, I think finding a shapefile of ZIPs for whatever area you are looking for wouldn’t be too difficult! I hope this helps 🙂

  • Brad

    Thanks for the tutorial. When I use your code, it cant find the theme_awhstin. Do I need to update to a newer version to get this?

    • Hi Brad, thanks for reaching out. Unfortunately theme_austin is my own custom theme. But if you download the hrbrthemes package from CRAN there is a much better option called theme_ipsum!

      • Brad

        Thanks for the tip – that looks great!

  • Martin

    Is there any way you can demonstrate drawing a map of a specific state?
    I’ve tried using subset method for a specific state, but the size of the map stays the same and it doesn’t really zoom into specific states that I chose.

    Thanks!

    • Hello Martin, thanks for reaching out. The subset is a good start and should work if you subset both the ‘state_df’ and ‘county_df’ what I would suggest is simply adding your state to the criteria for creating the data for the maps like so;

      county_df <- map_data("county","nebraska") #or the state you want
      state_df <- map_data("state","nebraska")

      • Martin

        Awhistin, thank you very much for the reply.

        I am not too sure how I should go on about what you mean by subset both ‘state_df’ and ‘county_df’.
        my current code looks something like:

        us<-map_data('state')
        #specificss <- subset(us, region %in% c( "virginia", "maryland", "north carolina"))

        ggplot(fm,aes(longitude,latitude)) +
        geom_polygon(data=specificss, aes(x=long,y=lat,group=group), color='gray',fill=NA,alpha=0.35) +
        geom_point(aes(color = count), size=0.15, alpha=0.25) +
        theme_grey()+
        xlim(-125,-65)+ylim(20,50)

        The problem is, it still shows other states even though I have subsetted them out; for example, the map doesn't seem to zoom into those three states that I want.

        Thank you very much in advance

        • Hi Martin, are you also subsetting the ‘fm’ data for your states? What I mean is you have also subset your ‘fm’ data for those states as well since the geom_polygon is only plotting the data to fill the map but the underlying base from ggplot(). So try something like this;

          subset(fm,state %in% c(‘MD’,’VA’,’NC’))

          • Martin

            Hello Awhstin,

            Sorry, I was doing it all wrong. I was still working on the first part of your code (code that is for the very first picture) and trying to figure out how I would be able to incorporate subset states and counties lol. I figured out after looking at the second part of your code! thank you very much!

            By the way, is there any way that I would be able to draw the map by purely using given Zip codes other than Longtitude + Latitude? I know zipcode package will be able to pull long + lat info just by looking at the zip code.

            Best,
            Martin

  • miwang3

    Thank you for this guide! How do I add AK and HI to this code?

    • Hi, if you look at the original data exported, it includes AK, HI and even some others. The reason they do not show up is because we merge via this line ‘choropleth <- merge(county_df, fm, by = c("state", "county"))' the county map with the farmer's market data. This county_df is built from the 'map_data('county')' line. This data only has the lower 48. I would suggest visiting https://rud.is/b/2016/03/29/easier-composite-u-s-choropleths-with-albersusa/ to see how to implement the Albers projection. I will, when I get the chance, work on an addendum to this post that includes the Albers projection.

      https://uploads.disquscdn.com/images/121adc9056d3ba99df4769bcfcfcfa03581691f8a92601d5fccc994a52921e8d.png

      • miwang3

        Austin,

        Thank you for your reply. Yes, neither state data nor county data has AK/HI. Results from both of the commands below return 49 states. I will explore Albers projection next. Thank you for the tip.

        us<-map_data('state')
        unique(us$region)

        county_df <- map_data("county")
        unique(county_df$region)

        Min

        • Min,

          Here is a little bit of code I think you can add to what you currently have to get all 50 states. (Be sure to have the albersusa package downloaded off of github).

          us<-data.frame(counties_composite())
          names(us)[7:8]<-c('statename','state')
          names(us)[4]<-c('county')

          #put the pieces together
          choropleth <- merge(us, fm, by = c("state", "county"))
          cmap <- fortify(counties_composite(), region="fips")

          gg <- ggplot()
          gg <- gg + geom_map(data=cmap, map=cmap,
          aes(x=long, y=lat, map_id=id),
          color="#2b2b2b", size=0.05, fill=NA)
          gg <- gg + geom_map(data=choropleth, map=cmap,
          aes(fill=count, map_id=fips),
          color="#2b2b2b",size=.05)+scale_fill_viridis(option="magma",direction=-1)+theme_map()
          gg

          Which should get you this, which is straight from the example on the vignette. Hope that helps,
          -Austin

          https://uploads.disquscdn.com/images/14915ccf0e13dd685231fb8819c269153dd24a5fbc4739f148c7cd99104d77ee.png

          • miwang3

            Wow, thank you Austin. It works!

            Min

  • Sean

    I am having trouble with the albersusa library. I am getting errors saying it is not available in R version 3.3 or 3.4. I could use some advice. Thanks!

    • Hello Sean,
      The albersusa package is available on Github, make sure you have devtools installed and run:

      devtools::install_github(“hrbrmstr/albersusa”)

      • Sean

        Thanks! I figured that part out. My current issue is trying to map my data. So, I have a list of states, counties, and revenue. I want to color code based on the revenue. When I track my data though, I get the line data.ds<-left_join(cty_sf,ds.counties,by=c("state","county")) . At this point, my revenue data is gone and i cannot figure out how to get it back. Suggestions?

        • Sean,

          If there is no data at all for any of the counties after the join then I would maybe suspect that your county/state data types are not the same. Look into those, to see if your cty_sf or ds.counties data types are both characters, or both factors.

          That is where I would start!

          • Sean

            That is my script. My original data has a column for County, State, and Revenue. I cannot figure out what is wrong.

          • So when you run class(ds.counties$county) & class(cty_sf$county) they are the same? (The same goes for state)

          • Sean

            Actually they are different. Not sure why or how to fix it. I’m a bit out of my comfort zone at this point lol : )

          • No problem, we will hopefully get you there. Can you tell me which one is which? cty_sf$county should be ‘character’ so if you run:

            ds.counties$county<-as.character(ds.counties$county)
            ds.counties$state<-as.character(ds.counties$state)

            Then try your join. For future reference I imagine that the ds.counties may be 'factor' data. If you read in a csv or excel you can set a parameter within that call, 'stringsAsFactors = F', which would make those strings as characters.

          • Sean

            Yes, cty_sf$county is a character column . I ran those two commands, but it hasn’t seemed to change my result. I already had the stringsAsFactors set to false when i imported my csv. Perhaps I am not understanding how all this works. There are 998 rows in my csv. However, when I look at the data.ds (after the join) There are 3142 rows.

          • Ah, I see, so the left_join command is joining the data you have, data.ds to the cty_sf. It will retain all the data from the cty_sf and all the matches it has from your data. IF you would like the join to join on only matches you can switch the join around, thus:

            data.ds<-left_join(ds.counties,cty_sf,by=c("state","county"))

            Gets you only the data that matches your ds.counties data. One caveat in doing this though is that when you go to make the map there will be lots of counties that will not have any 'fill' or color because the data is not there.

          • Sean

            That is good to know. I did not realize that. That has my data frame correct. However, it is not plotting anymore. I had an [incorrect] plot before, but now there is nothing plotting.

          • Ah. Well I am getting to the extent of being able to help without seeing the script, your instance, errors and the data. I would suggest inspecting the data after the join, some column names may have changed since you switched around the join. If you run, nrow(data.ds) and it comes up as 0 that means there is no data so the join did not work. So you can fiddle with that. If you would like more assistance visit my consulting page, and put your contact info in the contact form, and we can discuss it further.

          • Sean

            could i have a link to your consulting page?

  • Nikhil Malgatti

    Hi, I am getting this error with GEOM_SF().

    Error in geom_sf() : could not find function “geom_sf”

    I have loaded all related library, not sure why I am getting error.

    • Afternoon Nikhil,

      I believe it is because you are not using the dev or github version of ggplot2 (which I mentioned in that blurb just before trying the final plot) which has geom_sf(). Try running (but make sure you have devtools installed):

      devtools::install_github(“tidyverse/ggplot2”)