Awhile ago I was reading up on James Cheshire’s website, Spatial.ly, about mapping flows in R. I remember thinking to myself, that would be really fun to replicate once I find some data. Fast-forward to two days ago when I stumbled upon this fantastic dataset from Divvy Bikes. I have long been interested in Divvy bikes but had no idea how, or even tried, to look for their data.
These are very large datasets that require a little massaging to get them how we want them but attached is my code and I think there is a ton of room for some sweet data analysis and visualizations. The first image you see below is the raw shapefile mapped and zoomed for Chicago. It is built from a shapefile of the Chicago Waterways available from Chicago Public Data.
The second is the data plotted without grouping by our calculated age range, and third is the cleaned and spruced up viz. The code block below the series will reproduce the third version.
Some things to note about the code below. There is a ton of data here so it can take awhile to run if you aren’t operating on heavy machinery. There is a really nice new package out there called the Waffle Package that makes very nice clean waffle plots and I wanted to employ that in this instance to showcase the amount of riders by age range in just Quarter 1. There is a pretty distinct destination for riders in Chicago, which makes this dataset different from the one James used in his visualization but I believe some interesting work could be done with stations and the ones with the most trips two and from. I will save that for another day. The data for this post can be found by following these blue words.
waffle(time/250, rows=10, size=0.25, colors=c("#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F", "#E5C494"), xlab="1 square == 250 riders", legend_pos = "top")
I am sorry for throwing all of the code into one big chunk and not explaining a ton of it. I hope the comments help.