Five minutes of Twitter Data Tutorial

I decided to revisit, and revamp this tutorial to include the steps to create your new Twitter App and set up your streaming API connection. I think the process has gotten a little simpler so create your Twitter App, harvest your tweets and do some sentiment analysis already… the packages you will need are:

library(RCurl)
library(streamR)
library(ggplot2)

To get this to work you need to visit your twitter account and set up an app via dev.twitter.com, this takes roughly a minute, seen here:

Capture

Once you click on “Create New App” just fill in the required fields, and generate your access tokens. Then it is as simple as following the setup straight from the streamR documentation:

*Important* This code will not run just by copy paste because you need your own unique twitter Consumer Key and Consumer Secret.

library(ROAuth) 
requestURL <- "https://api.twitter.com/oauth/request_token" 
accessURL <- "https://api.twitter.com/oauth/access_token" 
authURL <- "https://api.twitter.com/oauth/authorize" 
consumerKey <- "xxxxxyyyyyzzzzzz" 
consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222" 

my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, requestURL=requestURL, accessURL=accessURL, authURL=authURL) 
my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

Two notes. One is that if you are having SSL problems make sure you have “https” in the URLs. Two is that make sure you copy paste your consumer key and consumer secret into the corresponding lines.

Harvesting Tweets

To start first we use filterStream from the streamR package. included is the filename of the twitter data output, then it is the parameters for the locations the tweets have to fall into and the time to harvest them in seconds the example is 30 seconds. Or the second one I put will grab World tweets for 300 seconds.

tweetsUS<-filterStream("", locations = c(-125, 25, -66, 50), timeout = 30, oauth = my_oauth)
tweetsWorld<-filterStream("", locations=c(-180,-90,180,90) ,timeout = 300, oauth = my_oauth)

Then we take that data and parse the raw json into a data frame,

tweets <- parseTweets(tweetsUS, verbose = FALSE)

Now we can use ggplot2 and the grid package to make a good looking graph. This first graph plots all the tweets from the US stream.

I will try and explain each part of the ggplot to make it editable for those who don’t know how to use it:

  • geom_point: this has your data, plus the aspects of the data to be plotted on the X & Y, alpha refers to the opacity, and the color is the color.
  • theme: all of these different elements are blank to remove them from the graph or else you would have tick marks and titles, the panel.background changes the background color.
map <- map_data("state")
ggplot(map) + geom_point(data = tweets, aes(x = lon, y = lat), size = .5, alpha = .25, color="gray")+
theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank(), panel.background = element_rect(fill="white"), 
 axis.ticks=element_blank(), axis.text=element_blank(), axis.title= element_blank())

This plot below was made by coloring by time zone,

Enjoy!

Rplot