The basic gist of this post is to help scrape attraction specific data from Trip Advisor using a csv of URLs.
This is a little less like a tutorial and a little more like an addendum. There is already fantastic code out there from Hadley Wickham on how to harvest data from hotels on Trip Advisor, but a project came across my desk to harvest data from our attraction. The differences within the html elements are very minute but they can be important. Awhile ago (a couple years) Trip Advisor changed the way their dates appear on the website and the code Hadley put up tries to convert strings into dates and it runs into a problem. The code in this loop just harvests the dates as strings in order to clean them up afterward. Furthermore this code runs off of a csv of URLs you can make with a gsub function based on your specific attraction.
There is a commented out section that addresses the date issue I had mentioned earlier. Use that if you notice your dates are dropped before a certain date.
To clarify a little more the backbone of this code is all from Hadley’s post. I just wanted to post this tutorial on how to get attraction specific data from a csv of URLs.
Uses for this could be:
- Comparing multiple attractions in a geographic area
- Tracking sentiment of your attraction over time
- Visitor engagement scoring
- View visitors by location of member