Below is the source code to create everything, and here is the CSV of spells.
I solemnly swear I am up to no good.
I am usually pretty up on social media postings about some fun dataset or new way to visualize something but somehow one on spells in Harry Potter slipped by me. I, like millions of others, am a reasonably big fan of the Harry Potter book series by J.K. Rowling. I even like to mess about in R in my Ravenclaw robe from time to time, an image of which may be too nerdy to share on this post. So as I was reading through my Twitter feed one morning I came across a link someone shared of the public Tableau Workbook that started the magic.
This Workbook created by Skyler Johnson, beautifully details the breakdown of spell usage across all of the Harry Potter book series. I wanted to not only share the original because of how awesome it is, but I thought to myself… I should try to replicate that in R. There are a few more complicated elements in the Tableau Workbook that I could not think of how to replicate. One example was the bar along the top axis, stat summary labels, and specific grouping to name a few. Luckily we have the internet.
As you can see from the chart in the tl:dr section that I was able to recreate it but as I was messing with different things I thought to myself, “this is too much fun to just recreate that graph”. I wanted to use some of my background and look at how the spells themselves use and grow through the books. Using the syuzhet package for a start I calculated the sentiment for each time the spell was used then aggregated the mean sentiment for each spell as seen below to create this sentiment chart.
There is a lot of negative sentiment going on here. The Bing sentiment score I used gives us a purely negative or positive sentiment value, based on a lexicon compiled by Bing Liu. I think that this may be an issue of granularity. The negative sentiment we are seeing is, in my opinion, most likely fear, or anger, but we should do some more testing to prove that. There’s a facet of syuzhet that helps us with that as well. The idea is to see what sort of spells align closely with what sorts of emotions to add a more specific layer of understanding. The function from syuzhet to help us with this is get_nrc_sentiment. Once we do that we get the resulting data frame that looks like the image below.
In order to plot this in a way that, I believe, is a little easier to digest and understand we need to transpose the data. We would like all the columns to be rows, each spell having a row for each emotion with the mean value for that emotion. Luckily there is a handy package for that as well called reshape2, and the specific function is melt. Properly applied we get all the data we want and need in an easy to plot format.
As you can see this detail of the emotions more closely related to the spells seems to be more revealing. A spell such as “Avadakedavra”, though very negative, is more closely related to feelings of sadness or fear than anger. What else can we get from this data? I think tons. But to start some conversation around what to do next I think we need to know when spells start appearing. I like the tile plots like the one above so let’s recreate that but just for spell counts in each book. This will help us see what spells are used across what books, and when.
Great! I am fairly confident this is accurate because look at that tile for Prisoners of Azkaban (POA) and the spell “expectopatronum”. If you can remember that is where Harry and friends learn and use that spell throughout the book. I am sure I will be spending more hours looking over this to find fun tidbits but hopefully in the coming weeks. Next projects? I will be looking into how a spell progresses or is used across each book. There is just too much here to let it sit.