Batman villains & ggraph

Holy ifelse() statements Batman!

If you were like me batman cartoons movies and television shows have been abundant since I was a kid. They all started with the ‘Bat-Man’ first appearing in comic books in 1939, and have come in many iterations from the dark and brooding to the fun and campy. Sadly the world recently lost the original Batman, Adam West, who starred in the 1960’s Batman TV series. I recently stumbled upon an article on Mental Floss that detailed different villains from that series and decided to do something with it.

Code

Before we start here are some really handy links to tips on ggraph nodes, edges, and layouts. Looking into ggraph was a little daunting at first because my mind struggles to grasp the concept of relational graphing with all those nodes and edges so reading those to start really helped. Then the first step is getting our data, which we will gather from this cool article on Mental Floss.


library(rvest)
library(ggraph)
library(igraph)
library(tidyverse)
library(extrafont)

#rvest
chars<-read_html('http://mentalfloss.com/article/60213/visual-guide-all-37-villains-batman-tv-series')%>%
  html_nodes('div h4')%>% html_text()%>%data.frame(stringsAsFactors = F)
chars$name<-sub(".+?. ", "", chars$.)
chars$id<-as.integer(lapply(strsplit(chars$.,'. '),'[',1))

apps<-read_html('http://mentalfloss.com/article/60213/visual-guide-all-37-villains-batman-tv-series')%>%
  html_nodes('strong i')%>% html_text()%>%data.frame(stringsAsFactors = F)

apps$id<-seq(1:37)
villians<-inner_join(apps,chars,by=c('id'))

Now we have a dataframe that has four columns, the season and episode (in an ugly string), an id, original name, and new name. The object then would be to get a row for every appearance every season. This was pretty stream-of-consciousness, and there may be a different way to structure this data other than by row. To get the season number we use a handy tidyr::separate_rows to first separate seasons, then episodes. Finally we grab our rows that have values to get the desired dataframe.


#massage data
raw.seasons<-separate_rows(villians,..x,sep="SEASON ")
raw.seasons$..y<-as.integer(unlist(lapply(strsplit(raw.seasons$..x,' *'),'[',1)))
raw.seasons<-separate_rows(raw.seasons,..x,sep= "([^0-9])")
raw.seasons$..x<-as.numeric(raw.seasons$..x)
batman<-subset(raw.seasons,!is.na(..x))

Now it is looking more like something we can work with. Based on the tutorials I mentioned earlier the data needed to be structured a certain way to feed into the ggraph plot. So we rename (more to help me learn than real necessity), then reorder the columns. I realized as I was working through this that I needed to add an extra level of depth because seasons and episodes were not showing how I wanted. It seems straightforward in hindsight but I did not consider that episode 11 season 3 and episode 11 season 2 would cause a problem. Finally we clean the names up a little, then create the ggraph object.


#arrange to plot
names(batman)[1:4]<-c('from','season','char','to')
batman<-batman[,c('from','to','season','char')]
batman$to<-paste0(batman$season,batman$to)
batman$from<-batman$char
batman$from<-gsub(' \\(','\n\\(',batman$char) #this bit makes nice names
#create igraph object
graph<-graph_from_data_frame(batman)
V(graph)$degree<-degree(graph)

Data! Now the part I wanted to get to when I started this thing, the plot. If you are a part of the Twitter-verse I suggest giving the ggraph creator, @thomasp85, a follow. Great thoughts and posts about the #rstats community and the tweets were ultimately what made me want to give his package a try. This is the section where the beginning quote, a retro Robin-style exclamation, comes from. As I was learning and messing with the package I ran into problems with making the plot look how I wanted. So to create visual differences from the episode, and the character itself, I started using ifelse statements. I guess you could say it got out of hand.


#prep and plot
n.names<-grep("[[:digit:]]",V(graph)$name,value=T)

ggraph(graph,layout='fr') + 
  geom_edge_link(aes(colour = factor(season)))+
  geom_node_point(aes(size=ifelse(V(graph)$name %in% n.names,1,degree)),
                  colour=ifelse(V(graph)$name %in% n.names,'#363636','#ffffff'),
                  show.legend = F)+ 
  theme_graph(background = 'grey20', text_colour = 'white',title_size = 30)+ 
  theme(legend.position = 'bottom')+ 
  scale_edge_color_brewer('Season',palette = 'Dark2')+ 
  geom_node_text(aes(label=name,fontface='bold'),
                 color=ifelse(V(graph)$name %in% n.names,'grey40','white'),
                 size=ifelse(V(graph)$name %in% n.names,1.75,2.5),repel = T,check_overlap = T)+
  labs(title='Batman Villains',subtitle='Plotting 37 Batman villains across 3 seasons with\nnode ends representing season & episode number',
       caption='ggraph walkthroughs available at: http://www.data-imaginist.com/\n Data from: http://mentalfloss.com/article/60213/visual-guide-all-37-villains-batman-tv-series')

Click to view in new window


Great! The villains are the white nodes with labels. There are spots where all of the edges converge without a white node which are the origins of each season. I think there are more than a few interesting things looking at the data now that it is plotted. The first is the different characters who made single season, or single episode appearances vs. those in multiple. The classic Joker and Penguin are interesting as they are seen in all the seasons. I know these to be classic villains, but what is surprising is King Tut. Though not in as many episodes he did appear in at least two episodes a season.

So maybe the question we can answer after all this is when are we getting a Christopher Nolan film with King Tut?