Game of Thrones - Battles Analysis

Edwin Varghese
Apr 29, 2017
7 min read

Game of Thrones is a popular television series which tells the story of Kings fighting for the ‘Iron Throne’ and the story takes place in the fictional continents of Westeros and Essos . Read more to know about the series - Know about Game of Thrones. Here I’ll be showing some Analysis and visualizations based on the data of battles. The dataset is available in Kaggle. (Kaggle is a platform where data science competitions are held for data scientists. Check out the website if you are interested).

Today, we’ll see some analysis performed on the battles.csv data. Let’s first check the structure of the data before getting on to the analysis.

Code:

#=========================================================================================

#Reading the data from the directory

got = read.csv("battles.csv")

str(got) #To determine structure of the dataset

Output:

'data.frame': 38 obs. of 25 variables:

$ name : Factor w/ 38 levels "Battle at the Mummer's Ford",..: 13 1 7 14 18 10 25 5 3 17 ...

$ year : int 298 298 298 298 298 298 298 299 299 299 ...

$ battle_number : int 1 2 3 4 5 6 7 8 9 10 ...

$ attacker_king : Factor w/ 5 levels "","Balon/Euron Greyjoy",..: 3 3 3 4 4 4 3 2 2 2 ...

$ defender_king : Factor w/ 7 levels "","Balon/Euron Greyjoy",..: 6 6 6 3 3 3 6 6 6 6 ...

$ attacker_1 : Factor w/ 11 levels "Baratheon","Bolton",..: 10 10 10 11 11 11 10 9 9 9 ...

$ attacker_2 : Factor w/ 8 levels "","Bolton","Frey",..: 1 1 1 1 8 8 1 1 1 1 ...

$ attacker_3 : Factor w/ 3 levels "","Giants","Mormont": 1 1 1 1 1 1 1 1 1 1 ...

Etc..

#=========================================================================================

In the structure of the data we can find that there are 25 columns of variable and 38 rows of observations and majority of the observations are categorical in nature which makes it harder to do data modeling. But we can perform some good visualizations using this data sets.

view(got)

We can see the data in the table in the source window thus getting an overview of the data is easier and effective rather than looking at the structure of the data. Let’s move to the Anlaysis.

Visualization & Analysis

How many types of battles were fought?

Battle type distribution using ‘plotly’ package.

Code:

#=========================================================================================

#creating an advanced pie chart with plotly

#Assign numerical values for battletypes

#=========================================================================================

install.packages("plyr")

library(plyr)

library(ggplot2)

library(plotly)

packageVersion('plotly')

#giving a numerical value to the battle type from the values of the table(got$battle_type

got$battletypescore <- revalue(got$battle_type,

c("ambush"="10", "pitched battle"="14", "razing"="2", "siege"="11"))

#now plotting the advanced pie chart with plotly (requires plotly api)

Sys.setenv("plotly_username"="edwinvarghese4442")

Sys.setenv("plotly_api_key"="XXXXXXXXXXXXXXXXXX")

library(grid)

pie_pltly= plot_ly(got, labels= ~ got$battle_type, values= ~got$battletypescore, type = 'pie',

textposition = 'inside',

textinfo = 'label ',

insidetextfont = list(color = '#FFFFFF'),

hoverinfo = got$battle_type,

marker = list(colors = colors,

line = list(color = '#FFFFFF', width = 1)),

showlegend = TRUE) %>%

#The 'pull' attribute can also be used to create space between the sectors

layout(title = 'Battle distribution category wise',

xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),

yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

pie_pltly

chart_link=plotly_POST(pie_pltly)

chart_link

#=========================================================================================

How frequent was the Attacking King’s attacks?

Code:

#=========================================================================================

#Attacking Kings Battles Timeline

#=========================================================================================

got = got[!(is.na(got$attacker_king) | got$attacker_king==""), ] #removing blank #observation

got = got[!(is.na(got$attacker_outcome) | got$attacker_outcome==""), ]

gpoint_a = ggplot(got,aes(x = got$battle_number,

y = got$attacker_king)) +

geom_point(aes(color = factor(got$attacker_outcome),

size = 5)) + theme_bw() +

ggtitle("Battles Timeline") +

xlab("Battles----->") + ylab("Attacking Kings") +

theme(plot.title = element_text(hjust = 0.5)) +

theme(legend.position = "bottom",

legend.direction = "horizontal",

legend.title = element_blank())

gpoint_a

#=========================================================================================

This shows that the attacks from the ‘Joffrey/Tommen baratheon’ were the most frequent. If you see the battles axis, they were consistent in their attacks. Even we can see ‘Robb Stark’ attacking up to around 27th battle. Then they’ve also fallen back.

Note: There were not enough years to create a timeline, but managed to create one with the battles. Hence, the title of “Battles Timeline”.

What was the percentage Win/Loss of Attacking Kings?

Code:

#=========================================================================================

# Plotting with ggplot

#=========================================================================================

got = got[!(is.na(got$attacker_king) | got$attacker_king==""), ]

got = got[!(is.na(got$attacker_outcome) | got$attacker_outcome==""), ]

library(dplyr)

gotnew = as.data.frame(got %>%

group_by(attacker_outcome, attacker_king) %>%

tally %>%

group_by(attacker_king) %>%

mutate(pct=(100*n)/sum(n)))

gnew = ggplot(data= gotnew, aes(x= gotnew$attacker_king, y = n, fill= gotnew$attacker_outcome)) +

geom_bar(stat = "identity") +

geom_text(aes(label = round(pct,2)), position=position_stack(vjust=0.5),size=3)+

scale_y_continuous(limits = c(0,35))

fill = c( "#E1B378","#5F9EA0") #custom colours

gnew = gnew + scale_fill_manual(values = fill)

gnew = gnew + theme(legend.position = "bottom",

legend.direction = "horizontal",

legend.title = element_blank(),

panel.background = element_blank(),

axis.line = element_line(color = "black"),

panel.grid.minor = element_line(color = "grey"))

gnew = gnew + ggtitle("Battle Win/Loss Statistics") +

xlab("Attacking Kings") +

ylab("Number of Battles") +

theme(plot.title = element_text(hjust = 0.5))

gnew

#=========================================================================================

Here we can observe that Joffrey/Tommen Baratheon holds the upper hand in high winning probability with a winning percentage of 92.86. It’s also interesting to find Balon/Euron Grejoy has not been defeated in a battle they have started.

Note: These statistics are based on only in the times when the Kings are on the attacking sides, when they are on the defending ends, it’s a different story. We’ll see that in the following analysis.

What are the geographical priorities of the Kings while going to the war?

#=========================================================================================

# Radar chart plot

#=========================================================================================

library(fmsb)

got = read.csv("battles.csv")

got <- got[!(got$attacker_king == ""), ]

got <- got[!is.na(got$attacker_king), ]

got$attacker_king = droplevels(got$attacker_king)

levels(got$attacker_king)

spiderR = table(got$attacker_king, got$region)

spiderR = as.data.frame.matrix(spiderR)

spiderR = rbind(9,0, spiderR)

colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9), rgb(0.8,0.2,0.1) )

colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4), rgb(0.8,0.2,0.1,0.5) )

radarchart( spiderR , axistype=1 ,

#custom polygon

pcol=colors_border , pfcol=colors_in , plwd=2 , plty=1,

#custom the grid

cglcol="grey", cglty=1, axislabcol="grey", caxislabels=seq(0,9,2.25), cglwd=0.8,

#custom labels

vlcex=0.8

)

legend(x=1.4, y=1, legend = rownames(spiderR[-c(1,2),]), bty = "n", pch=20 , col=colors_in , text.col = "black", cex=1.2, pt.cex=3)

#================================================================================

All the Kings have different priorities, although Lannisters and the starks seems to have a common interest fighting in the riverlands.

What are different battle types used by each King?

Code:

#================================================================================

#Radar chart plot

#================================================================================

library(fmsb) got = read.csv("battles.csv")

got <- got[!(got$attacker_king == ""), ] got <- got[!is.na(got$attacker_king), ] got <- got[!(got$battle_type == ""), ] got <- got[!is.na(got$battle_type), ] got$attacker_king = droplevels(got$attacker_king) got$battle_type = droplevels(got$battle_type) levels(got$attacker_king) levels(got$battle_type)

spiderR2 = table(got$attacker_king, got$battle_type) spiderR2 = as.data.frame.matrix(spiderR2) View(spiderR2) spiderR2 = rbind(6,0, spiderR2)

colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9), rgb(0.8,0.2,0.1) ) colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4), rgb(0.8,0.2,0.1,0.5) ) radarchart( spiderR2 , axistype=1 , #custom polygon pcol=colors_border , pfcol=colors_in , plwd=2 , plty=1, #custom the grid cglcol="grey", cglty=1, axislabcol="grey", caxislabels=seq(0,6,1.5), cglwd=0.8, #custom labels vlcex=0.9 ) legend(x=1.4, y=1, legend = rownames(spiderR2[-c(1,2),]), bty = "n", pch=20 , col=colors_in , text.col = "black", cex=1.2, pt.cex=3)

#=========================================================================================

If you see the output it’s very clear that Joffrey/Tommen Baratheon (Lannisters) have a huge army capable of executing a siege or an ambush battle. Where as, Rob stark prefers ambush battle and he’s quite successful in that too. This also shows that the Starks doesn’t have a strong army compared to the lannisters and the only way to win the battle is to do ambush attacks.

Battle Win/Loss statistics including both Attacking_Kings and Defender Kings

#=========================================================================================

# GGPLOT with 3 categorical variables

#=========================================================================================

library(ggplot2)

got <- got[!(got$attacker_king == ""), ]

got <- got[!is.na(got$attacker_king), ]

got <- got[!(got$attacker_outcome == ""), ]

got <- got[!is.na(got$attacker_outcome), ]

got <- got[!(got$defender_king == ""), ]

got <- got[!is.na(got$defender_king), ]

#removed blank and NA from the datasets

table(got$attacker_king, got$attacker_outcome)

ggplot(data = got, aes(x=interaction(got$attacker_king, got$defender_king),

fill=got$attacker_outcome)) +

geom_bar(position='stack', stat='count') +

theme(axis.text.x = element_text(angle = 90, hjust = 1)) +

coord_flip() + geom_text(stat = "count",position = position_stack(vjust = 0.5),

aes(label = ..count..)) + scale_y_continuous(limits = c(0,20)) +

ggtitle("Attacker/Defender King Win-Loss Statistics") +

xlab("Attacker_King . Defender_King") + ylab("Count") + theme(plot.title = element_text(hjust = 0.5)) + theme_bw() + theme(legend.position = "bottom", legend.direction = "horizontal", legend.title = element_blank())

#=========================================================================================

Here we are able to observe the outcomes of the battles when the Kings were on both Attacking and Defender Sides. We can cross check this ggplot graph with table function as shown below

Note: The generated ggplot has 3 categorical variables and interaction function has been used in the ggplot for generating this graph plot. You can also facet the graph using the facet function inorder to get separate graph plots.

#=========================================================================================

got$opponent_outcome = ifelse(got$attacker_outcome == "win","loss","win")

#created a new variable called opponent_outcome

table(got$attacker_king, got$attacker_outcome)

table(got$defender_king, got$opponent_outcome )

#=========================================================================================

What are the attacker commanders' winning statistics?

#=========================================================================================

View(got) ggplot() + geom_bar(data = got, aes(x = got$attacker_commander, fill = got$attacker_outcome )) + coord_flip() + theme_bw() + ggtitle("Attacker commander Win stats") + xlab("Attacker Commanders") + ylab("Count") + theme(plot.title = element_text(hjust = 0.5)) + theme(legend.position = "bottom", legend.direction = "horizontal", legend.title = element_blank())

#=========================================================================================

Battle between two major clans

If we look at a bigger picture of the analysis, we are able to observe one thing. Starks and lannisters are very competitive and they have significant number of wins as well. It's also evident from the data that they if they want to win against, they have to use strategies according to their competencies. Hence, we'll now take only their (Starks and Lannisters) case for the following analysis.

For this, we have to create two data frames: one with attacking kings and defender kingsas lannisters and starks respectively and the other with attacking kings and defender kings as starks and lannisters respectively.

code:

#=========================================================================================

ga= got[got[,4] == "Joffrey/Tommen Baratheon" & got[,5] == "Robb Stark", c(4:5, 14:15,18,19,20,21,24)] View(ga)

ga1= got[got[,4] == "Robb Stark" & got[,5] == "Joffrey/Tommen Baratheon", c(4:5, 14:15,18,19,20,21,24)] View(ga1)

gotcomb = rbind(ga, ga1) View(gotcomb)

#=========================================================================================

Stark/Lannisters Stats

(Thank you bearing with the clarity of the image. This is the best I could get)

Conclusion

Since we don't have any significant number of continuous independent variable, regression modelling is not possible. Although we can do some analysis based on the spread sheet.

In Battle no: 17, attacker Attacker King was Joffrey/Tommen Baratheon and they had an army of 20000 where as Robb stark had an army of only 10000 and still the Starks managed to win. It is good to note that the both region and size of the army was in favor of the lannisters. Defender commander was Edmure Tully for this battle. This can be one of the reasons for Starks winning the war. Looking at the bigger picture of the analysis,for Starks to win, they'll have to scale up their army strength and their battle should more of an ambush type because it has proven very effective in the past. Lannisters will have to be more cautious about their defence startegy, because Starks can attack .