Note: this page is just used to show our analysis work for Naruto World based on the skills we learned from the Social Network class.

Story Origin

Ninja is a well-known ancient profession in Japan. Their work includes protecting important people, sneaking into enemy camps to collect intelligence, performing secret missions, assassination activities, etc. This ancient profession in Japan is very similar to the military officer Jin Yiwei of the official intelligence agency of the Ming Dynasty in China, except that they have been given a more legendary color after the dramatic interpretation of artistic works. One of the most popular comics related to this topic is called Naruto, which tells a passionate, warm and inspirational ninja story. Therefore, our interest is to analyze the ninja world described in this comic by means of complex networks and natural language processing. Not much to say, let us walk into the world of Naruto together.

Introduction

This classic comic is developed around the fetters of the two protagonists Naruto Uzumaki and Sasuke Uchiha. The specific storyline will not be introduced in detail here, and interested friends can search by themselves. Naruto is a comic with a grand world, including a large number of characters and related concepts, which is very suitable for network analysis. Therefore, by constructing a complex network and performing basic network analysis, we can understand the number of characters, the out-degree, the in-degree, the nature of the network, and the network diagram based on the overall character connections. In this work, most ninja characters have corresponding ninja villages and graudation age from the ninja academy. The attribute ninja village divides the ninjas into different groups, and also imply a certain relationship between ninjas and the attribute graduation age shows the ninja's talent in a way. Our research interest includes whether the communities divided according to the village attribute are consistent with the communities solved by the algorithm based on node connections. Apart from these goals, we also tried to find the wordcloud, calculate centrality, implement sentiment analysis but the most fun is to research if geniuses are all alone, whether they change the world, sentiment analysis in terms of villages, find the sentiment variation of Sasuke (One of the two protagonists) over paragraphs and implement character pairing analysis.

Network Analysis

Our datasets are extracted from the Fandom wikipages by using the API provided by this website and some libraries such as JSON, urllib, and BeautifulSoup, etc. The first dataset is the text content of every character webpage shown in the comic. The second data set is the light novels of this comic. For the first dataset, we crawled the villages and graduation age in each character page as the attributes for the node. The network edges are generated according to the superlinks of corresponding node pages. After constructing the network, it is easy to obtain the number of nodes and edges by calling the built-in methods of NetworkX. The result shows there are 1324 nodes and 11056 edges and note that the isolated nodes that there is no connection between that node and others has been removed already.

Degree Analysis

The degree of a node in the network is the number of adjacent nodes that the i-th node has, which is the simplest but most important feature. The greater the degree, the more important the node is to some extent. For directed networks, degrees are divided into two categories: out-degree and in-degree. Out-degree refers to the number of edges from current node to other node. In-degree refers to the number of edges from other node to current node. Once the network generated successfully, the degree analysis will not be complex since the third-party library NextwrokX provides us with a lot of methods. From the figure below, you can see the top 5 nodes in terms of the number of out-degree and in-degree. The result is basically consistent with our expected result. They are indeed the most important characters in this comic. However, the appearance of "Naruto Musasabi" is out of our expectation that this character occupies the third place in the in-degree rank list. We found that this character is a fictional character from a novel written by a real character Jiraiya and this character is protagonist Naruto Uzumaki's namesake, which makes it own high in-degree. At the same time, the in-degree value greater than the out-degree value can also be explained, because most small characters will cite several important characters, but they are rarely cited by other characters, which leads to a very large in-degree of important characters.

Here for both distribution, the bins are 100. We used 'Frequency' as y-label here, according to the given example in lecture 4.

For in-degree distribution, most of the values are distributed between 0 and 50(more exactly, 0 to 20). There are limited nodes with extremely high in-degrees in the network, which means that the tail of this power-law distribution(long-tail distribution) will be pretty long. From the figure of out-degree distribution, the conclusion can be obtained that unlike the case of in-degrees, many nodes own non-zero out-degree values. Besides, the variance of out-degree frequency is considerable smaller than that of in-degree one and there aren't many outliers.

Clearly, for Naruto degree distribution, it looks like power-law distribution. Most nodes have links between 0 and 50 and a few of them have a considerable number of links. The nodes with high degree value are those who have high in-degrees. These protagonists can be considered as hubs in the network. For ER random network, it obeys binominal distribution and can be thought as Poisson Distribution roughly because we have a large number of nodes. Most degrees are distributed between 10 and 25.

Centrality Analysis

Centrality analysis can help us to find the most important node in the network and in our analysis, three different methods for calculating the centrality are used; they are degree centralitybetweenness centrality and eigenvector centrality respectively. The results are shown in the figure below.

The leftest figure is obtained by using the degree centrality, the middle one shows the centrality based on the betweenness centrality and the result of egienvector centrality can be found in the rightest figure. From these figures, it is obvious that the top 3 characters in terms of centrality are Naruto Uzumaki, Kakashi Hatake and Sasuke Uchiha according to all 3 methods, which fits our expectations perfectly. However, the fourth and fifth characters found are a bit different. the heroine Sakura Haruno and Naruto's sakename Naruto Musasabi appear twice each and meanwhile one of the most important villains named Orochimaru and absolute protagonist's son Boruto Uzumaki show up once. This difference comes from the solving logic of different algorithm and more details are given in our explainer notebook. Overall, the results are reasonable because they are all key characters whose story runs through the whole work.

Community Analysis

Here, you can find all 13 communities givne by Louvain Method in the community distribution figure. The x-axis represents the sequence number of each community and the y-axis represents the number of nodes for that community.

Community
0 Boruto Uzumaki & Ino Yamanaka & Shino Aburame
1 Naruto Musasabi & Sakura Haruno & Neji Hyūga
2 Kabuto Yakushi & Gaara & Killer B
3 Sasuke Uchiha & Orochimaru & Itachi Uchiha
4 Shikamaru Nara & Hinata Hyūga & Chōji Akimichi
5 Kakashi Hatake & Tsunade & Anbu Commander
6 Might Guy & Sai & Kankurō
7 Naruto Uzumaki & Iruka Umino & Torune Aburame
8 Jiraiya & Nagato & Hanzō
9 Yamato & Utakata & Chiriku

Besides, we named every community using three character names whose degree ranks in top 3 in that community. Note that the three very small communities are neglected in the following analysis. In the group 7, you can see Naruto, Iruka Umino and Torune Aburame are grouped to the same community and other communities also meet our expectation. You can find the result in the table above.

Network Overview

The overall network of Naruto World is shown in the figures below. As shown in the figure, those nodes whose size is obviously larger than others are critical protagonists in the network. They have higher degrees. The connections(links) among these nodes are drawn in shallow yellow.

The first network graph is drawn based on the communities analyzed above. Different node is grouped in a different community with different colors. The communities infomation is shown in the legend.

The second network graph is drawn based on the villages extracted from the wikipages in the beginning. Different node is grouped in a different village with different colors. There are 33 clans in this comic so that you can see 33 different colors marking the nodes.

The third network graph is drawn based on the grad_age extracted from the wikipages in the beginning.

Text Analysis

In this section, the research based on the text analysis is introduced which contains sentiment analysis, wordcloud processing, novel analysis etc.

Wordcloud Generation

The wordcloud generated through TF-IDF is shown in the figure below. From that, you can see the more important for the word, the bigger the word is so that Naruto, Sasuki, and Boruto, etc. are much bigger than other words because they are all the main characters' names. Meanwhile, it is noticeable that team, village, attack, time and so on are also high-frequent. This can be explained by the key value of this comic(or anime): team up to execute all kinds of mission to maintain the peace, attack the intruder and protect the village.

From the figure below, obviously, the result we finally obtained makes sense as far as our knowledge is concerned. The word Naruto shows up in every wordcloud as it is the name of the Anime and also the name of the protagonist. In addition, the name of the characters who have a high degrees in each community shows a great impotance in each wordcloud, such as Boruto in the first community and Sakura in the second community. What's more, the Three protagonists Naruto, Sasuke and Kakashi not only appears in their own community, but aslo shows up in other communitys, like Might Guy & Sai & Kankurō and Sakura Haruno & Naruto Musasabi & Neji Hyūga.

Sentiment Analysis

The sentiment analysis aims to find the happiest and saddest character in this comic and we used a sentiment dataframe provided to calculate the sentiment values for each character. In the sentiment reference dataframe, all the words have happiness rank and corresponding happiness mean value which can be used to obtain the overall mean sentiment value for each character's wikipage. After the calculation, we can find the sentiment values for every character and then rank them in descending order(the bigger sentiment value means happier the character is).

It is obvious that the sentiment values do not change much and fluctuate within a limited range. Most of the sentiment values of the characters are between 5.4 and 5.5 and it looks like a normal distribution. The result is shown in the figure above.

After sorting the sentiment values, it is not hard to find the top 5 happiest characters as well as the saddest characters. Let's have a look in the tables below.

top 5 happiest chatracter sentiment
0 Himawari Uzumaki 5.786766
1 Asura Ōtsutsuki 5.753659
2 Kushina Uzumaki 5.740877
3 Hamura Ōtsutsuki 5.733134
4 Hagoromo Ōtsutsuki 5.722735
top 5 saddest chatracter sentiment
0 Hidan 5.172518
1 Gyūki 5.215784
2 Rōshi 5.241013
3 Zabuza Momochi 5.254157
4 Kina Kodon 5.258733

According to the table we found, the happiest character is Himawari Uzumaki while the saddest character is Hidan. This difference comes from the truth that the wikipage of Himawari Uzumaki contains a number of words with a high sentiment but Hidan's wikipage is opposite. However, is our guess correct? How to verify? This is a small challenge for us so a short script is written which can be used to evaluate the corresponding word counts in terms of its sentiment values for each wikipage text. Finally, we found there are father, brother and mother and so on showing up above 20 times in the wikipage of the happiest character. On the other hand, the saddest character "Hidan"'s page contains terrible words like kill even reaching an astonishing 17 times. This can be explained by the story about these two characters. The happiest character is only a 10-year old little girl who borned after the world returning to peace (in the end of the story), we have abundant reason to believe that she is the happiest character in Naruto. However, for the saddest one, he is a villain who has killed many ninjas through the story because he adores death. 'Kill' is a very sad word in our analysis, but for Hidan, it's the source of his happiness. You can find the evidence in the tables below.

word counts sentiment
0 father 26 7.06
1 brother 21 7.22
2 mother 20 7.68
3 home 14 7.14
4 13 18 6.24
word counts sentiment
0 opponent 20 3.90
1 kill 17 1.56
2 body 16 5.96
3 despite 13 4.48
4 battle 13 2.98

Original Analysis

In this section, we will show the most interesting research topics. The research in this section combines all the theoretical knowledge and analyzing skills we learned in the course to mine hidden information contained in the network and texts and to compare the results we obtained with the conventional cognition. Complex network and natural language processing are both for discovering, explaining and then solving practical problems in life, instead of just doing some analysis alone.

Question 1: Are geniuses all alone? / Are geniuses changing the world?

In Naruto world, every child in Ninja Villages will be admitted to the Ninja Academy to learn basic ninjutsu (the skills to become a ninja) at a very young age. Only graduating successfully from these academies, they were considered professional ninjas. In most cases, gifted children will graduate at an early age and this is also match the situation in our real world. In line with our experience, the centrality of a character reflects the importance in the comic world. Out-degree illustrates how much one cares about other characters. In-degree demonstrates one's degree of being noticed. Therefore, to solve this question, we made an investigation regarding the characters with young graduation age(they are geniuses) and their degree and eigenvector centrality and this should rely on the graduation age attribute collected in the previous steps.

From the figure of graduation age distribution, it is not hard to find that most ninjas graduate at 12 years old and the Naruto Uzumaki also belong to this group because he is not a talented ninja although he is very hard-working and ambitious. Only a few real geniuses become professional ninjas before their 9th birthday. These characters should be considered here.

Name Graduation age In-degree Out-degree Eigenvector centrality
0 Kakashi Hatake 5 247 108 0.220451
1 Jiraiya 6 128 58 0.132034
2 Orochimaru 6 194 65 0.197742
3 Naruto Musasabi 6 2476 8 0.178412
4 Tsunade 6 143 60 0.136061
5 Yamato 6 65 55 0.098044
6 Itachi Uchiha 7 117 41 0.129140
7 Might Guy 7 103 64 0.117126
8 Sasori 7 57 31 0.072870
9 Baki 8 13 20 0.026534

From the table above, we can see most geniuses are important characters in the comic. Kakashi(Naruto's supervisor), Jiraiya(Naruto's master) and Itachi Uchiha(Sasuke Uchiha's brother) etc. help to promote the development of the storyline and their centrality values, degree values are all high compared with normal characters which means they change the world to some extent. As for social relationship of these geniuses, Kakashi Hatake has a number of friends while Naruto Musasabi seems to not care about others since his out-degree is only 8. Besides, Baki is a real genius, but with little attention. Overall, in the usual case, being a genius doesn't bring loneliness. Genius is not always alone.

Question 2: Ninjas from which village are the saddest?

The dog-eat-dog world in Naruto is cruel and filled with tragedy. Without power above all, there are wars between countries(more accurately it's between villages because each country built a ninja village to train ninjas as military forces) every year. The five largest villages, Konohagakure, Kirigakure, Kumogakure, Iwagakure, and Sunagakure tend to choose small villages as battlefields, resulting in the expansion of hatred and sadness. One of the most powerful villains in Naruto comes from a small village, he destroyed the hometown of Naruto Uzumaki, for revenge. In short, the world is stuck in a circle of avenge and revenge. Both the villains and heroes aim at bringing peace back to the world. Villains plan to become the power above all to eliminate all the unstable factors. Heroes know that the length of peace kept by force would be the length of the ruler's life. It's not a long-term plan. Thus they determine to unite the world by love. In this question, we would like to find out the village with the most tragedy. Our guess would be Amegakure. As a small village between three great countries, it has frequently served as a battleground during the various ninja wars, making most of its population war refugees. The following analysis will evaluate our guess.

Village Sentiment
0 Yugakure 5.191124
1 Ishigakure 5.278589
2 Shangri-la 5.306823
3 Howling Wolf Village 5.324129
4 Kumogakure 5.367670
6 Kirigakakure 5.371250
13 Iwagakure 5.421362
18 Sunagakure 5.449992
21 Konohagakure 5.488923

Note that living in the 5 great villages, Konohagakure, Kirigakure, Kumogakure, Sunagakure, and Iwagakure usually doesn't mean a relatively safe. Great countries could also be destroyed easily by wars.

word counts sentiment
0 beast 218 3.36
1 ha 155 6.00
2 attack 145 2.42
3 one 121 5.40
4 war 107 1.80
word counts sentiment
0 team 1926 6.26
1 ha 1735 6.00
2 time 1272 5.74
3 one 1268 5.40
4 village 1257 6.28

We noticed that beast appeared frequently in Kumogakure's text. Tailed beasts are 9 powerful monsters in Naruto. They were sealed in 9 human bodies(jinchūriki) in the five great countries. When a jinchūriki loses control of his/her tailed beast, the consequence would be disastrous: an out-of-control tailed beast would destroy everything until some powerful ninja stops it and seals it again. Therefore, the tailed beast may be the vital reason for the sadness in Kumogakure. Konohagakure is the most powerful village in the world and it's the hometown of the main characters. That's why Konohagakure is the happiest village among the great-5.

We found that ninjas from Yugakure are saddest. Actually, the saddest character, Hidan, is from Yugakure. Ironically, Yugakure is a small village which has strong inclinations towards pacifism. Ideally, it ought to be the happiest village in the world. However, Yugakure's transformation from a military force was not supported by all, Hidan being a notable objector. He slaughtered many villagers and left the village. Hidan has a very negative influence on the sentiment of Yugakure.

WAR IS CRUEL. In the world of Naruto, no matter great or small villages, they all face the pressure of war every day. It's so ironic that villagers in Yugakure, who love peace the most, are through the worst. We are supposed to cherish the peaceful time which got through difficult struggles: war has never been far away from us.

Question 3: Novel Analysis: Sasuke’s sentiment change over paragraphs?

Let's have a glance at the light novel Sasuke Shinden: The Teacher's Star Pupil. It's fanfiction thus the information it gives may more or less reflect the ideas from fans. To start, we would simply compute the hero Saseke's sentiment change over paragraphs.

Even with much fluctuation, most paragraphs have sentiment around 5.5, approximately the average sentiment of the novel. As we can see, with the development of the story, the sentiment declines sharply many times but returns to the normal values soon. In line with the graph, we guess that the climax of the story is around paragraph 1900 to 2400, where the sentiment fluctuates the most. Meanwhile, the sharply decreased sentiment is probably related to the occurrence of fighting.

Question 4: Novel Analysis: Sasuke’s Character Pairing

Sasuke Uchiha is one of the main characters in Naruto and the hero in the novel Sasuke Shinden: The Teacher's Star Pupil. In this part, we expect more from the light novel: in the author's view, who is Sasuke's character pairing? Character Pairing, also known as CP, is an important part of the ACG culture. Pairing refers to the characters who make up the romantic focus of a fanfiction. But they don't need to be factual lovers. For example, Naruto Uzumaki is sometimes regarded as Sasuke Uchiha's character pairing, even though they are only comrades. Here we assume that rather than with others, when staying with his CP, Sasuke is more pleased. Therefore, this question would become finding character names in each paragraph, and compare the corresponding sentiments.

There are too many characters in Naruto. It's time-consuming to search all the character names in the novel. Therefore, according to our investigation, we would like to list 4 options here: Naruto Uzumaki, Itachi Uchiha, Sakura Haruno and Karin. Naruto Uzumaki is Sasuke's closest comrade and they have common ideals. Itachi Uchiha is Sasuke's brother and the one who supports him the most. Sakura Haruno is Sasuke's wife and they have a daughter. Karin is Sasuke's teammate and she has loved Sasuke at a distance for many years.

First let's make clear who has the most stories with Sasuke in the manga series. This may give us some hints on the anwer for this question. Term-frequency is used here, to analyze character pages. Considering the length of different page content, term frequency with adjustment would be implemented. We define In-TF here to be the frequency of the word 'sasuke' on each candidates' page. In-TF reflects the candidate's affection for Sasuke.

Term-frequency here shows the degree of character's "love" to Sasuke. Karin is devoted to Sasuke. Naruto seems to be the most impossible candicate for CP.

We define Out-TF here to be the frequency of a candidate's name on Sasuke' page. Out-TF reflects Sasuke's affection for the candidate in a sense. Original term frequency would be implemented here. It's not necessary to apply term frequency with adjustment because the denominator would be the same.

Things become different. Naruto is the one with the highest probability to be CP, according to the analysis. Surprisingly, Sasuke doesn't seem to care much about Karin. Her affection is doubtless unrequited.

Name Sentiment
0 Naruto Uzumaki 5.546862
1 Sakura Haruno 5.534066
2 Itachi Uchiha 5.487603
3 Karin 5.328333

Different from our guess, in the author's eyes, Naruto Uzumaki and Sakura Haruno have a higher probability to be Sasuke's CP. They have close scores. Karin gets the lowest score. She scored significantly lower than everyone else. Sad story T_T.

Download our dataset to play around & See our explainer notebook for more details

Click here to find our data set: Data Set
Click here to find our explainer notebook: Explainer Notebook