The United Nations (UN) is an International Organization that came into existence after World War 2 to save succeeding generations from the scourge of war, to uphold international law, to promote socio-economic progress around the world, among others (Charter of the United Nations, 2015). These high-level political forums provide a platform for discourse and facilitate the discussions and deliberations that lead to decisions that affect the world community at large. The General Assembly is one of the main principal organs of the United Nations. The General Debate (GD) is held every year at the start of each session at the United Nations General Assembly (UNGA) as per the rules of procedure (General Assembly of the United Nations, n.d.), usually in September. It takes place usually around the months of September or October. Often, the member states of the UN are represented by their respective heads of state or government. The statements issued by these high-level officials reflect issues of importance to the respective states. It is important to know that the overall sentiment is positive toward these issues that are discussed to be solved.
The United Nations General Debate Corpus (UNGDC) created by Jankin Mikhaylov et al. (2019) provides these statements in English by all the nations making speeches at the GD from 1970 (UNGA Session 25) to 2018 (UNGA Session 73). I make use of this dataset to examine the differences between the average sentiment of the countries and average sentiment across years. Building on the work done by Baturo & Dasandi (2017), I structurally model topics only for the P5 member nations, namely the United States of America, the Russian Federation, the United Kingdom of Great Britain and Northern Ireland, the People’s Republic of China, and the French Republic. Finally, I present a classification model to classify between speeches of the P5 member nations of the UN based on the choice of words.
The following section goes into the data. The paper further goes in depth to talk about the preprocessing steps taken alongside the methods applied to assess the sentiments. Further structured topic models is employed to receive 16 unique and mutually independent topics. It is seen that there is an overall positive sentiment within the GD with the P5 mainly discussing millennium development goals alongside post conflict development. We also have topics that revolve around the principle of self-determination being discussed. This paper also proposes a classification model to classify country among the P5 as a function of choice of words in the speech.
The UNGDC has all the speeches made by representatives of 200 different member nations of the UN at the GD from the years 1970 to 2018 in English. In Figure 1, we see the total number of speeches made by each country. Some nations became member nations as late as 2011 such as South Sudan which have a very low number of speeches (8 speeches). The entire dataset has 8093 observations or rows or documents, with each row having country-year as the unit of observation. An example of the dataset is provided in Table 1.
Figure 1: Frequency of Speeches by Representatives at the United Nations General Debate
Table 1: First 3 Rows of the UNGDC
The ‘doc_id’ column is simply the filename of the individual text document, while the ‘text’ column represents the actual text within the file which are speeches by representatives. The ‘Country’ column represents the iso3c code for a country. The ‘Session’ column represents the session of the UNGA at which the speech was made and the ‘Year’ column tells us which year that session of the UNGA took place in.The name of the country the representative that spoke represented is treated as a categorical variable and the session and year are both continuous variables.
While working on examining speeches i.e. text data, the general bag-of-words approach is employed, whereby each document from the 8093 documents is represented by a set of words and phrases for up to 4-grams that does not take into account grammar or word order. A simple approach is used to tokenize these terms since parts of speech tagging do not have any particular effect on the tokenization approach in our case. This can be seen from Figures 2(a) and 2(b) where both approaches yield similar results. Figure 2(a) depicts a word cloud without parts of speech tagging while Figure 2(b) depicts a word cloud with parts of speech tagging. Thus, this saves us a computationally more expensive approach for preprocessing our data as well.
The corpus is turned into tokens by removing punctuations, symbols, numbers, and URLs. Further, these tokens are converted into up to 4-gram tokens. Thus, some individual words and phrases contain up to 4 words in the data. In the end, these tokens are turned into a document term matrix by lowercasing all the letters and removing all separators. Stop words are also removed. This is because stop words have generally created a problem in topic modeling as seen from previous experience with this dataset (Assignment 10, Week 12). Finally, the terms are also lemmatized using the lexicon hash lemma (Lexicon Package - RDocumentation, n.d.). There are only the P5 nations in this document term matrix. It has 243 documents with 508, 877 features, and 99.33% sparsity. I trim the document term matrix with a minimum document frequency of 20 and a maximum document frequency of 170 to get a 77.58% sparse matrix with 2,640 features. The minimum row sum value for this document term matrix is 287, while the median is 958 and the mean is 1062.
Sentiment Differences Across Nations & Years
The corpus that was turned into tokens was used for sentiment analysis by country. I employed the novel Lexicoder Sentiment Dictionary (Young & Soroka, 2012) to gauge the overall sentiment in R (Lexicoder Sentiment Dictionary (2015) — Data_dictionary_LSD2015, n.d.). To do so, first an overall positive and negative sentiment was calculated by adding up the columns ‘negative & neg_positive’ for the overall negative sentiment and the columns ‘positive & neg_negative’ for the overall positive sentiment. Then, an average sentiment was calculated by subtracting negative sentiment from the positive sentiment and aggregating this at a country level and also a year level unit of observation. Each row either contains an average of all years for each country or contains the average sentiment of all nations at any given year.
Figure 3 (a): Overall Sentiment Aggregated to Country
Figure 3 (b): Overall Sentiment Aggregated to Year
The overall sentiment of a country is shown in Figure 3 (a) categorized by regions. We see that the general sentiment at the GD has been positive when averaged out at all years. We see that Japan has the most positive sentiment about their speeches across years, followed by Nepal, Romania, and Egypt. The United States had the most aggregate positive sentiment among the P5 member nations. Some of the lowest positive sentiments across the years have been expressed by Israel, Eritrea, and Cuba. Figure 3 (b) shows the sentiment across each year exhibited as a whole at the GD. We see that the most positive sentimental year was 1975 where everyone expressed positive sentiments through their speeches. We also see that the lowest positive sentiment at the GD was in the year 2001 when the GD was held 2 months after 9/11 (UN General Assembly Begins Annual High-Level Debate, 2001). We see an overall positive sentiment in the UN which is a good sign.
Structured Topic Modeling for P5 Member States
Figure 4: 16-topic Topic Modeling
The work Baturo & Dasandi (2017) presents an optimal search model that compares semantic coherence and exclusivity to suggest the use of a 16-topic model owing to the largest positive residual for this data while using the entire dataset. They also suggest that the topic prevalence would not only be affected by the different countries but also the time. Thus, the topic prevalence is specified for country and year. The topic content may vary from country to country and hence is specified as the same. The same approach is applied to the subset of the data for the P5 member nations. Figure 4 provides the list of topics that are modeled using the structured model topic approach. The topics can be broadly classified as follows:
Topic 1 - Economic Development
This topic is related to economic development due to the prevalence of the words such as ‘world_trade_organ’, ‘privat’, ‘free_trade’, ‘global_economi’, ‘trade_and_invest’, among others.
Topic 2 - Millenium Development Goals
This topic is related to millennium development goals due to the prevalence of the words such as ‘strateg_stabil’, ‘intern_crimin’, ‘drug_traffick’, ‘millennium_develop_goal’, ‘atom_energi_agenc’, among others.
Topic 3 - Europe
This topic is related to Europe due to the prevalence of the words such as ‘central_europ’, ‘western_europ’, ‘racism’, ‘warsaw’, among others.
Topic 4 - Weapons of Mass Destruction
This topic is related to weapons of mass destruction due to the prevalence of the words such as ‘iraq’, ‘weapon_of_mass_destruct’, ‘prolifer_of_weapon’, ‘biolog’, ‘jerusalem’, among others.
Topic 5 - Post-World War Rebuilding
This topic is related to post-world war rebuilding due to the prevalence of the words such as ‘mistak’, ‘peac_coexist’, ‘berlin’, ‘peopl_of_the_world’, among others.
Topic 6 - International Peace & Security
This topic is related to international peace and security due to the prevalence of the words such as ‘yugoslavia’, ‘republ_of_korea’, ‘congo’, ‘arm_control’, ‘nato’, among others.
Topic 7 - Cold War
This topic is related to the cold war due to the prevalence of the words such as ‘nation_liber’, ‘nuclear_war’, ‘ussr’, ‘east-west’, ‘use_of_forc’, ‘non-interfer’, among others.
Topic 8 - Peacekeeping Operations
This topic is related to peacekeeping operations due to the prevalence of the words such as ‘unit_nation_peacekeep’, ‘peacekeep_oper’, ‘consensus’, ‘unit_nation_system’, among others.
Topic 9 - Terrorism
This topic is related to terrorism due to the prevalence of the words such as ‘attack’, ‘fight_against_terror’, ‘terror’, ‘terrorist’, among others.
Topic 10 - Self-determination
This topic is related to the principle of self-determination due to the prevalence of the words such as ‘central_america’, ‘kampuchea’, ‘viet_nam’, ‘self-determin’, among others.
Topic 11 - Developing Economies
This topic is related to developing economies due to the prevalence of the words such as ‘industri_countri’, ‘world_bank’, ‘raw’, ‘third_world’, ‘econom_order’, among others.
Topic 12 - Middle East & Northern Africa
This topic is related to the Middle East and Northern Africa (MENA) due to the prevalence of the words such as ‘isra_and_palestinian’, ‘syrian’, ‘libya’, ‘syria’, ‘arab’, among others.
Topic 13 - Al-Qaeda
This topic is related to the al-qaeda due to the prevalence of the words such as ‘sudan’, ‘afghan’, ‘terrorist_attack’, ‘terrorist’, among others, and the al-qaeda is linked to both Sudan and Afghanistan.
Topic 14 - Decolonization
This topic is related to decolonization due to the prevalence of the words such as ‘rhodesia’, ‘zimbabw’, ‘central_europ’, ‘south-east_asia’, ‘intern_co-oper’ among others.
Topic 15 - Post Conflict Development
This topic is related to post conflict development due to the prevalence of the words such as ‘peace-keep_oper’, ‘commonwealth_of_independ_state’, ‘end_of_the_cold’, ‘peac_process’, ‘cooper_in_europ’, among others.
Topic 16 - Sustainable Development
This topic is related to sustainable development due to the prevalence of the words such as ‘climat_chang’, ‘polit_process’, ‘infrastructur’, ‘unit_nation_framework’, ‘act_togeth’ among others.
Figure 5: Correlation between Topics
In these topic models, we see that we get some different results when compared to the work of Baturo & Dasandi (2017). This is probably due to the difference in preprocessing of the text data alongside the inclusion of only the P5 nations in our topic models. The P5 member nations, i.e. the United States, Russia, China, France, and the United Kingdom may have different agendas in their speeches when compared to the rest of the United Nations membership. We also see from Figure 5 that the correlation between each topic is almost 0, thus these topics are mutually exclusive of each other.
Figure 6: Topics 2, 15, 10 & 7 by Country Proportion
Figure 6 shows the distribution of the first 4 topics by nations. The Russian Federation talks most about the millennium development goals (topic 2) compared to the other four nations. The People’s Republic of China talks most about post-conflict development (topic 15), and self-determination (topic 10). The Russian Federation alongside the United States talks a lot about the Cold War (topic 7). Figure 7: Topic Proportion by Year
In Figure 7, the top 4 discussed topics by year have been plotted. We see that the topic relating to the cold war has been on the decline. The topics relating to the principle of self-determination have also declined. We see that the topic of post-conflict development has always been constant since conflicts have existed for a long time. Finally, we see there has been a start increase in the discussion of the millennium development goals which is a good sign toward fostering sustainability and development in the world.
Classifying Speeches based on Choice of Words
One of the primary problems of a classification approach is that we do not know the function that governs a country linked to speeches. The country may not be a function of the choice of words, but the words may be a function of the country. We’re not sure of the data generation process (Monroe et al., 2008). To overcome this issue, I present a classification model based on supervised learning to classify countries based on their speeches. The results of this model show that the countries may be a function of the word choice. My outcome variable is a multinomial categorical variable, with it taking values of 0 for the United States, 1 for the United Kingdom, 2 for France, 3 for China, and 4 for Russia. I make use of the XGBoost in R with the hyper-parameter objective as ‘multi:softmax’ for multinomial classification, to predict the country based on the speech, i.e. choice of words.
Figure 7: Topic Proportion by Year
Table 1 shows that my algorithm did very well in terms of predicting the right classes with a couple of false classification, i.e. misclassification of 2 US speech for Russian ones, and misclassification of a Chinese speech for an American one. We train the algorithm on a total of 197 documents with 5 different classes as the outcome variable. Our test set comprises 46 documents, with our algorithm only giving us 1 classification error out of the total 46 documents. This model had 93.48% test accuracy, with a 95% confidence interval of 82.21% to 98.63%. Figure 8 shows the overall statistics of the model while Figure 9 shows the most important features used to distinguish documents.
Figure 8: Classification Statistics
Figure 9: Feature Importance
Assessing the relative costs associated with our analysis as per Quinn et al. (2010), I implement the Lexicoder Sentiment Dictionary (Young & Soroka, 2012) which is a dictionary available for use, thus the pre-analysis costs were zero. Even for structured topic modeling, the pre-analysis costs associated are very low. Owing to being able to draw from the work of Baturo & Dasandi (2017), the most analysis cost in topic modeling was also not high.
There is an overall positive sentiment within the United Nations General Debate. This is a good sign depicting harmony within nations to collaborate on issues on the world order. The techniques applied are shown promising results for the quantitative analysis of real-world events. Further work can be done to draw inferences from such analysis. For example, understanding whether there is a positive sentiment toward the solutions of the problems being discussed in a particular topic could be beneficial in understanding the direction of the nation. If there is a greater positive sentiment, that would mean that the future looks promising and hopeful to resolving that particular issue.
Some topics are of much greater importance than others as seen by topic modeling in this paper. We also see that we get quite different results when taking preprocessing steps that may be different from others. This is a good check that topic models simply manipulate text to find differences that exist within the tokens we feed to the algorithms. We also see how different topics are discussed more by certain nations compared to others signaling the importance of that topic in their foreign policy. This could be used to undertake studies relating to understanding the foreign policy of nations based on their comments at such international forums.
Finally, due to the promising results via the classification model that we received with a ~93% accuracy, we could say that for the P5 nations in the United Nations, the country may be a function of words that they use in their speeches. This is useful for carrying out more exploratory analysis into the data generation process of the choice of words being influenced by the country, such that country can be mathematically explained as country = function(words).
Baturo, A., & Dasandi, N. (2017). What drives the international development agenda? An NLP analysis of the united nations general debate 1970–2016. 2017 International Conference on the Frontiers and Advances in Data Science (FADS), 171–176. https://doi.org/10.1109/FADS.2017.8253221
Charter of the United Nations. (2015, August 10). https://www.un.org/en/charter-united-nations/index.html
General Assembly of the United Nations. (n.d.). United Nations. Retrieved February 11, 2021, from https://www.un.org/en/ga/about/ropga/sessions.shtml
Jankin Mikhaylov, S., Baturo, A., & Dasandi, N. (2019). United Nations General Debate Corpus [Data set]. Harvard Dataverse. https://doi.org/10.7910/DVN/0TJX8Y
Lexicoder Sentiment Dictionary (2015)—Data_dictionary_LSD2015. (n.d.). Retrieved May 15, 2021, from https://quanteda.io/reference/data_dictionary_LSD2015.html
lexicon package—RDocumentation. (n.d.). Retrieved May 15, 2021, from https://www.rdocumentation.org/packages/lexicon/versions/1.2.1
Monroe, B. L., Colaresi, M. P., & Quinn, K. M. (2008). Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict. Political Analysis, 16(4), 372–403. https://doi.org/10.1093/pan/mpn018
Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to Analyze Political Attention with Minimal Assumptions and Costs. American Journal of Political Science, 54(1), 209–228. https://doi.org/10.1111/j.1540-5907.2009.00427.x
UN General Assembly begins annual high-level debate. (2001, November 10). UN News. https://news.un.org/en/story/2001/11/20112-un-general-assembly-begins-annual-high-level-debate
Young, L., & Soroka, S. (2012). Affective News: The Automated Coding of Sentiment in Political Texts. Political Communication, 29(2), 205–231. https://doi.org/10.1080/10584609.2012.671234