Unveiling a Global Perspective on National Security: Announcing the National Security and Defence Documents Dataset

Published:
Author: SCGA
An article from SCGA editorial team. 

By Professor Andrew Neal, Personal Chair of International Security, University of Edinburgh.

With documents from 113 countries – every country that publishes such a document – this collection provides a truly international perspective on national security and defence documents.

In an era of rapidly evolving global challenges, understanding how states perceive and articulate their security priorities has never been more critical. Today, we’re excited to announce the publication of a groundbreaking corpus and dataset of national security documents, offering unprecedented insights into the landscape of international security. This project was funded as an Insight Award by SCGA (with related support from other funders listed below).

This collection encompasses 575 documents from 113 countries, spanning the years 1987 to 2024. It represents the first-ever comprehensive corpus of public national security and defence documents on a global scale, providing a unique window into how countries across the world conceptualise and prioritise their security challenges.

The complete dataset has a DOI and is available through Edinburgh DataShare, the University of Edinburgh’s digital repository. It is released under a Creative Commons CC-BY 4.0 licence, meaning it can be freely shared and adapted with attribution.

Key Features of the Corpus and Dataset

1. Global Scope: With documents from 113 countries – every country that publishes such a document – this collection provides a truly international perspective on national security and defence documents.

2. Historical Depth: Covering nearly four decades, from 1987 to 2024, the corpus allows for analysis of security priorities before and after major global events like the end of the Cold War, NATO enlargement, 9/11, and the COVID-19 pandemic.

3. Diverse Document Types: The corpus includes national security strategies, defence white papers, and other top-level security documents.

4. Machine-Readable Format: All 575 documents are available as machine-readable PDFs, facilitating advanced analysis techniques.

5. Translations Included: Non-English documents have been machine-translated, ensuring accessibility for a wider audience.

6. Rich Metadata: The accompanying dataset provides extensive information on each country and document, including variables like UN region, regime type, economic data, and more.

Utility for Policy and Research

1. Comparative Analysis: Policymakers can now easily compare security strategies across different countries, regions, and political systems, informing more nuanced and context-aware policy decisions.

2. Trend Identification: With the right tools, researchers can track the evolution of security priorities over nearly 40 years and across regions, potentially identifying emerging threats or shifting global concerns.

3. Text as Data: The machine-readable format opens up possibilities for advanced textual analysis, including natural language processing and AI-driven insights across a substantial corpus of 575 documents.

4. Interdisciplinary Research: The dataset’s broad scope encourages cross-disciplinary studies, bridging gaps between security studies, international relations, peace and conflict studies, development studies, economics, and more.

5. Accountability Tool: By making these documents easily accessible, the corpus can serve as a resource for civil society to hold governments accountable for their stated security priorities and actions.

6. Educational Resource: For students and educators, this corpus provides a rich source of primary documents for studying global security dynamics across different historical periods.

The publication of this corpus and dataset marks a significant step forward in our understanding of global security perspectives. It offers a unique opportunity to move beyond Western-centric analyses and gain insights into how countries across the world have conceptualised and prioritised their security challenges since the late 1980s.

NLP Methodology

Our research team has developed a sophisticated Natural Language Processing (NLP) method to analyse this corpus. Here’s an overview of our approach:

1. Semantic Encoding: We use Google’s Universal Sentence Encoder (USE4) pre-trained model to encode sentences. This allows us to capture the semantic meaning of text beyond simple keyword matching.

2. Topic Formulation: We develop topics expressed as single sentences in natural language. These topics are then encoded using the same USE model.

3. Semantic Similarity Search: Our system compares the encoded topics to every sentence in the corpus, calculating a similarity score. This allows us to find sentences that are semantically similar to our topics, even if they use different words.

4. Threshold Adjustment: We can adjust the similarity threshold to balance between precision and recall in our results.

5. N-gram Filtering: For quantitative analysis, we can apply an n-gram filter to refine results, ensuring we capture specific phrases or word combinations.

6. Human Expertise: While our NLP system greatly enhances our ability to analyse large volumes of text, human expertise remains crucial in formulating topics, interpreting results, and drawing meaningful conclusions.

This NLP technology allows us to identify patterns, track the evolution of concepts, and uncover insights that would be impractical to discover through manual analysis alone. We are publishing the code behind our NLP tools alongside our dataset to allow scholars to reproduce our results and conduct their own research. Although we cannot offer support for implementing the code, we are open to collaborations with other researchers interested in using our analysis tools to explore this rich dataset further.

Enhanced Computational and Collaborative Capabilities

In an exciting development, we are expanding our project’s computational capabilities through Eleanor, the University of Edinburgh’s Cloud Service for Research. This enhancement will significantly boost our ability to conduct advanced analyses and facilitate collaboration with partners worldwide.

By leveraging cloud computing, we’re creating a flexible, accessible platform for our computational text analysis system. This setup will allow collaborators to engage with our dataset and tools without complex local installations, fostering easier partnerships and expanding the project’s reach.

Our aim is that hosting our system on Eleanor ensures the creation of a lasting research resource that will outlive the current project timeline, establishing a valuable legacy for future researchers. This cloud-based approach will not only enhance our current research outcomes but will set a model for collaborative, cloud-based research in the field of security studies.

Examples of Analytical Outputs

To illustrate the capabilities of our methods and NLP tools, here are a few examples of insights we’ve been able to extract from the corpus:

1. Threat Diffusion: Our tools allow us to track the emergence and spread of specific threat concepts over time. For instance, we can observe that ‘terrorism’ (unsurprisingly) became a dominant threat topic across many countries’ documents following the 9/11 attacks in 2001. Similarly, we’ve tracked the rising prominence of ‘cyber threats’ and many others in security documents.

2. Linguistic Variations in Threat Construction: Our semantic similarity search has revealed nuanced ways countries articulate threats. For example, we found a spectrum of threat qualifiers such as “serious,” “major,” “fundamental,” and “significant.” This nuance challenges binary notions of threat and non-threat.

3. Regional Variations: Our tools have helped identify regional patterns in threat perception. For instance, we found that corruption is more frequently cited as a security threat in documents from Global South countries compared to those from the Global North.

4. Emerging Concepts: We’ve tracked the evolution of concepts like “climate change” in security documents. Early mentions often framed it as an environmental issue, but over time, more documents began referring to it as a “threat multiplier” with wide-ranging security implications.

5. Comparative Analysis: Our tools allow for quick comparisons between countries. For example, we found that the U.S. and UK documents tend to identify the widest range of threats, while many smaller countries focus on a narrower set of security concerns.

For those interested in collaborative research using our advanced NLP tools, we welcome enquiries and potential partnerships.

Contributors:

Andrew W. Neal is Professor of International Security and Director of Postgraduate Research Programmes in the School of Social and Political Science at the University of Edinburgh. Professor Neal leads this ongoing project analysing national security and defence documents globally. His research focuses on critical security studies, including topics such as critical maritime infrastructure protection, parliamentary security politics, securitisation, the security implications of Scottish independence, and the works of Michel Foucault. His most recent book is Security as Politics: Beyond the State of Exception (Edinburgh UP, 2019).

Independent researcher and consultant Roy B. Gardner develops bespoke technologies for automated semantic content analysis based on machine learning and natural language processing. He is technology lead on the Comparative Constitutions Project at the University of Texas at Austin working on analytical tools for comparing national constitutions, and technology lead for the Edinburgh Law School-based PeaceRep project where he works on textual analysis of peace agreements.

Luc Wilson (1st Class MA hons in International Relations) worked as a research assistant to the PI to build our corpus and dataset.

Yuemiao Ma, final-year PhD student at Moray House School of Education, University of Edinburgh, worked as a research assistant to prepare our dataset for publication. Her PhD research is a qualitative, ethnographic study at a secondary school about how citizenship education is embedded in their Model United Nations club activities.

Anselm Vogler generously shared documents from his own collection, contributing significantly to the breadth of our corpus. See Vogler, A. (2023). ‘Barking up the tree wrongly? How national security strategies frame climate and other environmental change as security issues.’ Political Geography, 105. https://doi.org/https://doi.org/10.1016/j.polgeo.2023.102893

Sebastián Briones Razeto and Nicole Jenne also kindly shared documents from their collection, further enhancing the comprehensiveness of our dataset. See Razeto, S. B., & Jenne, N. (2021). ‘Security and defence policy documents: a new dataset.’ Defense & Security Analysis, 1-18. https://doi.org/10.1080/14751798.2021.1959730

Acknowledgements:

This project has been made possible through the generous support of the following funders:

1. The Scottish Council on Global Affairs (SCGA) Project, ‘A comprehensive national security corpus and dataset’, funded as an Insight Award.

2. BA/Leverhulme Small Research Grants, ‘A history of threats: mapping changing security issues and their conceptualisations in national security documents globally’, SRG23\231686.

3. Foreign Commonwealth and Development Office project: ‘Non-Western dynamics of peace and transition management’ FCDO Project Number: 300708-144, workpackage on ‘National Security Strategies, emergent powers and ‘Sustaining Peace’’.

4. School of Social and Political Science Research Adaptation Fund, financed by the Scottish Funding Council.

We are grateful for their support in advancing this important research.