Better political text classification using large language models



Nicholls, Thomas ORCID: 0000-0002-6971-8614 and Culpepper, Pepper D
Better political text classification using large language models. In: Information, Redistribution and Financial Regulation conference, 2022-9-30 - 2022-10-1, Oxford. (Unpublished)

[img] PDF
Better political text classification using large language models.pdf - Unspecified

Download (109kB) | Preview

Abstract

Comparative researchers in politics are deeply interested in the ways in which political discourse is conducted for different issues across a wide range of countries, and increasingly use computational methods to classify texts with low cost and high accuracy. Computer scientists are rapidly developing new deep learning models for language tasks, including supervised classification, which are not yet widely used by political scientists. These methods have the potential to improve the accuracy of current bag-of-words methods while also offering the possibility of handing non-English source texts without further work. We present such an improved method for supervised classification using a modern transformer language model, fine-tuned on a large unlabelled corpus and combined with a final softmax layer for probability estimation of category membership. We train the resulting model with hand-labeled data and validate it by analysing a large corpus of news articles on banking. The results show improved classification performance for English-language inputs compared with traditional computational approaches. We also demonstrate the ability to use the same classifier for non-English texts with good levels of classification performance. We suggest that similar methods using large deep learning models are now sufficiently mature for wider adoption by political scientists with primarily substantive, rather than methodological, interests.

Item Type: Conference or Workshop Item (Unspecified)
Uncontrolled Keywords: text-as-data, supervised classification, transformers, deep learning, multilingual analysis, poltical science
Depositing User: Symplectic Admin
Date Deposited: 11 Oct 2022 08:00
Last Modified: 18 Jan 2023 20:37
URI: https://livrepository.liverpool.ac.uk/id/eprint/3165225