Deep learning models for multilingual supervised political text classification

Nicholls, Thomas ORCID: 0000-0002-6971-8614 and Culpepper, Pepper D
Deep learning models for multilingual supervised political text classification. In: COMPTEXT, 2022-5-5 - 2022-5-7, Dublin. (Unpublished)

[img] Text
BANKLASHCOMPTEXT22-Production.pdf - Author Accepted Manuscript

Download (90kB) | Preview


Comparative computational research in politics is frequently based on large corpora of multilingual news or political speech. A common approach to handling the multiple-language issue is to machine translate to English before downstream modelling; this works well in many cases, but adds an extra step of introduced error. The cost of translation via the DeepL or Google Translate APIs is also high for large datasets. We present a method for supervised classification of large multilingual datasets, using a pre-trained multilingual transformer model. We fine-tune an XLM-RoBERTA textual model on a large unlabelled corpus, combine it with a final softmax layer for probability estimation of category membership, then train and validate the resulting model with hand-labeled data. Non-English texts are handled directly without producing an intermediate translated representation. We validate the method by analysing a large (N > 1M) corpus of news articles on banking written in English, French, and German. The classifications investigate aspects of the politics of post-financial crisis banking regulation, are theoretically-informed, and have complex decision boundaries. Results are compared to a conventional machine translation plus Support Vector Machine computational approach, in this case using the publicly available Opus-MT translation model running on local hardware.

Item Type: Conference or Workshop Item (Unspecified)
Uncontrolled Keywords: text-as-data, supervised classification, transformers, deep learning
Divisions: Faculty of Humanities and Social Sciences > School of the Arts
Depositing User: Symplectic Admin
Date Deposited: 17 May 2022 08:29
Last Modified: 18 Jan 2023 21:02