A survey of safety and trustworthiness of large language models through the lens of verification and validation



Huang, Xiaowei ORCID: 0000-0001-6267-0366, Ruan, Wenjie, Huang, Wei, Jin, Gaojie, Dong, Yi ORCID: 0000-0003-3047-7777, Wu, Changshun, Bensalem, Saddek, Mu, Ronghui, Qi, Yi, Zhao, Xingyu
et al (show 7 more authors) (2024) A survey of safety and trustworthiness of large language models through the lens of verification and validation Artificial Intelligence Review, 57 (7). 175-. ISSN 0269-2821, 1573-7462

Access the full-text of this item by clicking on the Open Access link.

Abstract

AbstractLarge language models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.

Item Type: Article
Uncontrolled Keywords: AI Safety, Trustworthy AI, Verification and Validation, Safeguarding, Large Language Models, Generative AI
Divisions: Faculty of Science & Engineering
Faculty of Science & Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 20 Sep 2024 09:21
Last Modified: 23 May 2026 08:55
DOI: 10.1007/s10462-024-10824-0
Open Access URL: https://doi.org/10.1007/s10462-024-10824-0
Related Websites:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3184632
Disclaimer: The University of Liverpool is not responsible for content contained on other websites from links within repository metadata. Please contact us if you notice anything that appears incorrect or inappropriate.