Text-based Question Difficulty Prediction: A Systematic Review of Automatic Approaches

Alkhuzaey, Samah, Grasso, Floriana ORCID: 0000-0001-8419-6554, Payne, Terry ORCID: 0000-0002-0106-8731 and Tamma, Valentina ORCID: 0000-0002-1320-610X (2023) Text-based Question Difficulty Prediction: A Systematic Review of Automatic Approaches. International Journal of Artificial Intelligence in Education. pp. 1-53.

Text
Text_based_AQG_IJAIED.pdf - Author Accepted Manuscript
Download (855kB) | Preview

Official URL: http://dx.doi.org/10.1007/s40593-023-00362-1

Abstract

<jats:title>Abstract</jats:title><jats:p>Designing and constructing pedagogical tests that contain <jats:italic>items</jats:italic> (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective. Assessment quality and validity are therefore heavily reliant on the quality of the items included in the test. Moreover, the notion of <jats:italic>difficulty</jats:italic> is an essential factor that can determine the overall quality of the items and the resulting tests.Thus, <jats:italic>item difficulty prediction</jats:italic> is extremely important in any pedagogical learning environment. Although difficulty is traditionally estimated either by experts or through pre-testing, such methods are criticised for being costly, time-consuming, subjective and difficult to scale, and consequently, the use of automatic approaches as proxies for these traditional methods is gaining more and more traction. In this paper, we provide a comprehensive and systematic review of methods for the priori prediction of question difficulty. The aims of this review are to: 1) provide an overview of the research community regarding the publication landscape; 2) explore the use of automatic, text-based prediction models; 3) summarise influential difficulty features; and 4) examine the performance of the prediction models. Supervised machine learning prediction models were found to be mostly used to overcome the limitations of traditional item calibration methods. Moreover, linguistic features were found to play a major role in the determination of item difficulty levels, and several syntactic and semantic features were explored by researchers in this area to explain the difficulty of pedagogical assessments. Based on these findings, a number of challenges to the item difficulty prediction community are posed, including the need for a publicly available repository of standardised data-sets and further investigation into alternative feature elicitation and prediction models.</jats:p>

Item Type:	Article
Uncontrolled Keywords:	Difficulty Prediction, Assessment, Question Difficulty, Systematic Review, Machine Learning, Natural Language Processing
Divisions:	Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User:	Symplectic Admin
Date Deposited:	23 Aug 2023 08:10
Last Modified:	15 Mar 2024 01:33
DOI:	10.1007/s40593-023-00362-1
Related URLs:	Author Publisher
URI:	https://livrepository.liverpool.ac.uk/id/eprint/3172298