Maintaining Curated Document Databases Using a Learning to Rank Model: The ORRCA Experience



Muhammad, Iqra, Bollegala, Danushka, Coenen, Frans, Gamble, Carol, Kearney, Anna and Williamson, Paula
(2020) Maintaining Curated Document Databases Using a Learning to Rank Model: The ORRCA Experience. .

[thumbnail of bcsSGAI_AI2020_IQ.pdf] Text
bcsSGAI_AI2020_IQ.pdf - Author Accepted Manuscript

Download (1MB) | Preview

Abstract

Curated Document Databases play a critical role in helping researchers find relevant articles in available literature. One such database is the ORRCA (Online Resource for Recruitment research in Clinical trials) database. The ORRCA database brings together published work in the field of clinical trials recruitment research into a single searchable collection. Document databases, such as ORRCA, require year-on-year updating as further relevant documents become available on a continuous basis. The updating of curated databases is a labour intensive and time consuming task. Machine learning techniques can help to automate the update process and reduce the workload needed for screening articles for inclusion. This paper presents an automated approach to the updating of ORRCA documents repository. The proposed automated approach is a learning to rank model. The approach is evaluated using the documents in the ORRCA database. Data from the ORRCA original systematic review was used to train the learning to rank model, and data from the ORRCA 2015 and 2017 updates was used to evaluate performance of the model. The evaluation demonstrated that significant resource savings can be made using the proposed approach.

Item Type: Conference or Workshop Item (Unspecified)
Uncontrolled Keywords: 4605 Data Management and Data Science, 46 Information and Computing Sciences, 4609 Information Systems, Networking and Information Technology R&D (NITRD), Machine Learning and Artificial Intelligence
Depositing User: Symplectic Admin
Date Deposited: 16 Sep 2020 10:29
Last Modified: 18 Jul 2024 17:37
DOI: 10.1007/978-3-030-63799-6_26
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3101268