Tick parasitism classification from noisy medical records

Neill, JO, Bollegala, D ORCID: 0000-0003-4476-7003, Radford, AD ORCID: 0000-0002-4590-1334 and Noble, PJ
(2019) Tick parasitism classification from noisy medical records. .

[img] Text
oneill_kdh_2019.pdf - Accepted Version

Download (348kB) | Preview


Much of the health information in the medical domain comes in the form of clinical narratives. The rich semantic information contained in these notes can be modeled to make inferences that assist the decision making process for medical practitioners, which is particularly important under time and resource constraints. However, the creation of such assistive tools is made difficult given the ubiquity of misspellings, unsegmented words and morphologically complex or rare medical terms. This reduces the coverage of vocabulary terms present in commonly used pretrained distributed word representations that are passed as input to parametric models that makes such predictions. This paper presents an ensemble architecture that combines indomain and general word embeddings to overcome these challenges, showing best performance on a binary classification task when compared to various other baselines. We demonstrate our approach in the context of the veterinary domain for the task of identifying tick parasitism from small animals. The best model shows 84.29% test accuracy, showing some improvement over models, which only use pretrained embeddings that are not specifically trained for the medical sub-domain of interest.

Item Type: Conference or Workshop Item (Unspecified)
Depositing User: Symplectic Admin
Date Deposited: 24 Jun 2019 15:51
Last Modified: 30 Mar 2021 14:14
URI: https://livrepository.liverpool.ac.uk/id/eprint/3047268