Critical success index or F measure to validate the accuracy of administrative healthcare data identifying epilepsy in deceased adults in Scotland



Mbizvo, Gashirai K ORCID: 0000-0002-9588-2944, Simpson, Colin R, Duncan, Susan E, Chin, Richard FM and Larner, Andrew J
(2023) Critical success index or F measure to validate the accuracy of administrative healthcare data identifying epilepsy in deceased adults in Scotland. Epilepsy Research, 199. 107275-.

[img] Text
4) Mbizvo_CSI-full_manuscript_v5_combined.docx - Author Accepted Manuscript

Download (631kB)

Abstract

<h4>Background</h4>Methods to undertake diagnostic accuracy studies of administrative epilepsy data are challenged by lack of a way to reliably rank case-ascertainment algorithms in order of their accuracy. This is because it is difficult to know how to prioritise positive predictive value (PPV) and sensitivity (Sens). Large numbers of true negative (TN) instances frequently found in epilepsy studies make it difficult to discriminate algorithm accuracy on the basis of negative predictive value (NPV) and specificity (Spec) as these become inflated (usually >90%). This study demonstrates the complementary value of using weather forecasting or machine learning metrics critical success index (CSI) or F measure, respectively, as unitary metrics combining PPV and sensitivity. We reanalyse data published in a diagnostic accuracy study of administrative epilepsy mortality data in Scotland.<h4>Method</h4>CSI was calculated as 1/[(1/PPV) + (1/Sens) - 1]. F measure was calculated as 2.PPV.Sens/(PPV + Sens). CSI and F values range from 0 to 1, interpreted as 0 = inaccurate prediction and 1 = perfect accuracy. The published algorithms were reanalysed using these and their accuracy re-ranked according to CSI in order to allow comparison to the original rankings.<h4>Results</h4>CSI scores were conservative (range 0.02-0.826), always less than or equal to the lower of the corresponding PPV (range 39-100%) and sensitivity (range 2-93%). F values were less conservative (range 0.039-0.905), sometimes higher than either PPV or sensitivity, but were always higher than CSI. Low CSI and F values occurred when there was a large difference between PPV and sensitivity, e.g. CSI was 0.02 and F was 0.039 in an instance when PPV was 100% and sensitivity was 2%. Algorithms with both high PPV and sensitivity performed best in terms of CSI and F measure, e.g. CSI was 0.826 and F was 0.905 in an instance when PPV was 90% and sensitivity was 91%.<h4>Conclusion</h4>CSI or F measure can combine PPV and sensitivity values into a convenient single metric that is easier to interpret and rank in terms of diagnostic accuracy than trying to rank diagnostic accuracy according to the two measures themselves. CSI or F prioritise instances where both PPV and sensitivity are high over instances where there are large differences between PPV and sensitivity (even if one of these is very high), allowing diagnostic accuracy thresholds based on combined PPV and sensitivity to be determined. Therefore, CSI or F measures may be helpful complementary metrics to report alongside PPV and sensitivity in diagnostic accuracy studies of administrative epilepsy data.

Item Type: Article
Uncontrolled Keywords: Humans, Epilepsy, Sensitivity and Specificity, Predictive Value of Tests, Algorithms, Adult, Delivery of Health Care, Scotland
Divisions: Faculty of Health and Life Sciences
Faculty of Health and Life Sciences > Institute of Systems, Molecular and Integrative Biology
Depositing User: Symplectic Admin
Date Deposited: 06 Dec 2023 08:32
Last Modified: 27 Jan 2024 01:58
DOI: 10.1016/j.eplepsyres.2023.107275
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3177186