Machine learning-Predicting Ames mutagenicity of small molecules



Chu, Charmaine SM ORCID: 0000-0003-1051-2598, Simpson, Jack D, O'Neill, Paul M and Berry, Neil G ORCID: 0000-0003-1928-0738
(2021) Machine learning-Predicting Ames mutagenicity of small molecules. JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 109. 108011-.

[thumbnail of manuscript_final2.docx] Text
manuscript_final2.docx - Author Accepted Manuscript

Download (275kB)

Abstract

In modern drug discovery, detection of a compound's potential mutagenicity is crucial. However, the traditional method of mutagenicity detection using the Ames test is costly and time consuming as the compounds need to be synthesised and then tested and the results are not always accurate and reproducible. Therefore, it would be advantageous to develop robust in silico models which can accurately predict the mutagenicity of a compound prior to synthesis to overcome the inadequacies of the Ames test. After curation of a previously defined compound mutagenicity library, over 5000 molecules had their chemical fingerprints and molecular properties calculated. Using 8 classification modelling algorithms, including support vector machine (SVM), random forest (RF) and extreme gradient boosting (XGB), a total of 112 predictive models have been constructed. Their performance has been assessed using 10-fold cross validation and a hold-out test set and some of the top performing models have been assessed using the y-randomisation approach. As a result, we have found SVM and XGB models to have good performance during the 10-fold cross validation (AUROC >0.90, sensitivity >0.85, specificity >0.75, balanced accuracy >0.80, Kappa >0.65) and on the test set (AUROC >0.65, sensitivity >0.65, specificity >0.60, balanced accuracy >0.65, Kappa >0.30). We have also identified molecular properties that are the most influential for mutagenicity prediction when combined with chemical molecular fingerprints. Using the Class A mutagenic compounds from the Ames/QSAR International Challenge Project, we were able to verify our models perform better, predicting more mutagens correctly then the StarDrop Ames mutagenicity prediction and TEST mutagenicity prediction.

Item Type: Article
Uncontrolled Keywords: Machine learning, Ames, Toxicity, Random forest, Support vector machine, Extreme gradient boosting
Divisions: Faculty of Science and Engineering > School of Physical Sciences
Depositing User: Symplectic Admin
Date Deposited: 17 Nov 2021 09:27
Last Modified: 18 Jan 2023 21:24
DOI: 10.1016/j.jmgm.2021.108011
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3143305