Effect of Increasing the Descriptor Set on Machine Learning Prediction of Small Molecule-Based Organic Solar Cells



Zhao, Zhi-Wen, del Cueto, Marcos, Geng, Yun and Troisi, Alessandro ORCID: 0000-0002-5447-5648
(2020) Effect of Increasing the Descriptor Set on Machine Learning Prediction of Small Molecule-Based Organic Solar Cells. Chemistry of Materials, 32 (18). pp. 7777-7787.

[img] Text
ML_summary_25Aug_HL.docx - Author Accepted Manuscript

Download (2MB)

Abstract

In this work, we analyzed a data set formed by 566 donor/acceptor pairs, which are part of organic solar cells recently reported. We explored the effect of different descriptors in machine learning (ML) models to predict the power conversion efficiency (PCE) of these cells. The investigated descriptors are classified into two main categories: structural (topology properties) and physical descriptors (energy levels, molecular size, light absorption, and mixing properties). In line with previous observations, ML predictions are more accurate when using both structural and physical descriptors, as opposed to only using one of them. We observed that ML predictions are also improved by using larger and more varied data sets. Importantly, the structural descriptors are the ones contributing the most to the ML models. Some physical properties are highly correlated with PCE, although they do not improve notably the ML prediction accuracy as they carry information already encoded in the structural descriptors. Given that various descriptors have significantly different computational costs, the analysis presented here can be used as a guide to construct ML models that maximize predictive power and minimize computational costs for screening large sets of candidates.

Item Type: Article
Depositing User: Symplectic Admin
Date Deposited: 09 Nov 2020 08:35
Last Modified: 18 Jan 2023 23:22
DOI: 10.1021/acs.chemmater.0c02325
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3106310