Metrics for Materials Discovery



Hargreaves, Cameron
(2022) Metrics for Materials Discovery. PhD thesis, University of Liverpool.

[img] Text
201075817_Dec22.pdf - Author Accepted Manuscript

Download (97MB) | Preview

Abstract

The vast corpus of experimental solid state data has enabled a variety of statistical methods to be applied in high throughput materials discovery. There are many techniques for representing a material into a numeric vector, and many investigations apply the Euclidean distance between these vectors to judge similarity. This thesis investigates applications of non-Euclidean metrics, in particular optimal transport measures, or the Earth Mover’s Distance (EMD), to quantify the similarity between two materials for use in computational workflows, with a focus on solid state electrolytes (SSEs). Chapter 1 introduces the field of lithium conducting SSEs for use in batteries, as well as an introductory precursor for some of the machine learning concepts, for those without exposure to this field. The EMD is a function which returns the minimal quantity of work that is required to transform one distribution into another, and a tutorial on how to compute the EMD using the simplest known technique is provided given its relevance to later chapters. In chapter 2 the discussion around the EMD is continued, and we introduce the workflow that has been developed for quantifying the chemical similarity of materials with the Element Movers Distance (ElMD). Given the affect that minor dopants can have on physical properties, it is imperative that we use techniques that capture nuanced differences in stoichiometry between materials. The relationships between the binary compounds of the ICSD are shown to be well captured using this metric. Larger scale maps of materials space are generated, and used to explore some of the known SSE chemistries. At the beginning of the PhD, there were no substantial datasets of lithium SSEs available, as such chapter 3 outlines the lengthy process of gathering this data. This resulted in the Liverpool ionics dataset, containing 820 entries, with 403 unique compositions having conductivities measured at room temperature. The performance of leading composition based property prediction models against this dataset is rigorously assessed. The resultant classification model gives a strong enough improvement over human guesswork that it may be used for screening in future studies. At present, materials datasets are disparate and scattered. Using the ElMD in chapter 4, we investigate how different metric indexing methods may be used to partition gathered datasets of compositions. This enables very fast nearest neighbour queries allowing the automated retrieval of similar compounds across millions of records in milliseconds. Chapter 5 introduces the technique Percifter for characterizing crystal structures, based on the principles of persistent homology (PH). This increasingly popular technique is used in materials science to describe the topology of a crystal. Percifter seeks to improve the stability of these representations for different choices of unit cells. These similarities may be observed directly, or compared through the EMD.

Item Type: Thesis (PhD)
Divisions: Faculty of Science and Engineering > School of Physical Sciences
Depositing User: Symplectic Admin
Date Deposited: 16 Jun 2023 13:29
Last Modified: 16 Jun 2023 13:29
DOI: 10.17638/03170917
Supervisors:
  • Dyer, Matthew
  • Kurlin, Vitaliy
URI: https://livrepository.liverpool.ac.uk/id/eprint/3170917