Machine learning approaches for diagnosis of autoimmune disease with the T-cell receptor repertoire



Kockelbergh, Hannah
(2024) Machine learning approaches for diagnosis of autoimmune disease with the T-cell receptor repertoire PhD thesis, University of Liverpool.

[thumbnail of 201525798_Oct2024.pdf] Text
201525798_Oct2024.pdf - Author Accepted Manuscript

Download (21MB) | Preview

Abstract

Genetic risk factors for some autoimmune conditions implicate T cells in disease mechanisms that are incompletely understood. The T-cell receptor (TCR) is encoded by genes that are recombined from an assortment of gene segments in the nuclei of T cells. The vastly diverse TCR repertoire arising from an individual’s T cells evolved to bind to a wide variety of pathogens. T cell activation is initiated by TCR binding, which leads to clonal expansion. A lineage of T cells expressing the same TCR participate in an active immune response, with some persisting to enable immunological memory. In autoimmune disease, T cells may be involved in an immune response directed against the host’s own tissues or microbiome. Next generation sequencing has enabled vast libraries of TCRs to be sequenced, which presents a unique opportunity to better understand autoimmune disease. From a set of TCR repertoire samples, patterns associated with a condition might be identifiable through interpretation of a machine learning classification model. However, the limited sharing of identical TCRs between individuals with the same condition, as well as the vast outnumbering of samples by unique TCR sequences, leads to difficulty identifying signatures of TCR repertoires that are predictive of autoimmune disease status. Promising TCR repertoire classification approaches consider relationships between non-identical TCR sequences. Methods that split TCR sequences into kmers demonstrate efficient performance that is comparable to and more stable than deep learning. This work is dedicated to investigating the utility of methods that augment kmer-based representations of the TCR repertoire. Throughout, methodology is evaluated using real TCR repertoire datasets including samples from patients with coeliac disease and inflammatory bowel disease, as well as participants with known cytomegalovirus infection history. TCR repertoires are also simulated to guide methodological development. To assess the hypothesis that capturing similarity of kmers in a TCR repertoire representation will improve generalisability, a novel approach employing a reduced amino acid alphabet is assessed against alternatives, providing evidence for some limited utility of property-informed kmers. For certain TCR repertoire datasets, combining kmers into broad motifs leads to performance that surpasses or is equivalent to a kmer model. Next, the notion that some kmers may be more informative than others leads to exploration of deviation-based kmer filters, which indicates that adequate regularisation precludes the need for filtering. When filtering is applied prior to a reduced alphabet, some marginal improvement is observed for certain datasets. This result of this work is methodology that may not significantly improve generalisability of TCR repertoire classification in general, but that may be valuable in application to other protein sequences with variable length but in a setting with lesser complexity. While TCR repertoire classification could lead to discovery of autoimmune disease-associated biomarkers, these conclusions suggest that further methodological development may needed in parallel with more detailed TCR repertoire sequencing approaches in order to realise this potential.

Item Type: Thesis (PhD)
Divisions: Faculty of Health & Life Sciences
Faculty of Health & Life Sciences > Inst. Population Health
Depositing User: Symplectic Admin
Date Deposited: 16 Jan 2025 10:37
Last Modified: 01 Jan 2026 02:31
DOI: 10.17638/03186205
Supervisors:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3186205
Disclaimer: The University of Liverpool is not responsible for content contained on other websites from links within repository metadata. Please contact us if you notice anything that appears incorrect or inappropriate.