Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae)



Whiteford, Samuel, Van't Hof, Arjen E, Krishna, Ritesh, Marubbi, Thea, Widdison, Stephanie, Saccheri, Ilik J ORCID: 0000-0003-0476-2347, Guest, Marcus, Morrison, Neil I and Darby, Alistair C ORCID: 0000-0002-3786-6209
(2022) Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae). G3-GENES GENOMES GENETICS, 12 (10). jkac210-.

[img] Text
jkac210.pdf - Author Accepted Manuscript
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

The assembly of divergent haplotypes using noisy long-read data presents a challenge to the reconstruction of haploid genome assemblies, due to overlapping distributions of technical sequencing error, intralocus genetic variation, and interlocus similarity within these data. Here, we present a comparative analysis of assembly algorithms representing overlap-layout-consensus, repeat graph, and de Bruijn graph methods. We examine how postprocessing strategies attempting to reduce redundant heterozygosity interact with the choice of initial assembly algorithm and ultimately produce a series of chromosome-level assemblies for an agricultural pest, the diamondback moth, Plutella xylostella (L.). We compare evaluation methods and show that BUSCO analyses may overestimate haplotig removal processing in long-read draft genomes, in comparison to a k-mer method. We discuss the trade-offs inherent in assembly algorithm and curation choices and suggest that "best practice" is research question dependent. We demonstrate a link between allelic divergence and allele-derived contig redundancy in final genome assemblies and document the patterns of coding and noncoding diversity between redundant sequences. We also document a link between an excess of nonsynonymous polymorphism and haplotigs that are unresolved by assembly or postassembly algorithms. Finally, we discuss how this phenomenon may have relevance for the usage of noisy long-read genome assemblies in comparative genomics.

Item Type: Article
Uncontrolled Keywords: pool-seq, haplotype, assembly, Plutella xylostella
Divisions: Faculty of Health and Life Sciences
Faculty of Health and Life Sciences > Institute of Infection, Veterinary and Ecological Sciences
Depositing User: Symplectic Admin
Date Deposited: 05 Sep 2022 08:18
Last Modified: 18 Jan 2023 20:46
DOI: 10.1093/g3journal/jkac210
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3163206