Grammar-Based Compression for RNA



Onokpasa, Evarista
(2025) Grammar-Based Compression for RNA. PhD thesis, University of Liverpool.

[thumbnail of 201524239_Apr2025.pdf] Text
201524239_Apr2025.pdf - Author Accepted Manuscript

Download (3MB) | Preview

Abstract

In bioinformatics, the benefits of compression is twofold. It (a) brings practical improvements for storage requirements and sometimes computational efficiency at the same time, it is also (b) used as a direct means to generate biological insight. In this research, we improve on the joint-RNA (ribonucleic acid) compression in [32] using arithmetic coding and refined Probabilistic Context Free Grammars(PCFG). Given an RNA primary sequence and a refined PCFG grammar a secondary structure can be predicted and its accuracy calculated, with this we establish the co-relation between prediction quality and compression quality of a grammar. Observing this remarkable property of RNA grammars, we launched an automatic search for better grammars in terms of compression and prediction quality and obtained interesting results. Finally we fine-tune the automatic grammar search with the use of heuristics to obtain grammars with better compression and prediction.

Item Type: Thesis (PhD)
Uncontrolled Keywords: Compression, RNA
Divisions: Faculty of Science and Engineering
Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 18 Aug 2025 09:21
Last Modified: 18 Aug 2025 09:21
DOI: 10.17638/03192139
Supervisors:
  • Wong, Prudence
  • Wild, Sebastian
URI: https://livrepository.liverpool.ac.uk/id/eprint/3192139