Using Knowledge Graphs to enhance the utility of Curated Document Databases



Muhammad, Iqra
(2022) Using Knowledge Graphs to enhance the utility of Curated Document Databases. Doctor of Philosophy thesis, University of Liverpool.

[img] Text
201453148_Nov2022_edited_version.pdf.pdf - Author Accepted Manuscript

Download (3MB) | Preview

Abstract

The research presented in this thesis is directed at the generation, maintenance and query ing of Curated Document Databases (CDDs) stored as literature knowledge graphs. Liter ature knowledge graphs are graphs where the vertices represent documents and concepts; and the edges provided links between concepts, and concepts and documents. The central motivation for the work was to provide CDD administrators with a useful mechanism for creating and maintaining literature knowledge graph represented CDDs, and for end users to utilise them. The central research question is “What are some appropriate techniques that can be used for generating, maintaining and utilizing literature knowledge graphs to support the concept of CDDs?”. The thesis thus addresses three issues associated with literature knowledge graphs: (i) their construction, (ii) their maintenance so that their utility can be continued, and (iii) the querying of such knowledge graphs. With respect to the first issue, the Open Information Extraction for Knowledge Graph Construction (OIE4KGC) approach is proposed founded on the idea of using open information extrac tion. Two open information extraction tools were compared, the RnnOIE tool and the Leolani tool. The RnnOIE tool was found to be effective for generation of triples from clinical trial documents. With respect to the second issue two approaches are proposed for maintaining knowledge graph represented CDDs; the CN approach and the Knowledge Graph And BERT Ranking (GRAB-Rank) approach. The first proposed approach used a feature vector representation; and the second a unique hybrid domain specific document embedding. The hybrid domain-specific document embedding combines a Bidirectional En coder Representations from Transformers embedding with a knowledge graph embedding. This proposed embedding was used for document representation in a LETOR model. The idea was to rank a set of potential documents. The Grab-Rank embedding based LETOR approach was found to be effective. For the third identified issue the standard solution is to represent both the query to be addressed and the documents in the knowledge graph in a manner that will allow the documents to be ranked with respect to the query. The solution proposed for this was to utilize a hybrid embedding for query resolution. Two forms of embedding are utilized for query resolution: (i) a Continuous Bag-Of-Words embedding was combined with graph embedding and (ii) for the second BERT and Sci-BERT em bedding were combined with graph embedding. The evaluation indicates that the CBOW embedding combined with graph embedding was found to be effective.

Item Type: Thesis (Doctor of Philosophy)
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 16 Jan 2023 10:11
Last Modified: 18 Jan 2023 19:43
DOI: 10.17638/03166230
Supervisors:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3166230