On the use of text classification methods for text summarisation

Garcia Constantino, Matias
On the use of text classification methods for text summarisation. Doctor of Philosophy thesis, University of Liverpool.

[thumbnail of Thesis_MatiasGarcia.pdf] PDF
Thesis_MatiasGarcia.pdf - Submitted version
Access to this file is embargoed until Unspecified.
After the embargo period this will be available under License Creative Commons Attribution No Derivatives.

Download (1MB)
[thumbnail of GarciaConstantinoMat_July2013_12957.pdf] PDF
GarciaConstantinoMat_July2013_12957.pdf - Author Accepted Manuscript
Available under License Creative Commons Attribution No Derivatives.

Download (1MB)


This thesis describes research work undertaken in the fields of text and questionnaire mining. More specifically, the research work is directed at the use of text classification techniques for the purpose of summarising the free text part of questionnaires. In this thesis text summarisation is conceived of as a form of text classification in that the classes assigned to text documents can be viewed as an indication (summarisation) of the main ideas of the original free text but in a coherent and reduced form. The reason for considering this type of summary is because summarising unstructured free text, such as that found in questionnaires, is not deemed to be effective using conventional text summarisation techniques. Four approaches are described in the context of the classification summarisation of free text from different sources, focused on the free text part of questionnaires. The first approach considers the use of standard classification techniques for text summarisation and was motivated by the desire to establish a benchmark with which the more specialised summarisation classification techniques presented later in this thesis could be compared. The second approach, called Classifier Generation Using Secondary Data (CGUSD), addresses the case when the available data is not considered sufficient for training purposes (or possibly because no data is available at all). The third approach, called Semi-Automated Rule Summarisation Extraction Tool (SARSET), presents a semi-automated classification technique to support document summarisation classification in which there is more involvement by the domain experts in the classifier generation process, the idea was that this might serve to produce more effective summaries. The fourth is a hierarchical summarisation classification approach which assumes that text summarisation can be achieved using a classification approach whereby several class labels can be associated with documents which then constitute the summarisation. For evaluation purposes three types of text were considered: (i) questionnaire free text, (ii) text from medical abstracts and (iii) text from news stories.

Item Type: Thesis (Doctor of Philosophy)
Additional Information: Date: 2013-07 (completed)
Uncontrolled Keywords: Text mining, text classification, text summarisation, questionnaire data mining, questionnaire, data mining
Subjects: ?? QA75 ??
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 11 Feb 2014 14:41
Last Modified: 16 Dec 2022 04:39
DOI: 10.17638/00012957
URI: https://livrepository.liverpool.ac.uk/id/eprint/12957