Bollegala, Danushka ORCID: 0000-0003-4476-7003, Atanasov, Vincent, Maehara, Takanori and Kawarabayashi, Ken-ichi
(2018)
ClassiNet - Predicting Missing Features for Short-Text Classification.
ACM Transactions on Knowledge Discovery from Data, 12 (5).
55:1-55:1.
Text
ClassiNet.pdf - Author Accepted Manuscript Download (1MB) |
Abstract
Short and sparse texts such as tweets, search engine snippets, product reviews, and chat messages are abundant on the Web. Classifying such short-texts into a pre-defined set of categories is a common problem that arises in various contexts, such as sentiment classification, spam detection, and information recommendation. The fundamental problem in short-text classification is feature sparseness -- the lack of feature overlap between a trained model and a test instance to be classified. We propose ClassiNet -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex vi in the ClassiNet, where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge eij connecting a vertex vi to a vertex vj represents the conditional probability that given vi exists in an instance, vj also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance x, we find similar features from ClassiNet that did not appear in x, and append those features in the representation of x. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Short-texts, Feature sparseness, Classifier networks, Text classification |
Depositing User: | Symplectic Admin |
Date Deposited: | 05 Sep 2018 06:43 |
Last Modified: | 19 Jan 2023 01:25 |
DOI: | 10.1145/3201578 |
Related URLs: | |
URI: | https://livrepository.liverpool.ac.uk/id/eprint/3025869 |