ClassiNet -- Predicting Missing Features for Short-Text Classification



Bollegala, D ORCID: 0000-0003-4476-7003, Atanasov, Vincent, Maehara, Takanori and Kawarabayashi, Ken-ichi
(2018) ClassiNet -- Predicting Missing Features for Short-Text Classification. ACM Transactions on Knowledge Discovery from Data, 12 (5). pp. 1-29.

This is the latest version of this item.

[img] Text
TKDD.pdf - Author Accepted Manuscript

Download (864kB)

Abstract

The fundamental problem in short-text classification is \emph{feature sparseness} -- the lack of feature overlap between a trained model and a test instance to be classified. We propose \emph{ClassiNet} -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex $v_i$ in the ClassiNet where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge $e_{ij}$ connecting a vertex $v_i$ to a vertex $v_j$ represents the conditional probability that given $v_i$ exists in an instance, $v_j$ also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance $\vec{x}$, we find similar features from ClassiNet that did not appear in $\vec{x}$, and append those features in the representation of $\vec{x}$. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.

Item Type: Article
Additional Information: Accepted to ACM TKDD
Uncontrolled Keywords: cs.CL, cs.CL, cs.AI, cs.CV, cs.LG
Depositing User: Symplectic Admin
Date Deposited: 23 Apr 2018 06:53
Last Modified: 19 Jan 2023 06:35
DOI: 10.1145/3201578
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3020457

Available Versions of this Item