Bollegala, D ORCID: 0000-0003-4476-7003, Atanasov, Vincent, Maehara, Takanori and Kawarabayashi, Ken-ichi
(2018)
ClassiNet -- Predicting Missing Features for Short-Text Classification.
ACM Transactions on Knowledge Discovery from Data, 12 (5).
pp. 1-29.
This is the latest version of this item.
Text
TKDD.pdf - Author Accepted Manuscript Download (864kB) |
Abstract
The fundamental problem in short-text classification is \emph{feature sparseness} -- the lack of feature overlap between a trained model and a test instance to be classified. We propose \emph{ClassiNet} -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex $v_i$ in the ClassiNet where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge $e_{ij}$ connecting a vertex $v_i$ to a vertex $v_j$ represents the conditional probability that given $v_i$ exists in an instance, $v_j$ also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance $\vec{x}$, we find similar features from ClassiNet that did not appear in $\vec{x}$, and append those features in the representation of $\vec{x}$. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.
Item Type: | Article |
---|---|
Additional Information: | Accepted to ACM TKDD |
Uncontrolled Keywords: | cs.CL, cs.CL, cs.AI, cs.CV, cs.LG |
Depositing User: | Symplectic Admin |
Date Deposited: | 23 Apr 2018 06:53 |
Last Modified: | 19 Jan 2023 06:35 |
DOI: | 10.1145/3201578 |
Related URLs: | |
URI: | https://livrepository.liverpool.ac.uk/id/eprint/3020457 |
Available Versions of this Item
-
ClassiNet -- Predicting Missing Features for Short-Text Classification. (deposited 09 Apr 2018 08:11)
- ClassiNet -- Predicting Missing Features for Short-Text Classification. (deposited 23 Apr 2018 06:53) [Currently Displayed]