Topic-Based Instance and Feature Selection in Multilabel Classification
摘要
Multilabel learning has been extensively studied in the past years, as it has many applications in different domains. It aims at annotating the labels for unseen data according to training data, which are often high dimensional in both instance and feature levels. The training data often have noisy and redundant information on these two levels. As an effective data preprocessing step, instance and feature selection should both be performed to find relevant training instances for each testing instance and relevant features for each label, respectively. However, most of the existing methods overlook the input-output correlation in each kind of selection. It will lead to the performance degradation. This article presents a formulation for multilabel learning from a topic view that exploits the dependence between features and labels in a topic space. We can perform effective instance and feature selection in the latent topic space, as the relationship between the input and output spaces is well captured in this space. The results from intensive experiments on various benchmarks demonstrate the effectiveness of the proposed framework.
