Active Learning With Complementary Sampling for Instructing Class-Biased Multi-Label Text Emotion Classification

摘要

High-quality corpora have been very scarce for the text emotion research. Existing corpora with multi-label emotion annotations have been either too small or too class-biased to properly support a supervised emotion learning. In this article, we propose a novel active learning method for efficiently instructing the human annotations for a less-biased and high-quality multi-label emotion corpus. Specifically, to compensate annotation for the minority-class examples, we propose a complementary sampling strategy based on unlabeled resources by measuring a probabilistic distance between the expected emotion label distribution in a temporary corpus and an uniform distribution. Qualitative evaluations are also given to the unlabeled examples, in which we evaluate the model uncertainties for multi-label emotion predictions, their syntactic representativeness for the other unlabeled examples, and their diverseness to the labeled examples, for a high-quality sampling. Through active learning, a supervised emotion classifier gets progressively improved by learning from these new examples. Experiment results suggest that by following these sampling strategies we can develop a corpus of high-quality examples with significantly relieved bias for emotion classes. Compared to the learning procedures based on traditional active learning algorithms, our learning procedure indicates the most efficient learning curve and estimates the best multi-label emotion predictions.

关键词

Uncertainty Predictive models Annotations Training data Support vector machines Data models Syntactics Active learning complementary sampling class-biased multi-label classification text emotion