摘要

This paper proposes a novel facial expression recognition network, called Multi-Stage and Similar Expres-sions Label Distribution Learning Network (MSCL). Our method is based on the observations of the labels ambiguity between similar expressions in complex wild scenes, for there are inherently similar features between them that are difficult to distinguish and even manually mislabeled. The proposed network con-sists of three modules, namely a multi-stage multi-branch classification network (MSB), a multi-branch label distribution learning module (MLD), and a multi-branch similarity preserving module (MSP). MSB aggregates similar expression features through the first-stage prediction, MLD utilizes the aggregation re-sults of MSB to extract the label distribution between similar expressions, and MSP utilizes a consistency relationship to minimize the differences between multiple branches. We propose an end-to-end model, where each module can be integrated with existing network modules. Furthermore, our method achieves 89.44% accuracy on the RAF-DB dataset and achieves state-of-the-art results on the AffectNet dataset with 63.25% and 66.56% accuracy on its two subsets, AffectNet-8 and AffectNet-7, respectively.