摘要

Due to the ambiguous expressions and the subjectiveness of annotators, annotation ambiguity is a seri-ous obstacle for facial expression recognition (FER). Ambiguous annotation exists in similar and dissimilar classes, which we call ambiguity and noise. The previous state-of-the-art approaches use uncertainty to generalize the two categories, and adopt uncertainty learning to suppress uncertainty samples. However, ambiguous expressions are confused with noisy label expressions may bias the model toward easy sam-ples and hurt the generalization capability. To solve this problem, we propose a novel approach to mine ambiguity and noise (MAN) in FER datasets. Specifically, we design a co-division module, which divides the datasets into clean, ambiguous and noisy label expressions based on the consistency and inconsis-tency between the predictions of two networks and the given labels. To effectively learn the clean ex-pressions, improve discriminative ability and avoid memorizing noisy labels, the tri-regularization module employs supervised learning, mutuality learning and unsupervised learning for the three subsets, respec-tively. Extensive experiments have shown that MAN can effectively mine the real ambiguity and noise, and achieve state-of-the-art performance in both synthetic noisy datasets and popular benchmarks.