摘要

Head pose estimation (HPE) has wide industrial applications, such as online education, human-robot interaction, and automatic manufacturing. In this article, we address two key problems in HPE based on label learning and asymmetric relation cues: 1) how to bridge the gap between the better prediction performance of networks and incorrectly label pose images in the HPE datasets and 2) how to take full advantage of the adjacent poses information around the centered pose image. We reconstruct all the incorrect labels as a two-dimensional Lorentz distribution to tackle the first problem. Instead of directly adopting the angle values as hard labels, we assign part of the probability values (soft labels) to adjacent labels for learning discriminative feature representations. To address the second problem, we reveal the asymmetric relation nature of HPE datasets. The yaw direction and pitch direction are assigned different weights by introducing the half at half-maximum of the Lorentz distribution. Compared with the traditional end-to-end frameworks, the proposed one can leverage the asymmetric relation cues for predicting the head pose angle in the incorrect label scenarios. Extensive experiments on two public datasets and our infrared dataset demonstrate that the proposed ARHPE network significantly outperforms other state-of-the-art approaches.