摘要

Random hand gesture authentication (RGA) allows the probe hand gesture types to be inconsistent with the registered ones. While it is highly user-friendly, it poses a significant challenge that requires the authentication model to distill more abstract and complex identity features. Prior efforts on RGA mainly use convolution operations to obtain short-term behavioral information and cannot distill robust behavioral features well. In this article, we propose a novel multiscale behavior analysis network (MSBA-Net), with a focus on capturing multiscale behavioral features for RGA, which can simultaneously distill short-term behavioral information and model long-term behavioral relationships in addition to physiological features of hand gestures. In addition, as hand motion can result in interframe semantic misalignment, we propose an efficient semantic alignment strategy to mitigate this issue, which helps extract behavior features accurately and improves model performance. The MSBA module is a plug-and-play module and could be integrated into existing 2-D CNNs to yield a powerful video understanding model (MSBA-Net). Extensive experiments on the SCUT-DHGA dataset demonstrate that our MSBA-Net has compelling advantages over the other 20 state-of-the-art methods. The code is available at https://github.com/SCUT-BIP-Lab/MSBA-Net.

全文