Gesture recognition algorithm based on multi-scale feature fusion in RGB-D images

摘要

With the rapid development of sensor technology and artificial intelligence, the video gesture recognition technology under the background of big data makes human-computer interaction more natural and flexible, bringing richer interactive experience to teaching, on-board control, electronic games, etc. In order to perform robust recognition under the conditions of illumination change, background clutter, rapid movement, partial occlusion, an algorithm based on multi-level feature fusion of two-stream convolutional neural network is proposed, which includes three main steps. Firstly, the Kinect sensor obtains RGB-D images to establish a gesture database. At the same time, data enhancement is performed on training and test sets. Then, a model of multi-level feature fusion of two-stream convolutional neural network is established and trained. Experiments result show that the proposed network model can robustly track and recognize gestures, and compared with the single-channel model, the average detection accuracy is improved by 1.08%, and mean average precision (mAP) is improved by 3.56%. The average recognition rate of gestures under occlusion and different light intensity was 93.98%. Finally, in the ASL dataset, LaRED dataset, and 1-miohand dataset, recognition accuracy shows satisfactory performances compared to the other method.

关键词

image processing neural nets