摘要
Music classification has achieved great progress due to the development of Convolutional Neural Networks (CNNs), which is important for music retrieval and recommendation. However, CNN cannot capture temporal information from music audio, which restricts the prediction performance of the model. To address the issue, we propose a Convolutional Neural Network-Long Short Term Memory (CNN-LSTM) model to learn local spatial features by CNN and learn temporal dependencies by LSTM. In addition, the traditional softmax loss function commonly lacks sufficient discrimination in music classification. Therefore, we propose an additive angular margin and cosine margin softmax (AACM-Softmax) loss function to improve classification results, which minimizes intra-class variances and maximizes inter-class variances simultaneously by enforcing combined margin penalties. Furthermore, we combine the CNN-LSTM model with AACM-Softmax loss function to comprehensively improve the classification performance by learning temporal-dependencies-included discriminative essential features. Extensive experiments on music genre datasets and music emotion datasets show that the proposed model consistently outperforms other models.
-
单位河海大学