摘要
Micro-video recommendation has attracted extensive research attention with the increasing popularity of micro-video sharing platforms. Traditional approaches consider micro-video recommendation as a matching task and ignore the rich relationships among users and micro-videos from various modalities (e.g., visual, acoustic, and textual). Recently, GNN-based approaches show promising performance for the micro-video recommendation task. However, they mainly focus on the homogeneous graph which includes only one type of nodes or relations, and cannot be applied to the heterogeneous graph which consists of users, micro-videos, and related multi-modal information. In this paper, a novel Heterogeneous Hierarchical Feature Aggregation Network (HHFAN) is proposed for personalized micro-video recommendation. Our goal is to explore the highly complicated relationship information among users, micro-videos and related multi-modal information from a modality-aware Heterogeneous Information Graph (M-HIG), and thus generate high-quality user and micro-video embeddings for recommendation. The proposed model consists of two key components: (1) In data structure level, we build a heterogeneous graph and utilize a random walk based sampling strategy to sample neighbors for users and micro-videos. (2) In representation learning level, we design a hierarchical feature aggregation network including the intra- and inter-type feature aggregation networks to better capture the complex structure and rich semantic information in the heterogeneous graph. We evaluate our method on two real-world datasets and the results demonstrate that the proposed model outperforms the baseline methods.
-
单位中国科学院; 中国科学院研究生院