Graph Attention Networks Adjusted Bi-LSTM for Video Summarization

Zhong, Rui<sup>*</sup>; Wang, Rui; Zou, Yang; Hong, Zhiqiang; Hu, Min

doi:10.1109/LSP.2021.3066349

摘要

The high redundancy among keyframes is a critical issue for the prior summarizing methods in dealing with user-created videos. To address the critical issue, we present a Graph Attention Networks (GAT) adjusted Bi-directional Long Short-term Memory (Bi-LSTM) model for unsupervised video summarization. First, the GAT is adopted to transform an image's visual features into higher-level features by the Contextual Features based Transformation (CFT) mechanism. Specifically, a novel Salient-Area-Size-based spatial attention model is presented to extract frame-wise visual features on the observation that humans tend to focus on sizable and moving objects. Second, the higher-level visual features are integrated with semantic features processed by Bi-LSTM to refine the frame-wise probability of being selected as keyframes. Extensive experiments demonstrate that our method outperforms state-of-the-art methods.

单位
武汉大学

全文

访问全文

分享分享被引(18) 浏览

更新时间：2024-03-23 07:01

Graph Attention Networks Adjusted Bi-LSTM for Video Summarization

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友