TY - GEN
T1 - Learning deep semantic attributes for user video summarization
AU - Sun, Ke
AU - Zhu, Jiasong
AU - Lei, Zhuo
AU - Hou, Xianxu
AU - Zhang, Qian
AU - Duan, Jiang
AU - Qiu, Guoping
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/8/28
Y1 - 2017/8/28
N2 - This paper presents a Semantic Attribute assisted video SUMmarization framework (SASUM). Compared with traditional methods, SASUM has several innovative features. Firstly, we use a natural language processing tool to discover a set of keywords from an image and text corpora to form the semantic attributes of visual contents. Secondly, we train a deep convolution neural network to extract visual features as well as predict the semantic attributes of video segments which enables us to represent video contents with visual and semantic features simultaneously. Thirdly, we construct a temporally constrained video segment affinity matrix and use a partially near duplicate image discovery technique to cluster visually and semantically consistent video frames together. These frame clusters can then be condensed to form an informative and compact summary of the video. We will present experimental results to show the effectiveness of the semantic attributes in assisting the visual features in video summarization and our new technique achieves state-of-the-art performance.
AB - This paper presents a Semantic Attribute assisted video SUMmarization framework (SASUM). Compared with traditional methods, SASUM has several innovative features. Firstly, we use a natural language processing tool to discover a set of keywords from an image and text corpora to form the semantic attributes of visual contents. Secondly, we train a deep convolution neural network to extract visual features as well as predict the semantic attributes of video segments which enables us to represent video contents with visual and semantic features simultaneously. Thirdly, we construct a temporally constrained video segment affinity matrix and use a partially near duplicate image discovery technique to cluster visually and semantically consistent video frames together. These frame clusters can then be condensed to form an informative and compact summary of the video. We will present experimental results to show the effectiveness of the semantic attributes in assisting the visual features in video summarization and our new technique achieves state-of-the-art performance.
KW - Bundling Center Clustering
KW - Deep Convolution Neural Network
KW - Semantic Attribute
KW - Video Summarization
UR - http://www.scopus.com/inward/record.url?scp=85030251231&partnerID=8YFLogxK
U2 - 10.1109/ICME.2017.8019411
DO - 10.1109/ICME.2017.8019411
M3 - Conference Proceeding
AN - SCOPUS:85030251231
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
SP - 643
EP - 648
BT - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017
PB - IEEE Computer Society
T2 - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017
Y2 - 10 July 2017 through 14 July 2017
ER -