TY - JOUR
T1 - Visual aesthetic understanding
T2 - Sample-specific aesthetic classification and deep activation map visualization
AU - Zhang, Chao
AU - Zhu, Ce
AU - Xu, Xun
AU - Liu, Yipeng
AU - Xiao, Jimin
AU - Tillo, Tammam
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/9
Y1 - 2018/9
N2 - Currently image aesthetic estimation using deep learning has achieved great success compared with the traditional methods by hand-crafted features. Similar to recognition problem, aesthetic estimation categorizes images into visually appealing or not. Nevertheless, it is desirable to understand why certain images are visually more appealing, in specific, which part of the image is contributing to the aesthetic preference. In fact, most traditional approaches adopting hand-crafted feature are, to some extent, able to understand part of image's aesthetic and content information while few studies have been conducted in the context of deep learning. Moreover, we discover that aesthetic rating is ambiguous so that many examples are uncertain in aesthetic level. This has caused a highly imbalanced distribution of aesthetic ratings. To tackle all these issues, we propose an end-to-end convolutional neural network (CNN) model which simultaneously implements aesthetic classification and understanding. To overcome the imbalanced aesthetic ratings, a sample-specific classification method that re-weights samples’ importance is proposed. We find that dropping out ambiguous image, as common adopted by recent deep learning models, is a special case of the sample-specific method, and also figure out that as the weights of the non-ambiguous images increase, the performance is positively affected. In order to understand what is learned in the deep model, global average pooling (GAP) following the last feature map is employed to generate aesthetic activation map (AesAM) and attribute activation map (AttAM). AesAM and AttAM respectively represent the likelihood of aesthetic level for spatial location, and the likelihood of different attribute information. In particular, AesAM mainly accounts for what is learned in deep model. Experiments are carried out on public aesthetic datasets and state-of-the-art performance is achieved. Thanks to the introduction of AttAM, the aesthetic preference is explainable by visualization. Finally, a simple application on image cropping based on the AesAM is presented. The code and trained model will be publicly available on https://github.com/galoiszhang/AWCU.
AB - Currently image aesthetic estimation using deep learning has achieved great success compared with the traditional methods by hand-crafted features. Similar to recognition problem, aesthetic estimation categorizes images into visually appealing or not. Nevertheless, it is desirable to understand why certain images are visually more appealing, in specific, which part of the image is contributing to the aesthetic preference. In fact, most traditional approaches adopting hand-crafted feature are, to some extent, able to understand part of image's aesthetic and content information while few studies have been conducted in the context of deep learning. Moreover, we discover that aesthetic rating is ambiguous so that many examples are uncertain in aesthetic level. This has caused a highly imbalanced distribution of aesthetic ratings. To tackle all these issues, we propose an end-to-end convolutional neural network (CNN) model which simultaneously implements aesthetic classification and understanding. To overcome the imbalanced aesthetic ratings, a sample-specific classification method that re-weights samples’ importance is proposed. We find that dropping out ambiguous image, as common adopted by recent deep learning models, is a special case of the sample-specific method, and also figure out that as the weights of the non-ambiguous images increase, the performance is positively affected. In order to understand what is learned in the deep model, global average pooling (GAP) following the last feature map is employed to generate aesthetic activation map (AesAM) and attribute activation map (AttAM). AesAM and AttAM respectively represent the likelihood of aesthetic level for spatial location, and the likelihood of different attribute information. In particular, AesAM mainly accounts for what is learned in deep model. Experiments are carried out on public aesthetic datasets and state-of-the-art performance is achieved. Thanks to the introduction of AttAM, the aesthetic preference is explainable by visualization. Finally, a simple application on image cropping based on the AesAM is presented. The code and trained model will be publicly available on https://github.com/galoiszhang/AWCU.
KW - Aesthetic understanding
KW - Sample-specific weighting
KW - Visual aesthetic quality assessment
UR - http://www.scopus.com/inward/record.url?scp=85047650385&partnerID=8YFLogxK
U2 - 10.1016/j.image.2018.05.006
DO - 10.1016/j.image.2018.05.006
M3 - Article
AN - SCOPUS:85047650385
SN - 0923-5965
VL - 67
SP - 12
EP - 21
JO - Signal Processing: Image Communication
JF - Signal Processing: Image Communication
ER -