Visual aesthetic understanding: Sample-specific aesthetic classification and deep activation map visualization

Chao Zhang; Ce Zhu; Xun Xu; Yipeng Liu; Jimin Xiao; Tammam Tillo

doi:10.1016/j.image.2018.05.006

Visual aesthetic understanding: Sample-specific aesthetic classification and deep activation map visualization

Chao Zhang, Ce Zhu^*, Xun Xu, Yipeng Liu, Jimin Xiao, Tammam Tillo

^*Corresponding author for this work

Department of Intelligent Science

Research output: Contribution to journal › Article › peer-review

19 Citations (Scopus)

Abstract

Currently image aesthetic estimation using deep learning has achieved great success compared with the traditional methods by hand-crafted features. Similar to recognition problem, aesthetic estimation categorizes images into visually appealing or not. Nevertheless, it is desirable to understand why certain images are visually more appealing, in specific, which part of the image is contributing to the aesthetic preference. In fact, most traditional approaches adopting hand-crafted feature are, to some extent, able to understand part of image's aesthetic and content information while few studies have been conducted in the context of deep learning. Moreover, we discover that aesthetic rating is ambiguous so that many examples are uncertain in aesthetic level. This has caused a highly imbalanced distribution of aesthetic ratings. To tackle all these issues, we propose an end-to-end convolutional neural network (CNN) model which simultaneously implements aesthetic classification and understanding. To overcome the imbalanced aesthetic ratings, a sample-specific classification method that re-weights samples’ importance is proposed. We find that dropping out ambiguous image, as common adopted by recent deep learning models, is a special case of the sample-specific method, and also figure out that as the weights of the non-ambiguous images increase, the performance is positively affected. In order to understand what is learned in the deep model, global average pooling (GAP) following the last feature map is employed to generate aesthetic activation map (AesAM) and attribute activation map (AttAM). AesAM and AttAM respectively represent the likelihood of aesthetic level for spatial location, and the likelihood of different attribute information. In particular, AesAM mainly accounts for what is learned in deep model. Experiments are carried out on public aesthetic datasets and state-of-the-art performance is achieved. Thanks to the introduction of AttAM, the aesthetic preference is explainable by visualization. Finally, a simple application on image cropping based on the AesAM is presented. The code and trained model will be publicly available on https://github.com/galoiszhang/AWCU.

Original language	English
Pages (from-to)	12-21
Number of pages	10
Journal	Signal Processing: Image Communication
Volume	67
DOIs	https://doi.org/10.1016/j.image.2018.05.006
Publication status	Published - Sept 2018

Keywords

Aesthetic understanding
Sample-specific weighting
Visual aesthetic quality assessment

Access to Document

10.1016/j.image.2018.05.006

Cite this

@article{ba12b6dac2564e509bca91c5d7282c9a,

title = "Visual aesthetic understanding: Sample-specific aesthetic classification and deep activation map visualization",

abstract = "Currently image aesthetic estimation using deep learning has achieved great success compared with the traditional methods by hand-crafted features. Similar to recognition problem, aesthetic estimation categorizes images into visually appealing or not. Nevertheless, it is desirable to understand why certain images are visually more appealing, in specific, which part of the image is contributing to the aesthetic preference. In fact, most traditional approaches adopting hand-crafted feature are, to some extent, able to understand part of image's aesthetic and content information while few studies have been conducted in the context of deep learning. Moreover, we discover that aesthetic rating is ambiguous so that many examples are uncertain in aesthetic level. This has caused a highly imbalanced distribution of aesthetic ratings. To tackle all these issues, we propose an end-to-end convolutional neural network (CNN) model which simultaneously implements aesthetic classification and understanding. To overcome the imbalanced aesthetic ratings, a sample-specific classification method that re-weights samples{\textquoteright} importance is proposed. We find that dropping out ambiguous image, as common adopted by recent deep learning models, is a special case of the sample-specific method, and also figure out that as the weights of the non-ambiguous images increase, the performance is positively affected. In order to understand what is learned in the deep model, global average pooling (GAP) following the last feature map is employed to generate aesthetic activation map (AesAM) and attribute activation map (AttAM). AesAM and AttAM respectively represent the likelihood of aesthetic level for spatial location, and the likelihood of different attribute information. In particular, AesAM mainly accounts for what is learned in deep model. Experiments are carried out on public aesthetic datasets and state-of-the-art performance is achieved. Thanks to the introduction of AttAM, the aesthetic preference is explainable by visualization. Finally, a simple application on image cropping based on the AesAM is presented. The code and trained model will be publicly available on https://github.com/galoiszhang/AWCU.",

keywords = "Aesthetic understanding, Sample-specific weighting, Visual aesthetic quality assessment",

author = "Chao Zhang and Ce Zhu and Xun Xu and Yipeng Liu and Jimin Xiao and Tammam Tillo",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier B.V.",

year = "2018",

month = sep,

doi = "10.1016/j.image.2018.05.006",

language = "English",

volume = "67",

pages = "12--21",

journal = "Signal Processing: Image Communication",

issn = "0923-5965",

}

TY - JOUR

T1 - Visual aesthetic understanding

T2 - Sample-specific aesthetic classification and deep activation map visualization

AU - Zhang, Chao

AU - Zhu, Ce

AU - Xu, Xun

AU - Liu, Yipeng

AU - Xiao, Jimin

AU - Tillo, Tammam

PY - 2018/9

Y1 - 2018/9

N2 - Currently image aesthetic estimation using deep learning has achieved great success compared with the traditional methods by hand-crafted features. Similar to recognition problem, aesthetic estimation categorizes images into visually appealing or not. Nevertheless, it is desirable to understand why certain images are visually more appealing, in specific, which part of the image is contributing to the aesthetic preference. In fact, most traditional approaches adopting hand-crafted feature are, to some extent, able to understand part of image's aesthetic and content information while few studies have been conducted in the context of deep learning. Moreover, we discover that aesthetic rating is ambiguous so that many examples are uncertain in aesthetic level. This has caused a highly imbalanced distribution of aesthetic ratings. To tackle all these issues, we propose an end-to-end convolutional neural network (CNN) model which simultaneously implements aesthetic classification and understanding. To overcome the imbalanced aesthetic ratings, a sample-specific classification method that re-weights samples’ importance is proposed. We find that dropping out ambiguous image, as common adopted by recent deep learning models, is a special case of the sample-specific method, and also figure out that as the weights of the non-ambiguous images increase, the performance is positively affected. In order to understand what is learned in the deep model, global average pooling (GAP) following the last feature map is employed to generate aesthetic activation map (AesAM) and attribute activation map (AttAM). AesAM and AttAM respectively represent the likelihood of aesthetic level for spatial location, and the likelihood of different attribute information. In particular, AesAM mainly accounts for what is learned in deep model. Experiments are carried out on public aesthetic datasets and state-of-the-art performance is achieved. Thanks to the introduction of AttAM, the aesthetic preference is explainable by visualization. Finally, a simple application on image cropping based on the AesAM is presented. The code and trained model will be publicly available on https://github.com/galoiszhang/AWCU.

AB - Currently image aesthetic estimation using deep learning has achieved great success compared with the traditional methods by hand-crafted features. Similar to recognition problem, aesthetic estimation categorizes images into visually appealing or not. Nevertheless, it is desirable to understand why certain images are visually more appealing, in specific, which part of the image is contributing to the aesthetic preference. In fact, most traditional approaches adopting hand-crafted feature are, to some extent, able to understand part of image's aesthetic and content information while few studies have been conducted in the context of deep learning. Moreover, we discover that aesthetic rating is ambiguous so that many examples are uncertain in aesthetic level. This has caused a highly imbalanced distribution of aesthetic ratings. To tackle all these issues, we propose an end-to-end convolutional neural network (CNN) model which simultaneously implements aesthetic classification and understanding. To overcome the imbalanced aesthetic ratings, a sample-specific classification method that re-weights samples’ importance is proposed. We find that dropping out ambiguous image, as common adopted by recent deep learning models, is a special case of the sample-specific method, and also figure out that as the weights of the non-ambiguous images increase, the performance is positively affected. In order to understand what is learned in the deep model, global average pooling (GAP) following the last feature map is employed to generate aesthetic activation map (AesAM) and attribute activation map (AttAM). AesAM and AttAM respectively represent the likelihood of aesthetic level for spatial location, and the likelihood of different attribute information. In particular, AesAM mainly accounts for what is learned in deep model. Experiments are carried out on public aesthetic datasets and state-of-the-art performance is achieved. Thanks to the introduction of AttAM, the aesthetic preference is explainable by visualization. Finally, a simple application on image cropping based on the AesAM is presented. The code and trained model will be publicly available on https://github.com/galoiszhang/AWCU.

KW - Aesthetic understanding

KW - Sample-specific weighting

KW - Visual aesthetic quality assessment

UR - http://www.scopus.com/inward/record.url?scp=85047650385&partnerID=8YFLogxK

U2 - 10.1016/j.image.2018.05.006

DO - 10.1016/j.image.2018.05.006

M3 - Article

AN - SCOPUS:85047650385

SN - 0923-5965

VL - 67

SP - 12

EP - 21

JO - Signal Processing: Image Communication

JF - Signal Processing: Image Communication

ER -

Visual aesthetic understanding: Sample-specific aesthetic classification and deep activation map visualization

Abstract

Keywords

Access to Document

Other files and links

Cite this