Visual aesthetic understanding: Sample-specific aesthetic classification and deep activation map visualization

Chao Zhang, Ce Zhu*, Xun Xu, Yipeng Liu, Jimin Xiao, Tammam Tillo

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)


Currently image aesthetic estimation using deep learning has achieved great success compared with the traditional methods by hand-crafted features. Similar to recognition problem, aesthetic estimation categorizes images into visually appealing or not. Nevertheless, it is desirable to understand why certain images are visually more appealing, in specific, which part of the image is contributing to the aesthetic preference. In fact, most traditional approaches adopting hand-crafted feature are, to some extent, able to understand part of image's aesthetic and content information while few studies have been conducted in the context of deep learning. Moreover, we discover that aesthetic rating is ambiguous so that many examples are uncertain in aesthetic level. This has caused a highly imbalanced distribution of aesthetic ratings. To tackle all these issues, we propose an end-to-end convolutional neural network (CNN) model which simultaneously implements aesthetic classification and understanding. To overcome the imbalanced aesthetic ratings, a sample-specific classification method that re-weights samples’ importance is proposed. We find that dropping out ambiguous image, as common adopted by recent deep learning models, is a special case of the sample-specific method, and also figure out that as the weights of the non-ambiguous images increase, the performance is positively affected. In order to understand what is learned in the deep model, global average pooling (GAP) following the last feature map is employed to generate aesthetic activation map (AesAM) and attribute activation map (AttAM). AesAM and AttAM respectively represent the likelihood of aesthetic level for spatial location, and the likelihood of different attribute information. In particular, AesAM mainly accounts for what is learned in deep model. Experiments are carried out on public aesthetic datasets and state-of-the-art performance is achieved. Thanks to the introduction of AttAM, the aesthetic preference is explainable by visualization. Finally, a simple application on image cropping based on the AesAM is presented. The code and trained model will be publicly available on

Original languageEnglish
Pages (from-to)12-21
Number of pages10
JournalSignal Processing: Image Communication
Publication statusPublished - Sept 2018


  • Aesthetic understanding
  • Sample-specific weighting
  • Visual aesthetic quality assessment

Cite this