Attributes and action recognition based on convolutional neural networks and spatial pyramid VLAD encoding

Shiyang Yan*, Jeremy S. Smith, Bailing Zhang

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)


Determination of human attributes and recognition of actions in still images are two related and challenging tasks in computer vision, which often appear in fine-grained domains where the distinctions between the different categories are very small. Deep Convolutional Neural Network (CNN) models have demonstrated their remarkable representational learning capability through various examples. However, the successes are very limited for attributes and action recognition as the potential of CNNs to acquire both of the global and local information of an image remains largely unexplored. This paper proposes to tackle the problem with an encoding of a spatial pyramid Vector of Locally Aggregated Descriptors (VLAD) on top of CNN features. With region proposals generated by Edgeboxes, a compact and efficient representation of an image is thus produced for subsequent prediction of attributes and classification of actions. The proposed scheme is validated with competitive results on two benchmark datasets: 90.4% mean Average Precision (mAP) on the Berkeley Attributes of People dataset and 88.5% mAP on the Stanford 40 action dataset.

Original languageEnglish
Title of host publicationComputer Vision - ACCV 2016 Workshops, ACCV 2016 International Workshops, Revised Selected Papers
EditorsChu-Song Chen, Kai-Kuang Ma, Jiwen Lu
PublisherSpringer Verlag
Number of pages15
ISBN (Print)9783319545257
Publication statusPublished - 2017
Event13th Asian Conference on Computer Vision, ACCV 2016 - Taipei, Taiwan, Province of China
Duration: 20 Nov 201624 Nov 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10118 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference13th Asian Conference on Computer Vision, ACCV 2016
Country/TerritoryTaiwan, Province of China
City Taipei


Dive into the research topics of 'Attributes and action recognition based on convolutional neural networks and spatial pyramid VLAD encoding'. Together they form a unique fingerprint.

Cite this