Traffic scene recognition based on deep CNN and VLAD spatial pyramids

Fang Yu Wu; Shi Yang Yan; Jeremy S. Smith; Bai Ling Zhang

doi:10.1109/ICMLC.2017.8107758

Traffic scene recognition based on deep CNN and VLAD spatial pyramids

Fang Yu Wu, Shi Yang Yan, Jeremy S. Smith, Bai Ling Zhang

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

9 Citations (Scopus)

Abstract

Traffic scene recognition is an important and challenging issue in Intelligent Transportation Systems (ITS). Recently, Convolutional Neural Network (CNN) models have achieved great success in many applications, including scene classification. The remarkable representational learning capability of CNN remains to be further explored for solving real-world problems. Vector of Locally Aggregated Descriptors (VLAD) encoding has also proved to be a powerful method in catching global contextual information. In this paper, we attempted to solve the traffic scene recognition problem by combining the features representational capabilities of CNN with the VLAD encoding scheme. More specifically, the CNN features of image patches generated by a region proposal algorithm are encoded by applying VLAD, which subsequently represent an image in a compact representation. To catch the spatial information, spatial pyramids are exploited to encode CNN features. We experimented with a dataset of 10 categories of traffic scenes, with satisfactory categorization performances.

Original language	English
Title of host publication	Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	156-161
Number of pages	6
ISBN (Electronic)	9781538604069
DOIs	https://doi.org/10.1109/ICMLC.2017.8107758
Publication status	Published - 14 Nov 2017
Event	16th International Conference on Machine Learning and Cybernetics, ICMLC 2017 - Ningbo, China Duration: 9 Jul 2017 → 12 Jul 2017

Publication series

Name	Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017
Volume	1

Conference

Conference	16th International Conference on Machine Learning and Cybernetics, ICMLC 2017
Country/Territory	China
City	Ningbo
Period	9/07/17 → 12/07/17

Keywords

Convolutional Neural Network
Traffic scene recognition
Vector of Locally Aggregated Descriptors encoding

Access to Document

10.1109/ICMLC.2017.8107758

Cite this

Wu, F. Y., Yan, S. Y., Smith, J. S., & Zhang, B. L. (2017). Traffic scene recognition based on deep CNN and VLAD spatial pyramids. In Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017 (pp. 156-161). (Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017; Vol. 1). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICMLC.2017.8107758

Wu, Fang Yu ; Yan, Shi Yang ; Smith, Jeremy S. et al. / Traffic scene recognition based on deep CNN and VLAD spatial pyramids. Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 156-161 (Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017).

@inproceedings{cbd2b2760ceb4973bf58deb07212f215,

title = "Traffic scene recognition based on deep CNN and VLAD spatial pyramids",

abstract = "Traffic scene recognition is an important and challenging issue in Intelligent Transportation Systems (ITS). Recently, Convolutional Neural Network (CNN) models have achieved great success in many applications, including scene classification. The remarkable representational learning capability of CNN remains to be further explored for solving real-world problems. Vector of Locally Aggregated Descriptors (VLAD) encoding has also proved to be a powerful method in catching global contextual information. In this paper, we attempted to solve the traffic scene recognition problem by combining the features representational capabilities of CNN with the VLAD encoding scheme. More specifically, the CNN features of image patches generated by a region proposal algorithm are encoded by applying VLAD, which subsequently represent an image in a compact representation. To catch the spatial information, spatial pyramids are exploited to encode CNN features. We experimented with a dataset of 10 categories of traffic scenes, with satisfactory categorization performances.",

keywords = "Convolutional Neural Network, Traffic scene recognition, Vector of Locally Aggregated Descriptors encoding",

author = "Wu, {Fang Yu} and Yan, {Shi Yang} and Smith, {Jeremy S.} and Zhang, {Bai Ling}",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 16th International Conference on Machine Learning and Cybernetics, ICMLC 2017 ; Conference date: 09-07-2017 Through 12-07-2017",

year = "2017",

month = nov,

day = "14",

doi = "10.1109/ICMLC.2017.8107758",

language = "English",

series = "Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "156--161",

booktitle = "Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017",

}

Wu, FY, Yan, SY, Smith, JS & Zhang, BL 2017, Traffic scene recognition based on deep CNN and VLAD spatial pyramids. in Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017. Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017, vol. 1, Institute of Electrical and Electronics Engineers Inc., pp. 156-161, 16th International Conference on Machine Learning and Cybernetics, ICMLC 2017, Ningbo, China, 9/07/17. https://doi.org/10.1109/ICMLC.2017.8107758

Traffic scene recognition based on deep CNN and VLAD spatial pyramids. / Wu, Fang Yu; Yan, Shi Yang; Smith, Jeremy S. et al.
Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 156-161 (Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017; Vol. 1).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Traffic scene recognition based on deep CNN and VLAD spatial pyramids

AU - Wu, Fang Yu

AU - Yan, Shi Yang

AU - Smith, Jeremy S.

AU - Zhang, Bai Ling

PY - 2017/11/14

Y1 - 2017/11/14

N2 - Traffic scene recognition is an important and challenging issue in Intelligent Transportation Systems (ITS). Recently, Convolutional Neural Network (CNN) models have achieved great success in many applications, including scene classification. The remarkable representational learning capability of CNN remains to be further explored for solving real-world problems. Vector of Locally Aggregated Descriptors (VLAD) encoding has also proved to be a powerful method in catching global contextual information. In this paper, we attempted to solve the traffic scene recognition problem by combining the features representational capabilities of CNN with the VLAD encoding scheme. More specifically, the CNN features of image patches generated by a region proposal algorithm are encoded by applying VLAD, which subsequently represent an image in a compact representation. To catch the spatial information, spatial pyramids are exploited to encode CNN features. We experimented with a dataset of 10 categories of traffic scenes, with satisfactory categorization performances.

AB - Traffic scene recognition is an important and challenging issue in Intelligent Transportation Systems (ITS). Recently, Convolutional Neural Network (CNN) models have achieved great success in many applications, including scene classification. The remarkable representational learning capability of CNN remains to be further explored for solving real-world problems. Vector of Locally Aggregated Descriptors (VLAD) encoding has also proved to be a powerful method in catching global contextual information. In this paper, we attempted to solve the traffic scene recognition problem by combining the features representational capabilities of CNN with the VLAD encoding scheme. More specifically, the CNN features of image patches generated by a region proposal algorithm are encoded by applying VLAD, which subsequently represent an image in a compact representation. To catch the spatial information, spatial pyramids are exploited to encode CNN features. We experimented with a dataset of 10 categories of traffic scenes, with satisfactory categorization performances.

KW - Convolutional Neural Network

KW - Traffic scene recognition

KW - Vector of Locally Aggregated Descriptors encoding

UR - http://www.scopus.com/inward/record.url?scp=85042518576&partnerID=8YFLogxK

U2 - 10.1109/ICMLC.2017.8107758

DO - 10.1109/ICMLC.2017.8107758

M3 - Conference Proceeding

AN - SCOPUS:85042518576

T3 - Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017

SP - 156

EP - 161

BT - Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 16th International Conference on Machine Learning and Cybernetics, ICMLC 2017

Y2 - 9 July 2017 through 12 July 2017

ER -

Wu FY, Yan SY, Smith JS, Zhang BL. Traffic scene recognition based on deep CNN and VLAD spatial pyramids. In Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 156-161. (Proceedings of 2017 International Conference on Machine Learning and Cybernetics, ICMLC 2017). doi: 10.1109/ICMLC.2017.8107758

Traffic scene recognition based on deep CNN and VLAD spatial pyramids

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this