Multilabel Image Classification with Regional Latent Semantic Dependencies

Junjie Zhang; Qi Wu; Chunhua Shen; Jian Zhang; Jianfeng Lu

doi:10.1109/TMM.2018.2812605

Multilabel Image Classification with Regional Latent Semantic Dependencies

Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

172 Citations (Scopus)

Abstract

Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single shot. Recent state-of-the-art approaches to the multilabel image classification exploit the label dependencies in an image, at the global level, largely improving the labeling capacity. However, predicting small objects and visual concepts is still challenging due to the limited discrimination of the global visual features. In this paper, we propose a regional latent semantic dependencies model (RLSD) to address this problem. The utilized model includes a fully convolutional localization architecture to localize the regions that may contain multiple highly dependent labels. The localized regions are further sent to the recurrent neural networks to characterize the latent semantic dependencies at the regional level. Experimental results on several benchmark datasets show that our proposed model achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images. Also, we set up an upper bound model (RLSD+ft-RPN) using bounding-box coordinates during training, and the experimental results also show that our RLSD can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.

Original language	English
Article number	8310600
Pages (from-to)	2801-2813
Number of pages	13
Journal	IEEE Transactions on Multimedia
Volume	20
Issue number	10
DOIs	https://doi.org/10.1109/TMM.2018.2812605
Publication status	Published - Oct 2018
Externally published	Yes

Keywords

deep neural network
Multilabel image classification
semantic dependence

Access to Document

10.1109/TMM.2018.2812605

Cite this

@article{8e92081e9bb3497a814b59b46e57a215,

title = "Multilabel Image Classification with Regional Latent Semantic Dependencies",

abstract = "Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single shot. Recent state-of-the-art approaches to the multilabel image classification exploit the label dependencies in an image, at the global level, largely improving the labeling capacity. However, predicting small objects and visual concepts is still challenging due to the limited discrimination of the global visual features. In this paper, we propose a regional latent semantic dependencies model (RLSD) to address this problem. The utilized model includes a fully convolutional localization architecture to localize the regions that may contain multiple highly dependent labels. The localized regions are further sent to the recurrent neural networks to characterize the latent semantic dependencies at the regional level. Experimental results on several benchmark datasets show that our proposed model achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images. Also, we set up an upper bound model (RLSD+ft-RPN) using bounding-box coordinates during training, and the experimental results also show that our RLSD can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.",

keywords = "deep neural network, Multilabel image classification, semantic dependence",

author = "Junjie Zhang and Qi Wu and Chunhua Shen and Jian Zhang and Jianfeng Lu",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2018",

month = oct,

doi = "10.1109/TMM.2018.2812605",

language = "English",

volume = "20",

pages = "2801--2813",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

number = "10",

}

TY - JOUR

T1 - Multilabel Image Classification with Regional Latent Semantic Dependencies

AU - Zhang, Junjie

AU - Wu, Qi

AU - Shen, Chunhua

AU - Zhang, Jian

AU - Lu, Jianfeng

PY - 2018/10

Y1 - 2018/10

N2 - Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single shot. Recent state-of-the-art approaches to the multilabel image classification exploit the label dependencies in an image, at the global level, largely improving the labeling capacity. However, predicting small objects and visual concepts is still challenging due to the limited discrimination of the global visual features. In this paper, we propose a regional latent semantic dependencies model (RLSD) to address this problem. The utilized model includes a fully convolutional localization architecture to localize the regions that may contain multiple highly dependent labels. The localized regions are further sent to the recurrent neural networks to characterize the latent semantic dependencies at the regional level. Experimental results on several benchmark datasets show that our proposed model achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images. Also, we set up an upper bound model (RLSD+ft-RPN) using bounding-box coordinates during training, and the experimental results also show that our RLSD can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.

AB - Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single shot. Recent state-of-the-art approaches to the multilabel image classification exploit the label dependencies in an image, at the global level, largely improving the labeling capacity. However, predicting small objects and visual concepts is still challenging due to the limited discrimination of the global visual features. In this paper, we propose a regional latent semantic dependencies model (RLSD) to address this problem. The utilized model includes a fully convolutional localization architecture to localize the regions that may contain multiple highly dependent labels. The localized regions are further sent to the recurrent neural networks to characterize the latent semantic dependencies at the regional level. Experimental results on several benchmark datasets show that our proposed model achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images. Also, we set up an upper bound model (RLSD+ft-RPN) using bounding-box coordinates during training, and the experimental results also show that our RLSD can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.

KW - deep neural network

KW - Multilabel image classification

KW - semantic dependence

UR - http://www.scopus.com/inward/record.url?scp=85043450655&partnerID=8YFLogxK

U2 - 10.1109/TMM.2018.2812605

DO - 10.1109/TMM.2018.2812605

M3 - Article

AN - SCOPUS:85043450655

SN - 1520-9210

VL - 20

SP - 2801

EP - 2813

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

IS - 10

M1 - 8310600

ER -

Multilabel Image Classification with Regional Latent Semantic Dependencies

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this