Human Detection Aided by Deeply Learned Semantic Masks

Xinyu Wang; Chunhua Shen; Hanxi Li; Shugong Xu

doi:10.1109/TCSVT.2019.2924912

Human Detection Aided by Deeply Learned Semantic Masks

Xinyu Wang, Chunhua Shen, Hanxi Li, Shugong Xu

Research output: Contribution to journal › Article › peer-review

12 Citations (Scopus)

Abstract

Human detection is one of the long-standing computer vision tasks, and it has been a cornerstone for many real-world applications, such as photo album organization, video surveillance, and autonomous driving. Benefiting from deep learning technologies, such as convolutional neural networks and modern object detectors, have been achieving much improved accuracy in generic object detection tasks. In this paper, we aim to improve deep learning-based human detection. Our main idea is to exploit semantic context information for human detection by using deep-learnt semantic features provided by semantic segmentation masks. Segmentation masks play as an attention mechanism and enforce the detectors to focus on the image regions where potential object candidates are likely to appear. Meanwhile, the extra segmentation mask channel can also guide the convolutional kernels to automatically learn more discriminative features which make it easier to distinguish the background and foreground. We implement our methods with two popular detection frameworks, i.e., faster R-CNN and SSD and experimentally analyze the effectiveness of the proposed methods. Evaluation results on the widely used MS-COCO dataset and the very recent CrowdHuman dataset are provided. Our proposed methods outperform the baseline detectors and achieve better performance on highly occluded human detection.

Original language	English
Article number	8746171
Pages (from-to)	2663-2673
Number of pages	11
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	30
Issue number	8
DOIs	https://doi.org/10.1109/TCSVT.2019.2924912
Publication status	Published - Aug 2020
Externally published	Yes

Keywords

convolutional neural network (CNN)
fully convolution network
Human detection
instance segmentation
object detection
semantic segmentation

Access to Document

10.1109/TCSVT.2019.2924912

Cite this

@article{a7ab5fb6d19e4eb79f515edd9d7d14b2,

title = "Human Detection Aided by Deeply Learned Semantic Masks",

abstract = "Human detection is one of the long-standing computer vision tasks, and it has been a cornerstone for many real-world applications, such as photo album organization, video surveillance, and autonomous driving. Benefiting from deep learning technologies, such as convolutional neural networks and modern object detectors, have been achieving much improved accuracy in generic object detection tasks. In this paper, we aim to improve deep learning-based human detection. Our main idea is to exploit semantic context information for human detection by using deep-learnt semantic features provided by semantic segmentation masks. Segmentation masks play as an attention mechanism and enforce the detectors to focus on the image regions where potential object candidates are likely to appear. Meanwhile, the extra segmentation mask channel can also guide the convolutional kernels to automatically learn more discriminative features which make it easier to distinguish the background and foreground. We implement our methods with two popular detection frameworks, i.e., faster R-CNN and SSD and experimentally analyze the effectiveness of the proposed methods. Evaluation results on the widely used MS-COCO dataset and the very recent CrowdHuman dataset are provided. Our proposed methods outperform the baseline detectors and achieve better performance on highly occluded human detection.",

keywords = "convolutional neural network (CNN), fully convolution network, Human detection, instance segmentation, object detection, semantic segmentation",

author = "Xinyu Wang and Chunhua Shen and Hanxi Li and Shugong Xu",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2020",

month = aug,

doi = "10.1109/TCSVT.2019.2924912",

language = "English",

volume = "30",

pages = "2663--2673",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

number = "8",

}

TY - JOUR

T1 - Human Detection Aided by Deeply Learned Semantic Masks

AU - Wang, Xinyu

AU - Shen, Chunhua

AU - Li, Hanxi

AU - Xu, Shugong

PY - 2020/8

Y1 - 2020/8

N2 - Human detection is one of the long-standing computer vision tasks, and it has been a cornerstone for many real-world applications, such as photo album organization, video surveillance, and autonomous driving. Benefiting from deep learning technologies, such as convolutional neural networks and modern object detectors, have been achieving much improved accuracy in generic object detection tasks. In this paper, we aim to improve deep learning-based human detection. Our main idea is to exploit semantic context information for human detection by using deep-learnt semantic features provided by semantic segmentation masks. Segmentation masks play as an attention mechanism and enforce the detectors to focus on the image regions where potential object candidates are likely to appear. Meanwhile, the extra segmentation mask channel can also guide the convolutional kernels to automatically learn more discriminative features which make it easier to distinguish the background and foreground. We implement our methods with two popular detection frameworks, i.e., faster R-CNN and SSD and experimentally analyze the effectiveness of the proposed methods. Evaluation results on the widely used MS-COCO dataset and the very recent CrowdHuman dataset are provided. Our proposed methods outperform the baseline detectors and achieve better performance on highly occluded human detection.

AB - Human detection is one of the long-standing computer vision tasks, and it has been a cornerstone for many real-world applications, such as photo album organization, video surveillance, and autonomous driving. Benefiting from deep learning technologies, such as convolutional neural networks and modern object detectors, have been achieving much improved accuracy in generic object detection tasks. In this paper, we aim to improve deep learning-based human detection. Our main idea is to exploit semantic context information for human detection by using deep-learnt semantic features provided by semantic segmentation masks. Segmentation masks play as an attention mechanism and enforce the detectors to focus on the image regions where potential object candidates are likely to appear. Meanwhile, the extra segmentation mask channel can also guide the convolutional kernels to automatically learn more discriminative features which make it easier to distinguish the background and foreground. We implement our methods with two popular detection frameworks, i.e., faster R-CNN and SSD and experimentally analyze the effectiveness of the proposed methods. Evaluation results on the widely used MS-COCO dataset and the very recent CrowdHuman dataset are provided. Our proposed methods outperform the baseline detectors and achieve better performance on highly occluded human detection.

KW - convolutional neural network (CNN)

KW - fully convolution network

KW - Human detection

KW - instance segmentation

KW - object detection

KW - semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85089542685&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2019.2924912

DO - 10.1109/TCSVT.2019.2924912

M3 - Article

AN - SCOPUS:85089542685

SN - 1051-8215

VL - 30

SP - 2663

EP - 2673

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 8

M1 - 8746171

ER -

Human Detection Aided by Deeply Learned Semantic Masks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this