Attentive region embedding network for zero-shot learning

Guo Sen Xie; Li Liu; Xiaobo Jin; Fan Zhu; Zheng Zhang; Jie Qin; Yazhou Yao; Ling Shao

doi:10.1109/CVPR.2019.00961

Attentive region embedding network for zero-shot learning

Guo Sen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, Ling Shao

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

253 Citations (Scopus)

Abstract

Zero-shot learning (ZSL) aims to classify images from unseen categories, by merely utilizing seen class images as the training data. Existing works on ZSL mainly leverage the global features or learn the global regions, from which, to construct the embeddings to the semantic space. However, few of them study the discrimination power implied in local image regions (parts), which, in some sense, correspond to semantic attributes, have stronger discrimination than attributes, and can thus assist the semantic transfer between seen/unseen classes. In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task. Specifically, AREN is end-to-end trainable and consists of two network branches, i.e., the attentive region embedding (ARE) stream, and the attentive compressed second-order embedding (ACSE) stream. ARE is capable of discovering multiple part regions under the guidance of the attention and the compatibility loss. Moreover, a novel adaptive thresholding mechanism is proposed for suppressing redundant (such as background) attention regions. To further guarantee more stable semantic transfer from the perspective of second-order collaboration, ACSE is incorporated into the AREN. In the comprehensive evaluations on four benchmarks, our models achieve state-of-the-art performances under ZSL setting, and compelling results under generalized ZSL setting.

Original language	English
Title of host publication	Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Publisher	IEEE Computer Society
Pages	9376-9385
Number of pages	10
ISBN (Electronic)	9781728132938
DOIs	https://doi.org/10.1109/CVPR.2019.00961
Publication status	Published - Aug 2019
Event	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States Duration: 16 Jun 2019 → 20 Jun 2019

Conference

Conference	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Country/Territory	United States
City	Long Beach
Period	16/06/19 → 20/06/19

Keywords

Categorization
Deep Learning
Recognition: Detection
Representation Learning
Retrieval

Access to Document

10.1109/CVPR.2019.00961

Cite this

@inproceedings{75c342a565f4442491e60b87e7d816d2,

title = "Attentive region embedding network for zero-shot learning",

abstract = "Zero-shot learning (ZSL) aims to classify images from unseen categories, by merely utilizing seen class images as the training data. Existing works on ZSL mainly leverage the global features or learn the global regions, from which, to construct the embeddings to the semantic space. However, few of them study the discrimination power implied in local image regions (parts), which, in some sense, correspond to semantic attributes, have stronger discrimination than attributes, and can thus assist the semantic transfer between seen/unseen classes. In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task. Specifically, AREN is end-to-end trainable and consists of two network branches, i.e., the attentive region embedding (ARE) stream, and the attentive compressed second-order embedding (ACSE) stream. ARE is capable of discovering multiple part regions under the guidance of the attention and the compatibility loss. Moreover, a novel adaptive thresholding mechanism is proposed for suppressing redundant (such as background) attention regions. To further guarantee more stable semantic transfer from the perspective of second-order collaboration, ACSE is incorporated into the AREN. In the comprehensive evaluations on four benchmarks, our models achieve state-of-the-art performances under ZSL setting, and compelling results under generalized ZSL setting.",

keywords = "Categorization, Deep Learning, Recognition: Detection, Representation Learning, Retrieval",

author = "Xie, {Guo Sen} and Li Liu and Xiaobo Jin and Fan Zhu and Zheng Zhang and Jie Qin and Yazhou Yao and Ling Shao",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 ; Conference date: 16-06-2019 Through 20-06-2019",

year = "2019",

month = aug,

doi = "10.1109/CVPR.2019.00961",

language = "English",

pages = "9376--9385",

booktitle = "Conference on Computer Vision and Pattern Recognition (CVPR), 2019",

publisher = "IEEE Computer Society",

}

Xie, GS, Liu, L, Jin, X, Zhu, F, Zhang, Z, Qin, J, Yao, Y & Shao, L 2019, Attentive region embedding network for zero-shot learning. in Conference on Computer Vision and Pattern Recognition (CVPR), 2019., 8954350, IEEE Computer Society, pp. 9376-9385, 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, United States, 16/06/19. https://doi.org/10.1109/CVPR.2019.00961

TY - GEN

T1 - Attentive region embedding network for zero-shot learning

AU - Xie, Guo Sen

AU - Liu, Li

AU - Jin, Xiaobo

AU - Zhu, Fan

AU - Zhang, Zheng

AU - Qin, Jie

AU - Yao, Yazhou

AU - Shao, Ling

PY - 2019/8

Y1 - 2019/8

N2 - Zero-shot learning (ZSL) aims to classify images from unseen categories, by merely utilizing seen class images as the training data. Existing works on ZSL mainly leverage the global features or learn the global regions, from which, to construct the embeddings to the semantic space. However, few of them study the discrimination power implied in local image regions (parts), which, in some sense, correspond to semantic attributes, have stronger discrimination than attributes, and can thus assist the semantic transfer between seen/unseen classes. In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task. Specifically, AREN is end-to-end trainable and consists of two network branches, i.e., the attentive region embedding (ARE) stream, and the attentive compressed second-order embedding (ACSE) stream. ARE is capable of discovering multiple part regions under the guidance of the attention and the compatibility loss. Moreover, a novel adaptive thresholding mechanism is proposed for suppressing redundant (such as background) attention regions. To further guarantee more stable semantic transfer from the perspective of second-order collaboration, ACSE is incorporated into the AREN. In the comprehensive evaluations on four benchmarks, our models achieve state-of-the-art performances under ZSL setting, and compelling results under generalized ZSL setting.

AB - Zero-shot learning (ZSL) aims to classify images from unseen categories, by merely utilizing seen class images as the training data. Existing works on ZSL mainly leverage the global features or learn the global regions, from which, to construct the embeddings to the semantic space. However, few of them study the discrimination power implied in local image regions (parts), which, in some sense, correspond to semantic attributes, have stronger discrimination than attributes, and can thus assist the semantic transfer between seen/unseen classes. In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task. Specifically, AREN is end-to-end trainable and consists of two network branches, i.e., the attentive region embedding (ARE) stream, and the attentive compressed second-order embedding (ACSE) stream. ARE is capable of discovering multiple part regions under the guidance of the attention and the compatibility loss. Moreover, a novel adaptive thresholding mechanism is proposed for suppressing redundant (such as background) attention regions. To further guarantee more stable semantic transfer from the perspective of second-order collaboration, ACSE is incorporated into the AREN. In the comprehensive evaluations on four benchmarks, our models achieve state-of-the-art performances under ZSL setting, and compelling results under generalized ZSL setting.

KW - Categorization

KW - Deep Learning

KW - Recognition: Detection

KW - Representation Learning

KW - Retrieval

UR - http://www.scopus.com/inward/record.url?scp=85077526511&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2019.00961

DO - 10.1109/CVPR.2019.00961

M3 - Conference Proceeding

AN - SCOPUS:85077526511

SP - 9376

EP - 9385

BT - Conference on Computer Vision and Pattern Recognition (CVPR), 2019

PB - IEEE Computer Society

T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019

Y2 - 16 June 2019 through 20 June 2019

ER -

Attentive region embedding network for zero-shot learning

Abstract

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this