Outpainting by Queries

Kai Yao; Penglei Gao; Xi Yang; Jie Sun; Rui Zhang; Kaizhu Huang

doi:10.1007/978-3-031-20050-2_10

Outpainting by Queries

Kai Yao, Penglei Gao, Xi Yang, Jie Sun, Rui Zhang, Kaizhu Huang^*

^*Corresponding author for this work

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

10 Citations (Scopus)

Abstract

Image outpainting, which is well studied with Convolution Neural Network (CNN) based framework, has recently drawn more attention in computer vision. However, CNNs rely on inherent inductive biases to achieve effective sample learning, which may degrade the performance ceiling. In this paper, motivated by the flexible self-attention mechanism with minimal inductive biases in transformer architecture, we reframe the generalised image outpainting problem as a patch-wise sequence-to-sequence autoregression problem, enabling query-based image outpainting. Specifically, we propose a novel hybrid vision-transformer-based encoder-decoder framework, named Query Outpainting TRansformer (QueryOTR), for extrapolating visual context all-side around a given image. Patch-wise mode’s global modeling capacity allows us to extrapolate images from the attention mechanism’s query standpoint. A novel Query Expansion Module (QEM) is designed to integrate information from the predicted queries based on the encoder’s output, hence accelerating the convergence of the pure transformer even with a relatively small dataset. To further enhance connectivity between each patch, the proposed Patch Smoothing Module (PSM) re-allocates and averages the overlapped regions, thus providing seamless predicted images. We experimentally show that QueryOTR could generate visually appealing results smoothly and realistically against the state-of-the-art image outpainting approaches. Code is available at https://github.com/Kaiseem/QueryOTR.

Original language	English
Title of host publication	Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings
Editors	Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	153-169
Number of pages	17
ISBN (Print)	9783031200496
DOIs	https://doi.org/10.1007/978-3-031-20050-2_10
Publication status	Published - 2022
Event	17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, Israel Duration: 23 Oct 2022 → 27 Oct 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13683 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	17th European Conference on Computer Vision, ECCV 2022
Country/Territory	Israel
City	Tel Aviv
Period	23/10/22 → 27/10/22

Keywords

Image outpainting
Query expanding
Transformer

Access to Document

10.1007/978-3-031-20050-2_10

Cite this

Yao, K., Gao, P., Yang, X., Sun, J., Zhang, R., & Huang, K. (2022). Outpainting by Queries. In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, & T. Hassner (Eds.), Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings (pp. 153-169). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13683 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20050-2_10

Yao, Kai ; Gao, Penglei ; Yang, Xi et al. / Outpainting by Queries. Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings. editor / Shai Avidan ; Gabriel Brostow ; Moustapha Cissé ; Giovanni Maria Farinella ; Tal Hassner. Springer Science and Business Media Deutschland GmbH, 2022. pp. 153-169 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{6f3f21c46e964f35b8556e7e51bf099d,

title = "Outpainting by Queries",

abstract = "Image outpainting, which is well studied with Convolution Neural Network (CNN) based framework, has recently drawn more attention in computer vision. However, CNNs rely on inherent inductive biases to achieve effective sample learning, which may degrade the performance ceiling. In this paper, motivated by the flexible self-attention mechanism with minimal inductive biases in transformer architecture, we reframe the generalised image outpainting problem as a patch-wise sequence-to-sequence autoregression problem, enabling query-based image outpainting. Specifically, we propose a novel hybrid vision-transformer-based encoder-decoder framework, named Query Outpainting TRansformer (QueryOTR), for extrapolating visual context all-side around a given image. Patch-wise mode{\textquoteright}s global modeling capacity allows us to extrapolate images from the attention mechanism{\textquoteright}s query standpoint. A novel Query Expansion Module (QEM) is designed to integrate information from the predicted queries based on the encoder{\textquoteright}s output, hence accelerating the convergence of the pure transformer even with a relatively small dataset. To further enhance connectivity between each patch, the proposed Patch Smoothing Module (PSM) re-allocates and averages the overlapped regions, thus providing seamless predicted images. We experimentally show that QueryOTR could generate visually appealing results smoothly and realistically against the state-of-the-art image outpainting approaches. Code is available at https://github.com/Kaiseem/QueryOTR.",

keywords = "Image outpainting, Query expanding, Transformer",

author = "Kai Yao and Penglei Gao and Xi Yang and Jie Sun and Rui Zhang and Kaizhu Huang",

note = "Funding Information: Acknowledgments. The work was partially supported by the following: National Natural Science Foundation of China under no.61876155; Jiangsu Science and Technology Programme under no. BE2020006-4; Key Program Special Fund in XJTLU under no. KSF-T-06 and no. KSF-E-37; Research Development Fund in XJTLU under no. RDF-19-01-21. Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 17th European Conference on Computer Vision, ECCV 2022 ; Conference date: 23-10-2022 Through 27-10-2022",

year = "2022",

doi = "10.1007/978-3-031-20050-2_10",

language = "English",

isbn = "9783031200496",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "153--169",

editor = "Shai Avidan and Gabriel Brostow and Moustapha Ciss{\'e} and Farinella, {Giovanni Maria} and Tal Hassner",

booktitle = "Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings",

}

Yao, K, Gao, P, Yang, X , Sun, J , Zhang, R & Huang, K 2022, Outpainting by Queries. in S Avidan, G Brostow, M Cissé, GM Farinella & T Hassner (eds), Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13683 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 153-169, 17th European Conference on Computer Vision, ECCV 2022, Tel Aviv, Israel, 23/10/22. https://doi.org/10.1007/978-3-031-20050-2_10

Outpainting by Queries. / Yao, Kai; Gao, Penglei; Yang, Xi et al.
Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings. ed. / Shai Avidan; Gabriel Brostow; Moustapha Cissé; Giovanni Maria Farinella; Tal Hassner. Springer Science and Business Media Deutschland GmbH, 2022. p. 153-169 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13683 LNCS).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Outpainting by Queries

AU - Yao, Kai

AU - Gao, Penglei

AU - Yang, Xi

AU - Sun, Jie

AU - Zhang, Rui

AU - Huang, Kaizhu

N1 - Funding Information: Acknowledgments. The work was partially supported by the following: National Natural Science Foundation of China under no.61876155; Jiangsu Science and Technology Programme under no. BE2020006-4; Key Program Special Fund in XJTLU under no. KSF-T-06 and no. KSF-E-37; Research Development Fund in XJTLU under no. RDF-19-01-21. Publisher Copyright: © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2022

Y1 - 2022

N2 - Image outpainting, which is well studied with Convolution Neural Network (CNN) based framework, has recently drawn more attention in computer vision. However, CNNs rely on inherent inductive biases to achieve effective sample learning, which may degrade the performance ceiling. In this paper, motivated by the flexible self-attention mechanism with minimal inductive biases in transformer architecture, we reframe the generalised image outpainting problem as a patch-wise sequence-to-sequence autoregression problem, enabling query-based image outpainting. Specifically, we propose a novel hybrid vision-transformer-based encoder-decoder framework, named Query Outpainting TRansformer (QueryOTR), for extrapolating visual context all-side around a given image. Patch-wise mode’s global modeling capacity allows us to extrapolate images from the attention mechanism’s query standpoint. A novel Query Expansion Module (QEM) is designed to integrate information from the predicted queries based on the encoder’s output, hence accelerating the convergence of the pure transformer even with a relatively small dataset. To further enhance connectivity between each patch, the proposed Patch Smoothing Module (PSM) re-allocates and averages the overlapped regions, thus providing seamless predicted images. We experimentally show that QueryOTR could generate visually appealing results smoothly and realistically against the state-of-the-art image outpainting approaches. Code is available at https://github.com/Kaiseem/QueryOTR.

AB - Image outpainting, which is well studied with Convolution Neural Network (CNN) based framework, has recently drawn more attention in computer vision. However, CNNs rely on inherent inductive biases to achieve effective sample learning, which may degrade the performance ceiling. In this paper, motivated by the flexible self-attention mechanism with minimal inductive biases in transformer architecture, we reframe the generalised image outpainting problem as a patch-wise sequence-to-sequence autoregression problem, enabling query-based image outpainting. Specifically, we propose a novel hybrid vision-transformer-based encoder-decoder framework, named Query Outpainting TRansformer (QueryOTR), for extrapolating visual context all-side around a given image. Patch-wise mode’s global modeling capacity allows us to extrapolate images from the attention mechanism’s query standpoint. A novel Query Expansion Module (QEM) is designed to integrate information from the predicted queries based on the encoder’s output, hence accelerating the convergence of the pure transformer even with a relatively small dataset. To further enhance connectivity between each patch, the proposed Patch Smoothing Module (PSM) re-allocates and averages the overlapped regions, thus providing seamless predicted images. We experimentally show that QueryOTR could generate visually appealing results smoothly and realistically against the state-of-the-art image outpainting approaches. Code is available at https://github.com/Kaiseem/QueryOTR.

KW - Image outpainting

KW - Query expanding

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85142680577&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-20050-2_10

DO - 10.1007/978-3-031-20050-2_10

M3 - Conference Proceeding

AN - SCOPUS:85142680577

SN - 9783031200496

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 153

EP - 169

BT - Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings

A2 - Avidan, Shai

A2 - Brostow, Gabriel

A2 - Cissé, Moustapha

A2 - Farinella, Giovanni Maria

A2 - Hassner, Tal

PB - Springer Science and Business Media Deutschland GmbH

T2 - 17th European Conference on Computer Vision, ECCV 2022

Y2 - 23 October 2022 through 27 October 2022

ER -

Yao K, Gao P, Yang X , Sun J , Zhang R, Huang K. Outpainting by Queries. In Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors, Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 153-169. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-20050-2_10