Arbitrary-Shaped Text Detection with Adaptive Text Region Representation

Xiufeng Jiang, Shugong Xu*, Shunqing Zhang, Shan Cao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)

Abstract

Text detection/localization, as an important task in computer vision, has witnessed substantial advancements in methodology and performance with convolutional neural networks. However, the vast majority of popular methods use rectangles or quadrangles to describe text regions. These representations have inherent drawbacks, especially relating to dense adjacent text and loose regional text boundaries, which usually cause difficulty detecting arbitrarily shaped text. In this paper, we propose a novel text region representation method, with a robust pipeline, which can precisely detect dense adjacent text instances with arbitrary shapes. We consider a text instance to be composed of an adaptive central text region mask and a corresponding expanding ratio between the central text region and the full text region. More specifically, our pipeline generates adaptive central text regions and corresponding expanding ratios with a proposed training strategy, followed by a new proposed post-processing algorithm which expands central text regions to the complete text instance with the corresponding expanding ratios. We demonstrated that our new text region representation is effective, and that the pipeline can precisely detect closely adjacent text instances of arbitrary shapes. Experimental results on common datasets demonstrate superior performance of our work.

Original languageEnglish
Article number9104986
Pages (from-to)102106-102118
Number of pages13
JournalIEEE Access
Volume8
DOIs
Publication statusPublished - 2020
Externally publishedYes

Keywords

  • arbitrary-shaped
  • deformable convolutional network
  • Scene text detection
  • text region representation

Fingerprint

Dive into the research topics of 'Arbitrary-Shaped Text Detection with Adaptive Text Region Representation'. Together they form a unique fingerprint.

Cite this