Handwritten Chinese text recognition by integrating multiple contexts

Qiu Feng Wang; Fei Yin; Cheng Lin Liu

doi:10.1109/TPAMI.2011.264

Handwritten Chinese text recognition by integrating multiple contexts

Qiu Feng Wang^*, Fei Yin, Cheng Lin Liu

^*Corresponding author for this work

CAS - Institute of Automation

Research output: Contribution to journal › Article › peer-review

163 Citations (Scopus)

Abstract

This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.

Original language	English
Article number	6112767
Pages (from-to)	1469-1481
Number of pages	13
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	34
Issue number	8
DOIs	https://doi.org/10.1109/TPAMI.2011.264
Publication status	Published - 2012
Externally published	Yes

Keywords

Handwritten Chinese text recognition
candidate character augmentation
confidence transformation
geometric models
language models
maximum character accuracy training
refined beam search

Access to Document

10.1109/TPAMI.2011.264

Cite this

@article{056726178cd4429ab3cab0fbccb5cb1c,

title = "Handwritten Chinese text recognition by integrating multiple contexts",

abstract = "This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.",

keywords = "Handwritten Chinese text recognition, candidate character augmentation, confidence transformation, geometric models, language models, maximum character accuracy training, refined beam search",

author = "Wang, {Qiu Feng} and Fei Yin and Liu, {Cheng Lin}",

note = "Funding Information: This work was supported by the National Natural Science Foundation of China (NSFC) under Grants 60825301 and 60933010.",

year = "2012",

doi = "10.1109/TPAMI.2011.264",

language = "English",

volume = "34",

pages = "1469--1481",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

number = "8",

}

TY - JOUR

T1 - Handwritten Chinese text recognition by integrating multiple contexts

AU - Wang, Qiu Feng

AU - Yin, Fei

AU - Liu, Cheng Lin

N1 - Funding Information: This work was supported by the National Natural Science Foundation of China (NSFC) under Grants 60825301 and 60933010.

PY - 2012

Y1 - 2012

N2 - This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.

AB - This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.

KW - Handwritten Chinese text recognition

KW - candidate character augmentation

KW - confidence transformation

KW - geometric models

KW - language models

KW - maximum character accuracy training

KW - refined beam search

UR - http://www.scopus.com/inward/record.url?scp=84862652667&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2011.264

DO - 10.1109/TPAMI.2011.264

M3 - Article

C2 - 22201052

AN - SCOPUS:84862652667

SN - 0162-8828

VL - 34

SP - 1469

EP - 1481

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 8

M1 - 6112767

ER -

Handwritten Chinese text recognition by integrating multiple contexts

Abstract

Keywords

Access to Document

Other files and links

Cite this