Improving handwritten Chinese text recognition by unsupervised language model adaptation

Qiu Feng Wang*, Fei Yin, Cheng Lin Liu

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

7 Citations (Scopus)

Abstract

This paper investigates the effects of unsupervised language model adaptation (LMA) in handwritten Chinese text recognition. For no prior information of recognition text is available, we use a two-pass recognition strategy. In the first pass, the generic language model (LM) is used to get a preliminary result, which is used to choose the best matched LMs from a set of pre-defined domains, then the matched LMs are used in the second pass recognition. Each LM is compressed to a moderate size via the entropy-based pruning, tree-structure formatting and fewer-byte quantization. We evaluated the LMA for five LM types, including both character-level and word-level ones. Experiments on the CASIA-HWDB database show that language model adaptation improves the performance for each LM type in all domains. The documents of ancient domain gained the biggest improvement of character-level correct rate of 5.87 percent up and accurate rate of 6.05 percent up.

Original languageEnglish
Title of host publicationProceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
Pages110-114
Number of pages5
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event10th IAPR International Workshop on Document Analysis Systems, DAS 2012 - Gold Coast, QLD, Australia
Duration: 27 Mar 201229 Mar 2012

Publication series

NameProceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012

Conference

Conference10th IAPR International Workshop on Document Analysis Systems, DAS 2012
Country/TerritoryAustralia
CityGold Coast, QLD
Period27/03/1229/03/12

Keywords

  • Handwritten Chinese text recognition
  • Language model adaptation
  • Language model compression
  • Two-pass recognition

Cite this