TY - GEN
T1 - Improving handwritten Chinese text recognition by unsupervised language model adaptation
AU - Wang, Qiu Feng
AU - Yin, Fei
AU - Liu, Cheng Lin
PY - 2012
Y1 - 2012
N2 - This paper investigates the effects of unsupervised language model adaptation (LMA) in handwritten Chinese text recognition. For no prior information of recognition text is available, we use a two-pass recognition strategy. In the first pass, the generic language model (LM) is used to get a preliminary result, which is used to choose the best matched LMs from a set of pre-defined domains, then the matched LMs are used in the second pass recognition. Each LM is compressed to a moderate size via the entropy-based pruning, tree-structure formatting and fewer-byte quantization. We evaluated the LMA for five LM types, including both character-level and word-level ones. Experiments on the CASIA-HWDB database show that language model adaptation improves the performance for each LM type in all domains. The documents of ancient domain gained the biggest improvement of character-level correct rate of 5.87 percent up and accurate rate of 6.05 percent up.
AB - This paper investigates the effects of unsupervised language model adaptation (LMA) in handwritten Chinese text recognition. For no prior information of recognition text is available, we use a two-pass recognition strategy. In the first pass, the generic language model (LM) is used to get a preliminary result, which is used to choose the best matched LMs from a set of pre-defined domains, then the matched LMs are used in the second pass recognition. Each LM is compressed to a moderate size via the entropy-based pruning, tree-structure formatting and fewer-byte quantization. We evaluated the LMA for five LM types, including both character-level and word-level ones. Experiments on the CASIA-HWDB database show that language model adaptation improves the performance for each LM type in all domains. The documents of ancient domain gained the biggest improvement of character-level correct rate of 5.87 percent up and accurate rate of 6.05 percent up.
KW - Handwritten Chinese text recognition
KW - Language model adaptation
KW - Language model compression
KW - Two-pass recognition
UR - https://www.scopus.com/pages/publications/84862061467
U2 - 10.1109/DAS.2012.46
DO - 10.1109/DAS.2012.46
M3 - Conference Proceeding
AN - SCOPUS:84862061467
SN - 9780769546612
T3 - Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
SP - 110
EP - 114
BT - Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
T2 - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
Y2 - 27 March 2012 through 29 March 2012
ER -