Common Sense Knowledge for Handwritten Chinese Text Recognition

Qiu Feng Wang; Erik Cambria; Cheng Lin Liu; Amir Hussain

doi:10.1007/s12559-012-9183-y

Common Sense Knowledge for Handwritten Chinese Text Recognition

Qiu Feng Wang^*, Erik Cambria, Cheng Lin Liu, Amir Hussain

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

67 Citations (Scopus)

Abstract

Compared to human intelligence, computers are far short of common sense knowledge which people normally acquire during the formative years of their lives. This paper investigates the effects of employing common sense knowledge as a new linguistic context in handwritten Chinese text recognition. Three methods are introduced to supplement the standard n-gram language model: embedding model, direct model, and an ensemble of these two. The embedding model uses semantic similarities from common sense knowledge to make the n-gram probabilities estimation more reliable, especially for the unseen n-grams in the training text corpus. The direct model, in turn, considers the linguistic context of the whole document to make up for the short context limit of the n-gram model. The three models are evaluated on a large unconstrained handwriting database, CASIA-HWDB, and the results show that the adoption of common sense knowledge yields improvements in recognition performance, despite the reduced concept list hereby employed.

Original language	English
Pages (from-to)	234-242
Number of pages	9
Journal	Cognitive Computation
Volume	5
Issue number	2
DOIs	https://doi.org/10.1007/s12559-012-9183-y
Publication status	Published - Jun 2013
Externally published	Yes

Keywords

Common sense knowledge
Handwritten Chinese text recognition
Linguistic context
Natural language processing
n-gram

Access to Document

10.1007/s12559-012-9183-y

Cite this

@article{dc5bb347ffec44dfa05b48d8711f9ab7,

title = "Common Sense Knowledge for Handwritten Chinese Text Recognition",

abstract = "Compared to human intelligence, computers are far short of common sense knowledge which people normally acquire during the formative years of their lives. This paper investigates the effects of employing common sense knowledge as a new linguistic context in handwritten Chinese text recognition. Three methods are introduced to supplement the standard n-gram language model: embedding model, direct model, and an ensemble of these two. The embedding model uses semantic similarities from common sense knowledge to make the n-gram probabilities estimation more reliable, especially for the unseen n-grams in the training text corpus. The direct model, in turn, considers the linguistic context of the whole document to make up for the short context limit of the n-gram model. The three models are evaluated on a large unconstrained handwriting database, CASIA-HWDB, and the results show that the adoption of common sense knowledge yields improvements in recognition performance, despite the reduced concept list hereby employed.",

keywords = "Common sense knowledge, Handwritten Chinese text recognition, Linguistic context, Natural language processing, n-gram",

author = "Wang, {Qiu Feng} and Erik Cambria and Liu, {Cheng Lin} and Amir Hussain",

note = "Funding Information: Acknowledgments This work has been supported in part by the National Basic Research Program of China (973 Program) Grant 2012CB316302, the National Natural Science Foundation of China (NSFC) Grants 60825301 and 60933010, and the Royal Society of Edinburgh (UK) and the Chinese Academy of Sciences within the China-Scotland SIPRA (Signal Image Processing Research Academy) Programme. The authors would like to thank Jia-jun Zhang for his aid in the machine translation process.",

year = "2013",

month = jun,

doi = "10.1007/s12559-012-9183-y",

language = "English",

volume = "5",

pages = "234--242",

journal = "Cognitive Computation",

issn = "1866-9956",

number = "2",

}

TY - JOUR

T1 - Common Sense Knowledge for Handwritten Chinese Text Recognition

AU - Wang, Qiu Feng

AU - Cambria, Erik

AU - Liu, Cheng Lin

AU - Hussain, Amir

N1 - Funding Information: Acknowledgments This work has been supported in part by the National Basic Research Program of China (973 Program) Grant 2012CB316302, the National Natural Science Foundation of China (NSFC) Grants 60825301 and 60933010, and the Royal Society of Edinburgh (UK) and the Chinese Academy of Sciences within the China-Scotland SIPRA (Signal Image Processing Research Academy) Programme. The authors would like to thank Jia-jun Zhang for his aid in the machine translation process.

PY - 2013/6

Y1 - 2013/6

N2 - Compared to human intelligence, computers are far short of common sense knowledge which people normally acquire during the formative years of their lives. This paper investigates the effects of employing common sense knowledge as a new linguistic context in handwritten Chinese text recognition. Three methods are introduced to supplement the standard n-gram language model: embedding model, direct model, and an ensemble of these two. The embedding model uses semantic similarities from common sense knowledge to make the n-gram probabilities estimation more reliable, especially for the unseen n-grams in the training text corpus. The direct model, in turn, considers the linguistic context of the whole document to make up for the short context limit of the n-gram model. The three models are evaluated on a large unconstrained handwriting database, CASIA-HWDB, and the results show that the adoption of common sense knowledge yields improvements in recognition performance, despite the reduced concept list hereby employed.

AB - Compared to human intelligence, computers are far short of common sense knowledge which people normally acquire during the formative years of their lives. This paper investigates the effects of employing common sense knowledge as a new linguistic context in handwritten Chinese text recognition. Three methods are introduced to supplement the standard n-gram language model: embedding model, direct model, and an ensemble of these two. The embedding model uses semantic similarities from common sense knowledge to make the n-gram probabilities estimation more reliable, especially for the unseen n-grams in the training text corpus. The direct model, in turn, considers the linguistic context of the whole document to make up for the short context limit of the n-gram model. The three models are evaluated on a large unconstrained handwriting database, CASIA-HWDB, and the results show that the adoption of common sense knowledge yields improvements in recognition performance, despite the reduced concept list hereby employed.

KW - Common sense knowledge

KW - Handwritten Chinese text recognition

KW - Linguistic context

KW - Natural language processing

KW - n-gram

UR - http://www.scopus.com/inward/record.url?scp=84877086482&partnerID=8YFLogxK

U2 - 10.1007/s12559-012-9183-y

DO - 10.1007/s12559-012-9183-y

M3 - Article

AN - SCOPUS:84877086482

SN - 1866-9956

VL - 5

SP - 234

EP - 242

JO - Cognitive Computation

JF - Cognitive Computation

IS - 2

ER -

Common Sense Knowledge for Handwritten Chinese Text Recognition

Abstract

Keywords

Access to Document

Other files and links

Cite this