Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks

Qiufeng Wang; Minghuan Liu; Weijia Zhang; Yuhang Guo; Tianrui Li

doi:10.1007/978-3-030-32236-6_31

Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks

Qiufeng Wang^*, Minghuan Liu, Weijia Zhang, Yuhang Guo, Tianrui Li

^*Corresponding author for this work

Southwest Jiaotong University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

3 Citations (Scopus)

Abstract

Rapid increase of the scale of text carries huge costs for manual proofreading. In comparison, automatic proofreading shows great advantages on time and human resource, drawing more researchers into it. In this paper, we propose two attention based deep neural network models combined with confusion sets to detect and correct possible Chinese spelling errors in character-level. Our proposed approaches first model the context of Chinese character embedding using Long Short-Term Memory (LSTM) networks, then score the probabilities of candidates from its confusion set through attention mechanism, choosing the highest one as the prediction answer. Also, we define a new methodology for obtaining (preceding text, following text, candidates, target) quads and provides a supervised dataset for training and testing (Our data has been released to the public in https://github.com/ccit-proofread.). Performance evaluation indicates that our models achieve the state-of-the-art performance and outperform a set of baselines.

Original language	English
Title of host publication	Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings
Editors	Jie Tang, Min-Yen Kan, Dongyan Zhao, Sujian Li, Hongying Zan
Publisher	Springer
Pages	349-359
Number of pages	11
ISBN (Print)	9783030322359
DOIs	https://doi.org/10.1007/978-3-030-32236-6_31
Publication status	Published - 2019
Externally published	Yes
Event	8th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2019 - Dunhuang, China Duration: 9 Oct 2019 → 14 Oct 2019

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11839 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	8th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2019
Country/Territory	China
City	Dunhuang
Period	9/10/19 → 14/10/19

Keywords

Attention mechanism
Error correction of Chinese text
Error detection of Chinese text
LSTM model

Access to Document

10.1007/978-3-030-32236-6_31

Cite this

Wang, Q., Liu, M., Zhang, W., Guo, Y., & Li, T. (2019). Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks. In J. Tang, M.-Y. Kan, D. Zhao, S. Li, & H. Zan (Eds.), Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings (pp. 349-359). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11839 LNAI). Springer. https://doi.org/10.1007/978-3-030-32236-6_31

Wang, Qiufeng ; Liu, Minghuan ; Zhang, Weijia et al. / Automatic Proofreading in Chinese : Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks. Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings. editor / Jie Tang ; Min-Yen Kan ; Dongyan Zhao ; Sujian Li ; Hongying Zan. Springer, 2019. pp. 349-359 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{805a79c53e694636bf18c66c99dc7f30,

title = "Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks",

abstract = "Rapid increase of the scale of text carries huge costs for manual proofreading. In comparison, automatic proofreading shows great advantages on time and human resource, drawing more researchers into it. In this paper, we propose two attention based deep neural network models combined with confusion sets to detect and correct possible Chinese spelling errors in character-level. Our proposed approaches first model the context of Chinese character embedding using Long Short-Term Memory (LSTM) networks, then score the probabilities of candidates from its confusion set through attention mechanism, choosing the highest one as the prediction answer. Also, we define a new methodology for obtaining (preceding text, following text, candidates, target) quads and provides a supervised dataset for training and testing (Our data has been released to the public in https://github.com/ccit-proofread.). Performance evaluation indicates that our models achieve the state-of-the-art performance and outperform a set of baselines.",

keywords = "Attention mechanism, Error correction of Chinese text, Error detection of Chinese text, LSTM model",

author = "Qiufeng Wang and Minghuan Liu and Weijia Zhang and Yuhang Guo and Tianrui Li",

note = "Publisher Copyright: {\textcopyright} 2019, Springer Nature Switzerland AG.; 8th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2019 ; Conference date: 09-10-2019 Through 14-10-2019",

year = "2019",

doi = "10.1007/978-3-030-32236-6_31",

language = "English",

isbn = "9783030322359",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "349--359",

editor = "Jie Tang and Min-Yen Kan and Dongyan Zhao and Sujian Li and Hongying Zan",

booktitle = "Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings",

}

Wang, Q, Liu, M, Zhang, W, Guo, Y & Li, T 2019, Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks. in J Tang, M-Y Kan, D Zhao, S Li & H Zan (eds), Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11839 LNAI, Springer, pp. 349-359, 8th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2019, Dunhuang, China, 9/10/19. https://doi.org/10.1007/978-3-030-32236-6_31

Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks. / Wang, Qiufeng; Liu, Minghuan; Zhang, Weijia et al.
Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings. ed. / Jie Tang; Min-Yen Kan; Dongyan Zhao; Sujian Li; Hongying Zan. Springer, 2019. p. 349-359 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11839 LNAI).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Automatic Proofreading in Chinese

T2 - 8th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2019

AU - Wang, Qiufeng

AU - Liu, Minghuan

AU - Zhang, Weijia

AU - Guo, Yuhang

AU - Li, Tianrui

PY - 2019

Y1 - 2019

N2 - Rapid increase of the scale of text carries huge costs for manual proofreading. In comparison, automatic proofreading shows great advantages on time and human resource, drawing more researchers into it. In this paper, we propose two attention based deep neural network models combined with confusion sets to detect and correct possible Chinese spelling errors in character-level. Our proposed approaches first model the context of Chinese character embedding using Long Short-Term Memory (LSTM) networks, then score the probabilities of candidates from its confusion set through attention mechanism, choosing the highest one as the prediction answer. Also, we define a new methodology for obtaining (preceding text, following text, candidates, target) quads and provides a supervised dataset for training and testing (Our data has been released to the public in https://github.com/ccit-proofread.). Performance evaluation indicates that our models achieve the state-of-the-art performance and outperform a set of baselines.

AB - Rapid increase of the scale of text carries huge costs for manual proofreading. In comparison, automatic proofreading shows great advantages on time and human resource, drawing more researchers into it. In this paper, we propose two attention based deep neural network models combined with confusion sets to detect and correct possible Chinese spelling errors in character-level. Our proposed approaches first model the context of Chinese character embedding using Long Short-Term Memory (LSTM) networks, then score the probabilities of candidates from its confusion set through attention mechanism, choosing the highest one as the prediction answer. Also, we define a new methodology for obtaining (preceding text, following text, candidates, target) quads and provides a supervised dataset for training and testing (Our data has been released to the public in https://github.com/ccit-proofread.). Performance evaluation indicates that our models achieve the state-of-the-art performance and outperform a set of baselines.

KW - Attention mechanism

KW - Error correction of Chinese text

KW - Error detection of Chinese text

KW - LSTM model

UR - http://www.scopus.com/inward/record.url?scp=85075822914&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-32236-6_31

DO - 10.1007/978-3-030-32236-6_31

M3 - Conference Proceeding

AN - SCOPUS:85075822914

SN - 9783030322359

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 349

EP - 359

BT - Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings

A2 - Tang, Jie

A2 - Kan, Min-Yen

A2 - Zhao, Dongyan

A2 - Li, Sujian

A2 - Zan, Hongying

PB - Springer

Y2 - 9 October 2019 through 14 October 2019

ER -

Wang Q, Liu M, Zhang W, Guo Y, Li T. Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks. In Tang J, Kan MY, Zhao D, Li S, Zan H, editors, Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings. Springer. 2019. p. 349-359. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-32236-6_31

Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this