Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks

Qiufeng Wang*, Minghuan Liu, Weijia Zhang, Yuhang Guo, Tianrui Li

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

3 Citations (Scopus)

Abstract

Rapid increase of the scale of text carries huge costs for manual proofreading. In comparison, automatic proofreading shows great advantages on time and human resource, drawing more researchers into it. In this paper, we propose two attention based deep neural network models combined with confusion sets to detect and correct possible Chinese spelling errors in character-level. Our proposed approaches first model the context of Chinese character embedding using Long Short-Term Memory (LSTM) networks, then score the probabilities of candidates from its confusion set through attention mechanism, choosing the highest one as the prediction answer. Also, we define a new methodology for obtaining (preceding text, following text, candidates, target) quads and provides a supervised dataset for training and testing (Our data has been released to the public in https://github.com/ccit-proofread.). Performance evaluation indicates that our models achieve the state-of-the-art performance and outperform a set of baselines.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings
EditorsJie Tang, Min-Yen Kan, Dongyan Zhao, Sujian Li, Hongying Zan
PublisherSpringer
Pages349-359
Number of pages11
ISBN (Print)9783030322359
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event8th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2019 - Dunhuang, China
Duration: 9 Oct 201914 Oct 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11839 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2019
Country/TerritoryChina
CityDunhuang
Period9/10/1914/10/19

Keywords

  • Attention mechanism
  • Error correction of Chinese text
  • Error detection of Chinese text
  • LSTM model

Cite this