Learner corpus and its application to automatic level checking using machine learning algorithms

Md Maruf Hasan, Oo Khaing Hnin

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

5 Citations (Scopus)


A learner corpus is a computerized textual database of the language produced by foreign language learners. Annotated learner corpora contain invaluable meta-information about learners and the errors they make. With proper feature extractions and machine learning techniques, it is possible to extract implicit and explicit knowledge from learner corpora and develop useful applications to support effective foreign language teaching and learning, such as automatic proficiency level checking, error-driven and personalized learning etc. In this paper, we use a learner corpus and experiment with feature extraction and machine learning techniques to explore such applications. In particular, we reported our experimental results in automatic proficiency checking with ID3 and C4.5 Decision Tree algorithms, Bayesian Net and SVM. We also briefly outline other potential applications of learner corpora such as in error-driven learning by using implicit and explicit features along with machine learning.

Original languageEnglish
Title of host publication5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2008
Number of pages4
Publication statusPublished - 2008
Externally publishedYes
Event5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2008 - Krabi, Thailand
Duration: 14 May 200817 May 2008

Publication series

Name5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2008


Conference5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2008


  • Annotated learner corpora
  • Automatic proficiency level checking
  • Computer-assisted language learning (CALL)
  • Foreign language pedagogy

Cite this