Abstract
We discuss an experiment on automatic identification of bi-gram multiword expressions for Latvian and Lithuanian. As these languages are considered to be underresourced in terms of lexical resources and availability or accuracy of special lexical tools (e.g. POS-tagger, parser), our approach uses raw corpora and combination of lexical association measures and supervised machine learning. We have achieved 92,4% precision and 52,2% recall for Latvian and 95,1% precision and 77,8% recall - for Lithuanian..
Original language | English |
---|---|
Title of host publication | Proceedings of the International MultiConference of Engineers and Computer Scientists 2017, IMECS 2017 |
Editors | Oscar Castillo, S. I. Ao, Craig Douglas, David Dagan Feng, A. M. Korsunsky |
Publisher | Newswood Limited |
Pages | 706-709 |
Number of pages | 4 |
ISBN (Electronic) | 9789881404770 |
Publication status | Published - 2017 |
Event | 2017 International MultiConference of Engineers and Computer Scientists, IMECS 2017 - Hong Kong, Hong Kong Duration: 15 Mar 2017 → 17 Mar 2017 |
Publication series
Name | Lecture Notes in Engineering and Computer Science |
---|---|
Volume | 2228 |
ISSN (Print) | 2078-0958 |
Conference
Conference | 2017 International MultiConference of Engineers and Computer Scientists, IMECS 2017 |
---|---|
Country/Territory | Hong Kong |
City | Hong Kong |
Period | 15/03/17 → 17/03/17 |
Keywords
- Hybrid approach
- Lexical association measures
- Machine learning
- Multi-word expressions
Cite this
Mandravickait, J., Krilaviius, T., & Man, K. L. (2017). Automatic Identification of Multi-Word Expressions for Latvian and Lithuanian. In O. Castillo, S. I. Ao, C. Douglas, D. D. Feng, & A. M. Korsunsky (Eds.), Proceedings of the International MultiConference of Engineers and Computer Scientists 2017, IMECS 2017 (pp. 706-709). (Lecture Notes in Engineering and Computer Science; Vol. 2228). Newswood Limited.