Automatic Identification of Multi-Word Expressions for Latvian and Lithuanian

Justina Mandravickait, Tomas Krilaviius, Ka Lok Man

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

We discuss an experiment on automatic identification of bi-gram multiword expressions for Latvian and Lithuanian. As these languages are considered to be underresourced in terms of lexical resources and availability or accuracy of special lexical tools (e.g. POS-tagger, parser), our approach uses raw corpora and combination of lexical association measures and supervised machine learning. We have achieved 92,4% precision and 52,2% recall for Latvian and 95,1% precision and 77,8% recall - for Lithuanian..

Original languageEnglish
Title of host publicationProceedings of the International MultiConference of Engineers and Computer Scientists 2017, IMECS 2017
EditorsOscar Castillo, S. I. Ao, Craig Douglas, David Dagan Feng, A. M. Korsunsky
PublisherNewswood Limited
Pages706-709
Number of pages4
ISBN (Electronic)9789881404770
Publication statusPublished - 2017
Event2017 International MultiConference of Engineers and Computer Scientists, IMECS 2017 - Hong Kong, Hong Kong
Duration: 15 Mar 201717 Mar 2017

Publication series

NameLecture Notes in Engineering and Computer Science
Volume2228
ISSN (Print)2078-0958

Conference

Conference2017 International MultiConference of Engineers and Computer Scientists, IMECS 2017
Country/TerritoryHong Kong
CityHong Kong
Period15/03/1717/03/17

Keywords

  • Hybrid approach
  • Lexical association measures
  • Machine learning
  • Multi-word expressions

Cite this