Concordance line sorting in the Prime Machine

Stephen Jeaco

doi:10.1075/ijcl.18056.jea

Concordance line sorting in the Prime Machine

Stephen Jeaco^*

^*Corresponding author for this work

Department of Applied Linguistics

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Corpus data provide evidence of the patterning of language, and one way word usage can be analysed is through the study of concordance lines. While popular concordancers provide different sorting methods, they are typically only able to display lines in the order in which they occur in the corpus, randomly, or alphabetically by words in slots to the left or right of the word of interest. Less sophisticated users may find recognising patterns from these orderings quite challenging. This paper considers possible needs of language learners in terms of concordance ranking and introduces two methods which have been adopted and developed for The Prime Machine. The first method uses repeated patterns, measuring the number of matches made with other lines in the set. The second method incorporates collocation scores, providing examples with strong collocations from the entire corpus at the top of sampled concordance lines.

Original language	English
Pages (from-to)	284-297
Number of pages	14
Journal	International Journal of Corpus Linguistics
Volume	26
Issue number	2
DOIs	https://doi.org/10.1075/ijcl.18056.jea
Publication status	Published - 14 Jul 2021

Keywords

Collocation
Concordance line ranking
Data-driven learning
Lexical patterning

Access to Document

10.1075/ijcl.18056.jea

Cite this

@article{490b1870c19145dfa9c3f0b4994de45d,

title = "Concordance line sorting in the Prime Machine",

abstract = "Corpus data provide evidence of the patterning of language, and one way word usage can be analysed is through the study of concordance lines. While popular concordancers provide different sorting methods, they are typically only able to display lines in the order in which they occur in the corpus, randomly, or alphabetically by words in slots to the left or right of the word of interest. Less sophisticated users may find recognising patterns from these orderings quite challenging. This paper considers possible needs of language learners in terms of concordance ranking and introduces two methods which have been adopted and developed for The Prime Machine. The first method uses repeated patterns, measuring the number of matches made with other lines in the set. The second method incorporates collocation scores, providing examples with strong collocations from the entire corpus at the top of sampled concordance lines.",

keywords = "Collocation, Concordance line ranking, Data-driven learning, Lexical patterning",

author = "Stephen Jeaco",

note = "Publisher Copyright: {\textcopyright} John Benjamins Publishing Company.",

year = "2021",

month = jul,

day = "14",

doi = "10.1075/ijcl.18056.jea",

language = "English",

volume = "26",

pages = "284--297",

journal = "International Journal of Corpus Linguistics",

issn = "1384-6655",

number = "2",

}

TY - JOUR

T1 - Concordance line sorting in the Prime Machine

AU - Jeaco, Stephen

N1 - Publisher Copyright: © John Benjamins Publishing Company.

PY - 2021/7/14

Y1 - 2021/7/14

N2 - Corpus data provide evidence of the patterning of language, and one way word usage can be analysed is through the study of concordance lines. While popular concordancers provide different sorting methods, they are typically only able to display lines in the order in which they occur in the corpus, randomly, or alphabetically by words in slots to the left or right of the word of interest. Less sophisticated users may find recognising patterns from these orderings quite challenging. This paper considers possible needs of language learners in terms of concordance ranking and introduces two methods which have been adopted and developed for The Prime Machine. The first method uses repeated patterns, measuring the number of matches made with other lines in the set. The second method incorporates collocation scores, providing examples with strong collocations from the entire corpus at the top of sampled concordance lines.

AB - Corpus data provide evidence of the patterning of language, and one way word usage can be analysed is through the study of concordance lines. While popular concordancers provide different sorting methods, they are typically only able to display lines in the order in which they occur in the corpus, randomly, or alphabetically by words in slots to the left or right of the word of interest. Less sophisticated users may find recognising patterns from these orderings quite challenging. This paper considers possible needs of language learners in terms of concordance ranking and introduces two methods which have been adopted and developed for The Prime Machine. The first method uses repeated patterns, measuring the number of matches made with other lines in the set. The second method incorporates collocation scores, providing examples with strong collocations from the entire corpus at the top of sampled concordance lines.

KW - Collocation

KW - Concordance line ranking

KW - Data-driven learning

KW - Lexical patterning

UR - http://www.scopus.com/inward/record.url?scp=85110848975&partnerID=8YFLogxK

U2 - 10.1075/ijcl.18056.jea

DO - 10.1075/ijcl.18056.jea

M3 - Article

AN - SCOPUS:85110848975

SN - 1384-6655

VL - 26

SP - 284

EP - 297

JO - International Journal of Corpus Linguistics

JF - International Journal of Corpus Linguistics

IS - 2

ER -

Concordance line sorting in the Prime Machine

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this