Key words when text forms the unit of study: Sizing up the effects of different measures

Stephen Jeaco

doi:10.1075/ijcl.18053.jea

Key words when text forms the unit of study: Sizing up the effects of different measures

Stephen Jeaco^*

^*Corresponding author for this work

Department of Applied Linguistics

Research output: Contribution to journal › Article › peer-review

8 Citations (Scopus)

Abstract

Throughout the social sciences, there has been growing pressure to present effect sizes when publishing empirical data (see American Psychological Association, 2001; Parsons & Nelson, 2004). While it seems indisputable that for the majority of quantitative research foci, effect size is an essential element of statistical analysis, this paper argues that specifically for key word analysis in corpus linguistics, the means of reporting effect size must depend on the level of the unit of study of each investigation (single text, collection or large corpus). After exploring some main criticisms of the log-likelihood measure, this paper unpacks the parameters of different measures for keyness and how they might address underlying concerns. It maintains that for the exploration of foregrounded/deviant/salient/marked features in text, the use of log-likelihood scores to rank the results is still fit for purpose and coupled with Bayes Factors is a solid approach for key word analyses.

Original language	English
Pages (from-to)	125-154
Number of pages	30
Journal	International Journal of Corpus Linguistics
Volume	25
Issue number	2
DOIs	https://doi.org/10.1075/ijcl.18053.jea
Publication status	Published - 28 Aug 2020

Keywords

Effect size
Key word analysis
Keyness
Log-likelihood
Ranking

Access to Document

10.1075/ijcl.18053.jeaLicence: CC BY-NC

Cite this

@article{8041a862fdc74fa18478ee0f66a05192,

title = "Key words when text forms the unit of study: Sizing up the effects of different measures",

abstract = "Throughout the social sciences, there has been growing pressure to present effect sizes when publishing empirical data (see American Psychological Association, 2001; Parsons & Nelson, 2004). While it seems indisputable that for the majority of quantitative research foci, effect size is an essential element of statistical analysis, this paper argues that specifically for key word analysis in corpus linguistics, the means of reporting effect size must depend on the level of the unit of study of each investigation (single text, collection or large corpus). After exploring some main criticisms of the log-likelihood measure, this paper unpacks the parameters of different measures for keyness and how they might address underlying concerns. It maintains that for the exploration of foregrounded/deviant/salient/marked features in text, the use of log-likelihood scores to rank the results is still fit for purpose and coupled with Bayes Factors is a solid approach for key word analyses.",

keywords = "Effect size, Key word analysis, Keyness, Log-likelihood, Ranking",

author = "Stephen Jeaco",

note = "Publisher Copyright: {\textcopyright} John Benjamins Publishing Company",

year = "2020",

month = aug,

day = "28",

doi = "10.1075/ijcl.18053.jea",

language = "English",

volume = "25",

pages = "125--154",

journal = "International Journal of Corpus Linguistics",

issn = "1384-6655",

number = "2",

}

TY - JOUR

T1 - Key words when text forms the unit of study

T2 - Sizing up the effects of different measures

AU - Jeaco, Stephen

N1 - Publisher Copyright: © John Benjamins Publishing Company

PY - 2020/8/28

Y1 - 2020/8/28

N2 - Throughout the social sciences, there has been growing pressure to present effect sizes when publishing empirical data (see American Psychological Association, 2001; Parsons & Nelson, 2004). While it seems indisputable that for the majority of quantitative research foci, effect size is an essential element of statistical analysis, this paper argues that specifically for key word analysis in corpus linguistics, the means of reporting effect size must depend on the level of the unit of study of each investigation (single text, collection or large corpus). After exploring some main criticisms of the log-likelihood measure, this paper unpacks the parameters of different measures for keyness and how they might address underlying concerns. It maintains that for the exploration of foregrounded/deviant/salient/marked features in text, the use of log-likelihood scores to rank the results is still fit for purpose and coupled with Bayes Factors is a solid approach for key word analyses.

AB - Throughout the social sciences, there has been growing pressure to present effect sizes when publishing empirical data (see American Psychological Association, 2001; Parsons & Nelson, 2004). While it seems indisputable that for the majority of quantitative research foci, effect size is an essential element of statistical analysis, this paper argues that specifically for key word analysis in corpus linguistics, the means of reporting effect size must depend on the level of the unit of study of each investigation (single text, collection or large corpus). After exploring some main criticisms of the log-likelihood measure, this paper unpacks the parameters of different measures for keyness and how they might address underlying concerns. It maintains that for the exploration of foregrounded/deviant/salient/marked features in text, the use of log-likelihood scores to rank the results is still fit for purpose and coupled with Bayes Factors is a solid approach for key word analyses.

KW - Effect size

KW - Key word analysis

KW - Keyness

KW - Log-likelihood

KW - Ranking

UR - http://www.scopus.com/inward/record.url?scp=85092322215&partnerID=8YFLogxK

U2 - 10.1075/ijcl.18053.jea

DO - 10.1075/ijcl.18053.jea

M3 - Article

AN - SCOPUS:85092322215

SN - 1384-6655

VL - 25

SP - 125

EP - 154

JO - International Journal of Corpus Linguistics

JF - International Journal of Corpus Linguistics

IS - 2

ER -

Key words when text forms the unit of study: Sizing up the effects of different measures

Abstract

Keywords

Access to Document

Other files and links

Cite this