Click here to submit your abstract to the 2024 conference now! Submissions close on 21 February, 23:59 GMT.

An Evaluation of Part-of-Speech Features as Predictors in Native Language Identification Tasks Using Mixed Effects Logistic Regression Model

My research topic is Native Language Identification (NLID) where researchers attempt to tell the author’s first language in an anonymous English-as-a-second-language text from different perspectives. Previous research could be divided into typological approach and statistical approach: Yevgeni Berzak, et al. (2014) used typological features from World Atlas of Language Structures and detect their cross-linguistic transfer, achieving 72.2% accuracy testing on Cambridge First Certificate in English dataset. Moshe Koppel et al. (2005) used ten-fold cross- validation experiments to train a model of pre- defined stylistic features including parts-of- speech (POS) bigrams, function words, letter n- grams and orthography, achieving 80.2% accuracy testing on International Corpus of Learner English. However, the machine learning method was unable to explain the reason behind it despite its high level of accuracy. My research models the prediction power of each POS feature using mixed effects logistic regression method, based on Multidimensional Analysis Tagger and the International Corpus Network of Asian Learners of English. It is a supplement to previous research by providing more detailed linguistic explanations in POS-based NLID tasks.