Innovation in Language Learning

Edition 17

Accepted Abstracts

From Distributions To Labels: A Lexical Proficiency Analysis Using Learner Corpora

David Alfter, University of Gothenburg (Sweden)

Yuri Bizzoni, University of Gothenburg (Sweden)

Anders Agebjörn, University of Gothenburg (Sweden)

Abstract

In this work we look at how information from learner essay corpora can be used for the evaluation of unseen learner essays. Using a corpus of learner essays which have been graded by well-trained human evaluators using the CEFR scale, we extract a list of word distributions over CEFR levels. For the analysis of unseen essays, we want to map each word to a CEFR level using this word list. However, the task of mapping from a distribution to a single label is not trivial. Furthermore, the concept of “target level” cannot be applied in this case. Indeed, receptive vocabulary lists derived from reading comprehension texts in textbooks can be said to represent certain target levels at which the vocabulary should be understandable. However, productive vocabulary as observed in learner essays does not contain this information. Hence, we use the concept of “significant onset of use” which estimates at which level a word gets used significantly often.

In contrast to traditional frequency based proficiency estimations, our approach includes information about learners. We look at “diversity” of a word, i.e. by how many different learners has the word been used at each level. Preliminary analyses have shown that including diversity scores in the calculation of distribution-to-label mapping yields more reliable and plausible mappings. 

Finally, we are investigating how we can evaluate the mapping from distribution to label. We show that the distributional profile of words from the essays, informed with the essays' levels, consistently overlaps with our frequency-based learner-augmented method, in the sense that words holding the same level of proficiency as predicted by our mapping tend to cluster together in a semantic space. In the absence of a gold standard, this information can be useful to see how often a word is associated with the same level in two different models. Also, in this case we have a similarity measure that can show which words are more central to a given level and which words are more peripheral.

 

Back to the list

REGISTER NOW

Reserved area


Media Partners:

Click BrownWalker Press logo for the International Academic and Industry Conference Event Calendar announcing scientific, academic and industry gatherings, online events, call for papers and journal articles
Pixel - Via Luigi Lanzi 12 - 50134 Firenze (FI) - VAT IT 05118710481
    Copyright © 2024 - All rights reserved

Privacy Policy

Webmaster: Pinzani.it