Innovation in Language Learning

An Ensemble Classifier for Error Detection and Recommendation in the Use of Articles by Learners of English

Kiruthikaa Krishnamoorthy, National University of Ireland, Galway (Ireland)

Thomas Gaillat, Insight Centre for Data Analytics NUI Galway (Ireland)

Abstract

Learner English can be classified into multiple proficiency levels based on their fluency. In a typical learning setup, multiple texts written by learners are repeatedly evaluated by native English speakers/teachers. This process can get time-consuming and could take a while before the learner gets feedback. The aim of our research is to propose a method to automate the evaluation process by leveraging Machine Learning Classifiers to predict errors and offer recommended alternatives as a feedback to the learners. Hence, reduce the efforts for evaluation and the time-taken to receive feedback. Our approach focuses on using linguistic microsystems [1] as a modeling process. The focus of this paper is to model the native English article microsystem [2], then use it as a evaluation system for learner texts. We present a classifier for article use in learner writings. The three articles, a, the, zero/null, form a microsystem that appears simple to the native speakers but is a complex issue for learners [5]. The article microsystem in English is influenced by multiple features. Hence, the preliminary task is to identify the features and then to represent these features. The text and its POS tags are foundational features which are further enriched by more features like – identification of anaphoric links and countability. A vector representation of these features is used to train the machine learning model. The Brown Corpus [3] is used as a native written text collection for training. The REALEC [4] corpus, a learner corpus, consists of the errors identified and categorized as articles, tenses etc and, corrected by a human evaluator. An ideal system should be able to differentiate incorrect texts from the correct ones. Hence, we use a combination of the REALEC and the Brown corpus as a Gold Standard to test the ability of the system to not only detect errors but also to validate the accuracy of its corrections. Results of the experiments show that 72% accuracy, the outcome from this system is aligned with a human evaluator. Therefore, the proposed system can be used as an is a error detection and correction strategy prior to human verification.

Keywords: Article Microsystem, Learner Error Evaluation and Correction, Multi-Layer annotations, Ensemble Classifier;

Back to the list

Innovation in Language Learning

Media Partners: