Innovation in Language Learning

Edition 17

Accepted Abstracts

CLARIN Resource Families for Language Learning and Teaching

Darja Fišer, University of Ljubljana (Slovenia)

Jakob Lenardič, Jožef Stefan Institute (Slovenia)

Abstract

CLARIN (https://www.clarin.eu) is a European Research Infrastructure whose primary aim is to support and facilitate the accessibility of language resources and tools to Humanities and Social Science researchers by providing access to certified data repositories (de Jong et al., 2018). The aim of this paper is to present CLARIN’s recent initiative, CLARIN Resource Families (https://www.clarin.eu/resource-families), and discuss its potential for language learning and teaching. CLARIN Resource Families are curated overviews of language corpora throughout the network of CLARIN centres. The resource families, which provide corpora in most European languages and beyond, are organized around specific types of language data, which currently comprise newspaper corpora, corpora of computer-mediated communication, corpora of parliamentary debates, L2-learner corpora, and parallel corpora but will soon be extended with historical and spoken corpora as well. The overviews of over 120 corpora include the most important metadata and descriptions on corpus size, text sources, time periods, annotations and licences. What is more, they also provide direct links to concordancers and download pages whenever available. With a combination of rich annotations and powerful concordancers, they can serve as invaluable didactic resources for general as well as specialized domains for a wide range of lexicographic, terminological, grammatical and stylistic classroom activities. While the corpora can be used in monolingual settings, they can also be compared and contrasted across languages. Of special importance for language teachers are the L2-learner corpora, which play a crucial role in second language research and pedagogy. We provide a user-friendly overview of 34 L2 corpora divided on the basis of their modality into written, spoken, and multimodal corpora. In addition to standard corpus metadata, such as corpus size, licence and annotation, we also provide metadata, specific for L2 corpora contain, such as the target L2 language and the L1 backgrounds of the speakers, error annotation and prosodic mark-up in the case of spoken corpora.

DE JONG, et al. 2018.  “CLARIN: Towards FAIR and Responsible Data Science Using Language Resources.” In N. Calzolari et al. (eds.), Proceedings of LREC 2018, May 7–12, Miyazaki, Japan.

Keywords: research infrastructures, on-line language resources, corpora;

Back to the list

REGISTER NOW

Reserved area


Media Partners:

Click BrownWalker Press logo for the International Academic and Industry Conference Event Calendar announcing scientific, academic and industry gatherings, online events, call for papers and journal articles
Pixel - Via Luigi Lanzi 12 - 50134 Firenze (FI) - VAT IT 05118710481
    Copyright © 2024 - All rights reserved

Privacy Policy

Webmaster: Pinzani.it