The present paper explores a series of challenges faced by Romanian scholars in their attempt to build discipline-specific expert corpora for academic writing. Such corpora are useful when teaching and researching disciplinary writing in L1 Romanian and L2 English. Since many study programs in Romania are also taught in English (IT, Political Science, Economics, for instance), and, moreover, English has been seen for many years as the main academic lingua franca (Mauranen & Randa 2008), most of the academic articles relevant for many disciplines are to be found in English - in addition, papers written in English have a broader impact. The study is based on a bilingual comparable corpus compiled within the DACRE project (Discipline-specific expert academic writing in Romanian and English: corpus-based contrastive analysis models), freshly started in 2021 and financed by the Romanian Executive Unit for Financing Higher Education, Research, Development and Innovation (UEFISCDI) in which we aim to advance the popularisation of corpora in higher education area and create digital instruments and methodological models useful to the national and international language-related research community. The intention of the project is to unfold salient linguistic and rhetorical features specific for each discipline (see Boettger 2016) and each language variety (Romanian, English L1 and L2), as extracted from peer-reviewed scientific articles. At the initial stage of the corpus compilation process, when assessing the linguistic resources to be included in the corpus, a multitude of challenges emerges. For example, the linguistic level of these resources is not consistent (see Yilmaz and Römer 2020). Other difficulties we encountered were the data availability (open sources or subscription-based), lack of recent resources for certain corpus batches, “multi-authorship” in determining L1 texts, and, most important, legal aspects (i.e. copyright). By describing, comparing and analyzing data collection barriers, we propose a model for expert corpus building in English vs in low-resource languages such as Romanian.
Keywords: Romanian vs English academic writing, bilingual expert corpora, discipline-specific writing, Romanian expert corpus, DACRE corpus.