Text-Based Glossary Generation by ChatGPT for FL Learners
Iglika Nikolova-Stoupak, Sorbonne Université (France)
Gaël Lejeune, Sorbonne Université (France)
Eva Schaeffer-Lacroix, Sorbonne Université (France)
Abstract
Vocabulary acquisition, a key aspect of foreign language (FL) learning, proves challenging to learners for a number of reasons, ranging from a lack of shared vocabulary with their native language to personal constraints relating to time management and memory profiles. Vocabulary is learnt more easily and efficiently when encountered in natural context rather than in isolation [4, 9]. Graded readers, which are adapted to specific levels of competence and have proven to have a strong positive effect on a learner’s vocabulary skills [8], typically feature the definitions of the words that are most complex for the proficiency level and most likely to be useful outside the framework of the given text. In an attempt to relieve FL teaching professionals in terms of time, effort and finances, the present study seeks to define the best practices for automatic generation of glossaries based on original and adapted reading materials. Experiments will make use of Alice’s Adventures in Wonderland by Lewis Carroll (1865) and of GPT-4 and GeminiPro, two state-of-the-art LLMs known for their multicultural capabilities [1, 2] and efficient use of long context [3, 6]. Three target languages from distinct linguistic families and with varying degrees of resourcedness will be addressed: English, Japanese and Bulgarian. In addition, different scenarios of generation will be experimented with, such as zero-shot (the model is given a text and asked to provide a glossary suitable for foreign language learners) and one-shot (the prompt also includes an example of a text accompanied by a glossary). The issuing glossaries will be analysed quantitatively, such as based on the parts-of-speech and frequencies of the words included. In the case of English, professionally crafted glossaries will be consulted as a gold standard.
Keywords |
large language models (LLMs) lexicography multilingual glossaries reading comprehension vocabulary acquisition |
REFERENCES |
[1] Ahuja, K., Diddee, H., Hada, R., Ochieng, M., Ramesh, K., Jain, P., Nambi, A., et al. (2023). Mega: Multilingual Evaluation of Generative AI. arXiv preprint arXiv:2303.12528. [2] Buscemi, A., & Proverbio, D. (2024). ChatGPT vs Gemini vs LLaMA on Multilingual Sentiment Analysis. arXiv preprint arXiv:2402.01715. [3] Du, Z., Jiao, W., Wang, L., Lyu, C., Pang, J., Cui, L., Song, K., Wong, D., Shi, S., & Tu, Z. (2023). On Extrapolation of Long-Text Translation with Large Language Models. [4] Godwin-Jones, R. (2018). Contextualized Vocabulary Learning. Language Learning & Technology, 22(3), 1–19. https://doi.org/10125/44651 [5] Laufer, B. & Ravenhorst-Kalovski, G. (2010). Lexical Threshold Revisited: Lexical Text Coverage, Learners’ Vocabulary Size and Reading Comprehension. Reading in a Foreign Language, 22(1) :15–30. [6] Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the Middle: How Language Models Use Long Contexts. [7] Nation, P. & Wang, K. (1999). Graded Readers and Vocabulary. Reading in a Foreign Language, 12(2), 355-379. [8] Restrepo Ramos, F. D. (2015). Incidental Vocabulary Learning in Second Language Acquisition: A Literature Review. PROFILE Issues in Teachers’ Professional Development, 17(1), 157-166. http://dx.doi.org/10.15446/profile.v17n1.43957
|