The Future of Education

Edition 15

Accepted Abstracts

Unsupervised Clustering Approaches for Dyslexia Screening Using Web-Based Gamified Data: A High-Silhouette Case Study

Nora Fink, AI researcher (Germany)

Abstract

Dyslexia is a common learning disability with neurobiological origins, affecting over ten percent of the global population. Traditional screening and diagnosis rely on time-consuming in-person evaluations, often delaying effective interventions. Recent advances in web-based testing and machine learning suggest that alternative approaches—particularly those leveraging data-driven insights—could bolster early dyslexia detection. In this work, we focus on unsupervised learning applied to a large-scale dataset (3,644 participants on desktop, plus an additional 1,395 on tablet, each having 197 features) derived from a gamified linguistic assessment designed to gauge phonological, orthographic, and working memory capabilities. We preprocess and scale the data, then compare multiple state-of-the-art clustering methods: K-Means, DBSCAN, Gaussian Mixture Models (GMM), and Agglomerative Clustering. We achieve a silhouette score of 0.8463 with Agglomerative Clustering (linkage = complete, n_clusters = 2), surpassing the next-best unsupervised approach (GMM, silhouette = 0.3316). We embed thorough hyperparameter tuning procedures and discuss practical implications, benchmarking our approach against a previously published dyslexia study that focused on a different set of machine learning methods and reading-based metrics. Our findings demonstrate that certain unsupervised strategies can uncover meaningful clusters within linguistic-interaction data, potentially aiding early screening. While we emphasize these findings are not a substitute for professional evaluation, they highlight the promise of web-based, minimal-equipment, data-driven tools for dyslexia screening in multiple orthographies.
 
Keywords: Dyslexia, Unsupervised Learning, Agglomerative Clustering, Silhouette Score, Linguistic Assessment, Web-based Screening
 
REFERENCE
 
[1] International Dyslexia Association. Definition of Dyslexia; 2019.
[2] Wharton C. Dyslexia and Education: A Systematic Review of Dropout Rates in Dyslexia. J Clin Educ. 2020; 14(2): 56–62.
[3] Diaz T. Efficacy of Early Literacy Interventions for Children with Dyslexia: A Meta-Analysis. Reading Res Q. 2019; 48(3): 345–359.
[4] Høien T, Lundberg I. Dyslexia: From Theory to Intervention. Kluwer Academic Publishers; 2000.
[5] Seymour PHK, Aro M, Erskine JM. Foundation literacy acquisition in European orthographies. Br J Psychol. 2003; 94(2): 143–174.
[6] Cuetos F. Psicología de la lectura. Ediciones Pirámide; 2017.
[7] Bishop DV, Snowling MJ. Developmental dyslexia and specific language impairment: same or different? Psychol Bull. 2004; 130(6): 858–886.
[8] Macaruso P, Shankweiler D, Crain S. Performance of dyslexic readers on hidden game-based tasks: preliminary evidence. Dev Sci. 2019; 22(4): e12825.
[9] Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987; 20: 53–65.
[10] Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques. Elsevier; 2011.
[11] Olulade OA, Napoliello EM, Eden GF. A neurodevelopmental perspective on the role of self-paced reading for dyslexia detection. Dev Psychol. 2021; 57(3): 393–407.
[12] Wolf M, Bowers PG. The double-deficit hypothesis for the developmental dyslexias. J Educ Psychol. 1999; 91(3): 415–438.
[13] Quaglini S, Stefanelli M. Machine learning in clinical decision support: bridging the gap. Artif Intell Med. 2009; 46(1): 5–17.
[14] Blanco E, et al. Eye-Tracking Approaches for Dyslexia Detection: A Systematic Review of Models. Neurocomputing. 2022; 512: 248–260.
[15] Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep Learning for computational biology. Mol Syst Biol. 2016; 12(7): 878.
[16] Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons; 1990.
[17] Xu R, Wunsch D. Clustering. IEEE Press/Wiley-Interscience; 2009.
[18] Fink A. Large-scale Web-based Gamified Testing for Dyslexia in Spanish: A Data Collection Approach. Cog Sci J. 2020; 44(2): 101–121.
[19] Witten IH, Frank E, Hall MA, Pal CJ. Data Mining: Practical Machine Learning Tools and Techniques. Elsevier; 2016.
[20] Jolliffe IT. Principal Component Analysis. Springer-Verlag; 2002.
[21] MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Statist. Probability. University of California Press; 1967. p. 281–297.
[22] Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining; 1996. p. 226–231.
[23] Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006.
[24] Ward JH. Hierarchical Grouping to Optimize an Objective Function. J Am Stat Assoc. 1963; 58(301): 236–244.
[25] Xu D, Tian Y. A Comprehensive Survey of Clustering Algorithms. Ann Data Sci. 2015; 2(2): 165–193.
[26] Santamaría R. Using Random Forest to Predict Dyslexia in Spanish. Comput Educ J. 2018; 91(1): 266–282.
[27] Willcutt EG, Pennington BF. Comorbidity of reading disability and attention-deficit/hyperactivity disorder: differences by gender and subtype. J Learn Disabil. 2000; 33(2): 179–191.
[28] Shinaver CS, Entwistle P. Reliability issues in ADHD/dyslexia screening: an online perspective. In: 12th Int. Conf. e-Learning. 2019. p. 45–52.
[29] Reichel M, Schulte-Körne G. Evaluation of advanced features in a text-based approach for the automatic detection of dyslexia. Comput Speech Lang. 2022; 73: 101343.
 

Back to the list

REGISTER NOW

Reserved area


Indexed in


Media Partners:

Click BrownWalker Press logo for the International Academic and Industry Conference Event Calendar announcing scientific, academic and industry gatherings, online events, call for papers and journal articles
Pixel - Via Luigi Lanzi 12 - 50134 Firenze (FI) - VAT IT 05118710481
    Copyright © 2025 - All rights reserved

Privacy Policy

Webmaster: Pinzani.it