Comparison of Pose Estimation Models toward Nonverbal Feedback for Pre-service Teachers
Shota Shirasaka, Fukuoka Institute of Technology (Japan)
Takahisa Imagawa, Kyushu Institute of Technology (Japan)
Shuichi Enokida, Kyushu Institute of Technology (Japan)
Abstract
Teacher behavior significantly impacts student learning outcomes [1], and nonverbal cues such as gestures and eye contact enhance instructional effectiveness [2]. Pose estimation, the automated detection of body keypoints, enables objective analysis of behavior. Yet, classroom settings pose specific challenges, including occlusion during board writing when teachers' backs face the camera and rapid postural transitions during greetings. To overcome above, two major approaches exist: CNN-based models offer faster inference with lower computational costs, while Transformer-based models achieve higher accuracy at the expense of greater resources [3]. This trade-off is critical for educational applications requiring both real-time feedback during microteaching and precise analysis of complex poses. However, systematic evaluation in authentic classroom environments remains limited. Therefore, this study extracts challenging scenes from ethically approved microteaching videos of 27 pre-service teachers (three minutes each), and compares representative CNN-based and Transformer-based models in terms of keypoints detection confidence, temporal stability, and processing speed to clarify the strengths and limitations of each approach under educational constraints, aiming to identify the optimal base model for future domain-specific adaptation. The findings will provide evidence-based guidelines for model selection in classroom applications, contributing to objective nonverbal feedback systems that support pre-service teachers in developing effective teaching skills.
Keywords: Nonverbal communication; Pose estimation; Pre-service teacher education; Microteaching
References:
[1] Hattie, J. (2009). Visible Learning. Routledge.
[2] Mayer, R. E. (2020). Multimedia Learning (3rd ed.). Cambridge University Press.
[3] Xu, Y., Zhang, J., Zhang, Q., & Tao, D. (2022). ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation. Advances in Neural Information Processing Systems, 35.
The Future of Education




























