TY - JOUR
T1 - Fitting Bayesian Item Response Theory Models Using Deep Learning Computational Frameworks
AU - Luo, Nanyu
AU - Han, Yuting
AU - He, Jinbo
AU - Zhang, Xiaoya
AU - Ji, Feng
N1 - Publisher Copyright:
© 2026 AERA. This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
PY - 2026/4
Y1 - 2026/4
N2 - PyTorch and TensorFlow are two widely adopted modern deep learning frameworks that provide comprehensive computational libraries for developing and fitting complex models. Motivated by the technical barriers in recent item response theory (IRT) work and the lack of practice-oriented tutorials, we demonstrate how modern deep learning platforms can be used for Bayesian IRT parameter estimation by providing a didactic yet in-depth introduction to PyTorch and TensorFlow in a psychometric context, framing IRT models as graphical models, and offering step-by-step guidance that bridges probabilistic machine learning and psychometrics. In this study, we illustrate how to leverage these platforms to estimate widely used psychometric models in educational testing, psychological measurement, and behavioral assessment, namely dichotomous and polytomous IRT models and their multidimensional extensions. We compare Hamiltonian Monte Carlo and variational inference estimators for these models in a unified computational environment. Simulation studies show that both approaches yield parameter estimates with low mean squared error and bias in low-dimensional settings, while also indicating that VI might underestimate aspects of posterior uncertainty in higher-dimensional scenarios. Nonetheless, for practitioners who prioritize computational efficiency and scalability, especially when Graphics Processing Unit (GPU) acceleration is available, VI remains a compelling option. Three empirical case studies further demonstrate how PyTorch- and TensorFlow-based implementations compare with established IRT software in applied settings. We conclude by discussing the broader potential of integrating contemporary deep learning tools and perspectives into psychometric research.
AB - PyTorch and TensorFlow are two widely adopted modern deep learning frameworks that provide comprehensive computational libraries for developing and fitting complex models. Motivated by the technical barriers in recent item response theory (IRT) work and the lack of practice-oriented tutorials, we demonstrate how modern deep learning platforms can be used for Bayesian IRT parameter estimation by providing a didactic yet in-depth introduction to PyTorch and TensorFlow in a psychometric context, framing IRT models as graphical models, and offering step-by-step guidance that bridges probabilistic machine learning and psychometrics. In this study, we illustrate how to leverage these platforms to estimate widely used psychometric models in educational testing, psychological measurement, and behavioral assessment, namely dichotomous and polytomous IRT models and their multidimensional extensions. We compare Hamiltonian Monte Carlo and variational inference estimators for these models in a unified computational environment. Simulation studies show that both approaches yield parameter estimates with low mean squared error and bias in low-dimensional settings, while also indicating that VI might underestimate aspects of posterior uncertainty in higher-dimensional scenarios. Nonetheless, for practitioners who prioritize computational efficiency and scalability, especially when Graphics Processing Unit (GPU) acceleration is available, VI remains a compelling option. Three empirical case studies further demonstrate how PyTorch- and TensorFlow-based implementations compare with established IRT software in applied settings. We conclude by discussing the broader potential of integrating contemporary deep learning tools and perspectives into psychometric research.
KW - deep learning
KW - item response theory
KW - Markov Chain Monte Carlo
KW - PyTorch
KW - TensorFlow
KW - variational inference
UR - https://www.scopus.com/pages/publications/105035654293
U2 - 10.3102/10769986261439301
DO - 10.3102/10769986261439301
M3 - Article
AN - SCOPUS:105035654293
SN - 1076-9986
JO - Journal of Educational and Behavioral Statistics
JF - Journal of Educational and Behavioral Statistics
ER -