In a contribution to the BEA 2024 Shared Task, we addressed the challenge of predicting the difficulty and response time of multiple-choice questions from the United States Medical Licensing Examination® (USMLE®). This exam is an important assessment for medical professionals.

To predict these variables, we evaluated various BERT-like pre-trained transformer models. We combined these models with Scalar Mixing and two custom 2-layer classification heads, using learnable Rational Activations as the activation function. This multi-task setup allowed us to predict both item difficulty and response time.

The results were noteworthy. Our models placed first out of 43 participants in predicting item difficulty and fifth out of 34 participants in predicting item response time. This demonstrates the potential of advanced AI techniques in improving the evaluation processes of critical exams like the USMLE®.

The publication can be found here: