New Pub: GRIPF at TSAR 2025 Shared Task Towards controlled CEFR level simplification with the help of inter-model interactions

New Pub: GRIPF at TSAR 2025 Shared Task Towards controlled CEFR level simplification with the help of inter-model interactions

New Pub
Language learners make the fastest progress when reading texts that match their proficiency level. But most real-world texts are too hard—and manually adapting them is time-consuming. So the big question is: Can AI automatically simplify texts to a specific CEFR level without losing meaning? We explored exactly this in the TSAR 2025 Shared Task, where systems had to rewrite advanced English texts (B2+) to easier levels like A2 or B1. Our team submitted two different approaches: EZ-SCALAR and SAGA. EZ-SCALAR works like an expert panel of AI models. Two large language models (GPT-5 and Claude) each produce their own simplification, critique each other, refine their versions, and then a final “judge” model picks the best result. An extended version, EZ-SCALAR Lex, adds something extra: a vocabulary check using EFLLex, a…
Read More
New Pub: Characterizing students’ energy learning trajectories

New Pub: Characterizing students’ energy learning trajectories

New Pub
Helping students apply energy ideas to everyday situations is a core goal in physics education. But not all students get there—and it’s not just about who knows the most content. In a 10-week classroom study with 165 students, we tracked both their energy understanding and their affective and metacognitive factors (like emotions, cognitive load, and self-regulation). Using k-means clustering on their learning trajectories, we identified three distinct student groups that differed in the coherence of their energy knowledge development. The key insight: Students who learned the most also felt more positive, experienced lower cognitive load, and used stronger metacognitive strategies. Those who struggled often felt overwhelmed or disengaged. The takeaway is clear: supporting emotions and self-regulation is just as important as teaching physics content. Instruction that addresses these factors can…
Read More
New Pub: TBA at BEA 2025 Shared Task: Transfer-Learning from DARE-TIES Merged Models for the Pedagogical Ability Assessment of LLM-Powered Math Tutors

New Pub: TBA at BEA 2025 Shared Task: Transfer-Learning from DARE-TIES Merged Models for the Pedagogical Ability Assessment of LLM-Powered Math Tutors

New Pub
In the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-Powered Tutors, the goal was to evaluate how well LLM-based math tutors support students. The task focused on four aspects of feedback: spotting mistakes identifying where the mistake happens giving guidance providing actionable suggestions For our submission, we built on FLAN-T5 models with a multi-step training pipeline. In addition to standard fine-tuning, we used model merging (DARE-TIES) to leverage information across all four labels – and saw clear improvements over plain fine-tuning. Our models achieved F1 scores between 52 and 69 and accuracies between 62% and 87%, ranking 11th, 8th, 11th, and 9th across the four tracks. Link to the paper: Link
Read More
[Workshop] Introduction to Language Technology and Language Modeling for Education

[Workshop] Introduction to Language Technology and Language Modeling for Education

New Pub
At the recent 19th Joint Summer School on Technology-Enhanced Learning located in Rhetimno, Greece, I (Sebastian Gombert) gave an introductory workshop on language technology and language modeling and their various use cases of in education. This included use cases such as short answer scoring, essay scoring, classification of texts according to the CEFR framework, and group communication analysis in CSCL. Overall, the workshop was well-received and well-attended.
Read More
New pub: Predicting Item Difficulty and Item Response Time with Scalar-mixed Transformer Encoder Models and Rational Network Regression Heads

New pub: Predicting Item Difficulty and Item Response Time with Scalar-mixed Transformer Encoder Models and Rational Network Regression Heads

Artificial Intelligence, Assessment, Computational Psychometrics, Conference, Higher Education, Publication, Workshop
In a contribution to the BEA 2024 Shared Task, we addressed the challenge of predicting the difficulty and response time of multiple-choice questions from the United States Medical Licensing Examination® (USMLE®). This exam is an important assessment for medical professionals. To predict these variables, we evaluated various BERT-like pre-trained transformer models. We combined these models with Scalar Mixing and two custom 2-layer classification heads, using learnable Rational Activations as the activation function. This multi-task setup allowed us to predict both item difficulty and response time. The results were noteworthy. Our models placed first out of 43 participants in predicting item difficulty and fifth out of 34 participants in predicting item response time. This demonstrates the potential of advanced AI techniques in improving the evaluation processes of critical exams like the…
Read More
Team led by Sebastian Gombert wins one of two tracks at BEA 2024 shared task on predicting Item Difficulty and Item Response Time

Team led by Sebastian Gombert wins one of two tracks at BEA 2024 shared task on predicting Item Difficulty and Item Response Time

Artificial Intelligence, Assessment, Award, Computational Psychometrics, Conference, Higher Education, New Pub, Workshop
For standardized exams to be fair and reliable, they must include a diverse range of question difficulties to accurately assess test taker abilities. Additionally, it's crucial to balance the time allotted per question to avoid making the test unnecessarily rushed or sluggish. The goal of this year's BEA shared task (competition) was to build systems which could predict Item Difficulty and Item Response Time for items taken from the United States Medical Licensing Examination (USMLE). EduTec member Sebastian Gombert designed systems which are able to predict both variables simultaneously. These placed first out of 43 for predicting Item Difficulty and fitfth out of 34 for predicting Item Response Time. They use modified versions of established transformer language models in a multitask setup. A corresponding system description paper titled Predicting Item…
Read More
New Pub: From the Automated Assessment of Student Essay Content to Highly Informative Feedback: a Case Study

New Pub: From the Automated Assessment of Student Essay Content to Highly Informative Feedback: a Case Study

Artificial Intelligence, Assessment, Computational Psychometrics, Empirical Study, Feedback, Higher Education, Journal, Publication, Special Issue, Technical paper
How can we give students highly informative feedback on their essays using natural language processing? In our new paper, led by Sebastian Gombert, we present a case study on using GBERT and T5 models to generate feedback for educational psychology students. In this paper: ➡ We implemented a two-step pipeline that segments the essays and predicts codes from the segments. The codes are used to generate feedback texts informing the students about the correctness of their solutions and the content areas they need to improve. ➡ We used 689 manually labelled essays as training data for our models. We compared GBERT, T5, and bag-of-words baselines for both steps. The results showed that the transformer-based models outperformed the baselines in both steps. ➡ We evaluated the feedback with a learner cohort…
Read More
Workshop: Hyperchalk – How to implement Self-hosted Whiteboard Tasks @ JTEL Summer School 2023

Workshop: Hyperchalk – How to implement Self-hosted Whiteboard Tasks @ JTEL Summer School 2023

Artificial Intelligence, Computer-supported collaborative learning, Learning Analytics, Summer School, Workshop
In this workshop which Lukas Menzel and I gave at the seventeenth JTEL Summer School, we explored the possibilities of our self-implemented whiteboard tool Hyperchalk. Hyperchalk is a backend for Excalidraw which allows for integrating learning management systems via LTI and collecting a complete history of trace data. After a short kick-off presentation, we let the participants design their own learning activities using the whiteboard. All participants created little tasks that other participants then solved. As most participants had a strong background in teaching, these were inspired by practical experiences. The tasks involved various topics, from stochastics to K12-level geography. At the end of the workshop, we taught the participants how to administrate the tool and how to set it up on their own servers. Overall, it was a successful…
Read More
Workshop: Large Language Models for Feedback Generation @ JTEL Summer School 2023

Workshop: Large Language Models for Feedback Generation @ JTEL Summer School 2023

Artificial Intelligence, Feedback, Summer School, Workshop
At the seventeenth JTEL Summer School, Lukas Menzel and I had the pleasure of giving a workshop on the potentials and pitfalls of large language models for generating learner feedback. We kicked the event off with a general presentation on large language models. We explained the technical properties of well-known language models such as BERT or GPT. Following this, we went into different setups that can be used for feedback generation. On the one hand, this can involve training a BERT-based model to predict codes for input responses that trigger OnTask-style feedback rules. While this approach is stable regarding what feedback students receive, it is also inflexible, as such feedback cannot necessarily mirror all detailed intricacies that might occur in a student's response. For this reason, it can feel kind…
Read More
EC-TEL2022 best demo award goes to EduTec

EC-TEL2022 best demo award goes to EduTec

Award, Conference
The demo paper Superpowers in the Classroom: Hyperchalk is an Online Whiteboard for Learning Analytics Data Collection and the corresponding tool Hyperchalk by Lukas Menzel, Sebastian Gombert, Daniele Di Mitri and Hendrik Drachsler were awarded the best demo award at the European Conference on Technology Enhanced Learning 2022. Hyperchalk allows hosting collaborative whiteboards which can be used to collaborate in real-time. It collects rich user data which can be used to conduct learning analytics. Hyperchalk was initially developed for the ALICE project.
Read More