BEA 2026 Shared Task on Rubric-based Short Answer Scoring for German

BEA 2026 Shared Task on Rubric-based Short Answer Scoring for German

New Pub
Are you a researcher in AI in education and/or natural language processing? Do you like shared tasks and machine learning competitions like the yearly SemEval tasks or Kaggle competitions? Then you might consider participating in the BEA 2026 Shared Task on Rubric-based Short Answer Scoring for German, co-organized by us and colleagues from IPN - Leibniz-Institut für die Pädagogik der Naturwissenschaften und Mathematik. For this shared task, we look at the natural language understanding task of rubric-based short answer scoring, a task for which models must score short answers to open-ended assessment questions with the help of provided textual scoring rubrics. The dataset we provide for this was collected in authentic German school contexts and scored by domain experts as part of the ALICE project funded by the Leibniz-Gemeinschaft. As…
Read More
New Pub: How Virtual Reality Mental Training Impacts Race Preparation in Recreational Runners

New Pub: How Virtual Reality Mental Training Impacts Race Preparation in Recreational Runners

Augmented Reality, Computational Psychometrics, General education, Journal, New Pub, Publication
Can Virtual Reality Mental Training Help Recreational Runners Race Smarter? We’re glad to announce that our paper has just been published! 🎉 In this post, we share the key ideas and early findings from our newly published study exploring how virtual reality (VR) mental training—grounded in cognitive-behavioral (CB) techniques—may support long-distance recreational runners in adopting race strategies and strengthening motivation within a coaching context. What happens when cognitive-behavioral (CB) techniques like imagery and self-talk meet virtual reality (VR) in a coaching context? An exploratory study of recreational long-distance runners provides intriguing early signals. Why this study matters VR has been used in sports settings to support skill learning and performance, but it’s still relatively uncommon to see VR paired directly with cognitive-behavioral mental training—especially practical tools like imagery and self-talk…
Read More
LAMASS@DiLea Study Report and Data Set Now Published

LAMASS@DiLea Study Report and Data Set Now Published

Empirical Study, Higher Education, Learning Analytics, Project, Publication, Report
What factors influence academic success and dropout rates in digital study formats at universities? What factors and effects at the subjective, curricular and institutional levels can be empirically measured and how do they interact in digital study formats? Can a factor analysis be used to compare digital and face-to-face study formats? The LAMASS@DiLea project team now present the answers to these questions in the newly published project report “LAMASS-Studie: Studienerfolg und Studienabbruch in digitalen Studienformaten”. It is a comprehensive analysis of academic success and dropout rates in digital study formats. The results of the study are valuable for existing digital study formats and for degree programs that wish to boost their use of digital technology. The dataset has also been published and is available to download. Modell des Studienabbruchs in…
Read More
New Pub: GRIPF at TSAR 2025 Shared Task Towards controlled CEFR level simplification with the help of inter-model interactions

New Pub: GRIPF at TSAR 2025 Shared Task Towards controlled CEFR level simplification with the help of inter-model interactions

New Pub
Language learners make the fastest progress when reading texts that match their proficiency level. But most real-world texts are too hard—and manually adapting them is time-consuming. So the big question is: Can AI automatically simplify texts to a specific CEFR level without losing meaning? We explored exactly this in the TSAR 2025 Shared Task, where systems had to rewrite advanced English texts (B2+) to easier levels like A2 or B1. Our team submitted two different approaches: EZ-SCALAR and SAGA. EZ-SCALAR works like an expert panel of AI models. Two large language models (GPT-5 and Claude) each produce their own simplification, critique each other, refine their versions, and then a final “judge” model picks the best result. An extended version, EZ-SCALAR Lex, adds something extra: a vocabulary check using EFLLex, a…
Read More
New Pub: Characterizing students’ energy learning trajectories

New Pub: Characterizing students’ energy learning trajectories

New Pub
Helping students apply energy ideas to everyday situations is a core goal in physics education. But not all students get there—and it’s not just about who knows the most content. In a 10-week classroom study with 165 students, we tracked both their energy understanding and their affective and metacognitive factors (like emotions, cognitive load, and self-regulation). Using k-means clustering on their learning trajectories, we identified three distinct student groups that differed in the coherence of their energy knowledge development. The key insight: Students who learned the most also felt more positive, experienced lower cognitive load, and used stronger metacognitive strategies. Those who struggled often felt overwhelmed or disengaged. The takeaway is clear: supporting emotions and self-regulation is just as important as teaching physics content. Instruction that addresses these factors can…
Read More
New Pub: From Nervous to Noteworthy: Evaluating SPEAKS

New Pub: From Nervous to Noteworthy: Evaluating SPEAKS

Competence development, Conference, Conference, Higher Education
Public speaking can be nerve-wracking, but it’s also a skill every professional needs. Many students leave higher education feeling unprepared to speak confidently in front of an audience. Traditional courses exist, but providing enough guidance to every student is time- and resource-intensive. This is where SPEAKS comes in. SPEAKS (Speech content Preparation for Effective and Authentic Knowledge Sharing) is an educational software designed to guide students through preparing the content of their speeches. The tool and its evaluation were presented at ECEL 2025 in a paper authored by Nina Mouhammad, Jan Schneider, Roland Klemke and Daniele Di Mitri as part of the HyTea-project, highlighting its potential to support students in developing better speech content and becoming more confident regarding public speaking. The tool uses a fully scripted, chat-based interface with…
Read More
New Pub: Optimizing Formative Assessment with Learning Analytics

New Pub: Optimizing Formative Assessment with Learning Analytics

Assessment, Learning Analytics, Literature review, New Pub
The teaching and learning processes in education need to be effective. This is something that all parents, teachers and educational scientists can agree on. To help us track the learners’ achievements and educational progress and ultimately show whether the teaching and learning processes are effective, we rely on formative assessment. Learning analytics has the potential to assist in formative assessment. So far there has not been enough evidence collected to prove this potential support. Thus, many have reservations about the connection between the results of learning analytics and formative assessment models. If the results from learning analytics don’t match well with formative assessment approaches, teachers may be reluctant to trust, understand or use those insights to guide their teaching. This issue is addressed in a recently published study which introduces…
Read More
New Pub: ChatGPT in Education

New Pub: ChatGPT in Education

Journal, New Pub, Research Methods
Early studies on the usage of ChatGPT in educational settings have reported substantial learning gains from ChatGPT applications. But how valid are these studies? Is using ChatGPT in education really as effective as it seems? A newly published paper takes a deeper look at key findings from past debates about media and teaching methods to reveal frequent conceptual challenges that arise in studies about the effectiveness of ChatGPT. When researchers compare different types of media for learning, they sometimes mix up the effects of the teaching style with the features of the technology. If the instructional methods and the technological features are confused with one another, it makes it difficult to be able to interpret the actual effect of ChatGPT. To help pinpoint the conceptual difficulties of these efficacy studies,…
Read More
New Pub: Design, Development and Evaluation of HILA

New Pub: Design, Development and Evaluation of HILA

Artificial Intelligence, Keynote, Learning Analytics, Publication
How can AI-supported learning analytics be integrated into educational processes in a significant way?  How can they be designed, tested and further developed to effectively improve teaching and learning practices? These questions were addressed by Hendrik Drachsler in his keynote at the Learning AID 2024 in Bochum, which has recently been published in the conference proceeding “Learning Analytics, Artificial Intelligence und Data Mining in der Hochschulbildung”. In his keynote, Hendrik stresses the importance of content-specific applications that address genuine educational needs and are supported by empirical evidence demonstrating their effectiveness. The key to fostering adaptive and sustainable learning experiences is to understand and accommodate learners’ individual needs. Hendrik argues that technological progress alone is not sufficient to improve education. His ongoing research shows that AI-supported learning analytics can only bring…
Read More
New Pub: TBA at BEA 2025 Shared Task: Transfer-Learning from DARE-TIES Merged Models for the Pedagogical Ability Assessment of LLM-Powered Math Tutors

New Pub: TBA at BEA 2025 Shared Task: Transfer-Learning from DARE-TIES Merged Models for the Pedagogical Ability Assessment of LLM-Powered Math Tutors

New Pub
In the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-Powered Tutors, the goal was to evaluate how well LLM-based math tutors support students. The task focused on four aspects of feedback: spotting mistakes identifying where the mistake happens giving guidance providing actionable suggestions For our submission, we built on FLAN-T5 models with a multi-step training pipeline. In addition to standard fine-tuning, we used model merging (DARE-TIES) to leverage information across all four labels – and saw clear improvements over plain fine-tuning. Our models achieved F1 scores between 52 and 69 and accuracies between 62% and 87%, ranking 11th, 8th, 11th, and 9th across the four tracks. Link to the paper: Link
Read More