AIED 2024 Interactive Event Demo - CMU PLUS PDF
Document Details
Uploaded by SelfSatisfactionEiffelTower2735
Carnegie Mellon University
Sanjit Kakarla, Rushil Gaddam, Danielle R. Thomas, Jionghao Lin, Erin Gatz, Shivang Gupta, Kenneth R. Koedinger
Tags
Related
- Large Language Models for Software Engineering PDF
- AI Tutor: Building Accurate Answers Using LLMs and RAG
- Large Language Models PDF
- Chapter 3 Introduction to AI, Machine Learning, Deep Learning, and Large Language Models (LLMs).pdf
- Chapter 3 Introduction to AI, Machine Learning, Deep Learning, and Large Language Models (LLMs).pdf
- A Survey of Large Language Models PDF
Summary
This document is a research paper about a tutoring platform called Personalized Learning Squared (PLUS) that aims to improve math performance among low-income middle school students. It proposes using large language models to assess tutor performance. The paper details the platform's approach, including scenario-based lessons and research-backed criteria for effective tutor responses.
Full Transcript
Personalized Learning Squared (PLUS): Implementation of Large Language Model-Facilitated Assessment of Tutors’ Performance in Reacting to Students Making Errors Sanjit Kakarla1*, Rushil Gaddam*, Danielle R. Thomas, Jionghao Lin, Erin Gatz, Shivang Gupta, Ken...
Personalized Learning Squared (PLUS): Implementation of Large Language Model-Facilitated Assessment of Tutors’ Performance in Reacting to Students Making Errors Sanjit Kakarla1*, Rushil Gaddam*, Danielle R. Thomas, Jionghao Lin, Erin Gatz, Shivang Gupta, Kenneth R. Koedinger Carnegie Mellon University, Human-Computer Interaction Institute, Pittsburgh, PA, 15213, USA Demo Link: https://youtu.be/LAJVDuU9IRQ PLUS Summary Personalized Learning Squared (PLUS) is an integrated tutoring platform led by Carnegie Mellon University in collaboration with Carnegie Learning, Inc and Stanford University. PLUS strives to double math performance among 10,000 low-income, middle school students. PLUS employs a system of hybrid human-AI tutoring, which combines both AI-assisted tutoring software and human tutoring to provide high-impact math tutoring to economically disadvantaged and under-resourced middle school students with a focus on enhancing the skills of novice, inexperienced tutors. Tutors require robust training to effectively assist students, and currently, there is a lack of trained tutors. Over 16 million low-income students are on a waitlist for high-quality tutoring. Inexperienced and novice tutors lack specific scenario-based knowledge related to more nuanced skills and competencies in tutoring. Understanding this, PLUS is built upon the SMART framework that encourages the ideas of Social-emotional learning, Mastery of content, Advocacy, Building Relationships, and utilizing Technology-based tools. Following this framework, PLUS has over 20 scenario-based lessons on specific competencies including Responding to Negative Self-Talk, Determining What Students Know, and Ensuring Conceptual Understanding, with the aim of enhancing the skills of novice tutors. In this demo, we showcase how large language models (LLMs) can be used to assess particular tutor criteria related to the Reacting to Student Errors lesson with three crucial perspectives: 1). the perspective of a tutor-supervisor to assess tutor competencies; 2). the perspective of a tutor engaging with the lesson and the AI-generated constructive questions; 3). the perspective of an AI-evaluator employing LLMs to compare human and AI assessment of tutors. This PLUS research-backed approach can extend to other competencies to enhance the future of education. Large-language models such as GPT-4 and Gemini are becoming increasingly popular and pose new opportunities such as providing explanatory feedback to tutors and assessing specific skills. These models hold the potential to advance student learning at only a fraction of the cost of human-led tutoring. The challenging and pivotal 1 *Corresponding Author https://www.tutors.plus/en/ question with these technologies remains whether they can help assess highly nuanced and humanistic criteria, such as training educators. Ultimately, by comparing LLM assessments of tutoring with human evaluations, we aim to develop an approach that will provide new insight into the use of artificial intelligence in education. This demo specifically showcases the use of LLMs to assess the criteria related to the Reacting to Errors lesson, a lesson in the PLUS app. In this lesson, a tutor practices responding to a student who has made a math error. The first aspect of this demo showcases a tutor supervisor perspective, highlighting the interface where supervisors can view all of their tutors’ lessons and the distributions of completed lessons, time spent, and performance per competency. Next, the supervisor recognizes the Reacting to Errors lesson has yet to be completed by the tutor and dynamically assigns this lesson. The second aspect highlights a tutor’s perspective of the lesson. Similar to all PLUS lessons, this lesson focuses on presenting a scenario to a tutor and gauging their initial understanding of the topic, then presenting multiple-choice and open-response questions to provide situational and deliberate practice. Following the tutor’s responses, the GPT-4 powered system assesses the tutor’s constructive response determining how aligned their response is to the research-recommended best approach. We enter the third aspect of the demo: from the research-recommended approach, an effective response to a student error meets five main criteria. Effective tutor responses are: 1) process- or effort-focused; 2) motivating; 3) indirect in addressing the student’s error; 4) immediate; and 5) accurate. Then, we apply a few-shot prompt, consisting of examples, shown in the demo on Open AI’s Chat-GPT-4 that are directly derived from the research-recommended approach. LLMs can be used to create synthetic dialogues of sample tutoring situations, and using our prompt engineering technique, we can prompt GPT-4 with sample transcriptions and organize responses in a spreadsheet with columns for human coding and LLM coding. Generating “simulated” student work via LLMs is a pivotal tool for understanding a tutor’s diagnostic skills before the hypothetical use of real-life transcriptions. Upon having human coders code these synthetic dialogues in a bias-mitigating fashion and comparing these responses with those of LLMs, we can calculate interrater reliabilities and make conclusions about the ability of LLMs to assess particular criteria. Relevance to AIED Past research has shown a significant learning gain of ~20% from pretest to posttest on PLUS lessons related to critical tutoring competencies. We hypothesize that by comparing LLM assessment of tutoring responses, we can develop methods of providing tutors real-time explanatory feedback while engaging in training. Utilizing generative AI to assess tutor responses, we plan to extend its use to assessing criteria to give more effective and targeted training to tutors. As AIED focuses on “AI in Education for a World in Transition,” this demo aims to explore how AI can enhance tutor experiences and improve the effectiveness of professional learning experiences. Training is an important aspect of effective tutoring, making this approach of using LLMs for timely feedback useful for addressing educational needs at scale, particularly relevant to AIED’s mission. References: 1. Chine, Danielle R., et al. "Educational equity through combined human-AI personalization: A propensity matching evaluation." International Conference on Artificial Intelligence in Education. Cham: Springer International Publishing, 2022. 2. Afterschool Alliance. America After 3PM: Demand Grows, Opportunity Shrinks (2020) 3. Thomas, D. R., Yang, X., Gupta, S., Adeniran, A., McLaughlin, E. A., & Koedinger, K. R. When the student becomes the student: Design and evaluation of efficient scenario-based lessons for tutors. In LAK23: 13th International Learning Analytics and Knowledge Conference, March 13-17, 2023, Arlington, TX, USA. ACM, New York, NY, USA (2023b). 4. Kakarla, S., Thomas, D. R., Lin, J., Gupta, S., Koedinger, K. R. Using Large Language Models to Assess Tutors’ Performance in Reacting to Students Making Math Errors. In AAAI 2024 Workshop AI4ED, Proceedings of Machine Learning Research 1:1–13, 2024 5. Thomas, D. R., Lin, J., Gatz, E., Gurung, A., Gupta, S., Norberg, K., Fancsali, S. E., Aleven, V., Branstetter, L., Brunskill, E., Koedinger, K. R. Improving Student Learning with Hybrid Human-AI Tutoring. A Three-Study Quasi-Experimental Investigation. In LAK24: 14th International Learning Analytics and Knowledge Conference, March 18-22, 2024, Kyoto, Japan (2024)