AI Tutor: Building Accurate Answers Using LLMs and RAG
Document Details
Uploaded by SafeCanyon
The Education University of Hong Kong
Dong Chenxi
Tags
Summary
This paper presents a low-code solution for building an AI tutor that leverages advanced AI techniques. The system uses large language models and retrieval-augmented generation to provide accurate and contextually relevant responses in a personalized learning environment. It offers a significant advancement in technology-enhanced tutoring systems.
Full Transcript
How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation Dong Che...
How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation Dong Chenxi Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong Email: [email protected] Abstract: This paper proposes a low-code solution to build an AI tutor that leverages advanced AI techniques to provide accurate and contextually relevant responses in a personalized learning environment. The OpenAI Assistants API allows AI Tutor to easily embed, store, retrieve, and manage files and chat history, enabling a low-code solution. Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) technology generate sophisticated answers based on course- specific materials. The application efficiently organizes and retrieves relevant information through vector embedding and similarity-based retrieval algorithms. The AI Tutor prototype demonstrates its ability to generate relevant, accurate answers with source citations. It represents a significant advancement in technology-enhanced tutoring systems, democratizing access to high-quality, customized educational support in higher education. Keywords: Intelligent Tutoring Systems, Large language model, Retrieval-Augmented generation, Higher education 1. Introduction The advent of artificial intelligence (AI) has instigated a transformational wave across various sectors, with education as a salient beneficiary. AI's unrivaled capacity to enable personalized and adaptive learning experiences has propelled intelligent tutoring systems to the forefront of modern educational paradigms (Kasneci et al., 2023). These systems, powered by AI, offer individualized feedback and interactive learning modules designed to cater to each student's learning needs. Nonetheless, the challenge of developing AI tutors capable of delivering consistently accurate and dependable responses across diverse academic disciplines persists. A notable hindrance to the reliability of AI in educational applications is the occurrence of 'information hallucination', a phenomenon where AI-generated responses while appearing valid, deviate from factual accuracy (Nye et al., 2023). Such inconsistencies can undermine confidence in AI-centric educational systems (Kasneci et al., 2023). Furthermore, the customization of these systems to align with specific course content necessitates access to current and pertinent educational materials, a task often complicated by the multifaceted nature of academic disciplines (Lewis et al., 2020). In this paper, we propose a low-code solution to build an AI tutor that leverages LLMs and RAG technology to address the challenges of accuracy, reliability, and customization. RAG, a powerful technique that combines vector embedding and storage systems with similarity-based retrieval algorithms, enables AI Tutor to organize and retrieve relevant information efficiently. Therefore, AI Tutor can utilize course-specific materials to provide accurate and contextually relevant responses, revolutionizing traditional educational approaches. By combining the strengths of LLMs and RAG through the OpenAI Assistants API, AI Tutor generates relevant and accurate answers to students' questions. 2. Related Work AI's application in education spans a broad spectrum, from augmenting curriculum design to facilitating personalized learning experiences and engaging students. Moore S. et al. (2023) delineate the role of AI as ranging from assistive to How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation autonomous, predicting a shift from AI as a tool in the educator's arsenal to an independent teaching agent. AI is primarily assistive, but the trend indicates a shift toward more sophisticated, AI-centric educational experiences. In 2013, Jaques et al. developed PAT2Math, a system that used rule-based algorithms to guide learners through algebraic problem-solving. While PAT2Math provided immediate feedback by comparing student inputs to predefined solutions, it was limited by its inability to generate responses for answers not in its database and its lack of reasoning ability compared to large language models. Gan et al. (2019) proposed an AI-based math tutor framework that used a modified Item Response Theory (IRT) approach to gauge a learner's ability and provide custom-tailored question-and-answer sessions. However, the system's adherence to a static question database primarily constrained its answering flexibility, often leading to repetitive and predictable responses. Additionally, the system exhibited a low tolerance for linguistic errors, struggling with sentence structures marred by typos or poor grammar, which could be particularly challenging for non-native English speakers. In contrast, our AI Tutor leverages the latest LLM advancements using the OpenAI Assistants API. This API's exceptional natural language comprehension enables it to process a wide range of student queries effectively. Beyond addressing the limitations of previous systems, the AI Tutor significantly enhances the learning experience with contextually relevant, naturally formulated responses. Additionally, its low-code approach simplifies the development and customization process, making sophisticated AI tutoring accessible to a broader range of educators. 3. Technical Background 3.1. Large Language Models (LLMs) LLMs are powerful neural networks trained on vast amounts of text that can generate coherent and meaningful language. They employ transformer architectures, utilizing self-attention mechanisms to understand the text better. To optimize their performance, these models are initially pre-trained on large datasets and then fine-tuned for specific tasks (Lewis et al., 2020). LLMs are used in various tasks, including translation, summarization, dialogue, and question- answering. Our AI Tutor system utilizes OpenAI's GPT-4-1106-preview model, demonstrating exceptional performance on natural language tasks. Interestingly, GPT-4 outperforms humans on many academic exams, such as SAT and GRE. For instance, on the SAT Math exam, it scored 710 out of 800, beating 89% of human test takers (OpenAI GPT-4 Technical Report, 2023). This highlights the model's robust reasoning capabilities. By integrating GPT-4, our AI Tutor system can effectively understand course materials, interpret diverse student questions, find relevant information, and provide natural, contextual responses. This significant upgrade helps overcome the limitations of previous rule-based tutoring systems that struggled with open-ended student inputs. 3.2. Retrieval Augmented Generation (RAG) Retrieval Augmented Generation (RAG) is a technique to dynamically interact with a comprehensive external knowledge base, selecting and prioritizing pertinent documents and information snippets. This process is grounded in the principles of vector embeddings, where data is transformed into a format that is easily accessible and relevant to the query at hand (As shown in Figure 1). The effectiveness of RAG in enhancing the quality of LLM-generated responses, particularly in question-answering tasks, has been demonstrated by Lewis et al. (2020). Their research highlights how RAG significantly outperforms traditional parametric seq2seq models by leveraging both pre-trained parametric and non- parametric memory. The flexibility of the RAG framework is evident in its application across various educational fields (Siriwardhana et al., 2022). This adaptability makes RAG an ideal component for customizing AI Tutor systems to specific learning requirements and preferences, enhancing their effectiveness and personalization. How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation Figure 1. Retrieval Augmented Generation (RAG) Framework for Question-Answer 4. The AI Tutor Design The AI Tutor is designed to be a streamlined and effective educational tool that leverages the power of the OpenAI Assistants API. Its technical infrastructure is shown in Figure 2 below, the core module that utilizes Retrieval-Augmented Generation (RAG) techniques to provide accurate and contextually relevant responses. The Assistant API handles this core module and is responsible for: Ingesting and vectorizing course materials (PDFs), and storing them in vector database. Transforming student queries into vector representations The model similarity-based retrieval of the most relevant course knowledge and generating answers with source attribution. Chat history management that enables the AI tutor to have some memory. The OpenAI Assistants API's pre-built functions handle all of the above features. Therefore, a low-code solution is possible, and developers do not need to write extensive code from scratch. Figure 2. The AI Tutor Design 5. Demonstration & Reflection We developed a web app prototype to showcase our AI Tutor's capabilities. The app enables users to input their OpenAI API key, upload course materials in text format, and ask relevant questions. Users can also delete uploaded materials and generate a downloadable Q&A record in HTML format. We also created a video demo, and you can access the video demo: https://youtu.be/UH0SjqU5tVI?si=cVuBlwdOADq1q7gx. In the demo, we uploaded course materials for "Finance Theory I," sourced from MIT OpenCourseWare (https://ocw.mit.edu/courses/15-401-finance-theory-i-fall- 2008/), and below Figure 3 is the screenshot of a Q&A session. How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation Figure 3. An Example of an AI Tutor Q&A Session From Figure 3 above, we can see that our AI Tutor demonstrated remarkable performance in several key aspects: Accuracy: The AI Tutor's responses were meticulously evaluated to ensure correctness and alignment with the posed questions. They closely adhered to the factual content of the course materials. Relevance: Each answer was carefully assessed for its pertinence, emphasizing alignment with the correct answer and supported by direct citations from relevant sources. Citation: The AI Tutor effectively incorporated citations, as exemplified in Figure 4, which referenced the answer to the lecture slides titled 'Lecture 1: Intro and Overview.' This feature enables students to validate the answer independently and enhances the credibility of the responses. In this study, we also comprehensively evaluated the AI Tutor system using a set of 50 carefully crafted questions, as detailed in Table 1 below. The AI Tutor's performance can be summarized as follows: Table 1. Evaluation of AI Tutor’s Performance on Different Types of Questions (50 questions in total). Question Type Total Example Question Unsatisfactory Main Issues Amount Feedback (From Feedbacks) Summarization 10 “Summarize the Lecture 1 3 -AI Tutor fails to retrieve the content.” relevant context Quantitative 20 “How much is the loan after 7 -Use the wrong equation five years with a 6% rate?” -Give the wrong result Qualitative 20 “What is a contract?” 2 -Answers are still abstract and lacking depth explanation The AI Tutor's performance can be summarized as follows: Summarization Questions: The AI Tutor achieved a 70% satisfaction rate. Feedback indicated that non- informative answers were often due to the challenge of retrieving relevant contextual information. Summarization queries rely on keywords that may lack the necessary breadth of content for effective retrieval. Quantitative: The AI Tutor achieved a 65% satisfaction rate in quantitative queries. Large language models (LLMs) like the AI Tutor face inherent challenges in mathematical problem-solving. They are primarily trained for text processing rather than numerical computation, resulting in issues such as improper mathematical notation, incorrect formula application, and inaccurate calculations. Limited multi-step reasoning and abstract problem-solving abilities further hinder their performance. How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation Qualitative: The AI Tutor achieved a 90% satisfaction rate in qualitative queries. However, some responses were considered too abstract, indicating the need for improvement in delivering concise and contextually rich answers. Enhancing the AI's understanding of conceptual questions can lead to more grounded and comprehensible explanations. This evaluation emphasizes the need to improve the AI Tutor system, specifically enhancing retrieval mechanisms for summarization tasks and refining computational accuracy and explanatory depth for quantitative and qualitative semantic search queries. 6. Discussion and Future Directions The AI Tutor exhibits several strengths. It can adapt to any course by utilizing course-specific materials, ensuring tailored responses aligned with the course content and objectives. Additionally, each answer has citations from relevant sources in the course materials. The AI Tutor has demonstrated its ability to generate relevant and satisfactory answers based on a rigorous 50-question test. However, the AI Tutor also has specific weaknesses. Firstly, its stability is not guaranteed, as it may experience fluctuations in performance. Secondly, it may need to be stronger in handling summarization-type questions, where concise and accurate summaries are required. Lastly, the AI Tutor may exhibit information hallucination, particularly in calculations and math-related problems, leading to potentially incorrect or misleading answers. The economic assessment of the AI Tutor system confirms its financial feasibility. Figure 4 presents a cost breakdown and comparison for each Q&A session, considering lecture slides of approximately 200 pages. Two types of costs are identified: one-time costs and recurrent costs. One-time costs involve embedding course materials and represent a fixed initial investment. These costs are minimal, as depicted in Figure 4. Recurrent costs encompass ongoing expenses for user question embedding and model input/output processing. The cost per question is approximately $0.024, resulting in daily expenses of around $2.40 for handling 100 questions. Institutions with limited budgets should carefully consider these recurrent costs. To manage operational expenses, exploring lower-cost models such as GPT-3.5-turbo-1106 is recommended. While this model may have a lower power level than GPT-4, it offers a significantly cheaper and faster alternative. Figure 4 illustrates the cost breakdown and comparison, with the GPT-4 model costing 0.024 USD per Q&A session, while the GPT-3.5-turbo model reduces the cost to 0.002 USD—less than one-tenth of the current price. Balancing the utilization of advanced AI models with cost management is crucial to ensure the accessibility and sustainability of AI tutors in educational settings. Figure 4. Cost Breakdown and Comparison for Each Q&A Session in USD We can propose several recommendations for future project improvements. Firstly, alternative options for the Language Model (LLM) can be explored to reduce the cost of the AI tutor. This includes considering open-source local How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation LLMs like Llama2 or utilizing low-resource models such as gpt-3.5-turbo and incorporating cheaper third-party frameworks like Langchain to reduce the implementation costs. To further optimize costs and maximize the use of chat history, we propose a design as illustrated in Figure 5. In practical scenarios, many students often ask similar questions during Q&A sessions for a course. In such cases, the system can retrieve answers from the existing chat history, eliminating the need to re-process the entire Q&A loop, thus saving on token input and output costs associated with the LLM. As depicted in Figure 5, an open-source small language model (e.g., all-MiniLM-L6-v2), which is free, can be used to identify the most similar questions from the chat history. Once identified, the answer can be reused, speeding up the system’s response time and enhancing the user experience. Figure 5. Optimized AI Tutor Design: Reusing Chat History for Low-Cost and Faster Responses Secondly, enhancing the user interface design of the AI tutor can significantly improve user engagement and satisfaction. Interactive and personalized features such as feedback mechanisms and user profiles can be integrated in the future. Thirdly, implementing more scalable solutions for the AI tutor is essential. Utilizing cloud-based platforms like Google Cloud or Azure can accommodate a more extensive user base and ensure smooth operation even during peak usage. How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation 7. Conclusion This paper presents a low-code AI Tutor design using the OpenAI Assistants API. AI Tutor's low-code design is advantageous as it simplifies development and reduces implementation complexity. The system effectively utilizes LLMs and RAG techniques to create an adaptive knowledge base, delivering accurate and personalized responses to student queries. Citations are integrated to empower students to verify information sources. An evaluation using 50 diverse questions demonstrated high satisfaction based on valuable human feedback, validating the AI Tutor's effectiveness. Economic feasibility analysis for a single course estimated a cost of $2.30 per day, and exploring cost-effective alternatives, such as the gpt-3.5-turbo model, proves promising for scalability. In conclusion, AI Tutor is a low-code web application employing LLMs and RAG techniques to deliver accurate and personalized responses in an intelligent tutoring system. References Cai, D., Wang, Y., Liu, L., & Shi, S. (2022, July). Recent advances in retrieval-augmented text generation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3417-3419). Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Gan, W., Sun, Y., Ye, S., Fan, Y., & Sun, Y. (2019, October). AI-tutor: Generating tailored remedial questions and answers based on cognitive diagnostic assessment. In 2019 6Th international conference on behavioral, economic and socio-cultural computing (BESC) (pp. 1-6). IEEE. Gromyko, V. I., Kazaryan, V. P., Vasilyev, N. S., Simakin, A. G., & Anosov, S. S. (2017, August). Artificial intelligence as tutoring partner for human intellect. In International Conference of Artificial Intelligence, Medical Engineering, Education (pp. 238-247). Cham: Springer International Publishing. Jaques, P. A., Seffrin, H., Rubi, G., de Morais, F., Ghilardi, C., Bittencourt, I. I., & Isotani, S. (2013). Rule-based expert systems to support step-by-step guidance in algebraic problem solving: The case of the tutor PAT2Math. Expert Systems with Applications, 40(14), 5456-5465. https://doi.org/10.1016/j.eswa.2013.04.008 Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F.,... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences, 103, 102274. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N.,... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474. Moore, S., Tong, R., Singh, A., Liu, Z., Hu, X., Lu, Y., … & Stamper, J. (2023, June). Empowering education with llms-the next-gen interface and content generation. In International Conference on Artificial Intelligence in Education (pp. 32-37). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-030-78270-2_6 Nye, B., Mee, D., & Core, M. G. (2023). Generative large language models for dialog-based tutoring: An early consideration of opportunities and concerns. In AIED Workshops. OpenAI, R. (2023). Gpt-4 technical report. arxiv 2303.08774. View in Article, 2, 13.