Module 09 - Learners Guide.pdf
Document Details
Uploaded by ComfortingMeadow8606
Full Transcript
Introduction to Generative AI 13-04-2024 1 Contents What is Generative AI Generative AI Applications Prompt Engineering Retrieval Augmented Generation AI Agents 2 What is Generative AI? A type of artificial...
Introduction to Generative AI 13-04-2024 1 Contents What is Generative AI Generative AI Applications Prompt Engineering Retrieval Augmented Generation AI Agents 2 What is Generative AI? A type of artificial intelligence that can create entirely new and original content such as Realistic images or Human like text for Audio for speaking videos which can It can also generate Chat bots and avatars or music for be used in synthetic data for Code for application entertainment entertainment, various use cases. development advertising industry. 3 Difference between Generative AI and Traditional AI Generative AI Traditional AI Generative AI focuses on creating Traditional AI is discriminative in nature entirely new content, data, or output that is original. Uses historical data to predict future outcomes. Designed to generate content like images, text, music, etc., based on This includes tasks such as classification, patterns learned from training data. regression, clustering, reinforcement learning, etc. 4 How Generative AI works? Evaluation Data Representation Generation Process and Refinement The model learns Now the model can Generated samples latent representations produce are evaluated based which capture the new content by on various metrics essential features and sampling from and the model characteristics of the learned is refined based on dataset. distributions. this feedback. 5 Types of Generative AI Models Generative Adversarial Networks Diffusion Models (GANs) Variational Autoregressive Models Autoencoders (VAEs) - GPT3 - ChatGPT 6 Image Generation Examples Prompt: Only the eiffel tower Prompt: Selfies from the Stone Age on mars 7 Video Generation Examples Snow dogs. Source: https://www.youtube.com/watch?v=T6PjcjKL7yQ 8 Audio Generation Example 9 TASKS Foundation Models (FMs) Question Answering DATA Sentiment Text Analysis Image Image Captioning Training Adaptation Information Audio Extraction Foundation Model Object Structured Recognition Data Instruction Following 10 Foundation Models (FMs) Some notable models Text Image Audio Multi Modal GPT 3.5 Stable Diffusion MusicLM GPT 4 ChatGPT DALL-E 2 Stable Audio Gemini Pro Gemini Midjourney AIVA Claud 3 Mixtral 8x7B Imagen Coqui LLaMa 2 GLIDE Voicebox Claude Coque 11 Large Language Models (LLMs) Large text-to-text foundation models are known as LLMs. Output Text (Completion) Answer: Input Text (Prompt) In skies of azure, they take to Example: flight, Birds of feather, creatures of Write a poem on birds flying. light. With graceful wings, they paint the air, A masterpiece, beyond compare. LLM 12 Working of LLMs LLMs sequentially generate data based on conditional probabilities i.e., It predicts the next word in the sentence based on the series of words assembled before it such as Transformers 13 Large Language Models Use of Transformers Most of the current leading autoregressive models utilize Transformers as their core because of the following benefits. Conditional genera Able to generate Ability to capture long tion by adding contextually relevant range of context additional input sequences tailored through Self Attention information, such as to specific tasks mechanism a prompt or or conditions. context. 14 Large Language Models (LLMs) Evolution of LLMs 15 Generative AI Applications 16 Generative AI Limitations Lack of control Data Dependence Ethical concerns Without any direct If the training data is The use of generative control over specific biased, limited in AI also raises ethical attributes or features of scope or contain concerns as the model resulting errors then the generated content outputs may not meet generated content can be misused. the exact requirements. may inherit these shortcomings. 17 PROMPT ENGINEERING 18 What are prompts Prompts involve instructions and context passed to a language model to achieve a desired task. Prompt engineering is the practice of developing and optimizing prompts to efficiently use language models (LMs) for a variety of applications. Prompt engineering is a useful skill for AI engineers and researchers to improve and efficiently use language models What is prompt engineering? Prompt engineering is a process of creating a set of prompts, or questions, that are used to guide the user toward a desired outcome. It is an effective tool for designers to create user experiences that are easy to use and intuitive. This method is often used in interactive design and software development, as it allows users to easily understand how to interact with a system or product. 19 Significance of prompt engineering Enhance model efficiency Improve task-specific performance Understand model challenges Increases safety of model source Resource efficiency Accelerate transfer of domain-specific information 20 Elements of a Prompt A prompt is composed with the following components: Instructions: A guidance regarding the task you want the model to perform Example: Context: An extra information that can guide the Share a recipe for your favorite model to better responses dish. Dish: Spaghetti Input data/Examples: The input or inquiry we seek Provide step-by-step for prepare a response for the dish Output: Recipe provided below: Output indicator: It is the definition for the type or format of the output 21 How to create an effective prompt? [Context] + [Specific Information] + [Intent/Goal] + [Response Format (if needed)] = Perfect Prompt Be Specific State Your Intent: Use Correct Spelling and Grammar Direct the Output Format Ask Follow-Up Questions Experiment with Different Phrasings Prompt for Fact-Checking 22 Prompt Engineering Techniques 1. Zero-shot prompting 2. Few-shot prompting 3. Chain-of-thought (CoT) prompting 4. Knowledge Generation Prompting 5. Self-Consistency 6. Tree-of-thought prompting 7. Multimodal CoT 23 Zero-shot prompting Zero Shot - model can perform a task without any explicit training examples for that task. How does this work? The model is trained on a large dataset of text and code; hence it can generalize to new tasks by using its understanding of the language and the world. 24 Few-shot Prompting LLMs may fail on more complex tasks when using the zero-shot setting or when it involves semantics that it hasn't seen before. In these scenarios, few shot prompting can help - we provide demonstrations in the prompt to make the model temporarily learn (in-context learning) what we want and make it generate the desired output. The model can learn how to perform the task by providing it with just one example (i.e., 1-shot). For more difficult tasks, we can experiment with increasing the demonstrations (e.g., 3-shot, 5- shot, 10-shot, etc. 25 Chain-of-thought prompting Improving the performance of language models by providing them with a step-by- step explanation of how to generate a response. The main idea of CoT is that by showing the LLM some few shot exemplars where the reasoning process is explained in the exemplars, the LLM will also show the reasoning process when answering the prompt. This explanation of reasoning often leads to more accurate results. 26 Example of Chain-of-thought 27 Generate knowledge prompting Simple prompting technique Add external knowledge for improved performance on common sense reasoning tasks. First step : Generate knowledge from language model Second step : Provide this knowledge as additional input while answering the question. 28 Example (without adding general knowledge) 29 Step 1: Generate Knowledge 30 Output 31 Step 2: Add knowledge in prompt 32 Self-Consistency (SC) Modified version of Chain of thought (CoT) CoT – greedy approach SC = Chain of thought (CoT) + few shot prompting Takes multiple reasonable paths to reach a correct answer Most consistent answer is considered as correct answer It enhances the performances of language models in reasoning tasks 33 Example 34 Tree-of-thoughts prompting (ToT) Advanced technique Used when exploration or strategic lookahead is required. Each thoughts serves as an intermediate step to solve a problem This approach uses Language model (LM) to self-evaluate each intermediate step through reasoning process Combines reasoning capability of LM with search algorithm (Depth first and breadth first search) 35 How ToT is different from other techniques? 36 Modified Prompt Question: Bob is in the living room. He walks to the kitchen, carrying a cup. He puts a ball in the cup and carries the cup to the bedroom. He turns the cup upside down, then walks to the garden. He puts the cup down in the garden, then walks to the garage. Where is the ball? 37 Answer 38 Multimodal CoT prompting Multimodal CoT incorporates text and vision into a two-stage framework. The first step involves rationale generation based on multimodal information. This is followed by the second phase, answer inference, which leverages the informative generated rationales. 39 40 RETRIEVAL AUGMENTED GENERATION 41 Knowledge components of LLM 42 Challenges faced by LLM Lack of domain Outdated Hallucination knowledge information Fails to Forget certain Limited Context memorize information from Length certain facts trained data 43 Overcoming the limitation The limitation of LLM can be overcome by injecting knowledge Methods for knowledge injection Adding knowledge in prompt Finetuning Retrieval Augmented Generation (RAG) Finetuning: Takes pre-trained model and further trains on smaller specific dataset to improve performance of particular task. Requires computing infrastructure Overfitting Retrieval Augmented Generation (RAG): Retrieval system in scans through large source of data to find relevant context No training required 44 Different techniques for injecting knowledge 45 Retrieval Augmented Generation (RAG) It combines the capability of LLM and external information Similar to open book exam rather than traditional exam It retrieves the information from vector store, then augment the user query with retrieved data and pass it to LLM for generating the output. 46 Steps in RAG Data Collection Data chunking Steps inStoreRAG Document embeddings Retrieval Generation and Extraction embeddings in Vectorstore 47 1. Data Collection and Extraction Collect data from different sources Sources can range from PDFs, websites, databases, Data's such as manuals, a product database, and a list of FAQs. Extraction of data. 48 2. Data Chunking Process of splitting the data into smaller, more manageable pieces. Each chunk is focused on specific topic Obtain most relevant information while retrieval instead of processing entire document Size of each chunk are adjustable and varies depend upon the data Ex: Recursive character splitter, character splitter etc 49 Embeddings Represents words and sentences in a numerical manner Vector representation – List of numbers. Computer understands language of numbers. Embeddings capture semantic meaning rather than literal meaning. If two words are closer in vector space means they are contextually similar 50 3. Document embeddings Splitted documents are transformed to embeddings, which are vector representation of data Embeddings are good at capturing meaningful relationships between entities. Embeddings helps system to understand user queries and match them with relevant chunks in the vectorstore based on the meaning. Better than simple word to word comparison Ex: OpenAI embeddings, Gemini Embeddings, Cohere Embeddings etc 51 4. Store Embeddings in vectorstore Embeddings are indexed and stored in special type of database called vectorstore. It keeps all vectors together in vectorspace and able to quickly find similar vectors near one another. Vector databases are scalable, can handle complexities and fast search algorithm Ex: Pinecone, Redis, FAISS, chroma etc 52 5. Retrieval User Query are transformed to Embeddings Performs similarity search between embedded query and embedded chunks in the vectorstore. Most relevant contents are retrieved. 53 6. Generation Prompt for the LLM consist of 3 things User query Retrieved Context System Instruction The merged prompt is passed to LLM for generating the answer. 54 Applications Customer services Research market intelligence Legal discovery and analysis Financial analysis RAG-powered chatbots could It rapidly synthesize insights from It helps to quickly find relevant RAG could ingest earnings leverage knowledge bases and large corpora of news, reports, filings precedents and arguments across large statements, press releases, regulatory customer history to provide etc. collections of case law. filings to generate investment insights personalized and informed support. and trading signals Improved customer satisfaction 55 Advanced RAG techniques 56 Parent Document Retriever Hierarchical Structure Break the document into multiple chunks of text (Parent chunk) Break the “parent” chunks into smaller “child” chunks, where smaller chunks are linked to larger parent chunks forms a hierarchical structure When queried, system retrieves the relevant smaller chunks. The number of "child" chunk retrieves reaches a threshold, entire "parent" chunk is passed to LLM for generation 57 Hybrid Fusion Search Combines two or more search algorithms Uses keyword search and vector semantic search Keyword search returns context containing those keywords Semantic search uses context, meaning and relationships between words for searching 58 Contextual Compressor Consists of two things 1. Base Retriever 2. Document Compressor Contextual Compression Retriever passes queries to the base retriever Base retriever extract relevant docs and passes to document compressor Document compressor filters and extract a meaningful compressed information 59 Possible approaches 60 AI AGENTS 61 Limitations of LLMs Being closed systems, LLMs are unable to fetch the most recent data or specific domain knowledge This limitation can lead to potential errors or “hallucinations” Autonomous agents are designed alongside LLMs to overcome these limitations as well as limitations of RAG and finetuning These agents demonstrate automated behaviors and utilize external data and tools, enhancing their accuracy 62 AI Agent A system utilizing an LLM to analyze a problem, formulate a solution, and implement it using various tools. Agents have complex reasoning capabilities, memory, and the means to execute tasks. It can range from simple programs performing single tasks to complex systems managing intricate processes. 63 Components of an agent 64 Agent core Central coordination module managing the agent’s logic and behavior. General Goals: Contains the agent’s overall objectives Tools for Execution: List of tools to which the agent has access Explanation for Planning Modules: Instructions on when to use different planning modules Relevant Memory: Dynamic section filling relevant memory items from past conversations Agent Persona (optional): Description of agent’s character. Used to make the agent prefer certain tools or to give the agent’s responses a unique style 65 Memory module Act as a store for the agent’s internal logs and user interactions. There are two types: Short-term memory: Records the agent’s process to answer a single user question. Long-term memory: Logs the events between the user and agent over weeks or months. Retrieval from memory is more than just semantic similarity. A composite score combining semantic similarity, importance, recency, and other metrics is used for retrieval. 66 Tools Well-defined executable workflows that agents can use to execute tasks. They can often be thought of as specialized third-party APIs. Examples of Tools: RAG Pipeline: Used by agents to generate context-aware answers. Code Interpreter: Helps agents solve complex programmatic tasks. Internet Search API: Allows agents to search for information over the internet. Simple API Services: These could include a weather API or an API for an Instant messaging application. These services provide specific data or functionality. 67 Planning module Complex problems can be tackled effectively using two techniques: 1. Task and Question Decomposition: Complex problems benefit from breaking down compound questions or inferred information. 2. Reflection or Critic Techniques: Methods like ReAct, Reflexion, and Graph of Thought enhance reasoning and responses. They refine the execution plan generated by the AI agent. 68 Applications While the applications of agents are practically boundless, here are few interesting cases: "Chat with your data" agent Swarm of agents Recommendation and experience design agents Customized AI author agents 69 “Chat with your data” agent Useful to get answers from our own data. Challenges in straightforward RAG Pipeline: Semantic similarity of source documents and complex data structures Contextual understanding and complex query handling Example: If a user wants to know the sales increase Solution: The agent requires a Planning Module for from Q1 to Q2 in 2024, the agent must find the sales question decomposition, a RAG pipeline for for both quarters and calculate the difference. information retrieval, and memory modules for handling sub-questions. 70 Swarm of agents Group of agents co-existing and collaborating in a single environment to solve problems. Multi-agent frameworks: Popular due to low-cost application creation with diverse roles. Examples: MetaGPT, Autogen, CrewAI, ChatDev Cost-Effectiveness: Prototyping applications and games can be very cheap Applications: Useful in populating digital spaces for simulations, campaigns, UX design and more 71 Agents for recommendation and experience design AI agents can power conversational recommendation systems. Personalized Experiences: Can create personalized experiences based on user preferences and selections. E-commerce Application: AI agent on an e-commerce site can helps compare products and provides recommendations. Crafted Conversations: Experiences like movie selection or hotel room booking can be crafted as conversations, not just decision- tree-style interactions. 72 Customized AI author agents Personal AI authors can assist with tasks like co-authoring emails and preparing for meetings. Regular authoring tools often struggle with tailoring content for different audiences. AI agents can utilize previous work to generate new content. The agent can adapt the generated pitch to personal style and specific needs. 73 THANK YOU FIND OUT MORE www.tataelxsi.com Confidentiality Notice This document and all information contained herein is the sole property of Tata Elxsi Ltd. No intellectual property rights are granted by the delivery of this document or the disclosure of its content. This document shall not be reproduced or disclosed to a third party without the express written consent of Tata Elxsi Ltd. This document and its content shall not be used for any purpose other than that for which it is supplied. 74