Prompt Engineering Techniques PDF

Generative Models and Human Preferences Aligning Models with Human Preferences : Generative models, like GPT and LLaMA, need to understand and align with human intent. 4 Model Evaluation Metric Human Evaluation Automated Evaluation 5 Bilingual Evaluation Understudy (BLEU) A metric for evaluating the quality of text generated by comparing it to one or more reference texts. It calculates n-gram overlap. Use: machine translation and text generation tasks. Limitations: May not capture semantic meaning or fluency well, as it primarily focuses on exact matches. 6 Example Reference : "The cat sat on the mat.“ (6) Generated : "The cat is sitting on the mat.“ (7) Matching unigrams: "The", "cat", "on", "the", "mat" (5 matches) Precision = (Number of matching unigrams) / (Total unigrams in generated) Precision = 5/7 = 0.714 7 Recall-Oriented Understudy for Gisting Evaluation (ROUGE) Measures the overlap of n-grams, word sequences, and word pairs between the generated text and reference text. Use: Summarization tasks. 8 ROUGE-1: Example Reference Summary: "The cat sat on the mat.“ , (6) Generated Summary: "The cat is on the mat.“, (6) Common (Overlapping) unigrams: "the", "cat", "on", "mat“ (4) Recall = Overlapping / Reference = 4/6 = 0.67 9 Distinct-n Measures the ratio of unique n-grams to the total number of n-grams in the generated outputs. Use: Text generation tasks 10 Distinct-1: Example Generated Texts: The cat sat. The cat ran. Total number of unigrams: "The", "cat", "sat“, “The”, “cat”, "ran“ (6) Unique unigrams: "The", "cat", "sat", "ran“ (4) Distinct-1 = 4 / 6 = 0.67 11 Quality vs. Quantity Metrics Quality Metrics: Focus on the accuracy, fluency, and appropriateness of generated text (e.g., BLEU, ROUGE). Quantity Metrics: Evaluate the volume of distinct and diverse outputs (e.g., Distinct-n). 12 Effective Addressing of User Queries Strategies : Iterative prompt refinement. Utilizing few shot and zero shot learning. User Query Outcomes : Enhanced relevance and accuracy in responses. 13 Prompt Engineering Prompts : involve instructions and context passed to a language model to achieve a desired task Prompt engineering is the practice of developing and optimizing prompts to efficiently use language models for a variety of applications Prompt engineering is a useful skill for AI engineers and researchers to improve and efficiently use language models 14 Prompt Engineering Purpose: Creating effective prompts to guide model behavior and outputs. Crafting inputs to optimize model outputs. Types of Prompts: Standard Prompts: Direct instructions or questions. Chain of Thought Prompts: Using intermediate reasoning steps. In Context Learning: Demonstrating tasks through examples within the prompt. Models learn from examples provided in the prompt. 15 Importance of Prompt Engineering Enhances AI model Improves accuracy and Enables more precise Crucial for specialized performance relevance of outputs and nuanced applications and tasks interactions with AI 16 Elements of a Prompt Context Instruction/Task Role Input Data Output Indicator 17 Task Define what the LLM model should do. For example: Classify, Sort, Filter, Write, Summarize, Translate, etc 18 Task Text Question Text Summarization Answering Classification Code Role Playing Reasoning Generation 19 Context Context provides background information that helps the model understand the situation or topic better. It can include relevant details or specific instructions. Example: In the context of climate change, summarize the following article: 20 Role The role involves adapting a persona or specific character. It influences the tone and style of the output. For example: Act as AI expert, …. Imagine You are chatting with a friend, ….. 21 Input Data The information provided to the model to generate a response about it. 22 Output indicator It define the expected format or type of response from the model. It can include length, style or structure. For example: What are the advantages of AI? The output should be a bullet- point format and not exceed 7. 23 Elements of Prompt Role As a financial analyst, analyze and compare Task the economic impacts of the COVID-19 Context pandemic on the hospitality and technolgy sectors. The hospitality sector saw an 80% decline in revenue in 20220, which tech Input Data sector experienced a 50% increase in revenue. Present your findings in a table format highlighting key differences and Output indicator implications for future investments. 24 Text Summarization Summarize the following text in one sentence “Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query such , a command , or a longer statement including context, instructions, and conversation history. Prompt engineering may involve phrasing a query, specifying a style, providing relevant context or assigning a role to the AI. “ 25 Question Answering Answer the question based on the context below. Keep the answer short and concise. Respond "Unsure about answer" if not sure about the answer. Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use. Question: What was OKT3 originally sourced from? Answer: 26 Text Classification Classify the text into neutral, negative, or positive Text: I think the match was not bad. Sentiment: 27 Role playing The following is a conversation with an AI research assistant. The assistant tone is technical and scientific. Human: Hello, who are you? AI: Greeting! I am an AI research assistant. How can I help you today? Human: Can you tell me about the creation of blackholes? AI: 28 Code Generation Table departments, columns = [DepartmentId, DepartmentName] Table students, columns = [DepartmentId, StudentId, StudentName] Create a MySQL query for all students in the Computer Science Department 29 Reasoning What is 117*901? The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A: 30 Reasoning The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. Solve by breaking the problem into steps. First, identify the odd numbers, add them, and indicate whether the result is odd or even. 31 Example Extract the name of cities in the following text. Desired format: City: Input: “I am going to London then Cardiff. After that, I will meet my friend in Norfolk" 32 Prompt Engineering Strategies Single Step Direct prompts for immediate responses. Directly providing the desired task and observing the output Multi Step : Breaking down tasks into sequential prompts for complex queries. Using intermediate steps to guide the model towards the solution. Generalization Capabilities : Investigating and Evaluating model performance different languages, domains, and tasks. Cross domain adaptability. 33 Prompt Engineering Techniques Chain-of- Zero-shot Few-shot Meta Thought Prompting Prompting Prompting Prompting Generate Self- Prompt Tree of Knowledge Consistency Chaining Thoughts Prompting Retrieval Augmented Generation 34 Prompt Engineering Techniques Automatic Automatic Directional Reasoning and Prompt Active-Prompt Stimulus Tool-use Engineer Prompting Program-Aided Multimodal Language ReAct Reflection CoT Models Graph Prompting 35 The zero-shot prompt The prompt directly instructs the model to perform a task without any additional examples to steer it. Adapting models to new tasks with no additional training data. Classify the text into neutral, negative or positive. Text: I think the vacation is okay. Sentiment: 36 Few-shot prompting The prompt instructs the model to perform a task with few examples to steer it. Adapting models to new tasks with minimal additional training data. Classify the text into neutral, negative or positive. Text: I think the vacation is okay. Sentiment: neutral Text: I think the food was okay. Sentiment: 37 Few-shot prompting This is awesome! // Negative This is bad! // Positive Wow that movie was rad! // Positive What a horrible show! // 38 Self-consistency Generate multiple outputs for the same input and then aggregate or select the most coherent and consistent responses. Used to enhance the quality and reliability of the output. 39 Prompt Chaining It involves using the output of one prompt as the input for another, effectively creating a chain of prompts that guides the model through a multi-step reasoning or generation process. 40 Meta prompting structured guidance to provide explicit instructions about the format or content of the desired output. For example, Summarize the following article [article] in 3-4 sentences, highlighting the main causes of climate change, its impacts on the environment, and any suggested solutions. Be concise but informative. 41 Automatic Prompt Engineer Use of algorithms or models to automatically generate, optimize, or refine prompts. This process helps enhance the performance by ensuring that the prompts are effective and aligned with the desired outputs. 42 Generated Knowledge Prompting It involves utilizing previously generated information or knowledge as context for subsequent prompts. 43 Program-Aided Language Models (PALM) Models designed to enhance the capabilities of large language models by integrating them with programming functionalities or external tools. 44 Active-prompt It encourages the model to engage in a dynamic or interactive way, often requiring it to perform tasks or generate responses based on user inputs in real-time. For example: You are a virtual dinner planner! I want to host a dinner party for f our people. 1. What cuisine do you prefer? (e.g., Italian, Mexican, Indian) 2. Do you have any dietary restrictions? (e.g., vegetarian, gluten-free) 3. What is your budget per person? 45 Reflection Encourage the model to think critically about its previous outputs, decisions, or interactions Reflect on your previous answer, …… 46 Multimodal COT It enables models to process and reason about information coming from multiple modalities, such as text, images, audio, and more. For example: "What can I make with these ingredients?" 47 Challenges and Limitations with Prompt Engineering Sensitivity to Prompt phrasing and Variations: Small changes can drastically alter outputs. Lack of Robustness across diverse contexts : Difficulty generalizing to unseen tasks or formats. 48 Instruction Tuning Fine-tuning models with specific tasks and instructions to meet human preferences. Fine-tuning models on a diverse set of instructions to generalize better across tasks. Methods for Instruction Tuning: Supervised Fine Tuning on instruction datasets: Using labeled data for multiple tasks. Reinforcement learning from human feedback (RLHF) : Aligning models with human preferences through reinforcement learning. 49 Challenges in Instruction Tuning Difficulty in capturing nuanced human preferences. Ambiguity in Instructions: Varying interpretations can lead to inconsistent performance. Ethical concerns around bias and misinformation. Data Scarcity: High quality, diverse instructional data is limited. 50 Advancements in Instruction Tuning Improved datasets and feedback loops. Enhanced model interpretability. Creation of large scale, diverse instruction datasets (e.g., FLAN, T5). 51 Comparison Feature Model Fine Tuning Instruction Tuning Prompt Engineering Method Retrains the model Retrains the model Modifies the input Data required Labeled data Instruction-Output pairs None Computational High High Low Cost Impact Improves performance on a Improves instructions following Guides model output without specific task changing the model. Weights Updated Updated Not updated Pros Can significantly improve Improves the model’s ability to No model retraining is required performance on a specific task follow instructions and generate Relatively inexpensive and fast more relevant responses Leads to better generalization than standard fine-tuning Cons Requires a large labeled Requires substantial datasets of Can be time-consuming dataset instruction-output pairs Requires skill and creativity Computationally expensive Computationally expensive The quality of output is heavily Leads to overfitting dependent on the quality of May lead to negatively prompt impact performance on other Does not improve the 52 model’s tasks capabilities Conclusion Importance of aligning generative models with human preferences. Challenges and advancements Importance of prompt engineering and instruction tuning. Progress and limitations Implications for Future NLP Applications. 53 Future Directions Developing better Integrating prompt Enhancing robustness evaluation and instruction tuning and generalization in frameworks for for improved unseen tasks. prompt effectiveness. performance. Need for more refined Potential for improved Continued focus on and accessible tuning human AI interaction. ethical AI methods. 54

Prompt Engineering Techniques PDF

Document Details

Tags

Related

Summary

Full Transcript