LLMs Explained PDF

What are LLMs? Generative AI is like a creative side of deep learning. It makes new stuff---text, images, audio, video---by learning from what already exists. Large Language Models (LLMs) take this up a notch. They\'re a fancy kind of generative AI that learns tons from lots of text. They get how humans talk, catching onto the patterns and context of our language Large language models refer to large, general-purpose language models that can be pre-trained and then fine-tuned for specific tasks. Pre-trained Large Language Models (LLMs) undergo an initial training phase where they learn a comprehensive understanding of language by analyzing diverse internet text. This phase means they get a broad grasp of general language patterns and contextual nuances. Fine-tuning involves subjecting these pre-trained LLMs to a specialized training regimen tailored to refine their ability in specific domains or tasks, such as medicine or gaming. Pre-training establishes the foundational linguistic knowledge, while fine-tuning tailors the model to excel in targeted applications, leveraging the acquired general expertise 3 Kinds of Large Language Models These are the three kinds of Large Language Models and each needs prompting in a different way. Generic Language Model predicts the next word based on the language in the training data. This is an example of a generic language model - The next word is a \'token\' based on the language in the training data (a token is A unit of text, often a word or subword, processed by the model) Instruction tuned model is trained to predict a response to the instructions given in the input. Dialog tuned model is trained to have a dialog by the next response. Dialog-tuned models are a special case of instruction tuned where requests are typically framed as questions to a chat bot. Dialog tuning is expected to be in the context of a longer back and forth conversation, and typically works better with natural question-like phrasings. Benefits of Using LLMs 1\. A single model can be used for different tasks This is a dream come true. These large language models that are trained with petabytes of data and generate billions of parameters are smart enough to solve different tasks including language translation, sentence completion, text classification, question answering, and more. 2\. The fine-tune process requires minimal field data. Large language models obtain decent performance even with little domain training data. In other words, they can be used for few-shot or even zero-shot scenarios. In machine learning, "few-shot" refers to training a model with minimal data and "zero-shot" implies that a model can recognize things that have not explicitly been taught in the training before. 3\. The performance is continuously growing with more data and parameters. Let\'s take Google PaLM (Pathways Language Model) which was released in April 2022 as an example. PaLM is a dense decoder-only Transformer model with 540 billion parameter. PaLM achieves state-of-the-art performance in multiple language tasks and utilizes the new Pathways system, which allows efficient training across multiple TPU v4 Pods. The Pathway AI architecture enables PaLM to perform various tasks, learn new tasks quickly, and have a better understanding of the world. How Are Large Language Models Trained? Most LLMs are pre-trained on a large, general-purpose dataset. The purpose of pre-training is for the model to learn high-level features that can be transferred to the fine-tuning stage for specific tasks. The training process of a large language model involves: Pre-processing the text data to convert it into a numerical representation that can be fed into the model. Randomly assigning the model's parameters. Feeding the numerical representation of the text data into the model. Using a loss function to measure the difference between the model's outputs and the actual next word in a sentence. Optimizing the model's parameters to minimize loss. Repeating the process until the model's outputs reach an acceptable level of accuracy. How Do Large Language Models Work? Think of a large language model like a super-smart text generator. It learns from tons of examples and uses deep neural networks to come up with stuff. This model is based on something called a transformer, which is like its brain. Unlike other systems that use a loop to understand how words relate in a sentence, this one uses self-attention. It\'s like the model figures out which words are the VIPs in a sentence and pays extra attention to them. Basically, it calculates a fancy weighted sum to decide which words are the most important and connected to each other. It does this by giving each word a score, kind of like saying how crucial it is compared to the others in the sentence. Sample LLMs Use Case Let's take a look at an example of a Text Generation use case. A typical LLM use case is Question Answering. We ask the LLM are question and it generates a response. What is Question Answering (QA) in Natural Language Processing (NLP)? Question answering (QA) is a subfield of natural language processing that deals with the task of automatically answering questions posed in natural language. QA systems are typically trained on a large amount of text and code, and they are able to answer a wide range of questions, including: factual definitional opinion-based questions The key here is that you needed domain knowledge to develop these Question Answering models. In each of the questions, a desired response was obtained. This is due to Prompt Design. What is Prompt Design & Prompt Engineering? Both involve the process of creating a prompt that is clear, concise, and informative. However, there are some key differences between the two. Prompt Design It is the process of creating a prompt that is tailored to the specific task that the system is being asked to perform. For example, if the system is being asked to translate a text from English to French, the prompt should be written in English and should specify that the translation should be in French. Prompt Engineering It is the process of creating a prompt that is designed to improve performance. This may involve using domain-specific knowledge, providing examples of the desired output, or using keywords that are known to be effective for the specific system. In general, prompt design is a more general concept, while prompt engineering is a more specialized concept. Prompt design is essential, while prompt engineering is only necessary for systems that require a high degree of accuracy or performance. Challenges & Limitations of LLMs Despite their impressive capabilities, LLMs still face some challenges: Bias: LLMs are trained on data that reflects the biases of the real world. This can lead to biased outputs, particularly when dealing with sensitive topics. Bias can present itself as stating an opinion or subjective answer as a given fact. For example, the following answer shows bias: Bias can present itself as using gender, race, class etc stereotypes as a norm. Explainability: It can be difficult to understand how LLMs arrive at their outputs, making it challenging to assess their reliability and trustworthiness. Accessibility: Access to large-scale computing resources and expertise required to train and operate LLMs can be limited. This is why it is important we continue to train LLMs to ensure they are honest, harmless, helpful, and usable via our LLM Projects. LLM Task Types in LLM Projects There a many different types of tasks in LLM projects. These may be to train LLMs by providing excellent examples, examples of how different locales and languages may interact with them, or testing their limits. On one thing that is important is that we provide genuine and authentic human content and interaction. Let\'s look at just some of the type of tasks you may be asked to complete in an LLM Project. Prompt Authoring You may be asked to create the following type of prompts. Different clients may approach these types of prompts differently, or have different definitions. This is why it is very important that when you take part in a project you fully review the instructions and information for that specific project, at first glance they may seem familiar to your understanding of prompt authoring but there may be nuances. Generate A prompt created to instruct the AI model to produce new or creative content or information. The request can best be described as "content creation" or "creative writing". It can include, but not limited to, asking an AI model to write a story, poem, song, create an email, provide a response, etc. based on the specific topic and details or scenarios provided in the prompt. Brainstorming A prompt created to generate ideas on a specific topic. It encourages discussion, creative thinking and the possibility for different perspectives, possibilities and solutions. Chain of thought A prompt that would generate a response that shows a logical progression of thought(s). The prompt can be in the form of one problem, encouraging a flow of chained responses. Alternatively, it can be a sequence of requests/questions that build up on one another, encouraging the AI model to respond in a progression of thoughts. QA A direct question requesting for information on the given topic. The QA category can be open or closed ended. Different approaches and clients may have differing definitions of open and closed Q&As. Here is the definition we are using in this module: Open-ended QA: The question cannot be answered with information in the prompt. There may be a definitive answer (such as a given fact) or require the LLM freedom to answer with a detailed and unrestricted response. Closed-ended QA: The question must be answered with information in the prompt alone. Classification A prompt requesting the AI model to classify or categorize specific content based on the given topic. You must include the test to be classified in the prompt itself. Rewrite A prompt request for specific text (sentence, paragraph, etc) to be rewritten, paraphrased. The text for rewording must be provided in the prompt. Extract A prompt request for key points or highlights to be extracted from a given text. The text needs to be provided in the prompt and must be at least a paragraph long (not 1-2 sentences). Summarize A prompt request for the summary of a specific text. The text needs to be provided in the prompt and have a significant length in order to be summed up. Translate A prompt requesting a translation of a given word, sentence or text. The text for translation and the target language need to be provided in the prompt. Code A prompt requesting a code related task or challenge for the given topic. Conversation A prompt requesting to generate a conversation between two or more parties given a specific topic/subject matter. In this format, you will present a conversation that ends with the expectation that the LLM will continue the conversation. Creating Complete Conversations Some tasks may ask the user to act as both human and \'chatbot\'. This is to create conversations from scratch where the user writes the prompt/question and then writes what would be a desirable answer. In this type of task, the challenge is to be able to think like a user (not as yourself) and write the type of prompt they would and also write helpful, honest, and harmless answers as the chatbot, that the user will be happy and satisfied with. These types of tasks are particularly useful to train and challenge models on specialist domains/topics. They are also useful to train models for a specific locale. Creating Complex and Domain Specific Prompts LLMs have become very good at answer basic prompts, for example, what is the capital of Spain? To take LLMs to the next level, many projects require prompts and answers that demonstrate complexity. Complexity can refer to two distinct things: a complex question is a very difficult question in terms of the topic. a complex question is complex due to how it it is written. It may not be a straight-forward or it may contain many parameters the answer must adhere to, for example, \"I want ideas for wedding table settings but they must not include real flowers (groom is allergic) and must include the colours orange and yellow. Please provide 3 ideas in a list format\". Some tasks may require people with specific domain (topic) knowledge to create in-depth and specialised prompts and answers in such domains as: medicine coding finance technology mathematics etc Turns A prompt and a responding answer is called a turn. Some projects require the user to produce one turn - that would be a prompt (question) and a response (answer). Other projects may require the user to provide a multi-turn conversation. The image to the right shows an example of what a two-turn conversation would look like. Do not under any circumstances use a generative chat/LLM model such as ChatGPT when completing such tasks. Do not use the model to create content, check facts, brainstorm ideas, etc.  Using a generative chat tool etc is a breach of this project and you may have your account terminated. Red Teaming Tasks Red teaming is like having someone play the bad guy to test how well something works. The goal is to find any weak spots or problems that might be missed otherwise. It\'s kind of like a practice round to make sure everything is strong and ready to go. In terms of LLM tasks, this may be using an AI model and trying to get it to break its own policies or safety guidelines. These tasks may be quite sensitive in nature and challenging to complete. Even though you know you are pretending, it may be uncomfortable to use offensive language or to communicate unpleasant ideas. We offer support and guidance to those taking part in such projects, and not taking part in, or leaving, the project does not affect anyone\'s profile or future participant in other projects. Labelling and Grading Project Labelling can include tasks such as looking at prompts and/or response and labelling them for topics present etc. Grading can include tasks that evaluate and assess a model response, this helps ensure the LLM is providing responses which are accurate and coherent. Labelling and grading can include: Accuracy - is the response accurate? Is it providing factually correct information and the outcome the user wants. Fidelity and Coherence - this assess the coherence, relevance, and overall quality of the generated text. Locale suitability - is the generated texted suitable for the local of the user. Glossary Listed below essential terms and concepts related to LLMs and AI in a way that's easy for non-data scientists to understand. Adapters: Tiny parts you can add to a trained model to help it do a specific task without changing it too much. Example: Adding a new skill to a computer game character without changing the whole game. Alignment: AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system is competent at advancing some objectives, but not the intended ones. Artificial Intelligence (AI): It's like a smart robot that can think and do things like humans. AI helps computers solve problems, make decisions, and understand our language. Example: Siri on your iPhone. Base model/foundational model: a pre-trained language mdoel that serves as the starting point for building more specific models for downstream tasks. Bias: When a computer makes mistakes because its training data isn't balanced or representative. Example: A computer thinking all doctors are men because it mostly reads about male doctors. Conversational AI: artificial intelligence systems that are designed to engage in natural language conversations with users. These systems use advanced natural language processing (NLP) and natural language understanding (NLU) techniques to interpret and respond to user inputs in a way that simulates human-like conversation. Conversational AI can be implemented in various forms, including chatbots, virtual assistants, messaging apps, and voice-activated interfaces. Data Augmentation: Making a dataset bigger and more diverse by creating new samples, like rephrasing sentences. Example: Changing "The cat is on the mat" to "The cat sits on the mat." Attention Mechanism: A way for computers to focus on important parts of the input when creating an output. Example: A computer knowing "pizza" is the most important word in "I want to eat pizza." Beam Search: A method for finding the best sequence of words when a computer generates text. Example: A computer choosing the most likely next word in a sentence. BERT: A transformer model that helps computers understand language for tasks like guessing what people think about a movie. Example: A computer that knows if a review is positive or negative. Chain-of-Thought Prompting: Chain-of-thought prompting (CoT) improves the reasoning ability of LLMs by prompting them to generate a series of intermediate steps that lead to the final answer of a multi-step problem. Context Window / Context Length: context window is the number of tokens that are considered when predicting the next token. Deep Learning (DL): It's a way computers learn from many examples, like how you learn from experience. Deep learning uses special computer programs called neural networks to find patterns in data. Example: A computer learning to recognize cats in pictures. Decoder: Part of a transformer that helps it create a response or answer. Example: A computer replying, "The weather today is sunny and warm." Encoder: Part of a transformer that helps it understand and remember what you tell it. Example: A computer remembering the question, "What's the weather like today?" Explainable AI (XAI): Making computers' decision-making processes easier for humans to understand. Example: A computer explaining why it thinks a certain movie is a comedy. Few-Shot Learning: When a computer can learn a new task with just a few examples. Example: A computer that can learn your favorite songs after hearing them once or twice. Fine-Tuning: Adjusting a trained model to be better at a specific task. Example: Teaching a computer to understand and answer questions about dinosaurs. Foundation Models: Big AI models, like LLMs, can be used for many different tasks. Example: A computer that can help with homework, write emails, and tell jokes. GPT-3 and GPT-4: A transformer model that helps computers generate text like a human, such as completing a sentence or writing a summary. Example: A computer writing a book report for you. Grounding: Grounding in large language modeling is the process of associating words and phrases with their corresponding real-world entities and concepts. Hallucination: In artificial intelligence (AI), a hallucination or artificial hallucination (also occasionally called confabulation or delusion) is a confident response by an AI that does not seem to be justified by its training data. In-Context Learning: When a computer can change its behavior based on the input, without extra training. Example: A computer knowing how to answer a question about sports after talking about sports. Large Language Model (LLM): A language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning. Machine Learning (ML): a subfield of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to learn from and make predictions or decisions based on data. The primary goal of machine learning is to create systems that can automatically improve and adapt their performance over time without being explicitly programmed for each task. Modality: A high-level data category. For example, numbers, text, images, video, and audio are five different modalities. Natural Language Generation (NLG): Teaching computers to create human-like text. Example: A computer that can write a story or a poem. Natural Language Processing (NLP): Teaching computers to understand, interpret, and create human language. Example: A computer that can chat with you or read your essay. Natural Language Understanding (NLU): Teaching computers to understand and find meaning in human language. Example: A computer that knows the difference between "I like cats" and "I don't like cats." Neural Network: A computer program that works like the human brain, using connected nodes (like brain cells) in layers. Example: A computer "brain" that can play a video game. Perplexity: A way to measure how well a computer can predict text. Example: A lower perplexity means a computer is better at guessing what word comes next in a sentence. Positional Encoding: A way transformers remember the order of words in a sentence. Example: Remembering that "the dog chased the cat" differs from "the cat chased the dog." Pretraining: The first step in training an LLM, where it learns language from lots of text. Example: A computer reading lots of books and articles to learn how to write. Prompt Engineering: a process of designing and constructing effective natural language prompts for use with large language models. Prompts are input patterns that are used to guide the behavior of LLMs and generate text that is relevant to a specific task or domain. One-shot Prompting/Learning: a prompting technique that allows a model to process one labeled example before attempting a task. Prompt Tuning: Changing the way you ask a computer a question to get a better answer. Example: Asking, "What's the capital of France?" instead of "Where's Paris?" Red Teaming: Red teaming is a process of critical analysis and examination conducted by an independent group or individual to assess and challenge the effectiveness, security, or strategy of a system, organization, or plan. The goal of red teaming is to identify vulnerabilities, weaknesses, or potential shortcomings that might not be apparent to the entity being evaluated. It is often used in various fields, including cybersecurity, military operations, and business, to enhance preparedness and resilience by simulating adversarial perspectives and tactics. Self-Attention: A way transformers focus on the most essential parts of a sentence. Example: Knowing that "cake" is the key word in "I want to eat cake." Sequence-to-Sequence (Seq2Seq) Model: A type of model that changes one sequence, like text, into another sequence, like a translation. Example: A computer turning English text into French text. T5: A transformer model that's good at both understanding and generating text, like translating one language to another. Example: A computer that can translate English to Spanish. Temperature: a hyperparameter that controls the randomness of the model\'s output. A temperature of 0 means always output the highest probability token. Token: A unit of text, often a word or subword, processed by the model Tokenization: Breaking text into words or parts of words, called tokens, to help computers understand language. Example: Splitting the sentence "I have a dog" into tokens: "I", "have", "a", and "dog". Transfer Learning: Using what a computer learned from one task to help it do another related task. Example: A computer that learned to recognize cats using that knowledge to recognize dogs. Transformer: A particular type of neural network created by Google to understand and generate language in a better way. Example: A computer that can chat with you like a friend. Unsupervised Learning: When a computer learns patterns without being told what's right or wrong. Example: A computer learning to group similar pictures together. Vocabulary: The set of unique words or tokens a computer program can understand. Example: computer knowing the words "apple", "banana", and "orange" but not "kiwi". Zero-Shot Learning: When a computer can do a task without being trained on it. Example: A computer that can play a new game without practicing first. Summary Key Takeaways Large Language Models (LLMs) are important machine learning tools that use deep learning algorithms to work with and understand human language. These models learn from huge amounts of text data to understand patterns and connections in language. LLMs can do various language-related tasks, like translating languages, analyzing feelings, having conversations like a chatbot, and more. They can understand complicated written information, recognize things and how they're connected, and create new text that makes sense and follows grammar rules! Large language models are leading the way in artificial intelligence. They can do a lot, like generate text that seems human, help with easy language translation, figure out emotions in text, and even make computer code. These models are useful in different areas like technology, healthcare, marketing, and more. They're not just tools for language; they're key parts in shaping the future of artificial intelligence.

LLMs Explained PDF

Document Details

Tags

Related

Summary

Full Transcript