Intro to AI Course Notes PDF
Document Details
Uploaded by HonestEcoArt
Columbia University
Ned Krastev
Tags
Related
- Artificial Intelligence (AI) Lecture 2 - Introduction PDF
- CS 312 Introduction to Artificial Intelligence - Clustering Algorithms PDF
- Introduction to Artificial Intelligence (AI101) Lecture Notes - Chitkara University PDF
- FCAI Lecture 1: Introduction To Artificial Intelligence PDF
- AI & Machine Learning 2 PDF
- Introduction to Artificial Intelligence PDF
Summary
This document is a course of notes on artificial intelligence, which discusses fundamental and advanced topics. It introduces the concepts of weak and strong AI, and important AI branches such as robotics and computer vision. The course also explores practical tools and technologies in AI development.
Full Transcript
1 Ned Krastev Intro to AI 2 Table of Contents....................................
1 Ned Krastev Intro to AI 2 Table of Contents................................................................................................................................................................ 1 1. Getting started.................................................................................................................................... 3 2. Data is essential for building AI....................................................................................................... 7 3. Key AI techniques.......................................................................................................................... 10 4. Important AI branches.................................................................................................................. 14 5. Understanding Generative AI....................................................................................................... 17 Abstract 1. This course explores fundamental and advanced topics in artificial intelligence (AI), starting with comparisons between natural and artificial intelligence and a historical overview of AI development. It distinguishes between AI, data science, and machine learning, and discusses the concepts of weak versus strong AI. The data section covers structured versus unstructured data, data collection methods, labeling, and the increasing volume and quality of data. Key AI techniques introduced include various forms of machine learning—supervised, unsupervised, and reinforcement—and deep learning via neural networks. Special focus is given to AI branches like robotics, computer vision, predictive analytics, and generative AI, including a detailed examination of ChatGPT and other language models. The course also addresses practical tools and technologies in AI development such as Python, APIs, and open-source models, and concludes with discussions on AI job roles, ethics in AI, and the future of AI. 3 1. Getting started 1.1 Natural vs. Artificial Intelligence - Demonstrating natural intelligence Natural intelligence is demonstrated through diverse activities like driving on highways, solving complex math, crafting poetry, and mastering music. These skills, which are learned rather than innate, showcase our brain's remarkable ability to process vast amounts of information. - The sophistication of Human Intelligence The human brain is incredibly sophisticated, with its capacity to acquire and apply knowledge leading to: Technological innovations that far surpass the tools of past centuries. Tool creation, highlighting our unique ability as humans to enhance our productivity through innovation. - Gutenberg’s printing press Gutenberg’s Printing Press, invented in 1440, revolutionized the way knowledge is disseminated, making it one of the most significant machines in history. Despite its transformative impact, it operated under fixed parameters and did not possess the capability to learn or adapt. 1.2 Brief history of AI The evolution of Artificial Intelligence: Key milestones Early Beginnings 1950 - Alan Turing's Seminal Question: Alan Turing publishes a paper asking, "Can machines think?" and introduces the Turing Test, setting a practical criterion for evaluating machine intelligence. If an interrogator can't distinguish between responses from a machine and a human, the machine is deemed to exhibit human-like intelligence. Formal Recognition 1956 - Dartmouth Conference: The term "artificial intelligence" is coined, marking the formal start of AI as a field of study. This conference brings 4 together experts from various disciplines to discuss the potential for machines to simulate human intelligence. Period of Stagnation 1960s and 70s - AI Winter: Challenges due to limited technology and data availability lead to reduced funding and interest, slowing AI progress. Technological Resurgence 1997 - IBM’s Deep Blue: Deep Blue defeats world chess champion Garry Kasparov, reigniting interest in AI. Late 1990s and Early 2000s: A surge in computer power and the rapid expansion of the Internet provide the necessary resources for advanced AI research. Advancements in Neural Networks 2006 - Geoffrey Hinton’s Deep Learning Paper: Revives interest in neural networks by introducing deep learning techniques that mimic the human brain's functions, requiring substantial data and computational power. 2011 - IBM’s Watson on Jeopardy!: Demonstrates significant advances in natural language processing, as Watson competes and wins in the quiz show "Jeopardy!". 2012 - Building High-Level Features Paper: Researchers from Stanford and Google publish a significant paper on using unsupervised learning to train deep neural networks, notably improving image recognition. Breakthroughs in Language Processing 2017 - Introduction of Transformers by Google Brain: These models transform natural language processing by efficiently handling data sequences, such as text, through self-attention mechanisms. 2018 - OpenAI’s GPT: Launches the generative AI technology that uses transformers to create large language models (LLMs), leading to the development of ChatGPT in 2022. 1.3 Demystifying AI, Data science, Machine learning, and Deep learning 5 Artificial Intelligence (AI) Objective: AI aims to create machines that can mimic human intelligence by learning and acquiring new skills. Scope: AI is a broad field that encompasses various subfields, including machine learning, making it the overarching discipline focused on intelligent behavior in machines. Machine Learning (ML) Definition: Machine learning is a key subfield of AI that uses statistical methods to enable machines to improve at tasks with experience. Functionality: Essentially, ML involves feeding input data into a model that processes it and produces an output. For example: o An algorithm might analyze a user's movie-watching history to predict which movies they will likely enjoy next. o Financial transaction data could be used to generate a credit score predicting a customer's loan repayment likelihood. Importance: ML is significant within AI because it provides the methods and technologies that allow computers to learn from and make predictions based on data. Data Science Relationship with AI and ML: While data science includes AI and machine learning, it also encompasses a broader set of statistical methods. Tools and Methods: Beyond machine learning, data scientists employ traditional statistical methods like data visualization and statistical inference to extract insights from data. Applications: o A data scientist might use ML algorithms to predict future client orders based on historical data. o Alternatively, they could perform an analysis correlating client orders with store visits to derive actionable business insights. Scope: Data science is not only about creating predictive models but also about understanding and visualizing data patterns to support decision- making. 6 1.4 Weak vs Strong AI Narrow AI Definition: Narrow AI refers to artificial intelligence systems that are designed to handle specific tasks. Examples and Applications: An example we discussed is a machine- learning algorithm that predicts movie recommendations based on a user's viewing history. This type of AI is pervasive in our daily lives and beneficial for businesses, handling defined and narrow tasks efficiently. Semi-Strong AI Introduction of ChatGPT and GPT 3.5: OpenAI's release of ChatGPT and the GPT 3.5 models in 2022 marked a significant advancement towards semi- strong AI. Capabilities: Unlike narrow AI, ChatGPT can perform a broad range of tasks: o Writing jokes o Proofreading texts o Recommending actions o Creating visuals and formulas o Solving mathematical problems Relation to Turing Test: ChatGPT's ability to generate human-like responses aligns with Alan Turing's imitation game concept, suggesting it can pass the Turing Test, thus classifying it as semi-strong AI. Artificial General Intelligence (AGI) Definition and Goal: AGI, also known as strong AI, aims to create machines that are generally more capable than humans across a variety of tasks. Current Research: While significant strides have been made, leading institutions like OpenAI and Google have not yet achieved AGI. Future Prospects and Ethical Considerations: Sam Altman of OpenAI suggests AGI could be developed in the near future. As we approach this possibility, it's crucial to consider the potential and ethical implications of 7 machines that can surpass human intelligence and independently create new scientific knowledge. 2. Data is essential for building AI 2.1 Structured vs unstructured data Types of Data Structured Data: o Definition: Organized into rows and columns, making it easy to analyze. o Example: A sales transactions spreadsheet with predefined fields in Excel. Unstructured Data: o Definition: Lacks a defined structure and cannot be organized into rows and columns. Includes formats like text files, images, videos, and audio. o Prevalence: Represents 80-90% of the world’s data, making it the dominant form of data. Evolution of Data Value Past Perception: Structured data was historically valued more due to its ease of analysis. Current Advancements: AI technologies have advanced to the point where unstructured data can now be transformed into valuable insights. Companies like Meta and Google are leading in this area, unlocking the potential of unstructured data. Opportunities from Unstructured Data Business Applications: Analyzing unstructured data (e.g., photographs, videos, text messages, emails) is creating enormous opportunities for businesses to gain insights that were previously inaccessible. 2.2 How we collect data 8 The MNIST Database Overview: Often referred to as the “Hello World” of machine learning, the MNIST database consists of 70,000 images of handwritten digits, each 28x28 pixels in grayscale. Purpose: The goal is to train a machine learning algorithm to recognize handwritten digits, despite variations in individual handwriting. Understanding Digital Representation Pixel Values: Each pixel in the image has a value between 0 (white) and 255 (black), representing different shades of gray. Binary Encoding: All digital information, including pixel values, is stored in binary form as combinations of 0s and 1s. Machine Learning Process Training: By examining these binary sequences, the computer learns to distinguish between different digits like 3, 6, or 9 based on their unique numerical patterns. Application: This training enables computers to recognize and differentiate digits from 0 to 9, showcasing a fundamental aspect of machine learning. Broader Implications for AI Information Conversion: Similar to how images are converted into binary sequences, videos, sounds, and written text are also transformed into data that computers can process. Pattern Recognition: Through machine learning, computers learn to identify patterns and similarities in this data, which helps them perform various tasks. Human Perception vs. AI Human Brain Capabilities: Humans can process vast amounts of information simultaneously through sensory perception. AI Aspiration: AI researchers aim to emulate this capability by developing systems that can interpret complex data from multiple sources like sensors, social media, and satellite images. 9 Importance of Data Quality Data Collection: Data is collected through various means such as web scraping, APIs, and big data analytics. Model Accuracy: The quality of data directly impacts the effectiveness of AI models. The adage “Garbage in, garbage out” highlights the necessity of high-quality data for producing reliable AI outputs. 2.3 Labelled and unlabelled data Labeled Data Definition: Labeled data involves tagging each item in a dataset with specific labels that the AI model will learn to recognize and predict. For example, photos can be classified as 'dog' or 'not a dog', and comments can be labeled as positive, negative, or neutral. Process: This method requires a meticulous review and classification of each data item, which can be time-consuming and costly. Benefits: Labeled data significantly enhances the accuracy and reliability of AI models, making them more effective in real-world applications. Unlabeled Data Definition: Unlabeled data does not come pre-tagged with labels. The AI model is tasked with analyzing the data and identifying patterns or classifications on its own. Application: This approach is often applied to large datasets where manual labeling is impractical due to resource constraints. Trade-offs: While less resource-intensive upfront, models trained on unlabeled data might not achieve the same level of accuracy as those trained on well-labeled datasets. Practical Implications Choice of Method: The decision between using labeled or unlabeled data often depends on the specific requirements of the project, available resources, and desired model performance. 10 Future Discussions: Subsequent lessons will delve deeper into how AI models learn from these types of data and the techniques used to enhance their learning process. 2.4 Metadata: Data that describes data Impact of Digitalization on AI Data Growth: The rapid expansion of online platforms, mobile technology, cameras, social media, sensors, and Internet of Things devices has resulted in a massive increase in data generation. Quality and Quantity: Not only has the volume of data grown, but the quality has also improved significantly, as evidenced by the comparison of old mobile phone photos to modern smartphone images. Challenges of Managing Data Unstructured Data: A large portion of this newly generated data is unstructured and too vast to manually label or organize effectively. Metadata as a Solution: To manage this overwhelming amount of data, metadata becomes essential. Metadata is data about data, providing summaries of key details such as asset type, author, creation date, usage, file size, and more. 3. Key AI techniques 3.1 Machine learning Recap and Context AI Basics and History: Previously, we explored foundational concepts, the history of AI, and distinguished between weak and strong AI. Importance of Data: We emphasized that high-quality, abundant data is crucial for building effective AI systems. Introduction to Machine Learning (ML) 11 Analogy: Machine learning is likened to a student learning from a teacher, where the ML model is the student and the data scientist acts as the teacher. The teacher provides extensive, quality data (learning materials) to train the student. Training Process: Like preparing for an exam, the ML model is trained to recognize patterns and solve problems using unseen data. The quality and relevance of training data significantly influence the model's performance. Practical Application in Business Real Estate Example: o Scenario: A real estate agent wants to develop a mobile app to estimate home selling prices based on user inputs. o Data Scientist's Role: The data scientist assesses the feasibility of the project, emphasizing the need for a comprehensive database of past transactions. o Model Development: The model predicts house prices by analyzing past transactions (input x) to estimate selling prices (output y). o Outcome: The successful app significantly boosts the real estate agent's business by predicting prices and collecting potential seller contacts. Educational Takeaway Functionality of ML: This example illustrates how ML uses historical data to learn patterns and make predictions about new, unseen situations. Future Lessons: The next lesson will cover different types of machine learning models, further expanding on how these technologies are applied. 3.2 Supervised, Unsupervised, and Reinforcement learning Supervised Learning Definition: Supervised learning uses labeled data to teach models how to predict outputs based on input data. Classification: An example is identifying whether an image contains a dog or not, using a dataset where each image is labeled as ‘dog’ or ‘not dog.’ 12 Regression: Another use is in prediction, such as estimating house prices based on a dataset with known home features and prices. Key Point: The model is explicitly trained with known outputs, guiding its learning process. Unsupervised Learning Definition: Unsupervised learning involves analyzing data without pre- labeled responses. Clustering: The model scans data to identify inherent patterns and group similar items, such as differentiating between images of dogs and cats without prior labels. Applications: Useful when labeling data is impractical or too costly, or when the relationships within data are unknown. Examples include identifying customer segments in a supermarket or determining popular property types in real estate. Key Point: The algorithm autonomously discovers relationships and patterns without direct input on the desired output. Reinforcement Learning Definition: Reinforcement learning teaches models to make decisions by rewarding desired behaviors and penalizing undesired ones, optimizing for a specific goal without labeled data. Application and Dynamics: Commonly used in robotics and recommendation systems like Netflix’s. The model learns from direct interaction with the environment, improving its recommendations based on user feedback, such as views, skips, and ratings. Key Point: Operates on a trial-and-error basis within defined rules, constantly adjusting actions based on feedback to achieve the best outcomes. 3.3 Deep learning Deep learning, a sophisticated subset of machine learning, draws inspiration from the human brain's structure and function. This advanced AI methodology allows machines to process information in stages, mirroring how our brains interpret complex stimuli. 13 Human Brain vs. Artificial Neural Networks (ANN) Initial Perception: Just as our brain gets a general impression from a first glance (like recognizing a sunny beach day), the input layer of an ANN receives raw data and begins the processing journey. Deeper Analysis: Upon closer inspection, our brain identifies more details (like children around a sandcastle or odd facial features), similar to how subsequent layers in an ANN detect and interpret more complex data features. Complex Understanding: In both human perception and deep learning, deeper layers synthesize basic insights into higher-level concepts, enabling nuanced understanding and recognition of intricate patterns. Practical Example of Deep Learning MNIST Dataset: Utilizing a dataset of handwritten digits, an ANN with multiple layers learns to recognize numbers by identifying and combining features such as edges and curves through its layers, ultimately determining the digit represented in an image. Layer Functions: o Input Layer: Receives raw pixel data, with each pixel's brightness represented as an activation value. o Hidden Layers: Sequentially refine and transform data, focusing on increasingly specific attributes. o Output Layer: Produces the final decision, identifying the specific digit. Significance of Deep Learning Pattern Recognition: By emulating aspects of human cognitive processes, ANNs excel in recognizing patterns and interpreting data from large, high- dimensional datasets. AI Advancements: Deep learning's ability to analyze complex patterns with high accuracy has driven significant advancements in AI, transforming many technological and scientific fields. 14 4. Important AI branches 4.1 Robotics Historical Context Ancient Origins: Tales like the myth of Talos and the mechanical inventions of Al-Jazari show early human fascination with automata and mechanical beings. Renaissance Innovations: Leonardo Da Vinci's designs, such as the mechanical knight and lion, prefigured modern robotic concepts, illustrating a longstanding interest in replicating human and animal actions through machines. Modern Robotics Definition: Robotics involves designing, constructing, and operating robots—machines capable of performing tasks either autonomously or with human-like capabilities. Interdisciplinary Field: The creation of robots requires a collaborative effort among mechanical engineers (for physical structure and mobility), electronics and electrical engineers (for operational control), and AI specialists (for decision-making and behavioral intelligence). AI Integration in Robotics Role of AI: Advanced AI technologies drive the decision-making and perception capabilities of robots, equipping them with sensors and cameras to interact intelligently with their environment. Multi-Model Systems: Effective robots often integrate multiple AI models, including: o Computer Vision: For object detection and environmental understanding. o Simultaneous Localization and Mapping (SLAM): For navigation and mapping. o Reinforcement Learning: For adaptive decision-making. o Natural Language Processing (NLP): For understanding and generating human language. 15 Practical Applications and Future Potential Industrial Automation: Robots like the Tesla Bot are being developed to handle repetitive and hazardous tasks in industrial settings, enhancing efficiency and safety. Medical Robotics: Robots in healthcare are already performing precise interventions and complex surgeries, significantly impacting patient care. Broader Applications: The use of robots extends to various sectors including agriculture (harvesting robots), domestic tasks (cleaning robots), exploration (space robots), emergency response (search and rescue robots), and security (surveillance robots). 4.2 Computer vision How Computer Vision Works Processing Images and Videos: Computers analyze still images and video frames, understanding nuances such as movement, shape changes, and color differences. Complexity of Videos: Unlike static images, videos consist of continuous image sequences (e.g., 30 frames per second), requiring more complex processing to maintain context and continuity. Main Families of Computer Vision Models Convolutional Neural Networks (CNNs): o Foundation: Essential for handling high-dimensional image data. o Functionality: CNNs excel at recognizing spatial hierarchies in images, organizing elements based on depth and importance, and gradually learning from basic to complex features across layers. Transformers: o Application: Increasingly used in computer vision, particularly in generative AI contexts. Generative Adversarial Networks (GANs): o Purpose: Primarily used for creating realistic images. 16 Specialized Networks: o Examples: U-net for medical image segmentation and EfficientNet for optimizing neural network performance and resource use. Applications of Computer Vision Broad Impact: From self-driving cars and medical imaging to security and surveillance. Non-Robotic Uses: Face recognition software exemplifies computer vision applied in non-robotic contexts. Virtual Reality: Advances in VR are revolutionizing education, entertainment, and communication by enhancing immersive experiences. 4.3 Traditional ML While AI innovations like ChatGPT and autonomous vehicles often capture public imagination, a substantial portion of AI's value lies in its application within traditional business operations. These applications might not make headlines as frequently, but they are fundamental in transforming various industries by enhancing efficiency and accuracy. 4.4 Generative AI Generative AI refers to the branch of artificial intelligence capable of generating new data or content. It stands out because it creates novel outputs, rather than just processing existing data. Examples: ChatGPT and DALL-E are prominent examples, where ChatGPT generates textual content and DALL-E produces images based on descriptions. Techniques in Generative AI Large Language Models (LLMs): These are neural networks trained on vast amounts of text data, predicting word relationships and subsequent words in sentences. LLMs are foundational for text-based applications like ChatGPT. 17 Diffusion Models: Used primarily for image and video generation, these models start with a noise pattern and refine it into a detailed image, applying learned patterns to enhance realism. Generative Adversarial Networks (GANs): Introduced in 2014, GANs use two algorithms in tandem—one to generate content and the other to judge its realism, improving both through iterative enhancement. Neural Radiance Fields: Specialized for 3D modeling, these are used to create highly realistic three-dimensional environments. Hybrid Models: Combining techniques like LLMs and GANs, hybrid models leverage the strengths of multiple approaches to enhance content generation. Impact and Applications Industry Revolution: Generative AI is pivotal in industries ranging from entertainment and media to architecture and healthcare, where it enables the creation of complex, realistic models and simulations. Corporate Influence: Big Tech firms are heavily investing in Generative AI, driving forward innovations that can generate content across text, images, videos, audio, and more. Future Potential: As technology evolves, Generative AI is set to profoundly impact how businesses operate, offering new ways to create and manipulate digital content. 5. Understanding Generative AI 5.1 Early approaches to Natural Language Processing (NLP) Early Beginnings Natural Language Processing, or NLP, is a field within computer science that focuses on enabling computers to understand, interpret, and generate human language. Originating in the 1950s, NLP initially relied on rule-based systems. These systems used explicit language grammar rules to process text. For example, a rule might dictate that sentences beginning with "Can you," "Will you," or "Is it" should be treated as questions, helping the system recognize "Can you help me?" as a question. 18 Shift to Statistical NLP By the late 1980s and into the early 1990s, the field began to shift towards Statistical NLP. This new approach moved away from rigid rule-based systems to probabilistic methods that analyze extensive data to understand language. This transition marked a significant development, as it utilized statistics to interpret language usage more dynamically. Practical Example In the 90s, statisticians would analyze sentences containing the word "can" to determine its use as a noun or a verb—important for understanding its meaning in context. For instance, "can" as a verb might indicate ability, while as a noun, it could refer to a container. Analyzing how "can" was used with surrounding words like "you" or "soda" helped predict its grammatical role in sentences. Towards Modern NLP This approach of analyzing word usage and context paved the way for the development of models that predict word meanings based on calculated probabilities. Such methods resemble early forms of machine learning, foreshadowing the sophisticated techniques like vector embeddings and deep learning that dramatically enhance NLP today. 5.2 Recent NLP advancements Integration of Machine Learning The 2000s marked a significant shift in NLP with the integration of machine learning techniques, greatly enhancing the capability to analyze and interpret large volumes of text data. Role of Vector Embeddings Complex Representation: Vector embeddings transformed how textual information is processed by representing words and sentences as numerical arrays in high-dimensional spaces. Semantic Similarity: These embeddings capture complex linguistic relationships and meanings, enabling models to operate in spaces that are often several hundred to thousands of dimensions deep. 19 Emergence of Advanced ML Models Linguistic Nuances: New machine learning models began to recognize subtle linguistic elements like sarcasm and irony, which were challenging for earlier statistical methods. Use of Neural Networks: By the 2010s, neural networks with their deep, multi-layered structures became crucial for advancing NLP tasks such as translation, speech recognition, and text generation. Transformative Impact of Transformer Architecture Introduced in 2018, the transformer architecture revolutionized NLP by facilitating the development of Large Language Models (LLMs) such as GPT and Gemini. Capabilities: Transformers enable better handling of sequential data and improve the learning of dependencies in text, significantly impacting the field. 5.3 From Language Models to Large Language Models (LLMs) Game-Based Learning Analogy In a team-based word association game, players use strategic word choices to help teammates guess a secret word. This mirrors how language models operate, using probabilistic predictions to determine the most likely next word in a sequence. Language Models Explained Language models predict words based on the context provided by surrounding text. They are inherently probabilistic, designed to fill in blanks in sentences or continue a sequence of text. Types of Language Models Masked Language Models: These models can predict a missing word anywhere in a sentence by considering both the preceding and following context. 20 Autoregressive Language Models: These predict the next word in a sequence using only the preceding words as context. Models like OpenAI’s GPT are autoregressive, building each prediction based on previously generated words. Statistical Learning and Model Training Language models are trained on vast datasets, learning from diverse linguistic patterns. This training allows them to develop a probabilistic understanding of word associations, enabling them to generate coherent and contextually appropriate text. Generative AI The term "generative" highlights these models' ability to produce new content, making them a subset of Generative AI. They can generate a wide range of outputs based on the training data they have processed. Expansion and Scalability Initially focused on single languages, modern models are increasingly multilingual and trained on expanding datasets. The size of these models, often referred to as Large Language Models (LLMs), is growing, with models like GPT-3 and GPT-4 using hundreds of billions to over a trillion parameters. 5.4 The efficiency of LLM training. Supervised vs Semi-supervised learning Challenges of Supervised Learning High Costs: Supervised learning requires labeled data. For example, labeling 100,000 customer reviews at 30 cents each amounts to $30,000. For more complex data like medical records, the cost can soar to millions. Scalability Issues: The expense and effort to label data increase with the complexity, making supervised learning less feasible for large datasets such as the entire internet content, which would be prohibitively expensive and biased if labeled by few individuals. Limitations of Unsupervised Learning 21 Lack of Direction: While unsupervised learning does not require labeled data, it lacks specific objectives, making it difficult for models to learn structures that are meaningful for understanding or generating human language. Contextual Weaknesses: This approach struggles with language nuances since it does not prioritize context prediction from previous texts, which is crucial in language processing. Introduction to Self-Supervised Learning Balanced Approach: Self-supervised learning offers a solution by enabling models to generate their own labels from unlabeled data. This method leverages the inherent structure in data to predict context and learn effectively. Advancements with LLMs: Self-supervised learning has been instrumental in developing Large Language Models (LLMs) like ChatGPT. These models autonomously analyze text, generate labels, and use the context from surrounding words to predict subsequent content, thus enhancing their understanding and generation of natural language. 5.5 From N-Grams to RNNs to Transformers: The Evolution of NLP N-grams Basics: N-grams predict the probability of a word based on the preceding n−1n-1n−1 words. Unigrams (n=1) predict words without any contextual basis, often leading to nonsensical choices, while bigrams (n=2) and trigrams (n=3) incorporate one or two preceding words respectively. Limitations: Although n-grams consider immediate predecessors, they lack an understanding of broader sentence context and fail to capture deeper semantic relationships. Recurrent Neural Networks (RNNs) Advancements: RNNs marked a significant improvement by processing sequences of text and retaining information from previous inputs, which allows for context-aware predictions. Challenges: RNNs struggle with long text inputs due to the vanishing gradient problem, where the influence of earlier text diminishes in longer sequences. 22 Long Short-Term Memory Networks (LSTMs) Solution: LSTMs address RNN limitations with a gate architecture that helps retain or discard information selectively, improving the model's ability to manage long-term dependencies. Drawbacks: Despite their effectiveness, LSTMs are computationally expensive and slow to train, particularly on larger datasets, making them less scalable. Transformers Innovation: Introduced in the seminal paper "Attention Is All You Need" in 2017, transformers revolutionize language modeling with an attention mechanism that assesses the relevance of different parts of the input data, allowing the model to focus on the most important segments. Efficiency and Scalability: By calculating attention scores and prioritizing certain words over others, transformers efficiently handle sequences without the computational overhead of LSTMs, enhancing scalability and performance in processing large volumes of text. Implications for Large Language Models The development of transformers has enabled the creation of powerful LLMs like ChatGPT by improving how machines understand and generate human-like text. This technology underpins the sophisticated capabilities of current AI models, allowing them to generate coherent and contextually aware language outputs on a large scale. 5.6 Phases in building LLMs Model Design Architecture Selection: Developers choose an appropriate neural network architecture, such as transformers, CNNs, or RNNs, depending on the intended application. Depth and Parameters: Decisions about the model's depth and the number of parameters it will contain are crucial as they define the model's capabilities and limitations. 23 Dataset Engineering Data Collection: Involves gathering data from publicly available sources or proprietary datasets. The amount and quality of data can significantly influence the model's performance. Data Preparation: Cleansing and structuring of data are critical to ensure that the model trains on high-quality and relevant information. Ethical Considerations: Developers must address key issues such as data diversity and potential biases within the training data. Pretraining Initial Training: The model is trained on a large corpus of raw data, which helps in developing a basic understanding of language patterns and structures. Handling Bias: Special attention is needed to avoid training the model on data that could lead it to generate biased or offensive outputs. Preliminary Evaluation Performance Assessment: Early evaluation of the model to understand its strengths and areas that require improvement, particularly in how it handles context and subtlety in language. Post-training Supervised Finetuning: Enhances the model's performance using high- quality, targeted data. Incorporating Feedback: Refining the model further through human feedback and annotations to improve its accuracy and ethical behavior. Finetuning Optimization: Adjusting the model's weights to optimize for specific tasks, improving speed and efficiency while potentially sacrificing some general capabilities. Final Testing and Evaluation 24 Comprehensive Review: Rigorous testing to assess the model's response quality, accuracy, speed, and ethical behavior, ensuring it meets end-user expectations and standards. 5.7 Prompt engineering vs Fine-tuning vs RAG: Techniques for AI optimization Prompt Engineering Definition: Modifying how we interact with the model through specific instructions or examples without altering the model's underlying architecture or training data. Process: Involves crafting and refining verbal prompts to guide the model towards generating the desired outputs. Utility: Allows quick, iterative adjustments to how the model interprets and responds to queries, ideal for tuning AI behavior with minimal resources. Retrieval-Augmented Generation (RAG) Definition: Enhancing model responses by integrating an external database that the model can query to pull in additional context or information. Implementation: Attaches a searchable database to the model, effectively expanding its knowledge base without changing its internal structure. Advantages: Provides a richer context for model responses, particularly useful in scenarios requiring detailed or expansive knowledge. Fine-Tuning Definition: Involves retraining the model on new data or adjusting its neural network weights to improve or specialize its performance. Characteristics: This is more resource-intensive and requires additional data, often leading to substantial improvements in the model's accuracy and speed for specific tasks. Limitations: Unlike prompt engineering or RAG, fine-tuning is not iterative and can be computationally expensive, necessitating careful planning and execution. 5.8 The importance of foundation models 25 Specialized Machine Learning Models Narrow Focus: Traditionally, machine learning models were designed for specific tasks such as image recognition, speech transcription, or time series prediction. Each model excelled within its narrow domain but lacked versatility. Limited Applications: These models were confined to tasks they were explicitly trained for, such as identifying objects in images or analyzing sentiment in texts. Introduction of Large Language Models (LLMs) Broader Capabilities: LLMs, trained on extensive and diverse data sets, exhibit a remarkable ability to perform general-purpose tasks across different fields. Versatility: Initially text-based, LLMs quickly evolved to handle a variety of data formats, including code, Excel files, PDFs, and even multimedia content like images and videos. Adaptability: With techniques like fine-tuning and prompt engineering, LLMs can be customized to excel in various applications beyond their initial training. Transition to Foundation Models Definition: Foundation models are a progression from LLMs, designed to serve as a versatile base for building applications across multiple disciplines. Characteristics: These models are characterized by their enormous size and capability to perform myriad tasks, making them a powerful tool for a broad range of applications. Strategic Importance: According to Sam Altman of OpenAI, the future of developing such models will likely be dominated by Big Tech and governmental bodies due to the significant resources required. 5.9 Buy vs Make: foundation models vs private models Traditional Business Strategy 26 Buy vs. Make: Businesses typically decide between outsourcing non-core activities for efficiency and retaining strategic, value-adding activities internally to maintain competitive advantage. Challenges with AI and Foundation Models Resource Intensity: Building proprietary foundation models like LLMs is prohibitively expensive and resource-intensive, limiting this capability to a few well-funded organizations worldwide. Strategic Importance of AI: Despite AI being a core value-add for many businesses, the high cost and technical demands of developing custom models mean that most companies cannot afford to build their own from scratch. Adapting to AI Realities Model-as-a-Service: Companies often turn to providers like OpenAI, which offer access to advanced models such as GPT through model-as-a-service arrangements. This allows businesses to leverage cutting-edge AI without the overhead of developing it. Core vs. Non-Core: The contradiction arises when AI is a core strategic asset, but access to the technology depends heavily on external sources. This shifts the strategic focus from building AI internally to effectively integrating and customizing external AI solutions. Competitive Differentiation Skill in AI Adaptation: The real competitive advantage lies in a company's ability to tailor these external AI resources to specific business needs through techniques like prompt engineering, RAG (Retrieval-Augmented Generation), and fine-tuning. Demand for AI Expertise: As AI continues to be a central element of business strategy, there is a growing need for skilled AI engineers who can navigate these tools to enhance business applications and outputs.