Artificial Intelligence Coursebook PDF

ARTIFICIAL INTELLIGENCE DLBDSEAIS01 ARTIFICIAL INTELLIGENCE MASTHEAD Publisher: IU Internationale Hochschule GmbH IU International University of Applied Sciences Juri-Gagarin-Ring 152 D-99084 Erfurt Mailing address: Albert-Proeller-Straße 15-19 D-86675 Buchdorf [email protected] www.iu.de DLBDSEAIS01 Version No.: 001-2024-0423 Hamidreza Kobdani; Kristina Schaaff Cover image: Adobe Stock. © 2024 IU Internationale Hochschule GmbH This course book is protected by copyright. All rights reserved. This course book may not be reproduced and/or electronically edited, duplicated, or dis- tributed in any kind of form without written permission by the IU Internationale Hoch- schule GmbH (hereinafter referred to as IU). The authors/publishers have identified the authors and sources of all graphics to the best of their abilities. However, if any erroneous information has been provided, please notify us accordingly. 2 TABLE OF CONTENTS ARTIFICIAL INTELLIGENCE Introduction Signposts Throughout the Course Book............................................. 6 Basic Reading.................................................................... 7 Further Reading.................................................................. 8 Learning Objectives............................................................... 9 Unit 1 History of AI 11 1.1 Historical Developments...................................................... 12 1.2 AI Winter.................................................................... 16 1.3 Expert Systems.............................................................. 19 1.4 Notable Advances............................................................ 21 Unit 2 Modern AI Systems 29 2.1 Narrow versus General AI..................................................... 30 2.2 Application Areas............................................................ 31 Unit 3 Reinforcement Learning 37 3.1 What is Reinforcement Learning?.............................................. 38 3.2 Markov Decision Process and Value Function................................... 41 3.3 Temporal Difference and Q–Learning.......................................... 43 Unit 4 Natural Language Processing 45 4.1 Introduction to NLP and Application Areas..................................... 46 4.2 Basic NLP Techniques........................................................ 53 4.3 Vectorizing Data............................................................. 57 Unit 5 Computer Vision 65 5.1 Introduction to Computer Vision.............................................. 66 5.2 Image Representation and Geometry.......................................... 69 5.3 Feature Detection............................................................ 79 5.4 Semantic segmentation...................................................... 84 3 Appendix List of References................................................................ 88 List of Tables and Figures......................................................... 97 4 INTRODUCTION WELCOME SIGNPOSTS THROUGHOUT THE COURSE BOOK This course book contains the core content for this course. Additional learning materials can be found on the learning platform, but this course book should form the basis for your learning. The content of this course book is divided into units, which are divided further into sec- tions. Each section contains only one new key concept to allow you to quickly and effi- ciently add new learning material to your existing knowledge. At the end of each section of the digital course book, you will find self-check questions. These questions are designed to help you check whether you have understood the con- cepts in each section. For all modules with a final exam, you must complete the knowledge tests on the learning platform. You will pass the knowledge test for each unit when you answer at least 80% of the questions correctly. When you have passed the knowledge tests for all the units, the course is considered fin- ished and you will be able to register for the final assessment. Please ensure that you com- plete the evaluation prior to registering for the assessment. Good luck! 6 BASIC READING Negnevitsky, M. (2011). Artificial intelligence: a guide to intelligent systems (3. ed.). Addi- son-Wesley. Russell, S. J., & Norvig, P. (2022). Artificial intelligence: a modern approach (Fourth edition, global edition). Pearson. 7 FURTHER READING UNIT 1 Buchanan, B. G. (2005). A (Very) Brief History of Artificial Intelligence. AI Mag, 26, 53–60. UNIT 2 Brett, J. M., Friedman, R., & Behfar, K. (2009). How to manage your negotiating team. Har- vard Business Review, 87(9), 105—109. UNIT 3 Russell, S., & Norvig, P. (2021). Artificial intelligence: A modern approach, eBook, global edi- tion (4. Aufl.). Pearson Education. Chapter 23: Reinforcement Learning, pp. 840 – 870. UNIT 4 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Represen- tations in Vector Space. http://arxiv.org/pdf/1301.3781v3 UNIT 5 Russell, S., & Norvig, P. (2021). Artificial intelligence: A modern approach, eBook, global edi- tion (4. Aufl.). Pearson Education. Chapter 27: Computer Vision, pp. 988 – 1027. 8 LEARNING OBJECTIVES In this course, you will get an introduction to the field of artificial intelligence. The discipline of Artificial Intelligence originates from various fields of study such as cog- nitive science and neuroscience. The coursebook starts with an overview of important events and paradigms that have shaped the current understanding of artificial intelli- gence. In addition, you will learn about the typical tasks and application areas of artificial intelligence. On the completion of this coursebook, you will understand the concepts behind reinforce- ment learning, which are comparable to the human way of learning in the real world by exploration and exploitation. Moreover, you will learn about the fundamentals of natural language processing and com- puter vision. Both are important for artificial agents to be able to interact with their envi- ronment. 9 UNIT 1 HISTORY OF AI STUDY GOALS On completion of this unit, you will be able to … – describe how artificial intelligence has developed as a scientific discipline. – understand the different paradigms of artificial intelligence winter. – explain the importance of expert systems and how they have contributed to artificial intelligence. – talk about the advances of artificial intelligence. 1. HISTORY OF AI Introduction This unit will discuss the history of artificial intelligence (AI). We will start with the histori- cal developments of AI which date back to Ancient Greece. We will also discuss the recent history of AI. In the next step, we will learn about the AI winters. From a historical perspective, there have been different hype cycles in the development of AI because not all requirements for a performant system could be met at that time. We will also examine expert systems and their development. The last section closes with a discussion of the notable advances in artificial intelligence. This includes modern con- cepts and its use cases. Figure 1: Historical Development of Al Source: Created on behalf of IU (2022). The figure above illustrates the milestones in AI which will be discussed in the following sections. 1.1 Historical Developments Even though historical views of artificial intelligence often start in the 1950s when it was first applied in computer science, the first considerations about AI range back to 350 BCE. Therefore, we will first start with a brief overview of ancient artificial intelligence history before we explore the more recent history. 12 Aristotle, Greek Philosopher (384–322 BCE) Aristotle was the first to formalize human thinking in a way to be able to imitate it. To for- malize logical conclusions, he fully enumerated all possible categorical syllogisms (Giles, 2016). Figure 2: Aristotle, Greek Philosopher (384-322 BCE) Source: (Pixabay, n.d.-a) Syllogisms (Greek: syllogismós, “conclusion”, “inference”) use deductive reasoning to derive workable conclusions from two or more given propositions. Logical programming languages as they are used today are based on a contemporary equivalent of Aristotle’s way to formalize thinking in the way logical derivations are used. Modern algorithms in AI can be programmed such that they derive valid logical conclusions based on a given set of previously defined rules. Leonardo da Vinci, Italian Polymath (1452–1519) Leonardo da Vinci designed a hypothetical computing machine on paper even though it was never put into practice. The machine had 13 registers, demonstrating that based on a stored program in memory or mechanics, a black box can accept inputs and produce out- puts. These early considerations about computing machinery are very important because pro- gress in computing is a necessary precondition for any sort of development in AI. 13 René Descartes, French Philosopher (1596–1650) The French philosopher Descartes believed that rationality and reason can be defined using principles from mechanics and mathematics. The ability to formulate objectives using equations is an important foundation for AI, as its objectives are defined mathemati- cally. According to Descartes, rationalism and materialism are two sides of the same coin (Bracken, 1984). This links to the methods used in AI where rational decisions are derived in a mathematical way. Thomas Hobbes, British Philosopher (1588–1679) Thomas Hobbes specified Descartes’ theories about rationality and reason. In his work, he identified similarities between human reasoning and computations of machines. Hobbes described that, in rational decision-making, humans employ operations similar to calcu- lus, such that they can be formalized in a way that is analogous to mathematics (Flasiński, 2016). David Hume, Scottish Philosopher (1711–1776) Hume made fundamental contributions to questions of logical induction and the concept of causal reasoning (Wright, 2009). For example, he combined learning principles with repeated exposure, which has had – among others – a considerable influence on the Learning curve learning curve (Russell & Norvig, 2022). The learning curve is a graphical representation of the ratio between a Nowadays, many machine learning algorithms are based on the principle of deriving pat- learning outcome and the terns or relations in data through repeated exposure. time required to solve a new tasks. Recent History of Artificial Intelligence The recent history of AI started around 1956 when the seminal Dartmouth conference took place. The term artificial intelligence was first coined at this conference and a definition of the concept was proposed (Nilsson, 2009). In the following, we will discuss the the key per- sonalities, organizations , and concepts in the development of AI. Key personalities The recent history of AI normally starts with the pioneering Dartmouth conference in 1956 where the term “artificial intelligence” was first coined, and a definition of the term was suggested. During the decade of AI's inception, important personalities contributed to the discipline. Alan Turing was an English computer scientist and mathematician who formalized and mechanized rational thought processes. In 1950 he conceptualized the well-known Turing Test. This test examines if an AI communicates with a human observer without the human observer being able to distinguish whether they are conversing with a machine or another human. If the human cannot identify an AI as such, it is considered a real AI (Turing, 1950). 14 The American scientist John McCarthy studied automata. It was he who first coined the term “artificial intelligence” during preparations for the Dartmouth conference (McCarthy et al., 1955). In cooperation with the Massachusetts Institute of Technology (MIT) and International Business Machines (IBM), he established AI as an independent field of study. He was the inventor of the programming language Lisp in 1958 (McCarthy, 1960). For more than 30 years LISP was used in a variety of applications of AI, such as fraud detection and robotics. In the 1960s, he founded the Stanford Artificial Intelligence Laboratory which has had a significant influence on research on implementing human capabilities, like reason- ing, listening, and seeing, in machines (Feigenbaum, 2012). American researcher Marvin Minsky, a founder of the MIT Artificial Intelligence Laboratory in 1959, was another important participant in the Dartmouth conference. Minsky com- bined insights from AI and cognitive science (Horgan, 1993). With a background in linguistics and philosophy, Noam Chomsky is another scientist who contributed to the development of AI. His works about formal language theory and the development of the Chomsky hierarchy still play an important role in areas such as natural language processing (NLP). Besides that, he is well known for his critical views on topics such as social media. Key institutions The most influential institutions involved in the development of AI are Dartmouth College and MIT. Since the Dartmouth conference, there have been several important conferences at Dartmouth College discussing the latest developments in AI. Many of the early influen- tial AI researchers have taught at MIT, making it a key institution for AI research. But also companies such as IBM and Intel, and government research institutions, such as the Defense Advanced Research Projects (DARPA), have contributed much to AI by funding research on the subject (Crevier, 1993). Key disciplines contributing to the development of AI Many research areas have been contributing to the development of artificial intelligence. The most important areas are decision theory, game theory, neuroscience, and natural language processing: In decision theory mathematical probability and economic utility are combined. This provides the formal criteria for decision-making in AI regarding economic benefit and dealing with uncertainty. Game theory is an important foundation for rational agents to learn strategies to solve games. It is based on the research of the American–Hungarian computer scientist John von Neuman (1903–1957), and the American–German mathematician and game theo- rist Oskar Morgenstern (1902–1977); (Leonard, 2010). The insights from neuroscience about how the brain works are increasingly used in arti- ficial intelligence models, especially as the importance of artificial neural networks (ANN) is increasing. Nowadays, there are many models in AI trying to emulate the way the brain stores information and solves problems. 15 Natural language processing (NLP) combines linguistics and computer science. The goal of NLP is to process not only written language (text) but also spoken language (speech). High-level programming languages are important to program AI. They are closer to human language than low-level programming languages such as machine code or assembly lan- guage and allow programmers to work independently from the hardware’s instruction sets. Some of the languages that have been developed specifically for AI are Lisp, Prolog, and Python: Lisp has been developed by John McCarthy and is one of the oldest programming lan- guages. The name comes from “list processing” as Lisp is able to process character strings in a unique way (McCarthy, 1960). Even though it dates back to the 1960s it has not only been used for early AI programming but is still relevant today. Another early AI programming language is Prolog which was specially designed to prove theorems and solve logical formulas. Nowadays, the general-purpose high-level programming language Python is the most important programming language. As Python is open source, there exist extensive libra- ries which help programmers to create applications in a very efficient way. There are three important factors that have contributed to the recent progress in artificial intelligence: Increasing availability of massive amounts of data, which are required to develop and train AI algorithms. Large improvements in data processing capacity of computers. New insights from mathematics, cognitive science, philosophy, and machine learning. These factors support the development of approaches that were previously impossible, be it because of a lack of processing capability or a lack of training data. 1.2 AI Winter The term “AI winter” first appeared in the 1980s. It was coined by AI researchers to describe periods when interest, research activities, and funding of AI projects significantly decreased (Crevier, 1993). The term might sound a bit dramatic. However, it reflects the culture of AI, which is known for its excitement and exuberance. Historically, the term has its origin in the expression “nuclear winter”, which is an after- effect of a hypothetical nuclear world war. It describes the state where the atmosphere is overcome by ashes and the sunshine cannot reach the Earth’s atmosphere, meaning that temperatures would drop excessively and nothing would be able to grow. Therefore, transferring this term to AI, it marks periods where interest and funding of AI technologies were significantly reduced, causing a reduction in research activities. Downturns like this are usually based on exaggerated expectations towards the capabilities of new technolo- gies that cannot be realistically met. 16 There have been two AI winters. The first lasted approximately from 1974 to 1980 and the second from 1987 to 1993 (Crevier, 1993). The First AI Winter (1974–1980) During the cold war between the former Soviet Union and the United States (US), auto- matic language translation was one of the major drivers to fund AI research activities (Hutchins, 1997). As there were not enough translators to meet the demand, expectations were high to automate this task. However, the promised outcomes in machine translation could not be met. Early attempts to automatically translate language failed spectacularly. One of the big challenges at that time was handling word ambiguities. For instance, the English sentence “out of sight, out of mind” was translated into Russian as the equivalent of “invisible idiot” (Hutchins, 1995). When the Automatic Language Processing Advisory Committee evaluated the results of the research that had been generously funded by the US, they concluded that machine translations are not as accurate, nor faster nor cheaper than employing humans (Auto- matic Language Proesccing Advisory Committee, 1966). Additionally, perceptrons – which were at that time a popular model of neural-inspired AI – had severe shortcomings as even simple logical functions, such as exclusive or (XOR), could not be represented in those early systems. The Second AI Winter (1987–1993) The second AI winter started around 1987 when the AI community became more pessimis- tic about developments. One major reason for this was the collapse of the Lisp machine Lisp machine business which led to the perception that the industry might end (Newquist, 1994). More- A Lisp machine is a type of computer that sup- over, it turned out that it was not possible to develop early successful examples of expert ports the Lisp language. systems beyond a certain point. Those expert systems had been the main driver of the returned interest in AI systems after the first AI winter. The reason for the limitations was that the growth of fact databases was no longer manageable, and results were unreliable towards unknown inputs i.e., inputs on which the machines had not been trained. However, there are also arguments that there are no such thing as AI winters, and that they are myths spread by a few prominent researchers and organizations who had lost money (Kurzweil, 2014). While the interest in Lisp machines and expert systems decreased, AI was still deeply embedded in many other types of processing operations such as credit card transactions. Causes of the AI Winters There are several conditions that can cause AI winters. The three most important require- ments for the success of artificial intelligence are algorithms and experience with them, computing capacity, and the availability of data. 17 The past AI winters occurred because not all requirements were met. During the first AI winter, there were already powerful algorithms. However, for successful results, it is necessary to process a huge amount of data. This requires a lot of memory capacity as well as high processing speed. At the time, there were not enough data availa- ble to properly train those algorithms. Therefore, the expectations of interested parties and investors could not be met. As the funded research was unable to produce the prom- ised results, the funding was stopped. Until the 1980s the computing capacity had increased enough to train the available algo- rithms on small data sets. However, as approaches from machine learning and deep learn- ing became integral parts of AI in the late 1980s, there was a greater need for large data sets to train AI systems, which became an issue. The lack of labeled training data – even though computing capacity would have been available – created the perception that sev- eral of the AI projects had failed. As the AI winters show, it is impossible to make progress towards developing algorithms for AI unless there is enough computing capacity (i.e., data storage and processing speed) and training data. The Next AI Winter Nowadays, all three aspects mentioned above are fully met. There is enough computa- tional power to train the available algorithms on a large number of existing data sets. The figure below summarizes the preconditions for AI to be successful. Figure 3: Important Aspects of Al Source: Created on behalf of IU (2022). However, the question of whether there might be another AI winter in the future can hardly be answered. If a hyped concept gets a lot of funding but does not perform, it might be defunded which could cause another AI winter. Nevertheless, nowadays AI technolo- 18 gies are embedded in many other fields of research. If low-performing projects are defun- ded, there is always room for new developments. Therefore, everybody is free to decide whether AI winters are simply a myth or if the concept really matters. 1.3 Expert Systems One of the key concepts when looking at the history of artificial intelligence are expert sys- tems. Expert systems belong to the group of knowledge-based systems. As the name sug- gests, the goal of expert systems is to emulate the decision and solution-finding process using the domain-specific knowledge of an expert. The word “expert” refers to a human with specialized experience and knowledge in a given field, such as medicine or mechan- ics. Since problems in any given domain may be similar to each other, but never quite alike, solving problems in that domain cannot be accomplished by memorization alone. Rather, problem-solving is supplemented by a method that involves matching or applying experiential knowledge to new problems and application scenarios. Components of an Expert System Expert systems are designed to help a non-expert user make decisions based on the knowledge of an expert. The figure below illustrates the typical components of an expert system: Figure 4: Components of an Expert System Source: Created on behalf of IU (2022). Expert systems are composed of a body of formalized expert knowledge from a specific application domain, which is stored in the knowledge base. The inference engine uses the knowledge base to draw conclusions from the rules and facts in the knowledge. It imple- ments rules of logical reasoning to derive new facts, rules, and conclusions not explicitly contained in the given corpus of the knowledge base. A user interface enables the non- expert user to interact with the expert system to solve a given problem from the applica- tion domain. 19 Types of Expert Systems With respect to the representation of knowledge, three approaches to expert systems can be distinguished: Case-based systems store examples of concrete problems together with a successful solution. When presented with a novel, previously unseen case, the system tries to retrieve a solution to a similar case and apply this solution to the case at hand. The key challenge is defining a suitable similarity measure to compare problem settings. Rule-based systems represent the knowledge base in the form of facts and if-A-then-B- type rules that describe relations between facts. If the problem class to be solved can be categorized as a decision problem, the knowl- edge can be represented in a decision tree. The latter are typically generated by analyz- ing a set of examples. Development of Expert Systems Historically, expert systems are an outgrowth of earlier attempts at implementing a gen- eral problem solver. This approach is primarily associated with the researchers Herbert A. Simon and Allen Newell, who, in the late 1950s, used a combination of insights from cogni- tive science and mathematical models of formal reasoning to build a system intended to solve arbitrary problems by successive reduction to simpler problems (Kuipers & Prasad, 2021). While this attempt was ultimately considered a failure when compared to its lofty goals, it has nevertheless proven highly influential in the development of cognitive sci- ence. One of the initial insights gained from the attempt at general problem solving was that the construction of a domain specific problem solver should—at least in principle—be easier to achieve. This led the way to think about systems that combined domain-specific knowl- edge with domain-dependent apposite reasoning patterns. Edward Feigenbaum, who worked at Stanford University, the leading academic institution for the subject at the time, defined the term expert system and built the first practical examples while leading the Heuristic Programming Project (Kuipers & Prasad, 2021). The first notable application was Dendral, a system for identifying organic molecules. In the next step, expert systems were established to help with medical diagnoses of infec- tious diseases based on given data and rules (Woods, 1973). The expert system that evolved out of this was called MYCIN, which had a knowledge base of around 600 rules. However, it took until the 1980s for expert systems to reach the height of research interest, leading to the development of commercial applications. The main achievement of expert systems was their role in pioneering the idea of a formal, yet accessible representation of knowledge. This representation was explicit in the sense that it was formulated as a set of facts and rules that were suitable for creation, inspec- tion, and review by a domain expert. This approach thus clearly separates domain-specific business logic from the general logic needed to run the program – the latter encapsulated in the inference engine. In stark contrast, more conventional programming approaches implicitly represent both internal control and business logic in the form of a program code 20 that is hard to read and understand by people who are not IT experts. At least in principle, the approach championed by expert systems enabled even non-programmers to develop, improve, and maintain a software solution. Moreover, it introduced the idea of rapid pro- totyping since the fixed inference engine enabled the creation of programs for entirely dif- ferent purposes simply by changing the set of underlying rules in the knowledge base. However, a major downside of the classical expert system paradigm, which also finally led to a sharp decline in its popularity, was also related to the knowledge base. As expert sys- tems were engineered for a growing number of applications, many interesting use cases required larger and larger knowledge bases to satisfactorily represent the domain in ques- tion. This insight proved problematic in two different aspects: 1. Firstly, the computational complexity of inference grows faster than it does linearly in the number of facts and rules. This means that for many practical problems the sys- tem’s answering times were prohibitively high. 2. Secondly, as a knowledge base grows, proving its consistency by ensuring that no constituent parts contradict each other, becomes exceedingly challenging. Additionally, rule-based systems in general lack the ability to learn from experience. Exist- ing rules cannot be modified by the expert system itself. Updates of the knowledge base can only be done by the expert. 1.4 Notable Advances After illustrating the downturns of AI winters, it is time to shift the focus to the prosperous times when artificial intelligence has made huge advances. After an overview of the research topics that have been in focus in the respective eras, we will examine the most important developments in adjacent fields of study and how they relate to the progress in artificial intelligence. Finally, we will examine the future prospects of AI. Nascent Artificial Intelligence (1956–1974) In the early years, AI research was dominated by the “symbolic” AI. In this approach, rules from formal logic are used to formalize thought processes as manipulation of symbolic representations of information. Accordingly, AI systems developed during this era deal with the implementation of logical calculus. In most cases, this is done by implementing a search strategy, where solutions are derived in a step-by-step procedure. The steps in this procedure are either inferred logically from a preceding step or systematically derived using backtracking of possible alternatives to avoid dead ends. The early years were also the period where first attempts for natural language processing were developed. The first approaches for language processing were focused on highly limited environments and settings. Therefore, it was possible to achieve initial successes. The simplification of working environments – a “microworld” approach – also yielded good results in the fields of computer vision and robot control. 21 In parallel, the first theoretical models of neurons were developed. The research focus was on the interaction between those cells (i.e., computational units) to implement basic logi- cal functions in networks. Knowledge Representation (1980–1987) The focus of the first wave of AI research was primarily on logical inference. In contrast, the main topics of the second wave were driven by the attempt to solve the problem of knowledge representation. The reason for this focus shift was caused by the insight that in day-to-day situations intelligent behavior is not only based on logical inference but much more on general knowledge about the way the world works. This knowledge-based way to view intelligence was the origin of early expert systems. The main characteristic of these technologies was that domain-relevant knowledge was systematically stored in databa- ses. Using these databases, a set of methods was developed to access that knowledge in an efficient, effective way. The emerging interest in AI after the first AI winter was also accompanied by an upturn in governmental funding at the beginning of the 1980s with projects such as the Alvey project in the UK and the Fifth Generation Computer project of the Japanese Government (Russell & Norvig, 2022). Additionally, in this period the early throwbacks of neurally-inspired AI approaches could be addressed by new network models and the use of backpropagation as a training method in layered networks of computational units. Learning from Data (Since 1993) During the 1990s there were some major advances of AI in games when the first computer system “Deep Blue” was able to beat Garry Kasparov, the world champion in chess at that time. Aside from this notable but narrow success, AI methods have become widely used in the development of real-world applications. Successful approaches in the subfields of AI have gradually found their way into everyday life – often without being explicitly labeled as AI. In addition, since the early 1990s, there has been a growing number of ideas from decision theory, mathematics, statistics, and operations research that those contributed signifi- cantly to AI becoming a rigorous and mature scientific discipline. Especially the paradigm of intelligent agents has become increasingly popular. In this context, the concept of intel- ligent agents from economic theory combines with the notions of objects and modularity of computer science and forms the idea of entities that can act intelligently. This perspec- tive allows it to shift perspective from AI being an imitation of human intelligence to the study of intelligent agents and a broader study of intelligence in general. The advances in AI since the 1990s have been supported by a significant increase in data storage and computational capacities. Along with this, during the rise of the internet, there has been an incomparable increase in variety, velocity, and volume of generated data, which also supported the AI boom. 22 In 2012 the latest upturn in the interest of AI research started when deep learning was developed based on advances in connectionists machine learning models. The increase in data processing and information storage capabilities combined with larger data corpora brought theoretical advances in machine learning models into practice. With deep learn- ing, new performance levels in many machine learning benchmark problems could be achieved. This led to a revival of interest in well-established learning models, like rein- forcement learning and created space for new ideas, like adversarial learning. Adjacent Fields of Study There are many fields of study that continuously contribute to AI research. The most influ- ential fields will be described in the following. Linguistics Linguistics can be broadly described as the science of natural language. It deals with exploring the structural (grammatical) and phonetic properties of interpersonal communi- cation. To understand language, it is necessary to understand the context and the subject matter in which it is used. In his book Syntactic Structures, Noam Chomsky (1957) made an important contribution to linguistics and, therefore, to natural language processing. Since our thoughts are so closely linked to language as a form of representation, one could take it a step further and link creativity and thought to linguistic AI. For example, how is it pos- sible that a child says something it has never said before? In AI, we understand natural language as a medium of communication in a specific context. Therefore, language is much more than just a representation of words. Cognition In the context of AI, cognition refers to different capabilities such as perception and cogni- tion, reasoning, intelligence, learning and understanding, and thinking and comprehen- sion. This is also reflected in the word “recognition”. A large part of our current under- standing of cognition is a combination of psychology and computer science. In psychology, theories and hypotheses are formed from observations with humans and ani- mals. In computer science, behavior is modeled based on what has been observed in psy- chology. When modeling the brain by a computer, we have the same principle of stimulus and response as in the human brain. When the computer receives a stimulus, an internal representation of that stimulus is made. The response to that stimulus can lead to the original model being modified. Once we have a well-working computer model for a spe- cific situation, the next step will be to find out how decisions are made. As decisions based on AI are involved in more and more areas of our lives, it is important to have high trans- parency about the reasoning process to an external observer. Therefore, explainability (the ability to explain, how a decision has been made) is becoming increasingly important. However, approaches based on deep learning still lack explainability. 23 Games When relating games to AI, this includes much more than gambling or computer games. Rather, games refer to learning, probability, and uncertainty. In the early twentieth cen- tury, game theory was established as a mathematical field of study by Oskar Morgenstern and John von Neuman (Leonard, 2010). In game theory, a comprehensive taxonomy of games was developed and, in connection with this, some gaming strategies that have been proven to be optimal strategies. Another discipline related to game theory is decision theory. While game theory is more about how the moves of one player affect the options of another player, decision theory deals with usefulness and uncertainty, i.e., utility and probability. Both are not necessarily about winning but more about learning, experimenting with possible options, and finding out what works based on observations. Games, like chess, checkers, and poker, are usually played for the challenge of winning or for entertainment. Nowadays, machines can play better than human players. Until 2016, people believed that the game of Go might be an unsolvable challenge for computers because of its combinatorial complexity. The objective of the game is to surround the most territory on a board with 19 horizontal and vertical lines. Even though the ruleset is quite simple, the complexity comes from the large size of the game board and the result- ing number of possible moves. This complexity makes it impossible to apply methods that have been used for games like chess and checkers. However, in 2015 DeepMind developed the system AlphaGo based on deep networks and reinforcement learning. This system was the first to be able to beat Lee Sedol, one of the world’s best Go players (Silver et al., 2016). Not long after AlphaGo, DeepMind developed the system AlphaZero (Silver et al., 2018). In contrast to AlphaGo, which learned from Go knowledge from past records, AlphaZero only learns based on intensive self-play following the set of rules. This system turned out to be even stronger than AlphaGo. It is also remarkable that AlphaZero even found some effec- tive and efficient strategies, which had, so far, been missed by Go experts. The Internet of Things It has only been a few years since the term “Internet of things” (IoT) first came up. IoT con- nects physical and virtual devices using technologies from information and communica- tion technology. In our everyday lives, we are surrounded by a multitude of physical devi- ces that are always connected, such as phones, smart home devices, cars, and wearables. The communication between those devices produces a huge amount of data which links IoT to AI. While IoT itself is only about connecting devices and collecting data, AI can help add intelligent behavior to the interaction between those machines. Having intelligent devices integrated into our everyday lives not only create opportunities but also many new challenges. For instance, data about medication based on physical measurements of a wearable device could be used positively, to remind a person about medication intake, but also to decide about a possible increase in their health insurance rate. Therefore, topics like ethics of data use and privacy violations, become increasingly important facing the new fields of use of AI. 24 Quantum computing Quantum computing is based on the physical theory of quantum mechanics. Quantum mechanics deal with the behavior of sub-atomic particles which follow different rules than described by theories from classical physics. For instance, in quantum mechanics, it is possible that an electron can be in two different states at the same time. Quantum mechanics assumes that physical systems can be characterized using a wave function describing the probabilities of the system being in a particular state. The goal is to exploit these quantum properties to build supercomputers where new algorithmic approaches can be implemented, allowing them to outperform classical machines (Giles, 2018). The kind of information processing from quantum computing is well suited for the probabilis- tic approach which is inherent in many AI technologies. Therefore, quantum computers offer the possibility of accelerating applications with AI and thus achieve a real advantage in processing speed. However, due to the early stage of development of these systems, using quantum computing has hardly been researched. The Future of AI It is always highly speculative when trying to assess the impact of a research area or new technology on the future as the future prospects will always be biased by previous experi- ences. Therefore, we do not attempt to predict the long-term future of AI. Nevertheless, we want to examine the directions of developments in AI and the supporting technologies. The Gartner hype curve is frequently used to evaluate the potential of new technologies (Gartner, 2021). The hype curve is presented in a diagram where the y-axis represents the expectations towards a new technology and time is plotted on the x-axis. The time axis is characterized by five phases: 1. In the discovery phase a technological trigger or breakthrough generates significant interest and triggers the innovation. 2. The peak phase of exaggerated expectations is usually accompanied by much enthu- siasm. Even though there may be successful applications most of them struggle with early problems. 3. The period of disillusionment shows that not all expectations can be met. 4. In the period of enlightenment, the value of innovation is recognized. There is an understanding of the practical understanding and advantages, but also of the limita- tions of the new technology. 5. In the last period, a plateau of productivity is reached, and the new technology becomes the norm. The final level of this plateau depends on whether the technology is adopted in a niche or a mass market. 25 Figure 5: The Gartner Hype Cycle Source: Created on behalf of IU (2022) based on (Gartner, 2018). The hype cycle has some similarities with the inverted U-shape of a normal distribution except that the right end of the curve leads into an increasing slope that eventually flat- tens out. In 2021, the hype cycle for artificial intelligence showed the following trends (Gartner, 2021): In the innovation trigger phase, subjects like composite AI (a combination of different approaches from AI) and general AI (the ability of a machine to perform humanlike intel- lectual tasks) appear. Moreover, topics like Human-Centered AI and Responsible AI show that human integration is becoming increasingly important for the future of AI. Deep neural networks, which have been the driver for new levels of performance in many machine learning applications over the past decades, are still at the peak phase of inflated expectations or hype. Moreover, topics like knowledge graphs and smart robots appear in that phase. In the disillusionment phase, we find topics like autonomous vehicles, which have expe- rienced defunding as the high expectations in this area could not be met. So far, none of the topics of AI have yet reached the plateau of productivity. This reflects the general acceptance of this area and the productive use of the related technologies. 26 SUMMARY Research about artificial intelligence has been of interest for a long time. The first theoretical thoughts about artificial intelligence date back to Greek philosophers like Aristotle. Those early considerations were con- tinued by philosophers like Hobbes and Descartes. Since the 1950s, it has also become an important component of computer science and made important contributions in areas such as knowledge representa- tion in expert systems, machine learning, and modeling neural net- works. In the past decades, there have been several ups and downs in AI research. They were caused by a cycle between innovations accompa- nied by high expectations and disappointment when those expectations could not be met, often because of technical limitations. Over time, AI has been shaped by different paradigms from multiple dis- ciplines. The most popular paradigm nowadays is deep learning. New fields of applications like IoT or quantum computing offer a vast amount of opportunities of how AI can be used. However, it remains to see how intelligent behavior will be implemented in machines in the future. 27 UNIT 2 MODERN AI SYSTEMS STUDY GOALS On completion of this unit, you will be able to… – explain the difference between narrow and general artificial intelligence systems. – name the most important application areas for artificial intelligence. – understand the importance of artificial intelligence for corporate activities. 2. MODERN AI SYSTEMS Introduction Artificial intelligence has become an integral part of our everyday life. There are several examples where we do not even notice the presence of AI, be it in Google maps or smart replies in Gmail. There are two categories of AI that will be explained in the following unit: narrow and gen- eral AI. Organizations like Gartner, McKinsey, or PricewaterhouseCoopers (PwC) predict a mind- blowing future of AI. Reports like the PWC report (2018) estimate that AI might make a con- tribution of 15.7 trillion USD to the global economy. Therefore, after discussing the two categories of AI, we will focus on the most important application areas of AI. Additionally, we will explore how modern AI systems can be evaluated. 2.1 Narrow versus General AI Recent research topics in artificial intelligence distinguish between two types: artificial narrow intelligence (ANI), also referred to as weak artificial intelligence, and artificial gen- eral intelligence (AGI) or strong artificial intelligence. In ANI, systems are built to perform specialized functions in controlled environments whereas AGI comprises open-ended, flexible, and domain independent forms of intelligence like that which is expressed by human beings. Even though many people believe that we already have some sort of strong artificial intel- ligence, current approaches are still implemented in a domain-specific way and lack the necessary flexibility to be considered AGI. However, there is a large consensus that it is only a matter of time until artificial intelligence will be able to outperform human intelli- gence. Results from a survey of 352 AI researchers indicate that there is a 50 percent chance that algorithms might reach that state by 2060 (Grace et al., 2017). In the following, we will have a closer look at the underlying concepts of weak and strong artificial intelligence. Artificial Narrow Intelligence The term ANI or weak AI reflects the current and future artificial intelligence. Systems based on ANI can already solve complex problems or tasks faster than humans. However, the capabilities of those systems are limited to the use cases for which they have been designed. In contrast to the human brain, narrow systems cannot generalize from a spe- cific task to a task from another domain. 30 For example, a particular device or system which can play chess, will probably not be able to play another strategy game like Go or Shogi without being explicitly programmed to learn that game. Voice assistants as Siri or Alexa can be seen as some sort of hybrid intelli- gences, which combine several weak AIs. Those tools are able to translate natural lan- guage and to analyze those words with their databases in order to complete different tasks. However, they are only able to solve a limited number of problems for which their algorithms are suitable and for which they have been trained for. For instance, currently, they would not be able to analyze pictures or optimize traffic. In short, ANI includes the display of intelligence with regard to complex problem solving and the display of intelligence relative to one single task. Artificial General Intelligence The reference point for which AGI is measured and judged against are the versatile cogni- tive abilities of humans. The goal of AGI is not only to imitate the interpretation of sensory input, but also to emulate the whole spectrum of human cognitive abilities. This includes all abilities currently represented by ANI, as well as the ability of domain-independent generalization. This means knowledge of one task can be applied to another in a different domain. This might also include motivation and volition. Some philosophical sources go one step further and require AGI to have some sort of consciousness or self-awareness (Searle, 1980). Developing an AGI would require the following system capabilities: cognitive ability to function and learn in multiple domains intelligence on a human level across all domains independent ability to solve problems problem-solving abilities at an average human level over multiple domains abstract thinking abilities without drawing directly on past experience the cognitive skill to form new ideas about hypothetical concepts perception of the whole environment in which the system acts self-motivation and self-awareness Considering the current state of AGI, it is difficult to imagine developing a system that meets these requirements. In addition, both types of AI also entail the concept of superin- telligence. This concept goes even further than current conceptions, and describes the idea that an intelligent system can reach a level of cognition that goes beyond human capabilities. This self-improvement might be achieved by a recursive cycle. However, this level of AI is above AGI and still very abstract. 2.2 Application Areas Due to the latest advances in computational and data storage capabilities, in the past years, applications for AI have been continuously increasing. The options where AI can be applied are almost endless. 31 The growing interest is also corroborated by an increase in research activities. According to the annual AI Index (Zhang et al., 2021), from 2019 to 2020, the number of journal publi- cations on AI grew by 34.5 percent. Since 2010 AI papers increased more than twenty-fold. The most popular research topics have been natural language processing and computer vision which are important for various areas of application. In a global survey about the state of AI, McKinsey & Company (2021) identified the follow- AI adoption ing industries as the main fields of AI adoption: High Tech/Telecom, Automotive and The use of AI capabilities Assembly, Financial Services, Business, Legal and Professional Services, Healthcare/ such as machine learning in at least one business Pharma and Consumer Goods/Retail. In the following section, we will have a closer look at function is called AI adop- these fields. tion. Figure 6: Application Areas of Al Source: Created on behalf of IU (2022). The figure above summarizes the most important domains in which AI is used. High Tech and Telecommunication Due to the constant increase of global network traffic and network equipment, there has been a rapid growth of AI in telecommunication. In this area, AI can not only be used to optimize and automate networks but also to ensure that the networks, are healthy and secure. Using AI in predictive maintenance, it can help fix network issues even before they occur. Moreover, network anomalies can be accurately predicted when using self-optimizing net- works. Big data makes it possible to easily detect network anomalies and therefore prevent frau- dulent behavior within them. 32 Automotive and Assembly In the past years, autonomous driving has become a huge research topic. It will drastically transform the automotive industry in the next decades from a steel-driven to a software- driven industry. Nowadays, cars are already equipped with many sensors to ensure the driver’s safety, for staying in–lane or emergency braking assistance. Intelligent sensors can also detect technical problems based on the car or risks from the driver – such as fatigue or being under the influence of alcohol – and initiate appropriate actions. Like in high tech and telecommunication, in assembly processes, AI can be used for pre- dictive maintenance and to fix inefficiencies in the assembly line. Moreover, using com- puter vision, it is already possible to detect defects faster and more accurately than a human. Financial Services Financial services offer numerous applications for artificial intelligence. Intelligent algo- rithms enable financial institutions to detect and prevent fraudulent transactions and money laundering much earlier than was previously possible. Computer vision algorithms can be used to precisely identify counterfeit signatures by comparing them to scans of the originals stored in a database. Additionally, many banks and brokers already use Robo-advising; Based on a user's invest- ment profile, accurate recommendations about future investments can be made (D’Acunto et al., 2019). Portfolios can also be optimized based on AI applications. Business, Legal, and Professional Services Especially in industries where paperwork and repetitive tasks play an important role, AI can help to make processes faster and more efficient. Significant elements of routine workflows are currently being automated using robotic process automation (RPA), which can drastically reduce administrative costs. Systems in Robotic process RPA do not necessarily have to be enabled with intelligent AI capabilities. However, meth- automation The automated execution ods, such as natural language processing and computer vision, can help enhance those of repetitive, manual, processes with more intelligent business logic. time consuming or error prone tasks by software bots is described as The ongoing developments in big data technologies can help companies extract more robotic process automa- information from their data. Predictive analytics can be used to identify current and future tion. trends about the markets a company is in and react accordingly. Another important use case is the reduction of risk and fraud, especially in legal, account- ing, and consulting practices. Intelligent agents can help to identify potentially fraudulent patterns, which will allow for earlier responses. 33 Healthcare and Pharma In the last few years, healthcare and pharma have been the fastest growing area adopting AI. AI-based systems can help detect diseases based on the symptoms. For instance, recent studies have been able to use AI–based systems to detect COVID–19 based on cough recordings (Laguarta et al., 2020). Not only in diagnostics AI can offer many advantages. Intelligent agents can be used to monitor patients according to their needs. Moreover, regarding medication, AI can help find an optimal combination of prescriptions to avoid side effects. Wearable devices – such as heart rate or body temperature trackers – can be used to con- stantly observe the vital parameters of a person. Based on this data, an agent can give advice about the wearer’s condition. Moreover, in case critical anomalies are detected, it is possible to initiate an emergency call. Consumer Goods and Retail The consumer goods and retail industry focuses on predicting customer behavior. Web- sites track how a customer’s profile changes based on their number of visits. This allows for personal purchase predictions for each customer. This data can not only be used to make personalized shopping recommendations but also to optimize the whole supply chain and direct about future research. Market segmentation is, nowadays, no longer based on geographical regions such as prov- ince or country. Modern technologies allow it to segment customers’ behavior on a street- by-street basis. This information can be used to fine-tune operations and decide whether store locations should be kept or closed. Additionally, the recent improvement in natural language processing technologies is increasingly used for chatbots and conversational interfaces. When it comes to customer retention and customer service, a well-developed artificial agent is key to ensuring cus- tomer satisfaction. Evaluation of AI Systems As the above-mentioned examples illustrate, the application areas for modern AI sytems are almost unlimited. More and more companies manage to support their business mod- els with AI or even create completely new ones. Therefore, it is important to carefully eval- uate new systems. When evaluating AI systems, it is crucial, that all data sets are inde- pendent from each other and follow a similar probability distribution. To develop proper models for AI applications, the available data is split into three data sets: 34 1. Training data set: As the name indicates, this data set is used to fit the parameters of an algorithm during the training process. 2. Development set: This data set is often also referred to as a validation set. It is used to evaluate the performance of the model developed using the training set and for fur- ther optimization. It is important that the development set contains data that have not been included in the training data. 3. Test set: Once the model is finalised using the training and the development set, the test set can be used for a final evaluation of the model. Like for the development set, it is important that the data in the test set have not been used before. The test set is only used once to validate the model and to ensure that it is not overfitted. When developing and tuning algorithms, metrics should be in place to evaluate how well it performs independently and compared to other systems. In a binary classification task, accuracy, precision, recall, and F-score are metrics that are commonly used for this pur- pose. For example, Financial services uses a binary classification task in fraud detection. A finan- cial transaction can either be categorized as fraud or not. Based on this, we will have four categories of classification results: 1. True positives (TP): identifies samples that were correctly classified as positive, i.e. being fraudulent transactions 2. False positives (FP): all results that wrongly indicate a sample to be positive even though it’s negative, i.e., a non-fraudulent transaction being categorized as fraud 3. True negatives: marks classification results that were correctly classified as negative, i.e., non-fraudulent transactions that were also labeled as such 4. False negatives: classification results that were wrongly classified as negative even though they should have been positive, i.e., fraudulent transactions that were classi- fied as non-fraud The classification results can be displayed in a confusion matrix, also known as error matrix. This is shown in the table below. Figure 7: The Confusion Matrix Source: Created on behalf of IU (2022). Using these categories, the above-mentioned metrics can be computed. Accuracy is an indicator for how many samples were classified correctly. It can be compu- ted as follows: 35 TP + TN Accuracy = TP + TN + FP + FN It measures which percentage of the total prediction was correct. Precision denotes the number of positive samples that were classified correctly in relation to all samples predic- ted in this class: TP P recision = TP + FP Recall indicates how many of the positively detected samples were identified correctly in relation to the total number of samples that should have been identified as such: TP Recall = TP + FN Finally, the F-score combines precision and recall in one score: precision · recall F =2· precision + recall In classification tasks with more than two classes, metrics can be calculated for every class. In the end the average of the values can be combined to one metric for all classes. SUMMARY There are two types of AI: narrow and general. Current AI systems all belong to the category of ANI. ANI can solve complex problems faster than humans. However, its capabilities are limited to the domain for which it has been programmed. Even though the term ANI might suggest a limitation, it is embedded in many areas of our lives. In contrast to that, AGI (AI which has the cognitive abilities to transfer knowledge to other areas of application) remains a theoretical construct, but is still an important research topic. The application areas for AI are almost unlimited. AI has had a signifi- cant impact on today’s corporate landscape. Use cases, such as the opti- mization of service operations, the enhancement of products based on AI, and automation of manual processes, can help companies towards optimizing their business functions. Those use cases stretch across a wide range of industries, be it automotive and assembly, financial serv- ices, healthcare and pharma, consumer goods, and many more. 36 UNIT 3 REINFORCEMENT LEARNING STUDY GOALS On completion of this unit, you will be able to … – explain the basic principles of reinforcement learning. – understand Markov decision processes. – use the Q-learning algorithm. 3. REINFORCEMENT LEARNING Introduction Imagine you are lost in a labyrinth and have to find your way out. As you are there for the first time, you do not know which way to choose to reach the door to leave. Moreover, there are dangerous fields on the labyrinth and you should avoid stepping on them. You will have four actions you can perform: move up, down, left, or right. As you do not know the labyrinth, the only way to find your way out is to see what happens when you perform random actions. Within the learning process, you will find out that there are fields on the labyrinth that will reward you by letting you escape the labyrinth. However, there are also fields where you will receive a negative reward as they are dangerous to step on. After some time, you will manage to find your way out without stepping on the dangerous fields from the experience you have made walking around. This process of learning by reward is called reinforcement learning. Figure 8: Initial Situation in the Labyrinth Source: Created on behalf of IU (2022). In this unit, you will learn more about the basic ideas of reinforcement learning and the underlying principles. Moreover, you will get to know algorithms, such as Q-learning, that can help you optimize the learning experience. 3.1 What is Reinforcement Learning? Generally, in machine learning, there exist three techniques to train a specific learning model: supervised, unsupervised, and reinforcement learning. 38 In supervised learning, a machine learns how to solve a problem based on a previously labeled data set. Typical application areas for supervised learning are regression and clas- sification problems such as credit risk estimation or spam detection. Training those kinds of algorithms takes much effort because it requires a large amount of pre-labeled training data. In unsupervised learning, training is performed using unlabeled data to discover the underlying patterns. Based on the input data, clusters are identified which can later be used for classification. This approach is often used to organize massive amounts of unstructured data such as customer behavior, to identify relevant peer groups. Reinforcement learning techniques follow a more explorative approach. Algorithms based on this approach improve themselves by interacting with the environment. In contrast to supervised and unsupervised learning, there is no predefined data required. An agent learns on an unknown set of data based on the reward the environment returns to the agent. The following table summarizes the basic terms of reinforcement learning. Table 1: Basic Terms of Reinforcement Learning Agent Performs actions in an environment and receives reward for doing so Action (A) The set of all possible actions the agent can per- form Environment (E) The scenario the agent must explore States (S ) The set of all possible states in the given environ- ment Reward (R) Immediate feedback from the environment to reward an agent's action Policy (π) The policy the agent applies to determine the next action based on the current state Value (V ) The long-term value of the current state S using the policy π Source: Created on behalf of IU (2022). Within the process of reinforcement learning, the agent starts in a certain state st ∈ S and applies an action at ∈ A st to the environment E, where A st is the set of actions availa- ble at state st. The environment reacts by returning a new state st + 1 and a reward rt + 1to the agent. In the next step the agent will apply the next action at + 1 to the environment which will again return a new state and a reward. In the introductory example, you are acting as the agent in the labyrinth environment. The actions you can perform are to move up, down, left, or right. After each move, you will reach another state by moving to another field in the labyrinth. Each time you perform an action, you will receive a reward from the environment. It will be positive if you reach the 39 door or negative if you step on a dangerous field. From your new position, the whole learning cycle will start again. Your goal will be to maximize your reward. The process of receiving a reward as a function of a state-action pair can be formalized as follows: f st, at = rt + 1 The whole process of agent-environment interaction is illustrated in the figure below. Figure 9: The Process of Reinforcement Learning Source: Created on behalf of IU (2022). The process of an action being selected from a given state, transitioning to a new state, and receiving a reward happens repeatedly. For a sequence of discrete time steps t = 0, 1, 2, … starting at the state s0 ∈ S, the agent-environment interaction will lead to a sequence: s0, a0, r1, s1, a1, r2, s2, a2, r3, s3, … The goal of the agent is to maximize the reward it will receive during the learning process. The cycle will continue until the agent ends in a terminal state. The total reward R after a time T can be computed as the sum of rewards received at this point: Rt = rt + 1 + rt + 2 + … + rT This reward is also referred to as the Value V π s in the state s using the strategy π. In our example the maximum reward will be received once you reach the exit of the labyrinth. We will have a closer look at the value function in the next section. 40 3.2 Markov Decision Process and Value Function To be able to evaluate different paths in the labyrinth, we need a suitable approach to compare interaction sequences. One method to formalize sequential decision-making is Markov Decision Processes (MDP). In the following, we will discuss how MDPs work. The Markov Decision Process MDPs are used to estimate the probability of a future event based on a sequence of possi- ble events. If a present state holds all the relevant information about past actions, it is said to have the “Markov property”. In reinforcement learning, the Markov property is critical because all decisions and values are functions of the present state (Sutton & Barto, 2018), i.e., decisions are made depending on the environment’s state. When a task in reinforcement learning satisfies the Markov property, it can be modeled as an MDP. The process representing the sequence of events in an MDP is called a Markov chain. If the Markof property is satisfied, in every state of a Markov chain, the probability that another state is reached depends solely on two factors: the transition probability of reaching the next state and the present state. MDPs consist of the following components: States S Actions A Rewards for an action at a certain state ra = R s, a, s′ Transition probabilities for the actions to move from one state to the next state T a s, s′ Because of the Markov property, the transition function depends only on the current state: P st + 1 st, at, st − 1, at − 1, … = P st + 1 st, at = T at s, s′ The equation above states that the probability P of transitioning from state s_t to state s_{t+1} given an action a_t depends only on the current state s_t and action a_t and not on any previous states or actions. Which action is picked in a certain state is described by the Policy π: π s, a = p at = a st = s Using our labyrinth example, the position at which you stand offers no information about the sequence of states you took to get there. However, your position in the labyrinth repre- sents all the required information for the decision about your next state, which means it has the Markov property. 41 The Value Function In addition to the previously explained concepts, reinforcement learning algorithms use value functions. Value functions give an estimation about how good it is for an agent to be in that state and to perform a specific action in that given state (Sutton & Barto, 2018). Previously, we learned that the value of a state can be computed as the sum of rewards received within the learning process. Additionally, a discount rate can be used to evaluate the rewards of future actions at the present state. The discount rate indicates the likeli- hood to reach a reward state in the future. This helps the agent select actions more pre- cisely according to the expected reward. An action at + 1 will then be chosen to maximize the expected discounted return: V π s =Eπ rt + 1 + γr t + 2 + … + γ T − 1 rT st = s} ∞ =Eπ ∑k = 0 γkrt + k + 1 st = s where γ is the discount rate, with 0 ≤ γ ≤ 1, denoting the security of the expected return. A value of γ closer to 1 indicates a higher likelihood for future rewards. Especially in sce- narios where the length of time the process will take is not known in advance, it is impor- tant to set γ < 1, as otherwise the value function will not converge. The following figure illustrates which action the agent should optimally perform in the respective states of the labyrinth to maximize the reward, i.e., trying to reach the exit and avoid the dangerous field. Figure 10: Transitions in the Labyrinth Source: Created on behalf of IU (2022). 42 3.3 Temporal Difference and Q–Learning So far, we discussed model-based reinforcement learning. That means that an agent tries to understand the model of the environment. All decisions are based on a value function. This value function is based on the current state and the future state where the agent will end. In contrast to this, model-free approaches analyze the quality of an action to evaluate their actions. Q-Learning is a very well-known model-free reinforcement learning algo- rithm and is based on the concept of temporal difference learning. In the following, we will explain the underlying concepts of temporal difference and Q-learning. Temporal Difference Learning As temporal difference (TD) learning is a model-free approach, there is no model of the learning environment required. Instead, learning happens directly from the experience in a system that is partially unknown. As the name indicates, TD learning makes predictions based on the fact that there is often a correlation between subsequent predictions. The most prominent example to illustrate the principle of TD learning by Sutton (1988) is about forecasting the weather. Let’s say we want to predict the weather on a Monday. In a supervised learning approach, one would use the prediction of every day and compare it to the actual outcome. The model would be updated once it is Monday. In contrast to that, a TD approach compares the prediction of each day to the prediction of the following day, i.e., it considers the temporal difference between subsequent days and updates the pre- dictions of one day based on the result of the previous day. Therefore, TD learning makes better use of the experience over time. Q-Learning One well-known algorithm based on TD learning is the Q-learning. After initialization, the agent will conduct random acts which are then evaluated. Based on the outcome of an action, the agent will adapt its behavior for the subsequent actions. The goal of the Q-learning algorithm is to maximize the quality function Q s, a. The goal is to maximize the cumulative reward while being in a given state s by predicting the best action a (van Otterlo & Wiering, 2012). During the learning process Q s, a is iteratively updated using the Bellman equation: Bellman equation The Bellman equation computes the expected Q s, a = r + γmaxa′Q s′, a′ reward in an MDP of tak- ing an action in a certain state. The reward is bro- All Q-values computed during the learning process are stored in the Q-Matrix. In every iter- ken into the immediate ation, the matrix is used to find the best possible action. When the agent has to perform a and the total future new action, it will look for the maximum Q-value of the state-action pair. expected reward. 43 The Q-learning Algorithm In the following, we will itemize the Q-learning algorithm. The algorithm consists of an ini- tialization and an iteration phase. In the initialization phase, all values in the Q-table are set to 0. In the iteration phase, the agent will perform the following steps: 1. Choose an action for the current state. In this phase there are two different strategies that can be followed: Exploration: perform random actions in order to find out more information about the environment Exploitation: perform actions based on the information which is already known about the environment based on the Q-table. The goal is to maximize the return 2. Perform the chosen action 3. Evaluate the outcome and get the value of the reward. Based on the result the Q-table will be updated. SUMMARY Reinforcement learning deals with finding the best strategy for how an agent should behave in an environment to achieve a certain goal. The learning process of that agent happens based on a reward system which either rewards the agent for good decisions or punishes it for bad ones. To model the process of the agent moving in the environment, Markov decision processes can be used. A value function can be applied to the system to better evaluate the quality of future decisions. The Q-learning algorithm is a model-free approach from temporal differ- ence learning in which the agent gathers information about the environ- ment based on exploration and exploitation. Overall, the reinforcement learning process is very similar to learning through trial and error in real life. 44 UNIT 4 NATURAL LANGUAGE PROCESSING STUDY GOALS On completion of this unit, you will be able to… – explain the historical background of NLP. – name the most important areas of application. – distinguish between statistical- and rule-based NLP techniques. – understand how to vectorize data. 4. NATURAL LANGUAGE PROCESSING Introduction Natural language processing (NLP) is one of the major application domains in artificial intelligence. NLP can be divided into three subdomains: speech recognition, language understanding, and language generation. Each will be addressed in the following sections. After an intro- duction to NLP and its application areas, you will learn more about the basic NLP techni- ques and how data vectorization works. 4.1 Introduction to NLP and Application Areas NLP is an interdisciplinary field with roots in computer science (especially the area of arti- ficial intelligence), cognitive science, and linguistics. It deals with processing, understand- ing, and generating natural language (Kaddari et al., 2021). In human-computer interac- tion, NLP has a key role when it comes to making the interaction more natural. Therefore, the goal of NLP is to use and interpret language on a similar level to that of humans. This does more than just help humans to interact with the computer using natural language; there are many interesting use cases, ranging from automatic machine translation to gen- erating text excerpts, or even complete literature works. As mentioned above, there are three subdomains in NLP: 1. Speech recognition: identifies words in spoken language and includes speech-to-text processing 2. Natural language understanding: extracts the meaning of words and sentences as well as reading comprehension 3. Natural language generation: is the ability to generate meaningful sentences and texts. All these subdomains build on methods from artificial intelligence and form the basis for the areas of application of NLP. Historical Developments Early NLP research dates back to the seventeenth century, when Descartes and Leibnitz conducted some early theoretical research about NLP (Schwartz, 2019). It became a tech- nical discipline in the mid-1950s. The geopolitical tension between the former Soviet Union and the United States led to an increased demand for English-Russian translators. Therefore, it was attempted to outsource translation to machines. Even though the first results were promising, machine translation turned out to be much more complex than 46 originally thought, especially as no significant progress could be seen. In 1964 the Auto- matic Language Processing Advisory Committee classified the NLP technology as “hope- less” and decided to temporarily stop the research funding in this area. This was seen as the start of the NLP winter. Almost 20 years after the NLP winter began, NLP started to regain interest. This was due to the following three developments: 1. Increase of computing power: Computing power significantly increased, allowing for more computationally intensive algorithms following Moore’s law. 2. Shift of paradigms: Early language models were based on a grammatical approach that tried to implement complex rule-based systems to deal with the complexity of everyday language. More recent research had shifted towards models that are based on statistical and decision-theoretic foundations, such as decision trees. 3. Part-of-speech-tagging (POS): For this technique, a text is split into smaller units, i.e., individual sentences, words, or sub-words. Using POS tagging, grammatical word functions and categories are added to a given text. This allows to describe speech using Markov models. In contrast to approaches that consider the whole history, this Markov models is a major reduction of complexity. In a Markov model, the next state is defined based on the current Taken together, the shift to statistical, decision-theory, and machine learning models state and a set of transi- increased the robustness of NLP, especially concerning their ability to deal with unknown tion probabilities. constellations. Moreover, the improved computing power allowed to process a much big- ger amount of training data which was now available because of the growing amount of electronic literature. This opened up big opportunities for the available algorithms to learn and improve. NLP and the Turing Test One of the early pioneers in AI was the mathematician and computer scientist Alan Mathi- son Turing. In his research, he formed the theoretical foundation of what became the Turing test (Turing, 1950). In the test, a human test person uses a chat to interview two chat partners: another human and a chatbot. Both try to convince the test person that they are human. If the test person cannot identify which of their conversational partners is human and which is the machine, the test is successfully passed. According to Turing pass- ing the test allows the assumption that the intellectual abilities of a computer are at the same level as the human brain. The Turing test primarily addresses the natural language processing abilities of a machine. Therefore, the Turing test has often been criticized as being too focused on functionality and not on consciousness. One early attempt to pass the Turing test was done by Joseph Weizenbaum who developed a computer program to simulate a conversation with a psy- chotherapist (Weizenbaum, 1966). His computer program ELIZA was one of the first con- versational AIs. To process the sentence entered by the user, ELIZA utilizes rule-based pat- tern matching combined with a thesaurus. The publication got some remarkable feedback from the community. Nevertheless, the simplicity of this approach was soon recognized and according to the expectations from the community, ELIZA did not pass the Turing test. 47 In 2014 the Chatbot “Eugene Goostman” was the very first AI which seemed to have passed the Turing test. The Chatbot pretended to be a 13-year-old boy from Ukraine who was not a native English speaker. This trick was used to explain that the bot did not know everything and sometimes made mistakes with the language. However, this trick was also the reason why the validity of the experiment was later questioned (Masnick, 2014). Application areas of NLP Now we will briefly describe the major application areas of NLP. Topic identification As the name indicates, topic identification deals with the challenge to automatically find the topics of a given text (May et al., 2015). This can either be done in a supervised or in an unsupervised way. In supervised topic identification, a model can, for instance, be trained on newspaper articles that have been labeled with topics, such as politics, sports, or cul- ture. In an unsupervised setting, the topics are not known in advance. In this case, the algorithm has to deal with topic modeling or topic discovery to find clusters with similar topics. Popular use cases for topic identification are, for instance, social media and brand moni- toring, customer support, and market research. Topic identification can help find out what people think about a brand or a product. Social media provides a tremendous amount of text data that can be analyzed for these uses cases. Customers can be grouped according to their interests, and reactions to certain advertisements or marketing campaigns can be easily analyzed. When it comes to market research, topic identification can help when analyzing open answers in questionnaires. If those answers are pre-classified, it can reduce the effort to analyze open answers. Moreover, in customer support, topic identification can be beneficial by categorizing the customers’ requests by topics. Automatically forwarding requests to specialized employ- ees can not only reduce costs, but also increase customer satisfaction. Text summarization Text summarization deals with methods to automatically generate summaries of a given text that contain the most relevant information from the source. Algorithms for text sum- marization are based on extractive and abstractive techniques. Extractive algorithms pro- duce a summary of a given text by extracting the most important word sequences. Abstract techniques, conversely, generate summaries by creating a new text and rewriting the content of the original document. A common text summarization technique that works in an unsupervised extractive way is TextRank (Mihalcea & Tarau, 2004). This algorithm compares every sentence of a given text with all other sentences. This is done by computing a similarity score for every pair of sen- tences. A score closer to one indicates a higher similarity between one sentence and another sentence that represents the content in a good way. For each sentence, the scores 48 are summarized to get a sentence rank. After sorting the sentences according to their rank, it is easy to evaluate the importance of each one and create a summary from a predefined number with the highest rank. There are two major challenges when dealing with supervised extractive text summariza- tion, as training requires a lot of hand-annotated text data. These are: 1. It is necessary that the annotations contain the words that have to be in the summary. When humans summarize texts, they tend to do this in an abstract way. Therefore, it is hard to find training data in the required format. 2. The decision about what information should be included in the summary is subjective and depends on the focus of a task. While a product description would focus more on the technical aspects of a text, a summary of the business value of a product will put the emphasis on completely different aspects. A typical use case for text summarization is presenting a user a preview of the content of search results or articles. This makes it easier to quickly analyze a huge amount of infor- mation. Moreover, in question answering, text summarization techniques can be used to help a user find answers to certain questions in a document. Sentiment analysis Sentiment analysis captures subjective aspects of texts (Nasukawa & Yi, 2003), such as analyzing the author’s mood on a tweet on Twitter. Like topic identification, sentiment analysis deals with text classification. The major difference between topic identification and sentiment analysis is that topic identification focuses on objective aspects of the text while sentiment analysis centers on subjective characteristics like moods and emotions. The application areas for sentiment analysis are manifold. Customer sentiment analysis has gained much traction as a research field lately. The ability to track customers’ senti- ments over time can, for instance, give important insights about how customers react to changes of a product/a service or how external factors like global crises influence custom- ers’ perceptions. Social networks, such as Facebook, Instagram, and Twitter, provide huge amounts of data about how customers feel about a product. Having a better understand- ing of customer’s needs can help modify and optimize business processes accordingly. Detecting emotions from user-generated content comes with some big challenges when dealing with irony/sarcasm, negation, and multipolarity. There is much sarcasm in user-generated content, especially in social media. Even for humans, it can sometimes be hard to detect sarcasm, which makes it even more difficult for a machine. Let us, for instance, look at the sentence “Wow, your phone has an internal storage of 1 Gigabyte?” Only a few years back this would have been a straightforward sentence. Now, if said about a modern smartphone, it is easy for a human to tell that this statement is sarcastic. While there has been some recent success in sarcasm detection using methods from deep learn- ing (Ghosh & Veale, 2016), dealing with sarcasm remains a challenging task. 49 Negation is another challenge when trying to detect a statement's sentiment. Negation can be explicit or implicit, and also comes with the morphology of a word denoted by pre- fixes, such as “dis-” and “non-,” or suffixes, such as “-less”. Double negation is another lan- guage construct that can be easily misunderstood. While most of the time double nega- tives will cancel each other, in some contexts it can also intensify the negation. Considering negation in the model used for sentiment analysis can help to significantly increase the accuracy (Sharif et al., 2016). An additional challenge in sentiment analysis can be multipolarity, meaning that some parts of the text can be positive while others are negative. Given the sentence “The display of my new phone is awesome, but the audio quality is really poor”, the sentiment for the display is positive while it is negative for the speakers. Simply calculating the average of the sentiment might lead to information loss. Therefore, a better approach to tackle this issue would be to split the sentence into two parts: one for the positive review of the dis- play and one for the negative feedback about the speakers. Named entity recognition Named entity recognition (NER) deals with the challenge of locating and classifying named entities in an unstructured text. Those entities can then be assigned to categories such as names, locations, time and date expressions, organizations, quantities, and many more. NER plays an important role in understanding the content of a text. Especially for text analysis and data organization, NER is a good starting point for further analysis. The following figure shows an example of how entities can be identified from a sentence. Figure 11: Example for Named Entity Recognition Source: Created on behalf of IU (2022). NER can be used in all domains where categorizing text can be advantageous. For instance, tickets in customer support can be categorized according to their topics. Tickets can then automatically be forwarded to a specialist. Also, if data has to be anonymized due to privacy regulations, NER can help to save costs. It can identify personal data and automatically remove it. Depending on the quality of the underlying data, manual cleanup is no longer necessary. Another use case is to extract information from candidate resumes in the application process. It can significantly decrease the workload of the HR depart- ment, especially when there are many applicants (Zimmermann et al., 2016). The biggest challenge in NER is that to train a model, it is necessary to have a large amount of annotated data for training available. The model will later always focus on the specific tasks/the specific subset of entities on which it has been trained. 50 Translation Machine translation (MT) is a subfield of NLP that combines several disciplines. Using methods from artificial intelligence, computer science, information theory, and statistics, in MT text or sp

Artificial Intelligence Coursebook PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue