Fundamentals of Artificial Intelligence - Lecture Notes PDF
Document Details
Uploaded by CrispSurrealism5763
Indian Institute of Technology Guwahati
Shyamanta M Hazarika
Tags
Summary
This document is a lecture on artificial intelligence. It introduces the field of artificial intelligence, and discusses its history, philosophy, and different dimensions. The document covers topics like problem-solving, knowledge representation, and machine learning. It also describes the work of important figures like Aristotle, Leonardo da Vinci, Thomas Hobbes, and others, who were influential in the development of artificial intelligence.
Full Transcript
INDEX S. No Topic Page No. Week 1 1 Introduction to Artificial Intelligence 1 2 Problem Solving as State Space Search 33 3 Uninformed Search...
INDEX S. No Topic Page No. Week 1 1 Introduction to Artificial Intelligence 1 2 Problem Solving as State Space Search 33 3 Uninformed Search 68 Week 2 4 Heuristic Search 105 5 Informed Search 127 6 Constraint Satisfaction Problems 156 Week 3 7 Searching AND/OR Graphs 196 8 Game Playing 216 9 Minimax + Alpha-Beta 238 Week 4 10 Introduction to Knowledge Representation 256 11 Propositional Logic 278 Week 5 12 First Order Logic -I 296 13 First Order Logic -II 321 14 Inference in First Order Logic -I 342 Week 6 15 Inference in FOL - II 367 16 Answer Extraction 396 17 Procedural Control of Reasoning 413 Week 7 18 Reasoning under Uncertainty 441 19 Bayesian Network 467 20 Decision Network 494 Week 8 21 Introduction to Planning 519 22 Plan Space Planning 562 23 Planning Graph and GraphPlan 592 Week 9 24 Practical Planning and Acting 613 25 Sequential Decision Problems 646 26 Making Complex Decisions 666 Week 10 27 Introduction to Machine Learning 687 28 Learning Decision Trees 717 Week 11 29 Linear Regression 742 30 Support Vector Machines 764 31 Unsupervised Learning 780 Week 12 32 Reinforcement Learning 811 33 Learning in Neural Networks 829 34 Deep Learning A Brief Overview 851 Fundamentals of Artificial Intelligence Prof. Shyamanta M Hazarika Department of Mechanical Engineering Indian Institute of Technology - Guwahati Module - 1 Lecture - 1 Introduction to AI Welcome to the course of Fundamentals of Artificial Intelligence. This is the first lecture, An Introduction to AI. In this lecture, we will look at the quest for AI; the thoughts, ideas, work that have influenced the growth of artificial intelligence. Thereafter, we would look at the definition of AI and the different dimensions of artificial intelligence. (Refer Slide Time: 01:02) We would look at what is involved in an AI system and try to understand the difference between weak and strong AI. We would also very quickly go through different developments in the history of AI. The quest for artificial intelligence is a story of dreams and began with dreams as all quests do. (Refer Slide Time: 01:33) 1 People have long imagined machines or devices that could reason and have abilities like the human; automatons that could move. In fact, human like automatons are described in many stories and are pictured in sculptures, paintings and drawings. Perhaps, the greatest of the thoughts was described in `The Politics’ by Aristotle. (Refer Slide Time: 02:08) He thought of tools that could perform task at our bidding or itself. Shuttles in loom that could fly to and fro. And in fact, had generated the idea of having automated machines. (Refer Slide Time: 02:26) 2 It was Leonardo Da Vinci who sketched designs for a humanoid robot in the form of a medieval knight around 1495. It is very difficult to be sure. And no one knows whether Leonardo Da Vinci or his contemporaries tried to build his design. Leonardo’s knights, never the less, was supposed to be able to sit up, move its arms and head, opens its jaw. It could do everything that a human did in its mechanical form. (Refer Slide Time: 03:02) In fact, in 1651, Thomas Hobbes published his book Leviathan. And in that book Hobbes seems to say that it might be possible to build an artificial animal. And because of that, George Dyson refers to Hobbes as the patriarch of artificial intelligence. (Refer Slide Time: 03:21) 3 In Leviathan, Hobbes writes that life is but a motion of limbs. Why may we not say that all automatons have an artificial life? And he continues to say: what is the heart, but a spring; and the nerves, but so many strings; and the joints, but so many wheels, giving motion to the whole body. It seems, Thomas Hobbes in Leviathan, actually described the possibility of artificial life. (Refer Slide Time: 03:55) Several people constructed actual automata that moved in startlingly lifelike ways. Perhaps, the most sophisticated of these was a mechanical duck which could quack, flap its wings, paddle, drink water, eat and digest grain. (Refer Slide Time: 04:13) 4 This is what the duck look like. So, in 1738, French inventor and engineer Jacques de Vaucanson displayed the duck, his masterpiece. (Refer Slide Time: 04:26) In fact, very shortly after that, 1801, French silk weaver and inventor Joseph Marie Jacquard invented the automated loom. Starting with Aristotle's idea in the politics, we had a real loom that is controlled by punch cards. It revolutionized the mass production and was in great use across Europe. (Refer Slide Time: 04:52) 5 Frank Baum in 1900 invented one of literary world’s most beloved robots, The Wonderful Wizard of Oz. This was a mechanical man and in search of a heart. (Refer Slide Time: 05:07) Not only in fiction, but in real life, these things were supposed to become reality soon. Some 17 years after Frank, Joseph Capek wrote the short story Opilec, describing automatons. And Joseph’s brother Karel introduced the term robot in the play Rossum’s Universal Robots. This is when we started having humanoids with brains, albeit in fiction. (Refer Slide Time: 05:41) 6 Rossum’s Universal Robot centers around a mad scientist who tries to get the powers of God for man has acquired the technology and intelligence to create life. (Refer Slide Time: 05:54) Not in fiction but in reality; two important tortoises like robots, Elmer and Elsie revolutionized the growth of AI. In 1948, Dr. W. Grey Walter was interested if these robots could model brain functions. And he built 2 small robots which we called tortoises and he named them Elmer and Elsie. Elmer and Elsie were a marvel of the day. (Refer Slide Time: 06:31) 7 They were the most revolutionary things because they did not have any pre-programming. And the most interesting thing was that, in spite of having basic analog circuits, they could go and recharge their own batteries when they felt that the batteries were running down. Perhaps, this was a turning point in the history of the quest for artificial intelligence. What then is artificial intelligence? (Refer Slide Time: 06:58) We say artificial intelligence is demonstrated when a task that is performed by a human and thought of as requiring the ability to learn, to reason and solve problems can be done by a machine. This is itself a deep philosophical question. (Refer Slide Time: 07:16) 8 Attempts to systematically answer it will fall within the foundations of artificial intelligence as a rich topic for analysis and debate. This is not the aim of this lecture. So, we will for the time being have a provisional answer. A provisional answer could be the following. (Refer Slide Time: 07:38) Artificial intelligence is the field devoted to building artifacts capable of displaying in controlled, well-understood environments and over sustained periods of time, behaviors that could be considered to be intelligent; or more generally, behaviors that are at the heart of what it is to have a mind. 2 things are very important in this working definition of AI that we go forward with. 9 We need to look at behaviors that we consider to be intelligent. There may not be anything of a human brain there. But then, we also look at behaviors that we believe that at the heart of it, we need to have a mind or a brain. This is itself a deep philosophical question. (Refer Slide Time: 08:41) This gives rise to further questions. 1. What exactly constitutes intelligent behavior? 2. What is to have a mind? 3. How humans actually manage to behave intelligently? Let us look at each of this question one by one. (Refer Slide Time: 09:03) How humans actually manage to behave intelligently? This question is empirical. It is predominantly for psychology and cognitive science to answer. However, this is a very pertinent question for AI. This is because, any insight into how the human thought process work may help build machines that work similarly. In fact, it is very difficult to separate out 10 the growth of cognitive science and the growth of AI. Each have lent help to the other and their history in intertwined. (Refer Slide Time: 09:42) The next question on what it is to have a mind is a question that what is the mark of the mental; and is a philosophical question. The trust on artificial intelligence has lent significant urgency to it. In fact, very careful philosophical contemplation of that question has influenced the course of artificial intelligence itself. The third question, what exactly constitutes intelligent behavior is: (Refer Slide Time: 10:12) The question specifying precisely what it is to count as intelligent. This has been traditionally met by proposing particular behavioral tests whose successful passing would signify the presence of intelligence. 11 (Refer Slide Time: 10:31) Let us now look at the dimensions of artificial intelligence vis-a-vis these three questions. So, we could have four different things involved. It would be like, whether something is thinking, acting; or it is giving rational behavior; or it is giving human behavior. So, an artificial system will either fall into 1 of these 4 ways of looking at things. (Refer Slide Time: 11:02) And therefore, we could have systems which would be thinking humanly or thinking rationally or acting rationally or acting humanly. (Refer Slide Time: 11:12) 12 Think like human. When we say, we mean we need to model human cognition. Think rationally would involve formalizing the inference process. To act rationally would mean doing the right thing always. And the fourth, act like human, would involve exhibiting human behavior. (Refer Slide Time: 11:35) Let us look at each of this dimensions one after another. The first one, thinking like the human involves modeling human cognition; and is an area that has been looked at very closely by the information processing community within psychology and has got huge push from the cognitive revolution. It requires scientific theories of internal activities of the brain. In fact, Newell and Simon’s General Problem Solver is a kind of this intelligences. (Refer Slide Time: 12:18) 13 The General Problem Solver developed in 1957 by Alan Newell and Herbert Simon embodied a grand vision. It thought of a single computer program that would solve any problem; given a sustainable description of the problem. It did cause a quite a stir when it was introduced. And many people felt it would sweep in a grand new era of intelligent machines. It is a different story that it did not happen the way it was hoped to make its contribution. (Refer Slide Time: 12:55) The next dimension of artificial intelligence is about thinking rationally, which involves formalizing the inference process. In fact, there are several Greek schools which developed various forms of logic; notation and rules of derivation of thoughts. They may or may not have proceeded to the idea of mechanization of these processes. But the very idea of 14 formalizing the inference process as a direct line through mathematics and philosophy to modern AI. In fact, Aristotle has a huge contribution. (Refer Slide Time: 13:34) Aristotle considered rationality to be an essential characteristic of the human mind. Perhaps the deepest contribution of Aristotle to AI was the idea of formalism. Notion that remains at the heart of contemporary computational theory of mind and what we will see later to be strong AI. The idea of an intelligent machine was often thought to be a machine that can perform logical inference. (Refer Slide Time: 14:07) Aristotle was one of the first to attempt and codify thinking. His syllogisms provided patterns of argument structure that always give the correct conclusion, given correct premises. Let us take an example and see what it means. Here is an example. This says that all computers use 15 energy. We could then say using energy always generates heat. And from these 2 sentences, we could try conclusion that all computers generate heat. Aristotle told us how to do it in a pattern of argument structure. (Refer Slide Time: 14:51) But then, there are many obstacles to the logistic approach in building programs to create intelligence. They are: Number 1, not all intelligent behavior is meditated by logical deliberation. Number 2, if everything is logical deduction, question is, what is the purpose of thinking? What thoughts should I have? Informal knowledge is not precise. And therefore, under a logical premise, it is difficult to model uncertainty. Number 4, the theory and practice of putting everything within a logical system is hard to put together for real life problems. Let us now focus attention on the third dimension of artificial intelligence, which is act rationally. (Refer Slide Time: 15:48) 16 That is, doing the right thing. Rational behavior or doing the right thing is where one is expected to act to maximize goal achievement, given the available information. Now, many of our rational acts do not necessarily involve thinking. Most of the bionic reflexes that we do; like if your glass is slipping out of your hand, you just re-grasp it with more pressure. We do it without thinking. So, they do not necessarily involve thinking. (Refer Slide Time: 16:24) It is definitely more general than the logical approach and amenable to scientific development than approaches based on human behavior or human thought. However, achieving perfect rationality in complex environments is not possible. This is because of the high computational demands that such systems put. (Refer Slide Time: 16:50) 17 We now focus attention on our fourth dimension which is, act like human. That is, exhibiting human behavior. Creating machines that perform functions that require intelligence when the same functions are performed by people. (Refer Slide Time: 17:08) There are a number of capabilities to be incorporated in such systems, that need to act like human. They include things like natural language processing, knowledge representation, automated reasoning, machine learning, computer vision and of course robotics. (Refer Slide Time: 17:27) 18 It was Alan Turing who published his landmark paper in which he speculated about the possibility of creating machines with true intelligence. He noted that intelligence is difficult to define. And therefore, he devised the famous Turing Test. In fact, the Turing Test was the first serious proposal in the philosophy of artificial intelligence. (Refer Slide Time: 18:00) In his paper Computing Machinery and Intelligence, Alan Turing starts with the words, I propose to consider the question, can machines think? And he laid the ground for what later became to be known as artificial intelligence. (Refer Slide Time: 18:14) 19 The Turing Test or the Imitation Game, as it was called in the paper, was put forth as a simple test that could be used to prove that machines could think. The Turing Test involves the following. There is a human interrogator talking to a human or to a machine, unaware of whom he gets responses from. If the interrogator is not able to distinguish the responses received from the man or the system, the system is said to be artificially intelligent. Turing also has lot of contribution apart from this, in the history of AI. (Refer Slide Time: 18:58) Turing and his fellow code-breakers were actually pitted against the Nazi code machine Enigma. And Turing put up electromechanical machines to read thousands of German radio intercepts. In fact, these machines employed something which we now have as a central idea of AI called heuristic searching. They found a right answer, often enough and fast to be read 20 in real time. In fact, without such machines, German U-boats would have decimated the North Atlantic convoys. (Refer Slide Time: 19:34) We now are able to talk of the early days of AI. Having looked at the dimensions; having looked at the general growth of AI; in late 1955, Alan Newell and Herbert Simon developed the logic theorist. This can be considered as the first AI program. The program representing each problem as a tree model, attempt to solve it by selecting the branch that would most likely result in the correct conclusion. (Refer Slide Time: 20:10) However, it was in 1956 that John McCarthy organized the Dartmouth Summer Research Project on artificial intelligence. And in fact, from that point on, the field came to be known 21 as artificial intelligence. The Dartmouth conference served to lay the groundwork for future of AI research. (Refer Slide Time: 20:36) The term artificial intelligence was first used in their document. They say it and I read here for you. We propose that a two month, 10-man study of artificial intelligence be carried out during the summer of 1956, at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture, that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. McCarthy, Minsky, Rochester and Shannon; August 31, 1955. In fact, John McCarthy is one of the founding fathers of artificial intelligence together with Marvin Minsky, Alan Newell and Herbert Simon. (Refer Slide Time: 21:34) 22 We now have looked at the dimensions of AI and have realized that we could be either thinking humanly, thinking rationally, acting rationally or acting humanly. This leads us to 2 distinct concepts of AI. The weak versus the strong AI. The weak AI is aimed at building machines that act intelligently. Now, they do not take a position on whether or not the machines actually are intelligent. Whereas strong AI is a field which is devoted to building persons. As Charniak and McDermott concede in their classic introduction to AI, that we are very far from achieving what is strong AI. However, the ultimate goal of AI which we are very far from achieving is to build a person or more humbly an animal. (Refer Slide Time: 22:33) 23 Charniak and McDermott do not say that the ultimate goal is to build something that appears to be a person or acts like a person. Their brand of AI is so called strong AI. An ambitious form of the field aptly summed up by Haugeland. (Refer Slide Time: 22:54) The fundamental goal according to Haugeland is not merely to mimic intelligence or produce some clever faith; not at all. AI wants only the genuine article; machines with minds, in the full and literal sense. This is not science fiction but real science based on theoretical conception as deep as it is daring; namely, we are at a root, computers ourselves. (Refer Slide Time: 23:29) And this is where Nilsson’s definition of AI is interesting to note. The goal of work in AI according to Nilsson is to build machines that perform tasks normally requiring human intelligence. Whether it copies the way human intelligence works is not clearly specified. 24 But, we will love to have it perform tasks that would require intelligence. So, what is involved if some AI system has to work, that requires intelligence for a human being to do the same work. (Refer Slide Time: 24:13) We would need interaction with the real world. That is, the system should be able to perceive, understand and act. It would involve things such as speech recognition, image understanding. The next thing would be reasoning and planning involving modeling the external world, planning and decision making and deal with unexpected problems and uncertainties. Thereafter, any intelligence system should be able to learn and adapt. That is, should have internal models which should be always updated. For example, a baby learning to categorize and recognize animals. Such a work would therefore involve a huge number of disciplines contributing and interacting with each other. One of the disciplines that has made large contribution to the growth of AI is philosophy; (Refer Slide Time: 25:15) 25 where we had looked at one of these ideas of logic, then they have contributed to method of reasoning, mind as a physical system to the foundations of learning and language rationality. The next area that has contributed hugely to the growth of AI is mathematics; involving formal representation and proof theory; algorithms, computational decidability, tractability. The other area that has contributed to the growth of AI is statistics and probability which involve modeling uncertainty and learning from data. Of course, economics has had a huge impact on AI in theories such as theories of utility and decision theory. In this course, we would definitely look at some formal representation. And we will look at modeling uncertainty and learning. We would of course look at decision theory. (Refer Slide Time: 26:21) 26 The other area that has influenced AI in a huge way is the growth of the area of neuroscience or neurons as information processing units. As we will see later, these developments in this area has in fact moved the course of AI in a huge way. Psychology and cognitive science, as I told you before, has made a huge impact in the area of AI. We look at how do people behave, perceive, process cognitive information, represent knowledge. Our understanding of these processes allow us to develop artificial intelligence systems that can do the same thing. In fact, the growth of cognitive science has also been pushed by the growth of AI, particularly machine learning. Next area of study that has made AI possible and pushed the frontiers of AI is the area of computer engineering. Having fast computers has made a large impact. And then, control theory that talks of designing systems that maximize an objective function over time has pushed the growth of AI. One area which have made a huge impact in the way artificial intelligence systems are developed over the years is the area of linguistics; particularly, knowledge representation and grammars. In this course, we would look at knowledge representation; we would look at neurons as information processing units. We would definitely do representation of knowledge in cognitive science and psychology. (Refer Slide Time: 28:18) Let us now look at a brief history of AI and try to understand how the area has evolved over the years. As I told you in the previous slide, the earliest beginning of AI was in 1943 with the McCulloch and Pitts, Boolean circuit model of the brain, which is about the neural model. 27 In 1950s, we had the initial promise with of course, Turing’s interesting and seminal paper on computing machineries and intelligence. We have Samuel’s checker program which we will talk of in our module on machine learning, which was the first program of machine learning. In fact, the Samuel’s checker program running on an IBM machine, playing checkers with a human, could defeat a human and was very popular at that point of time. We had very early AI programs like Newell and Simon’s logic theorists. These were very initial promising enterprises in the area of AI. The enthusiasm continued for the decade 1955 to 1965. (Refer Slide Time: 29:40) With the Dartmouth meeting came the term artificial intelligence. Very shortly thereafter, Newell and Simon came up with the General Problem Solver which we have discussed in one of the previous slides. Thereafter, there was the Geometry Theorem Prover and John McCarthy came up with the programing language called LISP. However, very soon the reality dawned. There was a realization that many artificially intelligent problems are intractable. People soon realized the limitations of existing neural network methods. And in fact, people left neural networks as something that would not make any great impact. (Refer Slide Time: 30:34) 28 However, between 1969 and 1985, people were starting to realize that domain knowledge is something that could change the tilt. Development of knowledge based system was growing. And there are lot of success stories; success of rule based expert systems such as DENDRAL, MYCIN. But these were too brittle and they did not definitely scale well in practice. However, starting with 1986, we had the rise and rise of machine learning. Neural networks again returned to popularity, albeit with lot of other modifications, which we will look at in our module on machine learning. There were mature advances in machine learning, both in terms of algorithms and applications. (Refer Slide Time: 31:36) Beginning of 1990, people started to talk of uncertainty; the role of uncertainty in machine learning and in AI in general. They looked at Bayesian networks for knowledge 29 representation. 1995 is the turning point when artificial intelligence started to be accepted as science. During this period, there was integration of learning, reasoning, knowledge representation. AI methods started to be used in vision, language, data mining and lot of other real-life problems. If we now look at the history of AI; let us focus on this slide. (Refer Slide Time: 32:15) Up till 1980-85, we would see that AI research was involved in its initial euphoric concepts. Beyond 85, machine learning took over the area of AI roughly. This period of AI, we were no longer concerned with the general AI problem and can be categorized as narrow AI; AI with very focused attention - problems which are really really focused and do not talk of a general AI concept. It is expected that as we make more progress in machine learning, we would be able to apply intelligence to any problem; near human level intelligence and would go back to our general AI problem. But it is very uncertain when that would happen. (Refer Slide Time: 33:11) 30 Here is a very important book that you need to pursue in this first module of the course. This is Artificial Intelligence: A Philosophical Introduction. This is by Jack Copeland. The book reviews the progress made in AI since the inception of the field in 1956. Copeland goes on to analyze what those working in AI must achieve before they can claim to have built a thinking machine. (Refer Slide Time: 33:47) The next book is The Quest for Artificial Intelligence by Nils J. Nilsson. The most important thing on this book is the end of chapter notes, which has citations to important materials; which is of great use to AI scholars and researchers. In fact, the book traces the history of the field that has captivated the imagination of scientists, philosophers and writers for centuries. A third book that I would love to refer to for this module of the course is: (Refer Slide Time: 34:21) 31 Machine Learning, the new AI. A concise overview of machine learning- computer programs that learn from data, in order to understand what lies in the new challenges for artificial intelligence. So, this was all about this lecture of the first module of the course, Introduction to AI. Thank you very much. 32 Fundamentals of Artificial Intelligence Prof. Shyamanta M Hazarika Department of Mechanical Engineering Indian Institute of Technology - Guwahati Module - 1 Lecture - 2 Problem Solving by Search Welcome back to the course on Fundamentals of Artificial Intelligence. We will do the second module from today. The module is called Problem solving by Search. (Refer Slide Time: 00:49) First we will look at what is that makes AI programming different from general computational paradigms. We will look at what is commonly called an AI technique and then introduce production systems - a system of programming which captures the essence of AI systems. We will focus on what are the different components of a production system and then look at some specialized production systems. This is what the overview of the course today looks like. (Refer Slide Time: 01:35) 33 We would first talk on: What is an AI technique? How is AI different from the general traditional programming? What makes AI programming different? At the end of this course today, you should be able to identify what is production systems; the components of a production system. We would introduce a basic procedure and look at different control and control strategies. Production systems is what separates facts, the way reasoning happens and the control system into three distinct units. We shall look at each of them very closely and then focus our attention on how control and control strategies are designed. We will look at what is called irrevocable control strategy, backtracking and graph search. A very interesting way of looking at things as a production system involves not only looking at these three units, but also able to do what is called problem reduction. We will look at a very specialized production system called decomposable production system which allows us to reduce the problem. And we will also look at something called the commutative production system. But then first, let us focus our attention on what is an AI technique. (Refer Slide Time: 03:22) 34 Artificial intelligence as you can remember from the first lecture of module 1, involves problems that span a very broad spectrum. Now, they are problems like natural language processing, navigation for robot in a clustered environment. It could be like autonomous driving or automated reasoning. They appear to have very little in common, except that they are very hard. The focus, when we ask the question what is an AI technique is to look for an answer if there are any techniques that are appropriate for the solution of a variety of these problems? Is there something common which is there for all these problems that appear? (Refer Slide Time: 04:29) One of the few hard and fast results to come out of the first three decades of AI research is the fact that any such intelligent program that one needs to write requires knowledge. This is 35 a very interesting idea that knowledge (if you have to have any program that can) give rise to intelligence. (Refer Slide Time: 04:57) AI technique therefore is a method that exploits knowledge and should be represented in such way that this knowledge that we are talking of captures generalization. Here it is very important for us to note what we mean by generalization. Whenever we are talking of a technique that exploits AI, we are more interested in capturing the general information about the problem, rather than capturing very specific things of the problem. The next important thing that an AI technique exploits is a knowledge that is understood by the people who provide it. Another important characteristic is that this knowledge that is provided needs to be easily modified. When I say knowledge needs to be easily modified, what I mean is, that the knowledge needs to be worked on by some system to create new knowledge. And this new knowledge that I create out of existing knowledge should be able to be created by very simple operations. This, I think is what drives AI technique. And an AI technique is therefore these three things used to help overcome the sheer bulk that is required. (Refer Slide Time: 06:33) 36 What is an AI technique? In order to answer this question, we need to understand that the aim here is to build a system, definitely for a particular problem. But then, the solution that we expect is as generic as possible. So, we are looking for a generic solution to definitely a particular problem. And when we want this, we need to do four things. a. We need to define the problem precisely. Problem definition is as important as what knowledge we capture for that problem. The second thing that needs to be done is a great analysis of the problem. One needs to understand what this problem that we are trying to tackle is about. And number 3: we need to isolate and represent the task knowledge. Now what is task knowledge and how do we isolate and represent task knowledge is something that we will look at in the next few examples that we are going to talk about. The fourth interesting thing is than given these three important things, that we define the problem precisely, we analyze the problem, we then represent the task knowledge; the onus is now to choose the best problem-solving technique. (Refer Slide Time: 08:08) 37 It is therefore important for us to realize what is the basic difference between traditional programming and programming with artificial intelligence or programming that tries to create AI systems. Traditional programming is when we program and look for answer for only a specific question it is meant to solve. Like, let us take an example, that I am interested in finding out the square root of a quadratic equation. My focus is only on getting the roots of the equation. Whereas, if I am doing programming with AI, the idea is to get answers to generic questions, rather than only specific questions that I am asking. Like here, one could take the example of playing a game of checkers or playing a game of chess or more interestingly what we will exploit is a game of what we call 8-puzzle, where I have 9 tiles positions. And one of the tile positioning is free, so that I can move it around. And given a start position of that configuration of tiles, I want to look for a goal position of the configuration of tiles. And when I am trying to look at a solution of this, I am not interested in a specific solution of given a particular start state to a particular goal state. I am interested in a way of arriving at from any given start state to any given goal state. The second important difference is that any traditional programming paradigm that we explain modification in the program itself will lead to change in its structure. Whereas, if it is an AI programming paradigm which we will look at today, modification in program do not change the structure. This is because (as we will look at the paradigm of programming called production systems, which captures the essence of AI systems) it is made up of independent pieces of information. 38 The rules, facts, the reasoning and the control are completely distinct elements. Therefore, modifications in one of them do not change the other or do not change the structure of the program. The third interesting difference therefore flows from the second. Modification in traditional programming may lead to adversely affecting the program. Whereas, here when I am talking of a production system or what I loosely refer to as programming with an AI, modifications would be quick and easy. (Refer Slide Time: 11:31) So, we are now in a position to state what we started our lecture with, that we want to do problem solving as search. But before that, we need to understand what do we mean by problems as a state space search? Problem solving task within such a module can be formulated as search in state space. A state space consists of all states of the domain and states of operators that change one state into another. The states can be best thought of to be as nodes in a connected graph and operators as edges. And when we were looking at problem solving, problem solving can be formulated as a search in the state space. Let us look at this example more clearly to understand what we mean by this. So, let us say we have taken a problem to find out the solution of an 8-puzzle game. An 8-puzzle game has a 3 x 3 matrix, where numbers are written: 1, 2, 3, 4, 5, possibly 6, 7, 8, with one of the tile which is unoccupied, so that you can move that tile. So, this space can come here and here. So basically, when I am talking of the 8-puzzle itself, I can have such configurations. And such configurations can change from one to another by a set of operators, which happens because of movement of the empty tile. Like here, I could take the empty one to the right. So, 5 comes here. And I have 1, 2, 3, 5, 4, 7, 6, 8. So, from here to the 39 state here, I can come by a simple operator which would be something like, take the empty to the right. And when I am doing this, I am generating a number of newer states. That newer states that is getting generated is (all of them together) creating what is called the search state space. So, all of these states of the domain and all of these operators that I have. So, we could have four operators where the blank goes to the right; the blank could be pushed to the left; the blank space could be pushed up or the blank space could be pushed down. All those, these operators generate what is called the set of operators. The states of the domain and the set of operators once defined, I can think of moving from one state to the other state. And basically, what we will have is therefore given a state some way to move to new states; to new states, to new states down there. And, when I am looking for a solution to this problem, okay? Given a start state s and some goal state g, the solution to this problem is about finding a path from s to g. So, path from an initial state to a goal state; if it is found, the problem is said to be solved. This is essentially what it means to solve problems as state space search. (Refer Slide Time: 15:37) The set of all possible configurations that I get of the relevant objects is the state space of the problem. This is also referred to as the problem space. I showed you an example of the 8- puzzle. So, each tile configuration that I have drawn in the previous slide, like this 3 x 3 matrix with numbers running from 1, 2, 3, 4, 5, 6, 7 and an 8, with one of them as empty, is a problem state. 40 8-puzzle actually has a very relatively small problem space, which is some 9 factorial different configurations. So, when we are looking for a solution from a given start to some given goal position g, which could be any other configuration in the system; like, I could be thinking of arranging them as 1, 2, 3, 4, 5, 6, 7 and 8. I am thinking of getting a way to come from this to this. And the total configurations that are there is only 9 factorial different configurations. So, such a search would be possible. (Refer Slide Time: 17:26) We can think of another example which is about the game of chess. Initial state for a game of chess could be thought of as 8 cross 8 array. The legal moves as set of rules, each rule consisting of 2 parts. A left side that serves as pattern to be matched and a right side that describes the change to reflect the move. Goal state could be any board position where the opponent does not have any more legal move. And therefore, it is a win. (Refer Slide Time: 18:02) 41 When we start playing chess by starting at an initial state using a set of rules to move from one state to another, we end up in one of the set of final states. This state space representation seems natural for games such as chess, 8-puzzle, because the set of states corresponding to set of board’s position. This kind of representation may not be very obvious, but would be also useful for less structured problems with use of more complex structures than a matrix. I would like to emphasize here that the game that I have spoken of, 8-puzzle; the matrix representation came obviously to the mind because we were talking of a 3 cross 3 arrangement of tiles. For chess, an 8 cross 8 matrix immediately comes to mind. But then, this kind of state space representation is also useful for other less structured problems. The only emphasis that I want to give here is now to point out that, under less structured problems, if we want to use such a state space representation, we would need more complex structures than a matrix. And such a representation, such a mode of problem solving could be useful. Now, here is the formal description of a problem. (Refer Slide Time: 19:42) 42 So, what we have is: Number 1. The state space. The state space defines all possible configurations of the relevant objects. The number 2 what we have is the initial state. The initial state specifies one or more states within that space as the possible situation from which problem solving can start. Thereafter, we have the goal state. The goal state specifies one or more states that would be acceptable as solutions to the problem. And of course, this is about only the different configurations of states that we are talking of. Together with this, we would need a set of rules that describes how we are going to move from one state to the other state. So, these are called operators. So, here is the formal description of problem solving as state space search. We have to define the state space which is all the possible configurations. We then identify one or more states within the space as possible situations from which we can start. That is the initial state. One or more states from the problem space that is acceptable as solutions is referred to as the goal state. And then, a couple of rules that describe actions available. Now, I will just take 1 minute to again emphasize what we were talking of as operators. Operators have two interesting portions. One on the left and one on the right. Information must be true. Everything on the left must match, so that I can take that particular operation, so that we have information on the right to be true. Like, remember our 8-puzzle game, where we had this 9 tiles. And we had a rule which said like, an empty can go to the right. This rule is applicable in the above state. Because, let us say we have 5 here. The empty moves to the right. Then, I have a configuration that leads to other things remaining same, 5 coming here and empty here. Whereas, the same operators from this group of operators, if I take another 43 which say move to the left empty. This is not applicable in this state. Because, here I cannot move to the left. So, when I specify the set of rules that describes actions or operators, information what must be true for the action to take place needs to be looked at first. And then the action can take place and can generate new information as we have seen here. (Refer Slide Time: 23:07) So, let us look at playing 8-puzzle with problem solving as state space. So, here is our start state where I have 9 tiles and my empty at the bottom. And this is my goal state where I have arranged everything as 1, 2, 3, 4, 5, 6, 7, 8, with empty at the middle. Now, as I go on generating; and remember the 4 operators that we were talking of. Let us quickly recall our operators. So, we could have empty that moves to the right. We could have an empty that moves to the left. We could have an empty or vacant tile position. And that vacant tile position that moves up and a vacant tile position that moves down. Given the start state s and given the goal state g and the four operators that we have here; let us try to understand problem solving as state space search. (Refer Slide Time: 24:26) 44 So, at this point, all the 4 operators that I have highlighted in the previous slide, 3 positions are possible. The empty could go up; the empty could go left or the empty could go right. So, these are the 3 configurations that can be arrived at from the given set of; the given start position. I then look at these new configurations and look at my 4 operators that I had highlighted in the previous slide; and try to apply these operators here. So, I will take each of them 1 by 1. (Refer Slide Time: 25:06) Now, we generate new states from here. So, one could come to the center and the other position that I could generate is that 8 could come down. And here is the third position where 4 could come to the center. To just keep my discussion of these 8-puzzle problem shorter, I have taken a route that is taking me towards the goal; but that does in no way stops us from exploring this paths as well. 45 Just to highlight that we can arrive at this goal very quickly, given this. So, I have just looked at this path. So, then, next I generate the 2 positions. 1, this empty can go to the left and the empty can go to the right. So, I have 2 positions. I now highlight; I; and see what happens when operator applies on this position that I have marked here. (Refer Slide Time: 26:08) So, once I have that, I can have a location like 1 could only go up. If it goes to the left, I generate something which has already been generated. So, I do not do that. And, what I get after a couple of steps is my goal position. So, what I want to emphasize here is that, starting from a start position and which was part of a space of the problem state, which we have seen is a factorial 9, 9 factorial. We could go and look at different configurations on the way generating lot of them and arrive at the goal position g, from a start position s. So, this paradigm of working like this is problem solving as state space search. (Refer Slide Time: 27:19) 46 So, now we are in a position to define production systems that works in such a way. So, production system consists of 3 important things: a database. Here I would emphasize that one should not confuse this with database from databased systems. What we mean here is a unit that consists of rules and facts, okay; operations and control strategies. So, these are the building blocks for constructing lucid descriptions of AI systems. So, a production system consists of these 3 elements: database, operations and control components. (Refer Slide Time: 28:08) AI systems that we will talk of or most of the AI systems display a more or less rigid separation between the computational components of data, operations and control. Various generalizations of the computational formalism that I have highlighted now, production systems involving clean separation of these components. And they seem to captured a essence of operation of many AI systems. 47 (Refer Slide Time: 28:39) Selecting rules and keeping track of these sequence of rules already tried constitute what we call the control strategy for production systems. The operation of AI production systems can thus be characterized as a search process in which rules are tried until some sequence of them is found, where I have finally arrived at the database, the facts that satisfy the termination condition. (Refer Slide Time: 29:19) So, to solve a problem using a production system, we therefore must specify the global database, the rules, the control strategy. And these 3 things together form what is called the production system. Now getting to that stage or transforming a problem statement into these 3 components of a production system is the representation problem in AI. And once this is 48 successfully done, then problem solving is about just searching in that problem space for the given goal state given the initial state. (Refer Slide Time: 30:03) A production system consists of productions, that is the rules and the working memory of facts that is referred to as the database. And the control strategy is about an algorithm for producing new facts from old ones. Now, as we have seen in the 8-puzzle example, a rule becomes eligible to fire when the conditions match some of the elements currently in the working memory. And as its match its left side, the rule is fired and we have new facts, which is the facts that is generated on the right side of the rule. A control strategy determines which of the several eligible rules fires next. (Refer Slide Time: 30:54) 49 So, it is important for us to realize what is the difference between production system versus conventional computation. There are several differences between production systems and conventional computational systems. In a conventional computational system, it is not that rules or section of programs can all access all units of the database. Here, the global database can be accessed by all of the rules. No part of it is local to any of them in particular. Rules do not call other rules. Communication between rules occur only through the global database. More important, production system structure is modular. Changes to any of the components can be made independently. Using conventional computation in AI applications is difficult. For if I change knowledge space somehow my facts change. It would require extensive changes to the program. So, production systems capture the essence of programming in AI systems. So, here is the procedure for production. (Refer Slide Time: 32:11) 50 So, we have an initial database that is updated to data. And then, we keep on working with the operators until the termination condition is satisfied. So, we select some rule in the set of rules that can be applied to data. We apply that rule to generate new data and we keep on doing this rule. We keep on doing this until we have the termination condition satisfied, in which case we have actually arrived at the goal state. And then we say that the problem has been solved. (Refer Slide Time: 32:44) Let us now focus at the different control strategies. Selecting rules and keeping track of these sequence of rules already tried and the database they produce constitute what we call the control strategy for production systems. Operations of AI production systems can be therefore characterized as a search process in which rules are tried until some sequence of them is found that produce a database satisfying the termination condition. 51 (Refer Slide Time: 33:19) It is very important to understand the computational cost involved in AI production system. That comes to us from the understanding of how much amount of information or knowledge is being used about the problem. Now, an important characteristic for selecting rules is the amount of information or knowledge about the problem. At the uninformed extreme, when we do not know anything about the problem, the rule selection can be made completely arbitrary. It does not matter which rule I select, because I do not know anything about the problem at all. At the other extreme, when I know almost everything about the problem, I have the informed extreme. The control strategy is guided by the problem knowledge. And this guidance is great enough to select a correct rule every time. (Refer Slide Time: 34:16) 52 So, overall computational cost of an AI production system can be thought of to be 2 categories, two major categories of cost. One that comes from the rule application and the other that comes from the control strategy. The rule application cost in the uninformed control system (we have to try a large number of rules to find a solution) and therefore, we have a high rule application cost. Whereas, in terms of control strategy when we have uninformed control system, arbitrary rule selection need not depend on costly computation. Therefore, we have a very low control strategy cost. So, when we have uninformed control system, the rule application cost is very high, for we need to try large number of rules. Whereas, for the same uninformed control system, the control strategy cost is very low; for rule selection will not depend on costly computation. (Refer Slide Time: 35:24) Let us look what happens under informed control system. In an informed control system, the production system is directly guided to the solution. And therefore, we have very less rule application cost. Whereas, to inform about the problem domain, there is a huge cost in terms of storage and computation. And therefore, we have very high control strategy cost. So, rule application cost, when I am talking of informed control system is minimal. Whereas, control strategy cost, when I am talking of informed control system is high. (Refer Slide Time: 36:15) 53 So, here is a comparison of the rule application cost and the control strategy cost. This side, we have the computational cost. And here we have the informedness. This is where we suppose that we have complete information about the problem. So, as we have discussed, here is the rule application cost. As informedness increases, the rule application cost decreases. And here is the control strategy cost. As we increase informedness, the control strategy cost increases. So, here is the overall cost of AI production system. And it is important for us to realize that whenever we are thinking of building one, the amount of information that we really want to exploit is that keeps our overall cost to the minimum. So, too much of information in a production system, they also spoil the way the AI system is built. (Refer Slide Time: 37:22) 54 Overall computational cost of an AI production system is therefore the combined cost of the rule application cost and the control strategy cost. Part of the art of designing efficient AI programs is deciding how to balance the above 2 costs. An important aspect of AI system design therefore involves techniques that could allow the control strategy to use a large amount of problem information without incurring excessive control cost. (Refer Slide Time: 37:54) So, whenever we are talking of controls and control strategy, we have 2 distinct types of controls; one called irrevocable control and another tentative control strategy. And in under the tentative control strategy, we have backtracking and graph-search control. An irrevocable control is a control strategy where, if a rule is applied once it cannot be retraced. Whereas a tentative control strategy is about applying a rule with a provision to retrace the application of the rule later. Let us see what we mean by each of them. (Refer Slide Time: 38:31) 55 So, in irrevocable control strategy, applicable rule is applied without provision for reconsideration later. Whereas tentative, applicable rule is applied, but provision is made to return later to this point of computation, where we could apply some other rule. Under tentative control strategies we have backtracking which is about establishing a point to which the state of computation can revert back if required. And we have the general graph-search where provision is made for keeping track of effects of several sequence of rules simultaneously; like the one that I showed you for the 8-puzzle game. (Refer Slide Time: 39:24) The production system for solving the 8-puzzle; example that we have looked at today; worked form the initial state to a goal state. Such a production system is called a forward 56 production system. So, we had an initial state, we applied operators on the initial state to arrive that a set of states on which operators were applied again. We did this continuously until we arrived at the goal state. Such a production system is called a forward production system. A production system that worked by starting at the goal state and applying inverse moves so that I reach the subgoal. I apply inverse rules, reach the subgoal again. In this case, I am going from the goal to the start. Such a production system is called a backward production system. (Refer Slide Time: 40:25) So, in terms of forward rules and backward rules, we could have 2 distinct types of production systems. We need to have clear states and goals. In the first, the state description forms my global database and I have my forward-looking rules as the example of the 8- puzzle that I highlighted, which are referred to as the F-rules. I have the forward production system. On the other hand, if I have goal descriptions as my global database; so, I have the final goal. The goal just before the final goal, the penultimul goals are my subgoals. So, the whole database that I have, the facts that I generate, are actually goal descriptions somehow. And I have a group of backward rules so that I can start from the final goal, reach the subgoals on the way, finally arriving at the initial state. What I have is a backward production system. (Refer Slide Time: 41:36) 57 Now, let us focus our attention on 2 specialized production systems. 1, the commutative production system and the other called the decomposable production system. A commutative production system is one in which the order in which a set of applicable rules is applied to the database is not important. I could apply any rule, any other rule still is applicable to the database. A decomposable production system is interesting because it leads to problem reduction. Here, the initial database can be decomposed or split into separate components and each of them can be processed independently so that had arrived at the final goal. Let us focus our attention in the first one, the commutative production system. (Refer Slide Time: 42:31) 58 A production system is commutative if it has the following properties with respect to a database D. Number 1. Each member of a set of rules applicable to D is also applicable to any database produced by applying an applicable rule to D. So, let us look at this example that I have highlighting at the bottom of the slide. I have a starting state S0 on which a rule R1 is applicable. So, I have S0 on which I apply R1 and I arrived at S1. Now, R1 was a member of the set of rules applicable to this. And this is the database produced from this. It is interesting to note that R1 is still applicable to S1. I can still apply R1 to this state if the system that I am talking of is a commutative production system. (Refer Slide Time: 43:39) Number 2. If the goal condition of a commutative production system is satisfied by D, some database. If the goal condition is satisfied by some database D, then it is also satisfied by any database produced by applying any applicable rule to D. Because of the very fact that I can still apply to all the rules to that database that is produced out of this. So, this is going to be true. So, let us say SG satisfies my rule. But then, we know all of the rules that were applied up till now could also be applied here: R1 or R2 or R3. And I still with have SG possible. And therefore, the goal condition satisfied by D, then it is satisfied by any database produced by applying any rule applicable to D. Finally, the database results by application to D any sequence composed of rules that are applicable to D is invariant under permutation of the sequence. And that is very important. 59 Here, I move from S0 to SG. By applying let us say R1, then R3, then R2, I should be able to go not only in this sequence from S0 to SG. I should be also able to go from S0 to SG in another sequence where it could be like R3, then possibly R1 and then possibly R2. If this is possible, then the production system is said to be commutative. (Refer Slide Time: 45:31) So, here I have S0 to S1. I can still apply S1. Then I can apply some rule and reach S3. And then, I can still apply R3 onto S3. And application of R2 I can reach SG. Now, here I have arrived at from S0 to SG using R1, R3 and R2. But any other combination of these 3 could also take me to SG if the system is a commutative production system. So, I could arrive that S2 applying R3. From there I could still apply R1 and arrive at S3. And form S3 I could apply R2 and arrive at SG again, which is my goal state. So, given this start state and goal state, any rule that is applied in D is still applicable to the database resulting out of D. That is property 1. Any sequence that takes me to the goal; take R1, R3, R2; any permutation of that sequence R3, R1, R2 also takes me to the goal. That is what a commutative production system is. (Refer Slide Time: 46:46) 60 Commutative production systems are an important subclass and have the following properties. An irrevocable control region can always be used. Because, even if we have done something not onto the way to towards the goal, some other rule is still applicable unto that system. So, in a commutative system M; this is because the application of a rule never needs to be taken back or undone. So, an irrevocable regime can always be used in a commutative system. Number 2. I do not need a mechanism for applying alternate sequence of rules. Rule that is applicable to an earlier database is applicable to the current one as well. And therefore, I do not have to think of a mechanism of applying alternative sequence of rules. So, one needs to understand that in an commutative production system, I may have inappropriate rules. But the inappropriate rule will only delay but never prevents termination. And this is very very important for commutative production systems. (Refer Slide Time: 48:15) 61 Let us now look at the other form of production system that we have highlighted in the beginning; the decomposable production system. In order to understand decomposable production system, let us consider an initial database which is C, B and Z. And the production rules are based on the following rewriting rules. C could be written as D L; C could be written as B M; B could be written as M M; and Z could be written as B B M. I am looking for a termination condition when the database contains only M. (Refer Slide Time: 48:57) So, if I start with the complete C B Z, I have 3 reiterating rules for C, B and Z. So, my path would be 3 different distinct paths. I could write C or B or Z. And each of them could take different paths. And what is important for me to note that some path may lead to termination. But, some other path may not lead to termination for me. Instead of looking at C B Z as a complete single entity, the decomposable production system allows me to look at C B Z as 3 62 different units of C B and Z. This is because of the problems that I get when I look at it as 1 single unit as highlighted here. (Refer Slide Time: 49:59) So, when a graph search control explore many equivalent paths in producing a database containing only Ms. Redundant paths can lead to inefficiencies, because the control strategy might attempt to explore all of them and also explore paths that do not terminate. So, one way to avoid the exploration is to recognize that the initial database can be decomposed or split into separate components that can be processed independently. As I highlighted in the previous slide that C B Z can be looked at as 3 distinct elements of C, B and Z. So, this is what we do here. (Refer Slide Time: 50:44) 63 We break C B Z into C, B and Z and then apply the rewriting rules that we have highlighted initially. So, C can be written as D L or B M, B is written as M M and Z is written as B M M. D L can be changed as D to D and L, B M is B M M, M M is M M and B M M is again split as B M B. So, we had an initial C B Z. We split it up as C, B and Z. Applied the rewriting rules to get D L, B M, M M, B M M. And then again split this into D and L, B M into B and M, M M into M and M and B M M as B M B. Okay. So, then we start applying the rewriting rules and apply only to this B. And we get a string where all of these Ms are satisfied. Now, one thing to note here is very important property that we want to explore. If you look at here; at this point, all of C, B and Z needs to be brought to termination. All of these parts are important. But at this point, when I think of rewriting C as either D L and B M, I can rewrite only 1 way. So, one of them is important for me, not both. Whereas, again at the next lower level, both of them needs to be looked at. All of the 3 needs to be looked at here. But again, here when I am looking at, I look at only 1 alternate path. But here there are no alternate paths by 1 path. So, it became simpler here. (Refer Slide Time: 52:51) So, it is important that nodes that are labeled by component databases have sets of successors and each labeled by one of the components. These nodes are referred to as and nodes. They are called and nodes because, if you want to process the compound database to termination, all of the component databases must be processed to termination. So, this node here, in order to process C B Z to termination, C, B and Z needs to processed to termination. In order to 64 process B M to termination, both B and M needs to be processed to termination; so on and so forth. (Refer Slide Time: 53:33) Another important thing that one needs to realize is that the successor nodes labeled by the result of rule application; like here, when I have D and L or I have B and M as I applied a rule onto C. They are referred to as the or nodes. Because, in order to process a component database to termination, the database resulting from only either this or this must be processed to termination. These are referred to as or nodes. (Refer Slide Time: 54:11) So, structure called and-or graphs are useful for depicting the activity of production systems. So, these are the and nodes, these are the or nodes. (Refer Slide Time: 54:25) 65 So, decomposable production systems if you see, these nodes that I have marked now as AND here are important because if a component database is; all of the component databases need to be moved to termination. So, that is the and node here. That is the and node. This is the and node. Both of them needs to be moved to termination. (Refer Slide Time: 54:53) So, finally we can say that the notion of decomposable production systems encompass a technique which is often called problem reduction in AI. We have reduced the problem of taking C B Z to termination. To the problem of taking C, B and Z individually to termination. This is something like the things that we have been doing in high school of doing a breaking up a bigger integration to do it by integration by parts. 66 So, problem reduction idea usually involves replacing a problem goal by a set of subgoals. Such that if the subgoals are solved the main goal is also solved. Explaining problems in terms of decomposable production system allows us to be indefinite about whether we are decomposing problem goals or problem states. (Refer Slide Time: 55:47) To finish our lecture today, we would love to highlight that the efficient AI systems require knowledge of the problem domain. These knowledge can be divided into 3 broad categories: a. declarative knowledge, b. procedural knowledge and c. control knowledge. Declarative knowledge is knowledge about a problem that is represented in the global database. Declarative knowledge includes specific facts. Procedural knowledge is about a problem that is represented in rules; general information that allows us to manipulate the declarative knowledge. Control knowledge includes knowledge about a variety of processes, strategies and structures used to coordinate the entire problem solving process. And this is what we will look at in the next lecture of this module on graph search techniques; to look at what are the problem-solving processes. Thank you very much. 67 Fundamentals of Artificial Intelligence Prof. Shyamanta M Hazarika Department of Mechanical Engineering Indian Institute of Technology - Guwahati Module - 1 Lecture - 3 Uniformed Search Welcome to fundamentals of artificial intelligence. Today we are going to look at uninformed search. Usually, in most of the AI techniques that we will explore, we will later look at using domain information or what is more commonly called heuristic knowledge which would be under the category of informed search. However, as an introduction to search in graphs, we would love to look at uninformed search first. So, what we will cover today is: (Refer Slide Time: 01:06) We will very quickly review problem solving as search; look at what we meant by state spaces. We would then formally introduce graph searching and introduce a generic searching algorithm. Thereafter, we would look at a couple of uninformed search strategies. First, the breadth-first search, thereafter the depth-first search, iterative deepening, uniform-cost search; and then we will look at something called bidirectional search. 68 (Refer Slide Time: 01:40) Let us look at what we meant by problem solving as search. We define a state space that contains all the possible configurations. 1 or 2 configurations I have shown you. We would love to define all possible configurations of the relevant objects. We would then look at the initial state. That is, specify 1 or more states within that state space as situations from which we start. And then, we define something called the goal state; 1 or more states that would be acceptable as solutions to the problem. And of course, we need to specify the operators; a set of rules that describes the actions or operations available. (Refer Slide Time: 02:26) The state space is what is vital for formulating a problem solving domain as state space search. The state space literally consists of all possible configurations. This is also referred to as the problem space, for the 8-puzzle game that I have shown you little while ago. Each tile 69 configuration is a problem state. The 8-puzzle, we should note has a relatively small space. Okay. There could be problems which would have huge state space. And we would look at how to search it in order to get to a solution. (Refer Slide Time: 03:10) So, let us look at a problem solving as state space search. And this is the start node which I wanted all of you to take note of. This is as with 9 tiles here. One of them is blank and you move the blank around to arrive at this goal position. Now, in the last class, we have looked at how these expansions can take place given the 4 operations. (Refer Slide Time: 03:38) We will just look at the complete tree of solution. So, given this state, you could have 3 possibilities of moving the blank. So, the blank is here. You could think of moving the empty to the left. It could come here at this point. Or you could think of moving it up to the center or 70 you could move it to the right. So, given the start S, this sets, these configurations of tiles are the, its successor nodes, successors. And we can keep on generating the successors. Like, from here, we could get at more successors by moving the blank to a newer configuration. Like the blank would go up and one could come down. So, we could have 2, blank, 1, 8, 6, 7 and 3, 4, 5. And we could have another one and which will be like 2, 8, 3, 1, 6, 4; 7 could go this side. And I would have blank here. And I would have 5. So, we could generate. I had all the successors from 1 node. But one thing to note here is that this point here, this node here is a repetition of what I started with here or as my start. So, I would avoid it getting expansions of these type of successors. As having said that, I would love to get all the successor nodes listed. And then, the idea would be to look for a path that takes me from, through this a space of successors as to the configuration that I was looking for. And therefore, this is line of expansions would be the solution to the problem that I was looking for. Now, if you look at this very closely, we can have a simile here. This can be thought of as a graph. (Refer Slide Time: 05:48) And basically, what were we looking for is given a node, start node of a graph and a goal node g. I am looking for some path that takes me from s to g. So, problem solving as state 71 space search can be extracted to the mathematical problem of finding a path from the start node to the goal node or in a directed graph. (Refer Slide Time: 06:22) So, let us take a look at what we mean by graphs and directed graphs before we proceed further into our discussion of problem solving as state space graph. You must have looked at graphs before, but for completeness, let me put the definition here. A graph G comprises a set V of vertices and a set E of edges. The vertices as can be anything but is most commonly a collection of letters or numbers. That’s the representation. And the set of edges is a set of doubleton subsets of V. That is, E is as a group of a subset of a, b; where a, b belongs to the vertices and a is not equal to b. U sually, the graph is denoted as a 2 tuple G V E. So, if a graph is given and I know its edges, then we say vertices a and b are adjacent. And if there is an edge a b that joins them in the graph. And we call a and b themselves as the end points of the edge. 2 edges that share a vertex; such as, let us say I have a graph here and have a here and b here. And that’s, that’s 1 edge. I have another edge a and c here. That’s another edge. So, if I have a b and a c, here they are sharing a common node a. Then they are said to be adjacent to each other. 72 (Refer Slide Time: 08:03) So, I am more interested here in finding out the path in a graph. Like if you recall we had this 8-puzzle tile configurations as where I explored and really got this sort of structure there, as I keep on expanding my nodes at every level. Given the 4 operators of the 8-puzzle game to me and from a given s as I could arrive at a given g. I am more interested in finding out how did I come from s to g or get to know what is a path. So, the notion of a path in a graph is intuitively very clear. But, it is very difficult to pin down it formally. So, suppose I have a graph which is a set of vertices and edges and I have k vertices. So, 0, 1, 2, 3 up to k, k + 1; not necessarily this thing. And I have edges as e 1, e 2 so and so forth up to e to the power sub k; not necessarily this thing. Now, each edge AI is a group of vertices. The alternating sequence that I get of the vertices and edges, starting from the vertice v sub 0 to v sub k is a path from v sub 0 to v sub k of length k. The length is the number of edges and it is not the number of vertices. So, for this path that I have highlighted here, the length of the path is 1, 2, 3 and 4. So, here is a 4 length path. And what I wanted to highlight was that this path is made up of edges e 1, e 2, e 3, e 4. So, alternatively it had vertices. Let us call this vertice as v a, v b, v c. So, the path from the start to the goal for the given graph would be something like s, e sub 1, v sub a, e sub 2, v sub b, e sub 3, v sub c, e sub 4 and g. So that is the path that is as in the given graph. 73 (Refer Slide Time: 10:43) What we are more interested when we were looking at graph search for solutions to given problems is what are called directed graphs or digraphs for short. Here, again it has a set of vertices as V. But the set of edges is actually directed edges. So, what are directed edges? Directed edges are ordered pair of elements of V. Put another way, if I have a graph of 2 tuple of V and E; E is a subset of V cross V. It is a digraph if these pair which I get are ordered pairs. So, ordering of pairs in E to give each as a direction, namely the edge a b goes from a to b. (Refer Slide Time: 11:32) So, here is an example of a digraph. I have a set of vertices, a, b, c, d and pair of edges a b, a c, b a, c c, d c. So, if these pairs are directed and arcs rather than just being edges as in the general graph that I was talking of, then what we have is a digraph. 74 (Refer Slide Time: 11:57) A tree is a connected graph with no cycle. So here, if you see, if you go back and see the directed graph, here we have a arc that starts at c and comes back to c again. Or, if you could see here, we go from a to b and then there is a path from b to a. Such a thing is called a cycle. So, in a tree, we have a graph which is connected, but they do not have cycles. And this something that we will explore in the remainder of this lecture. (Refer Slide Time: 12:33) So, graph searching for problem solving as state space search is to find a sequence of actions to achieve a goal as searching for paths in a directed graph. To solve the problem, we define the underlying search space and then apply a search algorithm. Searching in graphs therefore provides an appropriate abstract model of problem solving, independent of the particular 75 domain. And the type of graph we are searching here; one needs to remember is usually a directed graph. (Refer Slide Time: 13:11) So, we are in now position to formalize search in state space. So, let us define a state space. A state space for us is actually a graph of a vertices and edges where V is a set of nodes and E is the set of arcs. Here when I say nodes, the node maybe a simple node or it could be a data structure that contains a state description. It could contain other information such as parent of the node, operator that generated that node from that parent and other bookkeeping data. And each arch corresponds to an instance of one of the operators. So basically, whenever we are talking of formalizing search in a state space is we are looking at having whole of the state space converted into nodes with each operator giving me some instances that takes me to the other nodes. So, finally, we end up having such a directed graph. And searching in this directed graph for a path that brings me from s to g is what I mean by getting to the solution. 76 (Refer Slide Time: 14:33) For each arc in such a directed graph has a fixed positive cost associated with it, corresponding to the cost of the operator. And each node has a set of successor nodes corresponding to all of the legal operators as that can be applied to the applied to the source node’s state. The process of expanding a node means to generate all the successor nodes and add them to their associated arcs. 1 or more nodes are designated as the start node. And then there is a goal test predicate and which is applied to a state to determine if its associated node is a goal node. (Refer Slide Time: 15:17) A solution in such a formal framework is a sequence of operators that is associated with a path in a state space from a start node to a goal node. And the cost of each solution that I am looking for is the sum of the arc costs on the solution path. So, if all the arcs have the same 77 unit cost, then the solution cost is just the length of the solution or the number of steps or the state transitions that is involved. (Refer Slide Time: 15:51) So, state space search under such a scenario is the process of searching through a state space for a solution, by making explicit a sufficient portion of an implicit state space graph. Now, one needs to remember that we do not generate the complete graph when we are looking for a solution under this idea of problem solving. We keep on generating the graph as we keep on exploring for the solution. So, it is the process by which you search to a space, state space by making explicit only a sufficient portion of it; of an implicit state space graph to find a goal node. For large state spaces, it is important to realize that it is not practical to represent the whole space. So, you initially start with just the start node. And then expand the start node S. Its successors are generated and those nodes are added to the set V and the associated arcs are added to the set E. This process continues until a goal node is found. So, instead of having the whole state space graph as it is explicitly generated and then looking for a solution, the idea is to make explicit a sufficient portion of this as I go on expanding for nodes. And every node that I expand answering a question whether that is my goal node; if not, getting its successor nodes and keeping the process containing. 78 (Refer Slide Time: 17:50) So, state space search is when each node implicitly or explicitly represents a partial solution path. In general, from this node, there are many possible paths that have its partial path as a prefix. So, basically what it means is that when I am doing such a state space as exploration in a directed graph I have come up to this point. And then I keep on expanding beyond. But one needs to remember that whatever I have expanded up to this point, this still forms a part of the path. So, it is still, the partial path is still there as a prefix for me. Beyond this, there would be many alternate paths, but this one up to here; if somewhere the goal lies below this node, then up to here, this is the partial path that is included already. (Refer Slide Time: 18:49) 79 So, how do we evaluate such start search strategies? It is important to realize that we have many such searching techniques that we will discuss as which are uninformed. But, how do we evaluate such search strategies? There are 4 this thing matrixes. 1. We talk of completeness. Completeness is a guarantee of finding a solution whenever one exists. So, if a solution exists and if your searching technique can find that solution and the technique is said to be complete. Next, we talk of time complexity. That is how long does it take to find a solution. This is usually measured in terms of the number of nodes that the searching technique expands. Next, we talk of space complexity. How much space is used by the algorithm. This is also usually measured in terms of the maximum size of the nodes list during the search. And then, there is a question of whether this search technique is optimal or admissible. If a solution is found and is it guaranteed to be an optimal one? And that is, is it the one with the minimum cost? If that guarantees there for the given searching technique, then it is said to be optimal. We will now look at these searching techniques and then look at the evaluation of these techniques along these 4 matrices. (Refer Slide Time: 20:33) So, what are the important parameters that one needs to really look at before we go into looking at the search techniques and evaluation of them is as these 3 interesting things that needs to be remembered. We talk of a number of successor of any state; the maximum number. That is called the branching factor. Then we talk of the length of a path in the state space as the minimal length which is the depth of the shallowest goal that is d. 80 So, we need to understand and something about d and something about b. So, if I have a node that is expanded every time into 2 successors; so, here my branching factor is 2. If my goal lies here, that is the depth of the minimum length of the path to the shallowest goal, then my d is 3. And I could have a maximum depth at which goes on it could be infinite. And then, maximum depth of the search state space is m. (Refer Slide Time: 21:42) So, given this parameters to me and given the idea that we could search a directed graph to arrive at from a given and state to a given goal. We look at a graph search procedure. So, even before I write, there got in for the graph search procedure. I would like to highlight that what we literally do is we have the start node s and we have its successors. Let us call this successors a and b. So, we create 2 lists. One called the open and other called the closed. And we first put s in the point where it needs to be expanded. So, we create a list called open and we put s in the open list. So, we put s here, consisting solely of the start node. And then we pick up s for expansion. So, we pickup s. And what we want to really check now is that, is s as the goal node itself? If not, we generates its successors. So, if s is the goal node, we have nothing to do. We must come out and exit. That is what is a check here by saying that I create a list, a is called closed that is initially empty. And then, if there is nothing in open, I have to exit with failure. But, otherwise I select the first node on open and remove it on from open. I take it away from open and put that node in close. Now, why do I do that is a way to remember that s has been expanded. 81 So then, after that, what I do is I generate the successors of s. So, the successors of s here are a and b. So, I take a and b and put it in open. And next, when I look for an expansion, I take the node from open. So, a comes to me. Now, one needs to remember that how I add a b onto the open list will depend on what criteria do I really follow in order to really arrange this is. Later on, we will see that if I am doing a breadth-first search, I would see that everything is added to the end of the list. If I am doing a depth-first search, I would add it to the beginning of the list. Okay. So basically, you create a list called closed that is initially empty and you take the first node and open. You remove it from open and put it on close. And then, if n is goal node, we have nothing to do, we exit immediately. Else, what we do is the following. (Refer Slide Time: 24:45) We expand the node n generating the set M of its successors. So, we install them as successors of M in the open list. So, add this members of M to the open list. And then, expand them further. Now, if each member of M is already in G, then we can decide whether we want to or redirect its pointers. Or, if each member is already in close, that means one of the successor is already been expanded. Then we can redirect its descendants, whatever are there to that pointer. Else we will ignore that node to be expanded further. So, how do we reorder the list open? And is very, very important to understand and the type of technique that we will have in the graph search. So, they are either according to some arbitrary scheme if it is uninformed search. And according to some heuristic merit, if it is a informed search. 82 Today, our concentration would be only on looking at arbitrary schemes of arrangement and of the successor nodes in the open list. So, we look at uninformed search today. (Refer Slide Time: 26:04) And it is important that we really understand what we mean by uninformed here. Uninformed search is also called blind search because of the very fact that it does not use any information about the likely direction of the goal nodes. We do not have any idea of the problem domain. So, informed search on the other hand are also more popularly called heuristic search are informed search techniques in the sense that they use information about the domain and to try usually head in the general direction of the goal. Informed search methods and uninformed search methods are distinctly different in only either use of no information or use of information about the domain. Few of the uninformed search methods are breadth-first search, depth-first search, depth-limited search, uniform- cost, depth-first iterative deepening and bidirectional search. We will look at few of them today. In informed search we have hill climbing, best-first, greedy search, beam search, A and A* which we will look at in the next lecture. 83 (Refer Slide Time: 27:26) So, uninformed search strategies; we do not have information from the problem domain. And that is how we order the nodes in open using some arbitrary scheme. Today we will look at this 5 uninformed search techniques: breadth-first, depth-first, iterative deepening, uniform- cost search and bidirectional search. So, what is breadth-first search. Let us look at it in more closer terms by using a real graph here. (Refer Slide Time: 27:57) So, I have a graph with node a, b, c, d, e, f, g. So, what I expand is the shallowest unexpanded node. So, the fringe, if you can see by now is actually a first-in-first-out queue. So, new successors all go to the end. So, if you recall, we said we will create 2 lists when we are doing graph search; 1, an open list and another a closed list and we will keep the start node in open. As we expand, we take A to the closed list and generate the successors of A. 84 So, the successors of A that I get, I have B and C. Then, the next time I take B for expansion. Take B here and keep it in close. And when I get the successors of B; so, the successors of B equal C are D and E. So, those successors will go to the end of the list. So, D and E will come here. So, that the next node that I take for expansion is C rather than end D. So, this is how the breadth-first search works. I expand the slowest unexpanded node. The fringe is a first-in-first-out queue. That is, the new successors go at the end. So, first I check if A is a goal state. (Refer Slide Time: 29:23) If A is not a goal state, I get its successors which is B and C. So, the fringe for me is B C now. So, next I check if B is a goal node. 85 (Refer Slide Time: 29:34) If not, I get its expansion which is D E. But then, the next node to be expanded is C and not D. That that is something that one needs to really understand here, when I am doing a breadth-first search. (Refer Slide Time: 29:45) And then, next node to be expanded becomes D; and so on and so forth. So, a few questions that I want to answer about the properties of the breadth-first search. 86 (Refer Slide Time: 29:53) The first question is: Is breadth-first search complete? We can see that the breadth-first search always guarantees that it will reach the goal if the branching is finite. So, if I have a state space search to be done starting at a given s to some given g and I know that this is the state space within which I have all the configurations for those states. And then, if branching is finite for this search; so, at some point, somewhere, I will definitely arrive at g. Why would that happen intuitively is because I would be expanding everything at every level. And that will guarantee that no state is missed for a finite B. And therefore, breadth- first search is complete. The number of nodes that we will generate is at this level, it is 1. And if I have b branching at this it is b, at this level it will become b square, then b cube. So, I would have an order of b to the power d. This is the number of nodes we will generate. What about space complexity? Because of the very fact that the breadth-first search generates every node at every level, it has to keep all the nodes in it, that it had expanded. In case it keeps every node in the memory, either in fringe or on a path to the fringe. And therefore, the order is of b to the power d. Whether breadth-first search is optimal; now this depends if we guarantee the deeper solutions are less optimal, that is if step-cost is 1. If we assume that every step it just costs 1 and there is no other cost involved. So, then, it would mean that, if I have some solution which is deeper down there and if I have a solution which is somewhere at a shallower node, I will get the shallower one first. And therefore, every time I am guaranteed to get the minimum cost solution. So, it is optimum. 87 (Refer Slide Time: 32:15) Next, we will look at the depth-first search. So, you expand the deepest unexpanded node. That is what the clue is for depth-first search. And if you have realized by this time, I am, the fringe that I am talking of is last-in-first-out queue. That is, you put successors at the front and those are what is expanded. So, you first check if A is a goal node. (Refer Slide Time: 32:40) Or, if not, you get the successors which are B and C and check if B is goal. If not, you expand that and you have D E. But the point here that I want to make is D E would be put in the front of the open list rather than at the end of the list in open. 88 (Refer Slide Time: 32:58) So, I have D. I checked if D is a goal state. And then, I get its successor H I. (Refer Slide Time: 33:08) So, I put its successors H I in the front. So, here is the queue. If you go back 1 slide there, I had D which I expanded and D gave me 2 successors which was H and I. H and I would come to the front of the list rather than to the back of the list. So, I check and continue the search. 89 (Refer Slide Time: 33:25) Go on doing I; (Refer Slide Time: 33:27) 90 91 And then go on doing everything till I expand at every level. 92 (Refer Slide Time: 33:32) So, the questions of properties of depth-first search: Is depth-first search complete? You could see from these explanations that at depth-first search is not complete. Because, what can happen is that, if I have infinite depth space and I could be stuck in an allay, I could go on expanding down there a line and nowhere on this line a solution would exist. So, first important realization depth-first search is not complete. What about the time complexity of depth-first search? If I have m which is the maximum depth and b being the branching factor, then the number of nodes that I need to do is b to the power m. And terrible if m is much larger than b. But if solutions and dense, may be much faster than breadth-first search. But, if I have a huge m, then it would be terrible time complexity. In terms of space, is definitely, we only need to remember a single path of unexpected and unexplored nodes. So, the time complexity is b m which is linear. And it may find a non-optimal goal first. But, in terms of optimality, it could be possible intuitively that there are goals here somewhere at lower levels, which will definitely be missed by any depth-first search technique. So, it is not an optimal search technique. 93 (Refer Slide Time: 35:08) Next, we come to a modification of depth-first search which is depth-limited search. To avoid the infinite depth problem that I was showing you in the previous slide of depth-first search, we can decide to only search until a depth L; that is, we do not expand beyond depth L. But then, what will happen is that, if we keep that limit L, we may have a goal which lies just little well beyond L and we will never be able to find this. So, a better idea would be that, what if I keep on changing L and interactively iteratively keep on increasing L. So, this is the idea of the next search technique that we will discuss called the iterative deepening search. (Refer Slide Time: 36:09) So, iterative deepening search, I looked at level 0 first. 94 (Refer Slide Time: 36:11) Then I look at level 1 for the goal. (Refer Slide Time: 36:13) Then level 2. 95 (Refer Slide Time: 36:15) Level 3, so on and so forth. (Refer Slide Time: 36:17) So, the idea of iterative deepening is, let me repeat it. So, I start with it level 1, then I go to level 2, level 3, level 4. So, the idea is that I take this level. I did not find things here. So, I thought of doing a little bit more deeper level. I did not find things there. I may think of going even down. I did not find things there. I may think of going even