Making Simple Decisions PDF

16 MAKING SIMPLE DECISIONS In which we see how an agent should make decisions so that it gets what it wants— on average, at least. In this chapter, we fill in the details of how utility theory combines with probability theory to yield a decision-theoretic agent—an agent that can make rational decisions based on what it believes and what it wants. Such an agent can make decisions in contexts in which uncertainty and conflicting goals leave a logical agent with no way to decide: a goal-based agent has a binary distinction between good (goal) and bad (non-goal) states, while a decision-theoretic agent has a continuous measure of outcome quality. Section 16.1 introduces the basic principle of decision theory: the maximization of expected utility. Section 16.2 shows that the behavior of any rational agent can be captured by supposing a utility function that is being maximized. Section 16.3 discusses the nature of utility functions in more detail, and in particular their relation to individual quantities such as money. Section 16.4 shows how to handle utility functions that depend on several quantities. In Section 16.5, we describe the implementation of decision-making systems. In particular, we introduce a formalism called a decision network (also known as an influence diagram) that extends Bayesian networks by incorporating actions and utilities. The remainder of the chapter discusses issues that arise in applications of decision theory to expert systems. 16.1 C OMBINING B ELIEFS AND D ESIRES UNDER U NCERTAINTY Decision theory, in its simplest form, deals with choosing among actions based on the desir- ability of their immediate outcomes; that is, the environment is assumed to be episodic in the sense defined on page 43. (This assumption is relaxed in Chapter 17.) In Chapter 3 we used the notation R ESULT (s0 , a) for the state that is the deterministic outcome of taking action a in state s0. In this chapter we deal with nondeterministic partially observable environments. Since the agent may not know the current state, we omit it and define R ESULT (a) as a random variable whose values are the possible outcome states. The probability of outcome s , given evidence observations e, is written P (R ESULT (a) = s | a, e) , 610 Section 16.2. The Basis of Utility Theory 611 where the a on the right-hand side of the conditioning bar stands for the event that action a is executed.1 UTILITY FUNCTION The agent’s preferences are captured by a utility function, U (s), which assigns a single EXPECTED UTILITY number to express the desirability of a state. The expected utility of an action given the evi- dence, EU (a|e), is just the average utility value of the outcomes, weighted by the probability that the outcome occurs: EU (a|e) = P (R ESULT (a) = s | a, e) U (s ). (16.1) s MAXIMUM EXPECTED UTILITY The principle of maximum expected utility (MEU) says that a rational agent should choose the action that maximizes the agent’s expected utility: action = argmax EU (a|e) a In a sense, the MEU principle could be seen as defining all of AI. All an intelligent agent has to do is calculate the various quantities, maximize utility over its actions, and away it goes. But this does not mean that the AI problem is solved by the definition! The MEU principle formalizes the general notion that the agent should “do the right thing,” but goes only a small distance toward a full operationalization of that advice. Es- timating the state of the world requires perception, learning, knowledge representation, and inference. Computing P (R ESULT (a) | a, e) requires a complete causal model of the world and, as we saw in Chapter 14, NP-hard inference in (very large) Bayesian networks. Comput- ing the outcome utilities U (s ) often requires searching or planning, because an agent may not know how good a state is until it knows where it can get to from that state. So, decision theory is not a panacea that solves the AI problem—but it does provide a useful framework. The MEU principle has a clear relation to the idea of performance measures introduced in Chapter 2. The basic idea is simple. Consider the environments that could lead to an agent having a given percept history, and consider the different agents that we could design. If an agent acts so as to maximize a utility function that correctly reflects the performance measure, then the agent will achieve the highest possible performance score (averaged over all the possible environments). This is the central justification for the MEU principle itself. While the claim may seem tautological, it does in fact embody a very important transition from a global, external criterion of rationality—the performance measure over environment histories—to a local, internal criterion involving the maximization of a utility function applied to the next state. 16.2 T HE BASIS OF U TILITY T HEORY Intuitively, the principle of Maximum Expected Utility (MEU) seems like a reasonable way to make decisions, but it is by no means obvious that it is the only rational way. After all, why should maximizing the average utility be so special? What’s wrong with an agent that 1 P the current state S0 implicit, but we could make it explicit by writing Classical decision theory leaves P (R ESULT(a) = s | a, e) = s P (R ESULT(s, a) = s | a)P (S0 = s | e). 612 Chapter 16. Making Simple Decisions maximizes the weighted sum of the cubes of the possible utilities, or tries to minimize the worst possible loss? Could an agent act rationally just by expressing preferences between states, without giving them numeric values? Finally, why should a utility function with the required properties exist at all? We shall see. 16.2.1 Constraints on rational preferences These questions can be answered by writing down some constraints on the preferences that a rational agent should have and then showing that the MEU principle can be derived from the constraints. We use the following notation to describe an agent’s preferences: A'B the agent prefers A over B. A∼B the agent is indifferent between A and B. A' ∼B the agent prefers A over B or is indifferent between them. Now the obvious question is, what sorts of things are A and B? They could be states of the world, but more often than not there is uncertainty about what is really being offered. For example, an airline passenger who is offered “the pasta dish or the chicken” does not know what lurks beneath the tinfoil cover.2 The pasta could be delicious or congealed, the chicken juicy or overcooked beyond recognition. We can think of the set of outcomes for each action LOTTERY as a lottery—think of each action as a ticket. A lottery L with possible outcomes S1 ,... , Sn that occur with probabilities p1 ,... , pn is written L = [p1 , S1 ; p2 , S2 ;... pn , Sn ]. In general, each outcome Si of a lottery can be either an atomic state or another lottery. The primary issue for utility theory is to understand how preferences between complex lotteries are related to preferences between the underlying states in those lotteries. To address this issue we list six constraints that we require any reasonable preference relation to obey: ORDERABILITY Orderability: Given any two lotteries, a rational agent must either prefer one to the other or else rate the two as equally preferable. That is, the agent cannot avoid deciding. As we said on page 490, refusing to bet is like refusing to allow time to pass. Exactly one of (A ' B), (B ' A), or (A ∼ B) holds. TRANSITIVITY Transitivity: Given any three lotteries, if an agent prefers A to B and prefers B to C, then the agent must prefer A to C. (A ' B) ∧ (B ' C) ⇒ (A ' C). CONTINUITY Continuity: If some lottery B is between A and C in preference, then there is some probability p for which the rational agent will be indifferent between getting B for sure and the lottery that yields A with probability p and C with probability 1 − p. A ' B ' C ⇒ ∃ p [p, A; 1 − p, C] ∼ B. SUBSTITUTABILITY Substitutability: If an agent is indifferent between two lotteries A and B, then the agent is indifferent between two more complex lotteries that are the same except that B 2 We apologize to readers whose local airlines no longer offer food on long flights. Section 16.2. The Basis of Utility Theory 613 is substituted for A in one of them. This holds regardless of the probabilities and the other outcome(s) in the lotteries. A ∼ B ⇒ [p, A; 1 − p, C] ∼ [p, B; 1 − p, C]. This also holds if we substitute ' for ∼ in this axiom. MONOTONICITY Monotonicity: Suppose two lotteries have the same two possible outcomes, A and B. If an agent prefers A to B, then the agent must prefer the lottery that has a higher probability for A (and vice versa). A ' B ⇒ (p > q ⇔ [p, A; 1 − p, B] ' [q, A; 1 − q, B]). DECOMPOSABILITY Decomposability: Compound lotteries can be reduced to simpler ones using the laws of probability. This has been called the “no fun in gambling” rule because it says that two consecutive lotteries can be compressed into a single equivalent lottery, as shown in Figure 16.1(b).3 [p, A; 1 − p, [q, B; 1 − q, C]] ∼ [p, A; (1 − p)q, B; (1 − p)(1 − q), C]. These constraints are known as the axioms of utility theory. Each axiom can be motivated by showing that an agent that violates it will exhibit patently irrational behavior in some situations. For example, we can motivate transitivity by making an agent with nontransitive preferences give us all its money. Suppose that the agent has the nontransitive preferences A ' B ' C ' A, where A, B, and C are goods that can be freely exchanged. If the agent currently has A, then we could offer to trade C for A plus one cent. The agent prefers C, and so would be willing to make this trade. We could then offer to trade B for C, extracting another cent, and finally trade A for B. This brings us back where we started from, except that the agent has given us three cents (Figure 16.1(a)). We can keep going around the cycle until the agent has no money at all. Clearly, the agent has acted irrationally in this case. 16.2.2 Preferences lead to utility Notice that the axioms of utility theory are really axioms about preferences—they say nothing about a utility function. But in fact from the axioms of utility we can derive the following consequences (for the proof, see von Neumann and Morgenstern, 1944): Existence of Utility Function: If an agent’s preferences obey the axioms of utility, then there exists a function U such that U (A) > U (B) if and only if A is preferred to B, and U (A) = U (B) if and only if the agent is indifferent between A and B. U (A) > U (B) ⇔ A ' B U (A) = U (B) ⇔ A ∼ B Expected Utility of a Lottery: The utility of a lottery is the sum of the probability of each outcome times the utility of that outcome. U ([p1 , S1 ;... ; pn , Sn ]) = pi U (Si ). i 3 We can account for the enjoyment of gambling by encoding gambling events into the state description; for example, “Have $10 and gambled” could be preferred to “Have $10 and didn’t gamble.” 614 Chapter 16. Making Simple Decisions A p A q B 1¢ 1¢ (1–p) (1–q) C is equivalent to p A B C (1–p)q B 1¢ (1–p)(1–q) C (a) (b) Figure 16.1 (a) A cycle of exchanges showing that the nontransitive preferences A ' B ' C ' A result in irrational behavior. (b) The decomposability axiom. In other words, once the probabilities and utilities of the possible outcome states are specified, the utility of a compound lottery involving those states is completely determined. Because the outcome of a nondeterministic action is a lottery, it follows that an agent can act rationally— that is, consistently with its preferences—only by choosing an action that maximizes expected utility according to Equation (16.1). The preceding theorems establish that a utility function exists for any rational agent, but they do not establish that it is unique. It is easy to see, in fact, that an agent’s behavior would not change if its utility function U (S) were transformed according to U (S) = aU (S) + b , (16.2) where a and b are constants and a > 0; an affine transformation.4 This fact was noted in Chapter 5 for two-player games of chance; here, we see that it is completely general. As in game-playing, in a deterministic environment an agent just needs a preference VALUE FUNCTION ranking on states—the numbers don’t matter. This is called a value function or ordinal ORDINAL UTILITY FUNCTION utility function. It is important to remember that the existence of a utility function that describes an agent’s preference behavior does not necessarily mean that the agent is explicitly maximizing that utility function in its own deliberations. As we showed in Chapter 2, rational behavior can be generated in any number of ways. By observing a rational agent’s preferences, however, an observer can construct the utility function that represents what the agent is actually trying to achieve (even if the agent doesn’t know it). 4 In this sense, utilities resemble temperatures: a temperature in Fahrenheit is 1.8 times the Celsius temperature plus 32. You get the same results in either measurement system. Section 16.3. Utility Functions 615 16.3 U TILITY F UNCTIONS Utility is a function that maps from lotteries to real numbers. We know there are some axioms on utilities that all rational agents must obey. Is that all we can say about utility functions? Strictly speaking, that is it: an agent can have any preferences it likes. For example, an agent might prefer to have a prime number of dollars in its bank account; in which case, if it had $16 it would give away $3. This might be unusual, but we can’t call it irrational. An agent might prefer a dented 1973 Ford Pinto to a shiny new Mercedes. Preferences can also interact: for example, the agent might prefer prime numbers of dollars only when it owns the Pinto, but when it owns the Mercedes, it might prefer more dollars to fewer. Fortunately, the preferences of real agents are usually more systematic, and thus easier to deal with. 16.3.1 Utility assessment and utility scales If we want to build a decision-theoretic system that helps the agent make decisions or acts on his or her behalf, we must first work out what the agent’s utility function is. This process, PREFERENCE ELICITATION often called preference elicitation, involves presenting choices to the agent and using the observed preferences to pin down the underlying utility function. Equation (16.2) says that there is no absolute scale for utilities, but it is helpful, nonethe- less, to establish some scale on which utilities can be recorded and compared for any particu- lar problem. A scale can be established by fixing the utilities of any two particular outcomes, just as we fix a temperature scale by fixing the freezing point and boiling point of water. Typically, we fix the utility of a “best possible prize” at U (S) = u and a “worst possible NORMALIZED UTILITIES catastrophe” at U (S) = u⊥. Normalized utilities use a scale with u⊥ = 0 and u = 1. Given a utility scale between u and u⊥ , we can assess the utility of any particular STANDARD LOTTERY prize S by asking the agent to choose between S and a standard lottery [p, u ; (1 − p), u⊥ ]. The probability p is adjusted until the agent is indifferent between S and the standard lottery. Assuming normalized utilities, the utility of S is given by p. Once this is done for each prize, the utilities for all lotteries involving those prizes are determined. In medical, transportation, and environmental decision problems, among others, peo- ple’s lives are at stake. In such cases, u⊥ is the value assigned to immediate death (or perhaps many deaths). Although nobody feels comfortable with putting a value on human life, it is a fact that tradeoffs are made all the time. Aircraft are given a complete overhaul at intervals determined by trips and miles flown, rather than after every trip. Cars are manufactured in a way that trades off costs against accident survival rates. Paradoxically, a refusal to “put a monetary value on life” means that life is often undervalued. Ross Shachter relates an ex- perience with a government agency that commissioned a study on removing asbestos from schools. The decision analysts performing the study assumed a particular dollar value for the life of a school-age child, and argued that the rational choice under that assumption was to remove the asbestos. The agency, morally outraged at the idea of setting the value of a life, rejected the report out of hand. It then decided against asbestos removal—implicitly asserting a lower value for the life of a child than that assigned by the analysts. 616 Chapter 16. Making Simple Decisions Some attempts have been made to find out the value that people place on their own MICROMORT lives. One common “currency” used in medical and safety analysis is the micromort, a one in a million chance of death. If you ask people how much they would pay to avoid a risk—for example, to avoid playing Russian roulette with a million-barreled revolver—they will respond with very large numbers, perhaps tens of thousands of dollars, but their actual behavior reflects a much lower monetary value for a micromort. For example, driving in a car for 230 miles incurs a risk of one micromort; over the life of your car—say, 92,000 miles— that’s 400 micromorts. People appear to be willing to pay about $10,000 (at 2009 prices) more for a safer car that halves the risk of death, or about $50 per micromort. A number of studies have confirmed a figure in this range across many individuals and risk types. Of course, this argument holds only for small risks. Most people won’t agree to kill themselves for $50 million. QALY Another measure is the QALY, or quality-adjusted life year. Patients with a disability are willing to accept a shorter life expectancy to be restored to full health. For example, kidney patients on average are indifferent between living two years on a dialysis machine and one year at full health. 16.3.2 The utility of money Utility theory has its roots in economics, and economics provides one obvious candidate for a utility measure: money (or more specifically, an agent’s total net assets). The almost universal exchangeability of money for all kinds of goods and services suggests that money plays a significant role in human utility functions. It will usually be the case that an agent prefers more money to less, all other things being MONOTONIC PREFERENCE equal. We say that the agent exhibits a monotonic preference for more money. This does not mean that money behaves as a utility function, because it says nothing about preferences between lotteries involving money. Suppose you have triumphed over the other competitors in a television game show. The host now offers you a choice: either you can take the $1,000,000 prize or you can gamble it on the flip of a coin. If the coin comes up heads, you end up with nothing, but if it comes up tails, you get $2,500,000. If you’re like most people, you would decline the gamble and pocket the million. Are you being irrational? EXPECTED MONETARY VALUE Assuming the coin is fair, the expected monetary value (EMV) of the gamble is 12 ($0) + 12 ($2,500,000) = $1,250,000, which is more than the original $1,000,000. But that does not necessarily mean that accepting the gamble is a better decision. Suppose we use Sn to denote the state of possessing total wealth $n, and that your current wealth is $k. Then the expected utilities of the two actions of accepting and declining the gamble are 1 1 EU (Accept) = 2 U (Sk ) + 2 U (Sk+2,500,000 ) , EU (Decline) = U (Sk+1,000,000 ). To determine what to do, we need to assign utilities to the outcome states. Utility is not directly proportional to monetary value, because the utility for your first million is very high (or so they say), whereas the utility for an additional million is smaller. Suppose you assign a utility of 5 to your current financial status (Sk ), a 9 to the state Sk+2,500,000 , and an 8 to the Section 16.3. Utility Functions 617 U U o o o o o o o o o o o o $ $ -150,000 o 800,000 o o (a) (b) Figure 16.2 The utility of money. (a) Empirical data for Mr. Beard over a limited range. (b) A typical curve for the full range. state Sk+1,000,000. Then the rational action would be to decline, because the expected utility of accepting is only 7 (less than the 8 for declining). On the other hand, a billionaire would most likely have a utility function that is locally linear over the range of a few million more, and thus would accept the gamble. In a pioneering study of actual utility functions, Grayson (1960) found that the utility of money was almost exactly proportional to the logarithm of the amount. (This idea was first suggested by Bernoulli (1738); see Exercise 16.3.) One particular utility curve, for a certain Mr. Beard, is shown in Figure 16.2(a). The data obtained for Mr. Beard’s preferences are consistent with a utility function U (Sk+n ) = −263.31 + 22.09 log(n + 150, 000) for the range between n = −$150, 000 and n = $800, 000. We should not assume that this is the definitive utility function for monetary value, but it is likely that most people have a utility function that is concave for positive wealth. Going into debt is bad, but preferences between different levels of debt can display a reversal of the concavity associated with positive wealth. For example, someone already $10,000,000 in debt might well accept a gamble on a fair coin with a gain of $10,000,000 for heads and a loss of $20,000,000 for tails.5 This yields the S-shaped curve shown in Figure 16.2(b). If we restrict our attention to the positive part of the curves, where the slope is decreas- ing, then for any lottery L, the utility of being faced with that lottery is less than the utility of being handed the expected monetary value of the lottery as a sure thing: U (L) < U (SEMV (L) ). RISK-AVERSE That is, agents with curves of this shape are risk-averse: they prefer a sure thing with a payoff that is less than the expected monetary value of a gamble. On the other hand, in the RISK-SEEKING “desperate” region at large negative wealth in Figure 16.2(b), the behavior is risk-seeking. 5 Such behavior might be called desperate, but it is rational if one is already in a desperate situation. 618 Chapter 16. Making Simple Decisions CERTAINTY EQUIVALENT The value an agent will accept in lieu of a lottery is called the certainty equivalent of the lottery. Studies have shown that most people will accept about $400 in lieu of a gamble that gives $1000 half the time and $0 the other half—that is, the certainty equivalent of the lottery is $400, while the EMV is $500. The difference between the EMV of a lottery and its certainty INSURANCE PREMIUM equivalent is called the insurance premium. Risk aversion is the basis for the insurance industry, because it means that insurance premiums are positive. People would rather pay a small insurance premium than gamble the price of their house against the chance of a fire. From the insurance company’s point of view, the price of the house is very small compared with the firm’s total reserves. This means that the insurer’s utility curve is approximately linear over such a small region, and the gamble costs the company almost nothing. Notice that for small changes in wealth relative to the current wealth, almost any curve RISK-NEUTRAL will be approximately linear. An agent that has a linear curve is said to be risk-neutral. For gambles with small sums, therefore, we expect risk neutrality. In a sense, this justifies the simplified procedure that proposed small gambles to assess probabilities and to justify the axioms of probability in Section 13.2.3. 16.3.3 Expected utility and post-decision disappointment The rational way to choose the best action, a∗ , is to maximize expected utility: a∗ = argmax EU (a|e). a If we have calculated the expected utility correctly according to our probability model, and if the probability model correctly reflects the underlying stochastic processes that generate the outcomes, then, on average, we will get the utility we expect if the whole process is repeated many times. In reality, however, our model usually oversimplifies the real situation, either because we don’t know enough (e.g., when making a complex investment decision) or because the computation of the true expected utility is too difficult (e.g., when estimating the utility of successor states of the root node in backgammon). In that case, we are really working with estimates EU! (a|e) of the true expected utility. We will assume, kindly perhaps, that the UNBIASED estimates are unbiased, that is, the expected value of the error, E(EU ! (a|e) − EU (a|e))), is zero. In that case, it still seems reasonable to choose the action with the highest estimated utility and to expect to receive that utility, on average, when the action is executed. Unfortunately, the real outcome will usually be significantly worse than we estimated, even though the estimate was unbiased! To see why, consider a decision problem in which there are k choices, each of which has true estimated utility of 0. Suppose that the error in each utility estimate has zero mean and standard deviation of 1, shown as the bold curve in Figure 16.3. Now, as we actually start to generate the estimates, some of the errors will be negative (pessimistic) and some will be positive (optimistic). Because we select the action with the highest utility estimate, we are obviously favoring the overly optimistic estimates, and that is the source of the bias. It is a straightforward matter to calculate the distribution of the maximum of the k estimates (see Exercise 16.11) and hence quantify the extent of our disappointment. The curve in Figure 16.3 for k = 3 has a mean around 0.85, so the average disappointment will be about 85% of the standard deviation in the utility estimates. Section 16.3. Utility Functions 619 0.9 k=30 0.8 0.7 k=10 0.6 k=3 0.5 0.4 0.3 0.2 0.1 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 Error in utility estimate Figure 16.3 Plot of the error in each of k utility estimates and of the distribution of the maximum of k estimates for k = 3, 10, and 30. With more choices, extremely optimistic estimates are more likely to arise: for k = 30, the disappointment will be around twice the standard deviation in the estimates. This tendency for the estimated expected utility of the best choice to be too high is OPTIMIZER’S CURSE called the optimizer’s curse (Smith and Winkler, 2006). It afflicts even the most seasoned decision analysts and statisticians. Serious manifestations include believing that an exciting new drug that has cured 80% patients in a trial will cure 80% of patients (it’s been chosen from k = thousands of candidate drugs) or that a mutual fund advertised as having above- average returns will continue to have them (it’s been chosen to appear in the advertisement out of k = dozens of funds in the company’s overall portfolio). It can even be the case that what appears to be the best choice may not be, if the variance in the utility estimate is high: a drug, selected from thousands tried, that has cured 9 of 10 patients is probably worse than one that has cured 800 of 1000. The optimizer’s curse crops up everywhere because of the ubiquity of utility-maximizing selection processes, so taking the utility estimates at face value is a bad idea. We can avoid the curse by using an explicit probability model P(EU ! | EU ) of the error in the utility estimates. Given this model and a prior P(EU ) on what we might reasonably expect the utilities to be, we treat the utility estimate, once obtained, as evidence and compute the posterior distribution for the true utility using Bayes’ rule. 16.3.4 Human judgment and irrationality NORMATIVE THEORY Decision theory is a normative theory: it describes how a rational agent should act. A DESCRIPTIVE THEORY descriptive theory, on the other hand, describes how actual agents—for example, humans— really do act. The application of economic theory would be greatly enhanced if the two coincided, but there appears to be some experimental evidence to the contrary. The evidence suggests that humans are “predictably irrational” (Ariely, 2009). 620 Chapter 16. Making Simple Decisions The best-known problem is the Allais paradox (Allais, 1953). People are given a choice between lotteries A and B and then between C and D, which have the following prizes: A : 80% chance of $4000 C : 20% chance of $4000 B : 100% chance of $3000 D : 25% chance of $3000 Most people consistently prefer B over A (taking the sure thing), and C over D (taking the higher EMV). The normative analysis disagrees! We can see this most easily if we use the freedom implied by Equation (16.2) to set U ($0) = 0. In that case, then B ' A implies that U ($3000) > 0.8 U ($4000), whereas C ' D implies exactly the reverse. In other words, there is no utility function that is consistent with these choices. One explanation for CERTAINTY EFFECT the apparently irrational preferences is the certainty effect (Kahneman and Tversky, 1979): people are strongly attracted to gains that are certain. There are several reasons why this may be so. First, people may prefer to reduce their computational burden; by choosing certain outcomes, they don’t have to compute with probabilities. But the effect persists even when the computations involved are very easy ones. Second, people may distrust the legitimacy of the stated probabilities. I trust that a coin flip is roughly 50/50 if I have control over the coin and the flip, but I may distrust the result if the flip is done by someone with a vested interest in the outcome.6 In the presence of distrust, it might be better to go for the sure thing.7 Third, people may be accounting for their emotional state as well as their financial state. People REGRET know they would experience regret if they gave up a certain reward (B) for an 80% chance at a higher reward and then lost. In other words, if A is chosen, there is a 20% chance of getting no money and feeling like a complete idiot, which is worse than just getting no money. So perhaps people who choose B over A and C over D are not being irrational; they are just saying that they are willing to give up $200 of EMV to avoid a 20% chance of feeling like an idiot. A related problem is the Ellsberg paradox. Here the prizes are fixed, but the probabilities are underconstrained. Your payoff will depend on the color of a ball chosen from an urn. You are told that the urn contains 1/3 red balls, and 2/3 either black or yellow balls, but you don’t know how many black and how many yellow. Again, you are asked whether you prefer lottery A or B; and then C or D: A : $100 for a red ball C : $100 for a red or yellow ball B : $100 for a black ball D : $100 for a black or yellow ball. It should be clear that if you think there are more red than black balls then you should prefer A over B and C over D; if you think there are fewer red than black you should prefer the opposite. But it turns out that most people prefer A over B and also prefer D over C, even though there is no state of the world for which this is rational. It seems that people have AMBIGUITY AVERSION ambiguity aversion: A gives you a 1/3 chance of winning, while B could be anywhere between 0 and 2/3. Similarly, D gives you a 2/3 chance, while C could be anywhere between 1/3 and 3/3. Most people elect the known probability rather than the unknown unknowns. 6 For example, the mathematician/magician Persi Diaconis can make a coin flip come out the way he wants every time (Landhuis, 2004). 7 Even the sure thing may not be certain. Despite cast-iron promises, we have not yet received that $27,000,000 from the Nigerian bank account of a previously unknown deceased relative. Section 16.3. Utility Functions 621 Yet another problem is that the exact wording of a decision problem can have a big FRAMING EFFECT impact on the agent’s choices; this is called the framing effect. Experiments show that people like a medical procedure that it is described as having a “90% survival rate” about twice as much as one described as having a “10% death rate,” even though these two statements mean exactly the same thing. This discrepancy in judgment has been found in multiple experiments and is about the same whether the subjects were patients in a clinic, statistically sophisticated business school students, or experienced doctors. People feel more comfortable making relative utility judgments rather than absolute ones. I may have little idea how much I might enjoy the various wines offered by a restaurant. The restaurant takes advantage of this by offering a $200 bottle that it knows nobody will buy, but which serves to skew upward the customer’s estimate of the value of all wines and make ANCHORING EFFECT the $55 bottle seem like a bargain. This is called the anchoring effect. If human informants insist on contradictory preference judgments, there is nothing that automated agents can do to be consistent with them. Fortunately, preference judgments made by humans are often open to revision in the light of further consideration. Paradoxes like the Allais paradox are greatly reduced (but not eliminated) if the choices are explained bet- ter. In work at the Harvard Business School on assessing the utility of money, Keeney and Raiffa (1976, p. 210) found the following: Subjects tend to be too risk-averse in the small and therefore... the fitted utility functions exhibit unacceptably large risk premiums for lotteries with a large spread.... Most of the subjects, however, can reconcile their inconsistencies and feel that they have learned an important lesson about how they want to behave. As a consequence, some subjects cancel their automobile collision insurance and take out more term insurance on their lives. The evidence for human irrationality is also questioned by researchers in the field of evo- EVOLUTIONARY PSYCHOLOGY lutionary psychology, who point to the fact that our brain’s decision-making mechanisms did not evolve to solve word problems with probabilities and prizes stated as decimal num- bers. Let us grant, for the sake of argument, that the brain has built-in neural mechanism for computing with probabilities and utilities, or something functionally equivalent; if so, the required inputs would be obtained through accumulated experience of outcomes and rewards rather than through linguistic presentations of numerical values. It is far from obvious that we can directly access the brain’s built-in neural mechanisms by presenting decision problems in linguistic/numerical form. The very fact that different wordings of the same decision prob- lem elicit different choices suggests that the decision problem itself is not getting through. Spurred by this observation, psychologists have tried presenting problems in uncertain rea- soning and decision making in “evolutionarily appropriate” forms; for example, instead of saying “90% survival rate,” the experimenter might show 100 stick-figure animations of the operation, where the patient dies in 10 of them and survives in 90. (Boredom is a complicat- ing factor in these experiments!) With decision problems posed in this way, people seem to be much closer to rational behavior than previously suspected. 622 Chapter 16. Making Simple Decisions 16.4 M ULTIATTRIBUTE U TILITY F UNCTIONS Decision making in the field of public policy involves high stakes, in both money and lives. For example, in deciding what levels of harmful emissions to allow from a power plant, pol- icy makers must weigh the prevention of death and disability against the benefit of the power and the economic burden of mitigating the emissions. Siting a new airport requires consid- eration of the disruption caused by construction; the cost of land; the distance from centers of population; the noise of flight operations; safety issues arising from local topography and weather conditions; and so on. Problems like these, in which outcomes are characterized by MULTIATTRIBUTE UTILITY THEORY two or more attributes, are handled by multiattribute utility theory. We will call the attributes X = X1 ,... , Xn ; a complete vector of assignments will be x = x1 ,... , xn , where each xi is either a numeric value or a discrete value with an assumed ordering on values. We will assume that higher values of an attribute correspond to higher utilities, all other things being equal. For example, if we choose AbsenceOfNoise as an attribute in the airport problem, then the greater its value, the better the solution.8 We begin by examining cases in which decisions can be made without combining the attribute values into a single utility value. Then we look at cases in which the utilities of attribute combinations can be specified very concisely. 16.4.1 Dominance Suppose that airport site S1 costs less, generates less noise pollution, and is safer than site S2. STRICT DOMINANCE One would not hesitate to reject S2. We then say that there is strict dominance of S1 over S2. In general, if an option is of lower value on all attributes than some other option, it need not be considered further. Strict dominance is often very useful in narrowing down the field of choices to the real contenders, although it seldom yields a unique choice. Figure 16.4(a) shows a schematic diagram for the two-attribute case. That is fine for the deterministic case, in which the attribute values are known for sure. What about the general case, where the outcomes are uncertain? A direct analog of strict dominance can be constructed, where, despite the uncertainty, all possible concrete outcomes for S1 strictly dominate all possible outcomes for S2. (See Figure 16.4(b).) Of course, this will probably occur even less often than in the deterministic case. STOCHASTIC DOMINANCE Fortunately, there is a more useful generalization called stochastic dominance, which occurs very frequently in real problems. Stochastic dominance is easiest to understand in the context of a single attribute. Suppose we believe that the cost of siting the airport at S1 is uniformly distributed between $2.8 billion and $4.8 billion and that the cost at S2 is uniformly distributed between $3 billion and $5.2 billion. Figure 16.5(a) shows these distributions, with cost plotted as a negative value. Then, given only the information that utility decreases with 8 In some cases, it may be necessary to subdivide the range of values so that utility varies monotonically within each range. For example, if the RoomTemperature attribute has a utility peak at 70◦ F, we would split it into two attributes measuring the difference from the ideal, one colder and one hotter. Utility would then be monotonically increasing in each attribute. Section 16.4. Multiattribute Utility Functions 623 X2 X2 This region dominates A B C B C A A D X1 X1 (a) (b) Figure 16.4 Strict dominance. (a) Deterministic: Option A is strictly dominated by B but not by C or D. (b) Uncertain: A is strictly dominated by B but not by C. 0.6 1.2 0.5 1 0.4 0.8 Probability Probability S2 0.3 S2 S1 0.6 S1 0.2 0.4 0.1 0.2 0 0 -6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 Negative cost Negative cost (a) (b) Figure 16.5 Stochastic dominance. (a) S1 stochastically dominates S2 on cost. (b) Cu- mulative distributions for the negative cost of S1 and S2. cost, we can say that S1 stochastically dominates S2 (i.e., S2 can be discarded). It is important to note that this does not follow from comparing the expected costs. For example, if we knew the cost of S1 to be exactly $3.8 billion, then we would be unable to make a decision without additional information on the utility of money. (It might seem odd that more information on the cost of S1 could make the agent less able to decide. The paradox is resolved by noting that in the absence of exact cost information, the decision is easier to make but is more likely to be wrong.) The exact relationship between the attribute distributions needed to establish stochastic dominance is best seen by examining the cumulative distributions, shown in Figure 16.5(b). (See also Appendix A.) The cumulative distribution measures the probability that the cost is less than or equal to any given amount—that is, it integrates the original distribution. If the cumulative distribution for S1 is always to the right of the cumulative distribution for S2 , 624 Chapter 16. Making Simple Decisions then, stochastically speaking, S1 is cheaper than S2. Formally, if two actions A1 and A2 lead to probability distributions p1 (x) and p2 (x) on attribute X, then A1 stochastically dominates A2 on X if x x ∀x p1 (x ) dx ≤ p2 (x ) dx. −∞ −∞ The relevance of this definition to the selection of optimal decisions comes from the following property: if A1 stochastically dominates A2 , then for any monotonically nondecreasing utility function U (x), the expected utility of A1 is at least as high as the expected utility of A2. Hence, if an action is stochastically dominated by another action on all attributes, then it can be discarded. The stochastic dominance condition might seem rather technical and perhaps not so easy to evaluate without extensive probability calculations. In fact, it can be decided very easily in many cases. Suppose, for example, that the construction transportation cost depends on the distance to the supplier. The cost itself is uncertain, but the greater the distance, the greater the cost. If S1 is closer than S2 , then S1 will dominate S2 on cost. Although we will not present them here, there exist algorithms for propagating this kind of qualitative QUALITATIVE PROBABILISTIC information among uncertain variables in qualitative probabilistic networks, enabling a NETWORKS system to make rational decisions based on stochastic dominance, without using any numeric values. 16.4.2 Preference structure and multiattribute utility Suppose we have n attributes, each of which has d distinct possible values. To specify the complete utility function U (x1 ,... , xn ), we need dn values in the worst case. Now, the worst case corresponds to a situation in which the agent’s preferences have no regularity at all. Mul- tiattribute utility theory is based on the supposition that the preferences of typical agents have much more structure than that. The basic approach is to identify regularities in the preference REPRESENTATION THEOREM behavior we would expect to see and to use what are called representation theorems to show that an agent with a certain kind of preference structure has a utility function U (x1 ,... , xn ) = F [f1 (x1 ),... , fn (xn )] , where F is, we hope, a simple function such as addition. Notice the similarity to the use of Bayesian networks to decompose the joint probability of several random variables. Preferences without uncertainty Let us begin with the deterministic case. Remember that for deterministic environments the agent has a value function V (x1 ,... , xn ); the aim is to represent this function concisely. The basic regularity that arises in deterministic preference structures is called preference PREFERENCE INDEPENDENCE independence. Two attributes X1 and X2 are preferentially independent of a third attribute X3 if the preference between outcomes x1 , x2 , x3 and x1 , x2 , x3 does not depend on the particular value x3 for attribute X3. Going back to the airport example, where we have (among other attributes) Noise, Cost , and Deaths to consider, one may propose that Noise and Cost are preferentially inde- Section 16.4. Multiattribute Utility Functions 625 pendent of Deaths. For example, if we prefer a state with 20,000 people residing in the flight path and a construction cost of $4 billion over a state with 70,000 people residing in the flight path and a cost of $3.7 billion when the safety level is 0.06 deaths per million passenger miles in both cases, then we would have the same preference when the safety level is 0.12 or 0.03; and the same independence would hold for preferences between any other pair of values for Noise and Cost. It is also apparent that Cost and Deaths are preferentially independent of Noise and that Noise and Deaths are preferentially independent of Cost. We say that the MUTUAL PREFERENTIAL set of attributes {Noise, Cost , Deaths} exhibits mutual preferential independence (MPI). INDEPENDENCE MPI says that, whereas each attribute may be important, it does not affect the way in which one trades off the other attributes against each other. Mutual preferential independence is something of a mouthful, but thanks to a remark- able theorem due to the economist Gérard Debreu (1960), we can derive from it a very simple form for the agent’s value function: If attributes X1 ,... , Xn are mutually preferentially in- dependent, then the agent’s preference behavior can be described as maximizing the function V (x1 ,... , xn ) = Vi (xi ) , i where each Vi is a value function referring only to the attribute Xi. For example, it might well be the case that the airport decision can be made using a value function V (noise, cost , deaths ) = −noise × 104 − cost − deaths × 1012. ADDITIVE VALUE FUNCTION A value function of this type is called an additive value function. Additive functions are an extremely natural way to describe an agent’s preferences and are valid in many real-world situations. For n attributes, assessing an additive value function requires assessing n separate one-dimensional value functions rather than one n-dimensional function; typically, this repre- sents an exponential reduction in the number of preference experiments that are needed. Even when MPI does not strictly hold, as might be the case at extreme values of the attributes, an additive value function might still provide a good approximation to the agent’s preferences. This is especially true when the violations of MPI occur in portions of the attribute ranges that are unlikely to occur in practice. To understand MPI better, it helps to look at cases where it doesn’t hold. Suppose you are at a medieval market, considering the purchase of some hunting dogs, some chickens, and some wicker cages for the chickens. The hunting dogs are very valuable, but if you don’t have enough cages for the chickens, the dogs will eat the chickens; hence, the tradeoff between dogs and chickens depends strongly on the number of cages, and MPI is violated. The existence of these kinds of interactions among various attributes makes it much harder to assess the overall value function. Preferences with uncertainty When uncertainty is present in the domain, we also need to consider the structure of prefer- ences between lotteries and to understand the resulting properties of utility functions, rather than just value functions. The mathematics of this problem can become quite complicated, so we present just one of the main results to give a flavor of what can be done. The reader is referred to Keeney and Raiffa (1976) for a thorough survey of the field. 626 Chapter 16. Making Simple Decisions UTILITY INDEPENDENCE The basic notion of utility independence extends preference independence to cover lotteries: a set of attributes X is utility independent of a set of attributes Y if preferences be- tween lotteries on the attributes in X are independent of the particular values of the attributes MUTUALLY UTILITY INDEPENDENT in Y. A set of attributes is mutually utility independent (MUI) if each of its subsets is utility-independent of the remaining attributes. Again, it seems reasonable to propose that the airport attributes are MUI. MUI implies that the agent’s behavior can be described using a multiplicative utility MULTIPLICATIVE UTILITY FUNCTION function (Keeney, 1974). The general form of a multiplicative utility function is best seen by looking at the case for three attributes. For conciseness, we use Ui to mean Ui (xi ): U = k1 U1 + k2 U2 + k3 U3 + k1 k2 U1 U2 + k2 k3 U2 U3 + k3 k1 U3 U1 + k1 k2 k3 U1 U2 U3. Although this does not look very simple, it contains just three single-attribute utility functions and three constants. In general, an n-attribute problem exhibiting MUI can be modeled using n single-attribute utilities and n constants. Each of the single-attribute utility functions can be developed independently of the other attributes, and this combination will be guaranteed to generate the correct overall preferences. Additional assumptions are required to obtain a purely additive utility function. 16.5 D ECISION N ETWORKS In this section, we look at a general mechanism for making rational decisions. The notation INFLUENCE DIAGRAM is often called an influence diagram (Howard and Matheson, 1984), but we will use the DECISION NETWORK more descriptive term decision network. Decision networks combine Bayesian networks with additional node types for actions and utilities. We use airport siting as an example. 16.5.1 Representing a decision problem with a decision network In its most general form, a decision network represents information about the agent’s current state, its possible actions, the state that will result from the agent’s action, and the utility of that state. It therefore provides a substrate for implementing utility-based agents of the type first introduced in Section 2.4.5. Figure 16.6 shows a decision network for the airport siting problem. It illustrates the three types of nodes used: CHANCE NODES Chance nodes (ovals) represent random variables, just as they do in Bayesian networks. The agent could be uncertain about the construction cost, the level of air traffic and the potential for litigation, and the Deaths, Noise, and total Cost variables, each of which also depends on the site chosen. Each chance node has associated with it a conditional distribution that is indexed by the state of the parent nodes. In decision networks, the parent nodes can include decision nodes as well as chance nodes. Note that each of the current-state chance nodes could be part of a large Bayesian network for assessing construction costs, air traffic levels, or litigation potentials. DECISION NODES Decision nodes (rectangles) represent points where the decision maker has a choice of Section 16.5. Decision Networks 627 Airport Site Air Traffic Deaths Litigation Noise U Construction Cost Figure 16.6 A simple decision network for the airport-siting problem. actions. In this case, the AirportSite action can take on a different value for each site under consideration. The choice influences the cost, safety, and noise that will result. In this chapter, we assume that we are dealing with a single decision node. Chapter 17 deals with cases in which more than one decision must be made. UTILITY NODES Utility nodes (diamonds) represent the agent’s utility function.9 The utility node has as parents all variables describing the outcome that directly affect utility. Associated with the utility node is a description of the agent’s utility as a function of the parent attributes. The description could be just a tabulation of the function, or it might be a parameterized additive or linear function of the attribute values. A simplified form is also used in many cases. The notation remains identical, but the chance nodes describing the outcome state are omitted. Instead, the utility node is connected directly to the current-state nodes and the decision node. In this case, rather than representing a utility function on outcome states, the utility node represents the expected utility associated with each action, as defined in Equation (16.1) on page 611; that is, the node is associated ACTION-UTILITY FUNCTION with an action-utility function (also known as a Q-function in reinforcement learning, as described in Chapter 21). Figure 16.7 shows the action-utility representation of the airport siting problem. Notice that, because the Noise, Deaths, and Cost chance nodes in Figure 16.6 refer to future states, they can never have their values set as evidence variables. Thus, the simplified version that omits these nodes can be used whenever the more general form can be used. Although the simplified form contains fewer nodes, the omission of an explicit description of the outcome of the siting decision means that it is less flexible with respect to changes in circumstances. For example, in Figure 16.6, a change in aircraft noise levels can be reflected by a change in the conditional probability table associated with the Noise node, whereas a change in the weight accorded to noise pollution in the utility function can be reflected by 9 These nodes are also called value nodes in the literature. 628 Chapter 16. Making Simple Decisions Airport Site Air Traffic Litigation U Construction Figure 16.7 A simplified representation of the airport-siting problem. Chance nodes cor- responding to outcome states have been factored out. a change in the utility table. In the action-utility diagram, Figure 16.7, on the other hand, all such changes have to be reflected by changes to the action-utility table. Essentially, the action-utility formulation is a compiled version of the original formulation. 16.5.2 Evaluating decision networks Actions are selected by evaluating the decision network for each possible setting of the deci- sion node. Once the decision node is set, it behaves exactly like a chance node that has been set as an evidence variable. The algorithm for evaluating decision networks is the following: 1. Set the evidence variables for the current state. 2. For each possible value of the decision node: (a) Set the decision node to that value. (b) Calculate the posterior probabilities for the parent nodes of the utility node, using a standard probabilistic inference algorithm. (c) Calculate the resulting utility for the action. 3. Return the action with the highest utility. This is a straightforward extension of the Bayesian network algorithm and can be incorpo- rated directly into the agent design given in Figure 13.1 on page 484. We will see in Chap- ter 17 that the possibility of executing several actions in sequence makes the problem much more interesting. 16.6 T HE VALUE OF I NFORMATION In the preceding analysis, we have assumed that all relevant information, or at least all avail- able information, is provided to the agent before it makes its decision. In practice, this is Section 16.6. The Value of Information 629 hardly ever the case. One of the most important parts of decision making is knowing what questions to ask. For example, a doctor cannot expect to be provided with the results of all possible diagnostic tests and questions at the time a patient first enters the consulting room.10 Tests are often expensive and sometimes hazardous (both directly and because of associated delays). Their importance depends on two factors: whether the test results would lead to a significantly better treatment plan, and how likely the various test results are. INFORMATION VALUE THEORY This section describes information value theory, which enables an agent to choose what information to acquire. We assume that, prior to selecting a “real” action represented by the decision node, the agent can acquire the value of any of the potentially observable chance variables in the model. Thus, information value theory involves a simplified form of sequential decision making—simplified because the observation actions affect only the agent’s belief state, not the external physical state. The value of any particular observation must derive from the potential to affect the agent’s eventual physical action; and this potential can be estimated directly from the decision model itself. 16.6.1 A simple example Suppose an oil company is hoping to buy one of n indistinguishable blocks of ocean-drilling rights. Let us assume further that exactly one of the blocks contains oil worth C dollars, while the others are worthless. The asking price of each block is C/n dollars. If the company is risk-neutral, then it will be indifferent between buying a block and not buying one. Now suppose that a seismologist offers the company the results of a survey of block number 3, which indicates definitively whether the block contains oil. How much should the company be willing to pay for the information? The way to answer this question is to examine what the company would do if it had the information: With probability 1/n, the survey will indicate oil in block 3. In this case, the company will buy block 3 for C/n dollars and make a profit of C − C/n = (n − 1)C/n dollars. With probability (n−1)/n, the survey will show that the block contains no oil, in which case the company will buy a different block. Now the probability of finding oil in one of the other blocks changes from 1/n to 1/(n − 1), so the company makes an expected profit of C/(n − 1) − C/n = C/n(n − 1) dollars. Now we can calculate the expected profit, given the survey information: 1 (n − 1)C n−1 C × + × = C/n. n n n n(n − 1) Therefore, the company should be willing to pay the seismologist up to C/n dollars for the information: the information is worth as much as the block itself. The value of information derives from the fact that with the information, one’s course of action can be changed to suit the actual situation. One can discriminate according to the situation, whereas without the information, one has to do what’s best on average over the possible situations. In general, the value of a given piece of information is defined to be the difference in expected value between best actions before and after information is obtained. 10 In the United States, the only question that is always asked beforehand is whether the patient has insurance. 630 Chapter 16. Making Simple Decisions 16.6.2 A general formula for perfect information It is simple to derive a general mathematical formula for the value of information. We assume that exact evidence can be obtained about the value of some random variable Ej (that is, we VALUE OF PERFECT INFORMATION learn Ej = ej ), so the phrase value of perfect information (VPI) is used.11 Let the agent’s initial evidence be e. Then the value of the current best action α is defined by EU (α|e) = max P (R ESULT (a) = s | a, e) U (s ) , a s and the value of the new best action (after the new evidence Ej = ej is obtained) will be EU (αej |e, ej ) = max P (R ESULT (a) = s | a, e, ej ) U (s ). a s But Ej is a random variable whose value is currently unknown, so to determine the value of discovering Ej , given current information e we must average over all possible values ejk that we might discover for Ej , using our current beliefs about its value: VPI e (Ej ) = P (Ej = ejk |e) EU (αejk |e, Ej = ejk ) − EU (α|e). k To get some intuition for this formula, consider the simple case where there are only two actions, a1 and a2 , from which to choose. Their current expected utilities are U1 and U2. The information Ej = ejk will yield some new expected utilities U1 and U2 for the actions, but before we obtain Ej , we will have some probability distributions over the possible values of U1 and U2 (which we assume are independent). Suppose that a1 and a2 represent two different routes through a mountain range in winter. a1 is a nice, straight highway through a low pass, and a2 is a winding dirt road over the top. Just given this information, a1 is clearly preferable, because it is quite possible that a2 is blocked by avalanches, whereas it is unlikely that anything blocks a1. U1 is therefore clearly higher than U2. It is possible to obtain satellite reports Ej on the actual state of each road that would give new expectations, U1 and U2 , for the two crossings. The distributions for these expectations are shown in Figure 16.8(a). Obviously, in this case, it is not worth the expense of obtaining satellite reports, because it is unlikely that the information derived from them will change the plan. With no change, information has no value. Now suppose that we are choosing between two different winding dirt roads of slightly different lengths and we are carrying a seriously injured passenger. Then, even when U1 and U2 are quite close, the distributions of U1 and U2 are very broad. There is a significant possibility that the second route will turn out to be clear while the first is blocked, and in this 11 There is no loss of expressiveness in requiring perfect information. Suppose we wanted to model the case in which we become somewhat more certain about a variable. We can do that by introducing another variable about which we learn perfect information. For example, suppose we initially have broad uncertainty about the variable Temperature. Then we gain the perfect knowledge Thermometer = 37; this gives us imperfect information about the true Temperature , and the uncertainty due to measurement error is encoded in the sensor model P(Thermometer | Temperature ). See Exercise 16.17 for another example. Section 16.6. The Value of Information 631 P(U | Ej) P(U | Ej) P(U | Ej) U U U U2 U1 U2 U1 U2 U1 (a) (b) (c) Figure 16.8 Three generic cases for the value of information. In (a), a1 will almost cer- tainly remain superior to a2 , so the information is not needed. In (b), the choice is unclear and the information is crucial. In (c), the choice is unclear, but because it makes little difference, the information is less valuable. (Note: The fact that U2 has a high peak in (c) means that its expected value is known with higher certainty than U1.) case the difference in utilities will be very high. The VPI formula indicates that it might be worthwhile getting the satellite reports. Such a situation is shown in Figure 16.8(b). Finally, suppose that we are choosing between the two dirt roads in summertime, when blockage by avalanches is unlikely. In this case, satellite reports might show one route to be more scenic than the other because of flowering alpine meadows, or perhaps wetter because of errant streams. It is therefore quite likely that we would change our plan if we had the information. In this case, however, the difference in value between the two routes is still likely to be very small, so we will not bother to obtain the reports. This situation is shown in Figure 16.8(c). In sum, information has value to the extent that it is likely to cause a change of plan and to the extent that the new plan will be significantly better than the old plan. 16.6.3 Properties of the value of information One might ask whether it is possible for information to be deleterious: can it actually have negative expected value? Intuitively, one should expect this to be impossible. After all, one could in the worst case just ignore the information and pretend that one has never received it. This is confirmed by the following theorem, which applies to any decision-theoretic agent: The expected value of information is nonnegative: ∀ e, Ej VPI e (Ej ) ≥ 0. The theorem follows directly from the definition of VPI, and we leave the proof as an exercise (Exercise 16.18). It is, of course, a theorem about expected value, not actual value. Additional information can easily lead to a plan that turns out to be worse than the original plan if the information happens to be misleading. For example, a medical test that gives a false positive result may lead to unnecessary surgery; but that does not mean that the test shouldn’t be done. 632 Chapter 16. Making Simple Decisions It is important to remember that VPI depends on the current state of information, which is why it is subscripted. It can change as more information is acquired. For any given piece of evidence Ej , the value of acquiring it can go down (e.g., if another variable strongly constrains the posterior for Ej ) or up (e.g., if another variable provides a clue on which Ej builds, enabling a new and better plan to be devised). Thus, VPI is not additive. That is, VPI e (Ej , Ek ) = VPI e (Ej ) + VPI e (Ek ) (in general). VPI is, however, order independent. That is, VPI e (Ej , Ek ) = VPI e (Ej ) + VPI e,ej (Ek ) = VPI e (Ek ) + VPI e,ek (Ej ). Order independence distinguishes sensing actions from ordinary actions and simplifies the problem of calculating the value of a sequence of sensing actions. 16.6.4 Implementation of an information-gathering agent A sensible agent should ask questions in a reasonable order, should avoid asking questions that are irrelevant, should take into account the importance of each piece of information in relation to its cost, and should stop asking questions when that is appropriate. All of these capabilities can be achieved by using the value of information as a guide. Figure 16.9 shows the overall design of an agent that can gather information intel- ligently before acting. For now, we assume that with each observable evidence variable Ej , there is an associated cost, Cost(Ej ), which reflects the cost of obtaining the evidence through tests, consultants, questions, or whatever. The agent requests what appears to be the most efficient observation in terms of utility gain per unit cost. We assume that the result of the action Request (Ej ) is that the next percept provides the value of Ej. If no observation is worth its cost, the agent selects a “real” action. The agent algorithm we have described implements a form of information gathering MYOPIC that is called myopic. This is because it uses the VPI formula shortsightedly, calculating the value of information as if only a single evidence variable will be acquired. Myopic control is based on the same heuristic idea as greedy search and often works well in practice. (For example, it has been shown to outperform expert physicians in selecting diagnostic tests.) function I NFORMATION -G ATHERING -AGENT( percept ) returns an action persistent: D , a decision network integrate percept into D j ← the value that maximizes VPI (Ej ) / Cost (Ej ) if VPI (Ej ) > Cost (Ej ) return R EQUEST(Ej ) else return the best action from D Figure 16.9 Design of a simple information-gathering agent. The agent works by repeat- edly selecting the observation with the highest information value, until the cost of the next observation is greater than its expected benefit. Section 16.7. Decision-Theoretic Expert Systems 633 However, if there is no single evidence variable that will help a lot, a myopic agent might hastily take an action when it would have been better to request two or more variables first and then take action. A better approach in this situation would be to construct a conditional plan (as described in Section 11.3.2) that asks for variable values and takes different next steps depending on the answer. One final consideration is the effect a series of questions will have on a human respon- dent. People may respond better to a series of questions if they “make sense,” so some expert systems are built to take this into account, asking questions in an order that maximizes the total utility of the system and human rather than an order that maximizes value of information. 16.7 D ECISION -T HEORETIC E XPERT S YSTEMS DECISION ANALYSIS The field of decision analysis, which evolved in the 1950s and 1960s, studies the application of decision theory to actual decision problems. It is used to help make rational decisions in important domains where the stakes are high, such as business, government, law, military strategy, medical diagnosis and public health, engineering design, and resource management. The process involves a careful study of the possible actions and outcomes, as well as the preferences placed on each outcome. It is traditional in decision analysis to talk about two DECISION MAKER roles: the decision maker states preferences between outcomes, and the decision analyst DECISION ANALYST enumerates the possible actions and outcomes and elicits preferences from the decision maker to determine the best course of action. Until the early 1980s, the main purpose of decision analysis was to help humans make decisions that actually reflect their own preferences. As more and more decision processes become automated, decision analysis is increasingly used to ensure that the automated processes are behaving as desired. Early expert system research concentrated on answering questions, rather than on mak- ing decisions. Those systems that did recommend actions rather than providing opinions on matters of fact generally did so using condition-action rules, rather than with explicit rep- resentations of outcomes and preferences. The emergence of Bayesian networks in the late 1980s made it possible to build large-scale systems that generated sound probabilistic infer- ences from evidence. The addition of decision networks means that expert systems can be developed that recommend optimal decisions, reflecting the preferences of the agent as well as the available evidence. A system that incorporates utilities can avoid one of the most common pitfalls associ- ated with the consultation process: confusing likelihood and importance. A common strategy in early medical expert systems, for example, was to rank possible diagnoses in order of like- lihood and report the most likely. Unfortunately, this can be disastrous! For the majority of patients in general practice, the two most likely diagnoses are usually “There’s nothing wrong with you” and “You have a bad cold,” but if the third most likely diagnosis for a given patient is lung cancer, that’s a serious matter. Obviously, a testing or treatment plan should depend both on probabilities and utilities. Current medical expert systems can take into account the value of information to recommend tests, and then describe a differential diagnosis. 634 Chapter 16. Making Simple Decisions We now describe the knowledge engineering process for decision-theoretic expert sys- tems. As an example we consider the problem of selecting a medical treatment for a kind of congenital heart disease in children (see Lucas, 1996). About 0.8% of children are born with a heart anomaly, the most common being aortic AORTIC COARCTATION coarctation (a constriction of the aorta). It can be treated with surgery, angioplasty (expand- ing the aorta with a balloon placed inside the artery), or medication. The problem is to decide what treatment to use and when to do it: the younger the infant, the greater the risks of certain treatments, but one mustn’t wait too long. A decision-theoretic expert system for this problem can be created by a team consisting of at least one domain expert (a pediatric cardiologist) and one knowledge engineer. The process can be broken down into the following steps: Create a causal model. Determine the possible symptoms, disorders, treatments, and outcomes. Then draw arcs between them, indicating what disorders cause what symptoms, and what treatments alleviate what disorders. Some of this will be well known to the domain expert, and some will come from the literature. Often the model will match well with the informal graphical descriptions given in medical textbooks. Simplify to a qualitative decision model. Since we are using the model to make treatment decisions and not for other purposes (such as determining the joint probability of certain symptom/disorder combinations), we can often simplify by removing variables that are not involved in treatment decisions. Sometimes variables will have to be split or joined to match the expert’s intuitions. For example, the original aortic coarctation model had a Treatment variable with values surgery, angioplasty, and medication, and a separate variable for Timing of the treatment. But the expert had a hard time thinking of these separately, so they were combined, with Treatment taking on values such as surgery in 1 month. This gives us the model of Figure 16.10. Assign probabilities. Probabilities can come from patient databases, literature studies, or the expert’s subjective assessments. Note that a diagnostic system will reason from symp- toms and other observations to the disease or other cause of the problems. Thus, in the early years of building these systems, experts were asked for the probability of a cause given an effect. In general they found this difficult to do, and were better able to assess the probability of an effect given a cause. So modern systems usually assess causal knowledge and encode it directly in the Bayesian network structure of the model, leaving the diagnostic reasoning to the Bayesian network inference algorithms (Shachter and Heckerman, 1987). Assign utilities. When there are a small number of possible outcomes, they can be enumerated and evaluated individually using the methods of Section 16.3.1. We would create a scale from best to worst outcome and give each a numeric value, for example 0 for death and 1 for complete recovery. We would then place the other outcomes on this scale. This can be done by the expert, but it is better if the patient (or in the case of infants, the patient’s parents) can be involved, because different people have different preferences. If there are ex- ponentially many outcomes, we need some way to combine them using multiattribute utility functions. For example, we may say that the costs of various complications are additive. Verify and refine the model. To evaluate the system we need a set of correct (input, GOLD STANDARD output) pairs; a so-called gold standard to compare against. For medical expert systems this usually means assembling the best available doctors, presenting them with a few cases, Section 16.7. Decision-Theoretic Expert Systems 635 Sex Postcoarctectomy Syndrome Tachypnea Tachycardia Paradoxical Failure Hypertension To Thrive Aortic

Making Simple Decisions PDF

Document Details

Tags

Related

Summary

Full Transcript