The Turing Test: The First 50 Years PDF

Rugg and Wilding – Retrieval processing and episodic memory Opinion mode. Proc. Natl. Acad. Sci. U. S. A. (in press) 32 Rugg, M.D. et al. (1995) Event-related potentials and the recollection 20 Rugg, M.D. et al. (1998) Neural correlates of memory retrieval during of low and high frequency words. Neuropsychologia 33, 471–484 recognition memory and cued recall. NeuroImage 8, 262–273 33 Schacter, D.L. et al. (1996) Conscious recollection and the human 21 Rugg, M.D et al. (1996) Differential activation of the prefrontal cortex hippocampal formation: evidence from positron emission tomography. in successful and unsuccessful memory retrieval. Brain 119, 2073–2083 Proc. Natl. Acad. Sci. U. S. A. 93, 321–325 22 Fletcher, P.C. et al. (1997) The functional neuroanatomy of episodic 34 Buckner, R.L. et al. (1998) Functional-anatomic study of episodic retrieval memory. Trends Neurosci. 20, 213–218 using fMRI: 1. Retrieval effort versus retrieval success. NeuroImage 7, 23 Rugg, M.D. et al. (1999) The role of the prefrontal cortex in recognition 151–162 memory and memory for source: an fMRI study. NeuroImage 10, 520–529 35 Buckner, R.L. et al. (1998) Functional-anatomic study of episodic 24 Wagner, A.D. et al. (1998) Prefrontal cortex and recognition memory: retrieval using fMRI: 2. Selective averaging of event-related fMRI trials functional MRI evidence for context-dependent retrieval processes. to test the success hypothesis. NeuroImage 7, 163–175 Brain 121, 1985–2002 36 Nolde, S.F. et al. (1998) Left prefrontal activation during episodic 25 Kutas, M. and Dale, A. (1997) Electrical and magnetic readings of mental remembering: an event-related fMRI study. NeuroReport 9, 3509–3514 functions. In Cognitive Neuroscience (Rugg, M.D., ed.), pp. 169–242, 37 Henson, R.N.A. et al. (1999) Right prefrontal cortex and episodic Psychology Press memory retrieval: a functional MRI test of the monitoring hypothesis. 26 Wilding, E.L. and Rugg, M.D. (1996) An event-related potential study Brain 122, 1367–1381 of recognition memory with and without retrieval of source. Brain 38 Nolde, S.F. et al. (1998) The role of prefrontal cortex during tests of 119, 889–905 episodic memory. Trends Cognit. Sci. 2, 399–406 27 Donaldson, D.I. and Rugg, M.D. (1998) Recognition memory for new 39 Johnson, M.K. et al. (1996) Electrophysiological brain activity and associations: electrophysiological evidence for the role of recollection. source memory. NeuroReport 7, 2929–2932 Neuropsychologia 36, 377–395 40 Senkfor, A. and Van Petten, C. (1998) Who said what? An event-related 28 Düzel, E. et al. (1997) Event-related brain potential correlates of two potential investigation of source and item memory. J. Exp. Psychol: Learn. states of conscious awareness in memory. Proc. Natl. Acad. Sci. U. S. A. Mem. Cognit. 24, 1005–1025 94, 5973–5978 41 Ranganath, C. and Paller, K.A. (1999) Frontal brain potentials during 29 Rugg, M.D. et al. (1998) Dissociation of the neural correlates of implicit recognition are modulated by requirements to retrieve perceptual detail. and explicit memory. Nature 392, 595–598 Neuron 22, 605–613 30 Paller, K.A. and Kutas, M. (1992) Brain potentials during retrieval provide 42 Ranganath, C. and Paller, K.A. Neural correlates of memory retrieval and neurophysiological support for the distinction between conscious evaluation. Cognit. Brain Res. (in press) recollection and priming. J. Cogn. Neurosci. 4, 375–391 43 Burgess, P.W. and Shallice, T. (1996) Confabulation and the control of 31 Smith, M.E. (1993) Neurophysiological manifestations of recollective recollection. Memory 4, 359–411 experience during recognition memory judgments. J. Cogn. Neurosci. 44 Koriat, A. and Goldsmith, M. (1996) Monitoring and control processes in 5, 1–13 the strategic regulation of memory accuracy. Psychol. Rev. 103, 490–517 The Turing Test: the first 50 years Robert M. French The Turing Test, originally proposed as a simple operational definition of intelligence, has now been with us for exactly half a century. It is safe to say that no other single article in computer science, and few other articles in science in general, have generated so much discussion. The present article chronicles the comments and controversy surrounding Turing’s classic article from its publication to the present. The changing perception of the Turing Test over the last 50 years has paralleled the changing attitudes in the scientific community towards artificial intelligence: from the unbridled optimism of 1960s to the current realization of the immense difficulties that still lie ahead. I conclude with the prediction that the Turing Test will remain important, not only as a landmark in the history of the development of intelligent machines, but also with real relevance R.M. French is at to future generations of people living in a world in which the cognitive capacities of Quantitative Psychology and machines will be vastly greater than they are now. Cognitive Science, Department of T he invention and development of the computer will un- doubtedly rank as one of the twentieth century’s most far- Mathison Turing. The first was theoretical in nature: in order to solve a major outstanding problem in mathematics, he Psychology, University of Liege, Belgium. reaching achievements that will ultimately rival or even sur- developed a simple mathematical model for a universal com- tel: +32 4 221 05 42 fax: +32 4 366 28 59 pass that of the printing press. At the very heart of that puting machine (today referred to as a Turing Machine). The e-mail: rfrench@ development were three seminal contributions by Alan second was practical: he was actively involved in building ulg.ac.be 1364-6613/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S1364-6613(00)01453-4 115 Trends in Cognitive Sciences – Vol. 4, No. 3, March 2000 Opinion French – The Turing Test one of the very first electronic, programmable, digital com- ficial intelligence journals, philosophy journals, technical trea- puters. Finally, his third contribution was philosophical: he tises, novels and the popular press. Type ‘Turing Test’ into any provided an elegant operational definition of thinking that, Web browser and you will have thousands of hits. Perhaps in many ways, set the entire field of artificial intelligence (AI) the reason for this high profile is partly our drive to build in motion. In this article, I will focus only on this final contri- mechanical devices that imitate what humans do. However, bution, the Imitation Game, proposed in his classic article there seems to be a particular fascination with mechanizing in Mind in 1950 (Ref. 1). our ability to think. The idea of mechanized thinking goes back at least to the 17th century with the Characteristica The Imitation Game Universalis of Leibnitz and extends through the work of La Before reviewing the various comments on Turing’s article, Mettrie to the writings of Hobbes, Pascal, Boole, Babbage I will briefly describe what Turing called the Imitation Game and others. The advent of the computer meant that, for the (called the Turing Test today). He began by describing a first time, there was a realistic chance of actually achieving the parlour game. Imagine, he says, that a man and a woman goal of mechanized thought. It is this on-going fascination are in two separate rooms and communicate with an inter- with mechanized thought that has kept the Turing Test in the rogator only by means of a teletype – the 1950s equivalent forefront of discussions about AI for the past half century. of today’s electronic ‘chat’. The interrogator must correctly identify the man and the woman and, in order to do so, he The value and the validity of the Turing Test may ask any question capable of being transmitted by tele- Opinions on the validity and, especially, the value of the type. The man tries to convince the interrogator that he is Turing Test as a real guide for research vary widely. Some the woman, while the woman tries to communicate her real authors have maintained that it was precisely the operational identity. At some point during the game the man is replaced definition of intelligence that was needed to sidestep the by a machine. If the interrogator remains incapable of dis- philosophical quagmire of attempting to define rigorously tinguishing the machine from the woman, the machine will what was meant by ‘thinking’ and ‘intelligence’ (see Refs 4–7). be said to have passed the Test and we will say that the machine At the other extreme, there are authors who believe that the is intelligent. (We see here why Turing chose communication Turing Test is, at best, passé8 and, at worst, a real impediment by teletype – namely, so that the lack of physical features to progress in the field of artificial intelligence9,10. Hayes and which Turing felt were not essential for cognition, would Ford9 claim that abandoning the Turing Test as an ultimate not count against the machine.) goal is ‘almost a requirement for any rational research pro- The Turing Test, as it rapidly came to be described in gram which declares itself interested in any particular part the literature and as it is generally described today, replaces the of cognition or mental activity’. Their not unreasonable view woman with a person of either gender. It is also frequently is that research time is better spent developing what they described in terms of a single room containing either a person call ‘a general science of cognition’ that would focus on more or a machine and the interrogator must determine whether restricted areas of cognition, such as analogy-making, vision, he is communicating with a real person or a machine. These generalization and categorization abilities. They add, ‘From a variations do, indeed, differ somewhat from Turing’s original practical perspective, why would anyone want to build ma- formulation of his imitation game. In the original test the man chines that could pass the Turing Test? Human cognition, playing against the woman, as well as the computer that re- even high-quality human cognition, is not in short supply. places him, are both ‘playing out of character’ (i.e. they are What extra functionality would such a machine provide?’ both relying on a theory of what women are like). The mod- Taking a historical view, Whitby8 describes four phases ern description of the Test simply pits a machine in one room in the evolving interest in the Turing Test: against a person in another. It is generally agreed that this 1950–1966: a source of inspiration for all concerned variation does not change the essence of Turing’s operational with AI definition of intelligence, although it almost certainly makes 1966–1973: a distraction from some more promising the Test more difficult for the machine to pass2. One signifi- avenues of AI research cant point about the Turing Test that is often misunderstood 1973–1990: by now a source of distraction mainly to is that failing it proves nothing. Many people would undoubt- philosophers, rather than AI workers edly fail it if they were put in the role of the computer, but 1990 onwards: consigned to history this certainly does not prove that they are not intelligent! The Turing Test was intended only to provide a sufficient I am not sure exactly what Whitby means by ‘consigned condition for intelligence. to history’, but if he means ‘forgotten’, I personally doubt that To reiterate, Turing’s central claim is that there would be this will be the case. I believe that in 300 years’ time people no reason to deny intelligence to a machine that could flaw- will still be discussing the arguments raised by Turing in his lessly imitate a human’s unrestricted conversation. Turing’s paper. It could even be argued that the Turing Test will take article has unquestionably generated more commentary and on an even greater significance several centuries in the future controversy than any other article in the field of artificial in- when it might serve as a moral yardstick in a world where telligence, and few papers in any field have created such an machines will move around much as we do, will use natural enduring reaction. Only 13 years after Turing’s article ap- language, and will interact with humans in ways that are al- peared, Anderson had already counted over 1000 published most inconceivable today. In short, one of the questions facing papers on whether machines could think3. For half a century, future generations may well be, ‘To what extent do machines references to the Turing Test have appeared regularly in arti- have to act like humans before it becomes immoral to damage 116 Trends in Cognitive Sciences – Vol. 4, No. 3, March 2000 French – The Turing Test Opinion or destroy them?’ And the very essence of the Turing Test is we adopt this solipsistic position for a machine, we must also our judgment of how well machines act like humans. adopt it for other people, and few people would be willing to do that. Shift in perception of the Turing Test Finally, the most important objection that Turing raised It is easy to forget just how high the optimism once ran for was what he calls ‘Lady Lovelace’s objection’. The name of the rapid achievement of artificial intelligence. In 1958, a this objection comes from a remark by Lady Lovelace con- mere eight years after the appearance of Turing’s article, when cerning Charles Babbage’s ‘Analytical Engine’, and was para- computers were still in their infancy and even high-level phrased by Turing as ‘the machine can only do what we know programming languages had only just been invented, Simon how to order it to do’1. In other words, machines, unlike and Newell11, two of the founders of the field of artificial humans, are incapable of creative acts because they are only intelligence, wrote, ‘…there are now in the world machines following the programmer’s instructions. His answer is, in that think, that learn and that create. Moreover, their ability essence, that although we may program the basics, a com- to do these things is going to increase rapidly until – in a vis- puter, especially a computer capable of autonomous learn- ible future – the range of problems they can handle will be ing (see section 7 of Turing’s article1, ‘Learning Machines’), co-extensive with the range to which the human mind has may well do things that could not have been anticipated by been applied’. Minsky, head of the MIT AI Laboratory, its programmer. wrote in 1967, ‘Within a generation the problem of creating “artificial intelligence” will be substantially solved’12. A brief chronicle of early comments on the Turing Test During this period of initial optimism, most of the au- Mays wrote one of the earliest replies to Turing, questioning thors writing about the Turing Test shared with the founders the fact that a machine designed to perform logical opera- of AI the belief that a machine could actually be built that tions could actually capture ‘our intuitive, often vague and would be able to pass the Test in the foreseeable future. The imprecise, thought processes’16. Importantly, this paper con- debate, therefore, centered almost exclusively around Turing’s tained a first reference to a problem that would take center operational definition of disembodied intelligence – namely, stage in the artificial intelligence community three decades did passing the Turing Test constitute a sufficient condition later: ‘Defenders of the computing machine analogy seem im- for intelligence or not? As it gradually dawned on AI re- plicitly to assume that the whole of intelligence and thought searchers just how difficult it was going to be to produce ar- can be built up summatively from the warp and woof of tificial intelligence, the focus of the debate on the Turing atomic propositions’16. This objection, in modified form, Test shifted. By 1982, Minsky’s position regarding artificial would re-appear in the 1980s as one of the fundamental intelligence had undergone a radical shift from one of un- criticisms of traditional artificial intelligence. bounded optimism 15 years earlier to a far more sober assess- In Scriven’s first article17, he arrived at the conclusion that ment of the situation: ‘The AI problem is one of the hardest merely imitating human behaviour was certainly not enough ever undertaken by science’13. The perception of the Turing for consciousness. Then, a decade later, apparently seduced Test underwent a parallel shift. At least in part because of the by the claims of the new AI movement, he changed his mind great difficulties being experienced by AI, there was a grow- completely, saying, ‘I now believe that it is possible so to con- ing realization of just how hard it would be for a machine to struct a supercomputer as to make it wholly unreasonable to pass the Turing Test. Thus, instead of discussing whether or deny that it had feelings’3. not a machine that had passed the Turing Test was really in- Gunderson clearly believed that passing the Turing Test telligent, the discussion shifted to whether it would even be would not necessarily be a proof of real machine intelli- possible for any machine to pass such a test. gence18,19. Gunderson’s objection was that the Test is based on a behaviouristic construal of thinking, which he felt must Turing’s comments of the Imitation Game be rejected. He suggested that thinking is a very broad concept The first set of comments on the Imitation Game were voiced and that a machine passing the Imitation Game is merely by Turing himself. I will briefly consider three of the most exhibiting a single skill (which we might dub ‘imitation-game important. The first is the ‘mathematical objection’ based on playing’), rather than the all-purpose abilities defined by Gödel’s Theorem14, which proves that there are truths that thinking. Further, he claimed that playing the Imitation can be expressed in any sufficiently powerful formal system, Game successfully could well be achieved in ways other that we humans can recognize as truths, but that cannot be than by thinking, without saying precisely what these other proved within that system (i.e. a computer could not recog- ways might be. Stevenson, writing a decade later when the nize them as truths, because it would have to prove them in difficulties with AI research had become clearer, criticized order to recognize them as such). This then would provide Gunderson’s single-skill objection, insisting that to play the a limitation for the computer, but not for humans. This ar- game would require ‘a very large range of other properties’20. gument was taken up and developed in detail a decade later In articles written in the early 1970s we see the first shift in a well-known paper by Lucas15. Turing replied that humans away from the acceptance that it might be possible for a are not perfect formal systems and, indeed, may also have a machine to pass the Turing Test. Even though Purtill’s limit to the truths they can recognize. basic objection21 to the Turing Test was essentially the Lady The second objection is the ‘argument from conscious- Lovelace objection (i.e. that any output is determined by ness’ or the ‘problem of other minds’. The only way to know what the programmer explicitly put into the machine, and if anything is thinking is to be that thing, so we cannot therefore can be explained in this manner), he concluded his know if anything else really thinks. Turing’s reply was that if paper in a particularly profound manner, thus: ‘…if a 117 Trends in Cognitive Sciences – Vol. 4, No. 3, March 2000 Opinion French – The Turing Test Box 1. The Human Subcognitive Profile Let us designate as ‘subcognitive’ any question capable of provid- see Ref. d and Box 2) can indirectly, but reliably, probe much ing a window on low-level (i.e. unconscious) cognitive or physical deeper subcognitive and even physical levels of the two candi- structure. By ‘low-level cognitive structure’, we mean the sub- dates. The clear boundary between the symbolic level and the conscious associative network in human minds that consists of physical level that Turing had hoped to achieve with his teletype highly overlapping, activatable representations of experience link to the candidates all but disappears (Refs b,e). People’s (Refs a–c). answers to subcognitive questions are produced by our lifetime The Turing Test interrogator prepares a long list of sub- of experiencing the world with our human bodies, our human cognitive questions (the Subcognitive Question List) and produces behaviors (whether culturally or genetically engendered), our a profile of answers to these questions from a representative sample human desires and needs, etc. (See Harnard for a discussion of of the general population; for example: the closely related ‘symbol grounding problem’, Ref. f.) It does not ‘On a scale of 0 (completely implausible) to 10 (completely matter if we are confronted with made-up words or conceptual plausible): juxtapositions that never normally occur (e.g. banana splits and medicine), we can still respond and, moreover, these responses Rate Flugblogs as the name of a start-up computer company will show statistical regularities over the population. Thus, by Rate Flugblogs as the name of air-filled bags that you tie on surveying the population at large with an extensive set of these your feet and use to cross swamps questions, we draw up a Human Subcognitive Profile for the Rate banana splits as medicine population. It is precisely this profile that could not be repro- Rate purses as weapons. duced by a machine that had not experienced the world Other questions might include: as the members of the sampled human population had. The Subcognitive Question List that was used to produce the Human Someone calls you a trubhead. Is this a compliment or an insult? Subcognitive Profile gives the interrogator a tool for elimi- Which word do you find prettier: blutch or farfaletta? nating machines from a Turing test in which humans are also Does holding a gulp of Coca-Cola in your mouth feel more participating. like having pins and needles in your foot or having cold water poured on your head? References We can imagine many more questions that would be designed a French, R.M. (1988) Subcognitive probing: hard questions for the to test not only for subcognitive associations, but for internal Turing Test. In Proc. Tenth Annu. Cognit. Sci. Soc. Conf., pp. 361–367, physical structure. These would include questions whose answers Erlbaum would be, for example, a product of the spacing of the candidate’s b French, R.M. (1990) Subcognition and the limits of the Turing Test. eyes, would involve visual aftereffects, would be the results Mind 99, 53–65 of little self-experiments involving tactile sensations on their c French, R.M. (1996) The Inverted Turing Test: how a simple (mindless) program could pass it. Psycoloquy 7 (39), turing-test.6.french bodies or sensations after running in place, and so on. d Harnad, S. (1994) Levels of functional equivalence in reverse The interrogator would then come to the Turing Test and bioengineering: the Darwinian Turing Test for artificial life. Artif. asks both candidates the questions on her Subcognitive Question Life 1, 293–301 List. The candidate most closely matching the average answer e Davidson, D. (1990) Turing’s test. In Modelling the Mind (Said, K.A. profile from the human population will be the human. et al., eds), pp. 1–11, Oxford University Press The essential idea here is that the ‘symbols-in/symbols-out’ f Harnad, S. (1990) The symbol grounding problem. Physica D 42, level specified in Turing’s original article (Harnad’s level T2; 335–346 computer could play the complete, “any question” imitation little value in guiding real research on artificial intelligence. game it might indeed cause us to consider that perhaps that Stalker replied that an explanation of how a computer passes computer was capable of thought. But that any computer the Turing Test would require an appeal to mental, not purely might be able to play such a game in the foreseeable future mechanistic notions25. Moor then countered that these two is so immensely improbable as to make the whole question explanations are not necessarily competitors26. academic’. Sampson replied that low-level determinism (i.e. the program and its inputs) does not imply predictable high- Comments from the 1980s level behaviour22. Two years later, Millar presented the first Numerous papers on the Turing Test appeared at the begin- explicit discussion of the Turing Test’s anthropocentrism: ning of the 1980s, among them one by Hofstadter27. This ‘Turing’s test forces us to ascribe typical human objectives paper covers a wide range of issues and includes a particu- and human cultural background to the machine, but if we larly interesting discussion of the ways in which a computer are to be serious in contemplating the use of such a term simulation of a hurricane differs or does not differ from a real [intelligence] we should be open-minded enough to allow hurricane. (For a further discussion of this point, see Ref. 28.) computing machinery or Martians to display their intelligence The two most often cited papers from this period were by by means of behaviour which is well-adapted for achieving Block29 and Searle30. Instead of following up the lines of in- their own specific aims’23. quiry opened by Purtill21 and Millar23, these authors contin- Moor agreed that passing the test would constitute a suf- ued the standard line of attack on the Turing Test, arguing ficient proof of intelligence24. He viewed the Test as ‘a po- that even if a machine passed the Turing Test, it still might tential source of good inductive evidence for the hypothesis not be intelligent. The explicit assumption was, in both that machines can think’, rather than as a purely operational cases, that it was, in principle, possible for machines to pass definition of intelligence. However, he suggested that it is of the Test. 118 Trends in Cognitive Sciences – Vol. 4, No. 3, March 2000 French – The Turing Test Opinion Block claimed that the Test is testing merely for behav- has led many to underestimate its severity…’ He suggests that iour, not the underlying mechanisms of intelligence29. He the Turing Test, when we think of just how hard it would be suggested that a mindless machine could pass the Turing to pass, also shows why AI has turned out to be so hard. Test in the following way: the Test will be defined to last an As the 1980s ended, a new type of discussion about the hour; the machine will then memorize all possible conversa- Turing Test appeared, one that reflected not only the diffi- tional exchanges that could occur during an hour. Thus, wher- culties of traditional, symbolic AI but also the surge of interest ever the questions of the interrogator lead, the machine will in sub-symbolic AI fuelled by the ideas of connectionism33,34. be ready with a perfect conversation. But for a mere hour’s These new ideas were the basis of work by French35,36 that worth of conversation such a machine would have to store at sought to show, by means of a technique based on ‘sub- least 101500 20-word strings, which is far, far greater than the cognitive’ questions (see Box 1), that ‘only a computer that number of particles in the universe. Block drops all pretence had acquired adult human intelligence by experiencing the that he is talking about real computers in his response to this world as we have could pass the Turing Test’36. Further, he objection: ‘My argument requires only that the machine be argued that any attempt to fix the Turing Test ‘so that it could logically possible, not that it be feasible or even nomologically test for intelligence in general and not just human intelligence possible’. Unfortunately, Block is no longer talking about is doomed to failure because of the completely interwoven and the Turing Test because, clearly, Turing was talking about real interdependent nature of the human physical, subcognitive, computers (cf. sections 3 and 4 of Turing’s article). In addition, and cognitive levels’36. French also emphasized the fact that a real interrogator might throw in questions with invented the Turing Test, when rigorously administered, probes deep words in them like, ‘Does the word splugpud sound very pretty levels of the associative concept networks of the candidates to you?’ A perfectly legitimate question, but impossible for and that these ‘networks are the product of a lifetime of inter- the Block machine to answer. Combinatorial explosion brings action with the world which necessarily involves human sense the walls down around Block’s argument. organs, their location on the body, their sensitivity to various Searle replaced the Turing Test with his now-famous stimuli, etc’36. A similar conclusion was reached by Davidson, ‘Chinese Room’ thought experiment30. Instead of the Imita- who wrote, ‘Turing wanted his Test to draw “a fairly sharp tion Game we are asked to imagine a closed room in which line between the physical and the intellectual capacities of there is an English-speaker who knows not a word of Chinese. man.” There is no such line’37. A native Chinese person writes a question in Chinese on a In the past decade, Harnad has been one of the most piece of paper and sends it into the room. The room is full of prolific writers on the Turing Test38–42. Most importantly he symbolic rules specifying inputs and outputs. The English- has proposed a ‘Total Turing Test’ (TTT) in which the screen speaker then matches the symbols in the question with sym- provided by the teletype link between the candidates and the bols in the rule-base. This does not have to be a direct table interrogator is removed38. This is an explicit recognition of matching of the string of symbols in the question with sym- the importance of bodies in an entity’s interaction with the bols in the rule base, but can include any type of look-up environment. The heart of Harnad’s argument is that mental program, regardless of its structural complexity. The English- semantics must be ‘grounded’, in other words, the meanings speaker is blindly led through the maze of rules to a string of of internal symbols must derive, at least partly, from inter- symbols that constitutes an answer to the question. He copies actions with the external environment43. Shanon also recog- this answer on a piece of paper and sends it out of the room. nized the necessity of an interaction with the environment44. The Chinese person on the outside of the room would see a However, Hauser argued that the switch from the normal perfect response, even though the English-speaker understood Turing Test to the TTT is unwarranted45. In later papers, no Chinese whatsoever. The Chinese person would therefore Harnad extended this notion by defining a hierarchy of Turing believe that the person inside the room understands Chinese. Tests (see Box 2) of which the second (T2: the symbols-in/ Many replies have been made to this argument31 and I will symbols-out Turing Test) corresponds to the standard Turing not include them here. One simple refutation would be to Test. T3 (the Total Turing Test) is the Robotic Turing Test ask how the room could possibly contain answers to ques- in which the interrogator directly, visually, tactically, addresses tions that contained caricaturally distorted characters. So, for the two candidates – the teletype ‘screening’ mechanism is example, assume the last character in a question had been eliminated. But we might still be able to detect some inter- distorted in a very phallic manner (but the character is still nal differences, even if the machine passed T3. Therefore, clearly recognizable to a native Chinese person). The ques- Harnad proposes T4: Internal Microfunctional Indistinguish- tion sent into the room is: ‘Would the last character in this ability. And finally, T5: Grand Unified Theories of Every- question be likely to embarrass a very shy young woman?’ thing, where the two candidates would be microfunctionally Now, to answer this question, all possible inputs, including equivalent by every test relevant to a neurologist, neuro- all possible distortions of those inputs, would have to be con- physiologist, and neurobiophysicist (for example, both fully tained in the rules in the room. Combinatorial explosion, obey the Hodgkin–Huxley equations governing neuronal once again, brings down this line of argument. firing) but would nonetheless be distinguishable to a physical chemist. Could any machine ever pass the Turing Test? Harnad clearly recognizes the extreme difficulty of achiev- In the mid-1980s, Dennett emphasized the sheer difficulty of a ing even T2 and stresses the impossibility of implementing machine’s passing the Turing Test32. He accepted it as a suffi- disembodied cognition. Schweizer wishes to improve the cient condition for intelligence, but wrote that, ‘A failure to Robotic Turing Test (T3) by proposing a Truly Total Turing think imaginatively about the test actually proposed by Turing Test in which a long-term temporal dimension is added to 119 Trends in Cognitive Sciences – Vol. 4, No. 3, March 2000 Opinion French – The Turing Test Box 2. The Turing Test hierarchy Stevan Harnad has proposed a five-level Turing Test (TT) hier- ities, but it further requires indistinguishability in all of our ro- archy (Refs a–c). This hierarchy attempts to encompass various botic capacities: in other words, total indistinguishability in ex- levels of difficulty in playing an Imitation Game. The levels are ternal (i.e. behavioral) function. At this level, physical appearance t1, T2, T3, T4, and T5. The Harnad hierarchy works as follows: and directly observable behaviour matter. Level t1 Level T4: ‘Microfunctional Indistinguishability’ The ‘toy-model’ level. These are models (‘toys’, hence the lower This level would call for internal indistinguishability, right down case ‘t’) that only handle a fragment of our cognitive capacity. So, to the last neuron and neurotransmitter. These could be synthetic for example, Colby’s program designed to imitate a paranoid neurons, of course, but they would have to be functionally schizophrenic would fall into this category, because ‘the TT is indistinguishable from real ones. predicated on total functional indistinguishability, and toys are most decidedly distinguishable from the real thing.’ Level T5: ‘Grand Unified Theories of Everything (GUTE)’ Harnad designates this level as ‘t1’, essentially the level of cur- At this level the candidates are ‘empirically identical in kind, rent AI research, and adds that ‘research has not even entered right down to the last electron’, but there remain unobservable- the TT hierarchy yet’. in-principle differences at the level of their designers’ GUTEs. Level T2 Harnad feels that T3 is the right level for true cognitive mod- This is the level described in Turing’s original article. Harnad eling. He writes, ‘My own guess is that if ungrounded T2 systems refers to it as the ‘pen-pal version’ of the Turing Test, because are underdetermined and hence open to overinterpretation, T4 all exchanges are guaranteed by the teletype link to occur in a systems are overdetermined and hence include physical and symbols-in/symbols-out manner. Thus, T2 calls for a system functional properties that may be irrelevant to cognition. I think that is indistinguishable from us in its symbolic (i.e. linguistic) T3 is just the right empirical filter for mind-modeling.’ capacities. This is also the level for which Searle’s Chinese Room experiment is written. One central question is to what extent References a Harnad, S. (1991) Other bodies, other minds: a machine in- questions at this level can be used successfully, but indirectly, to carnation of an old philosophical problem. Minds and Machines probe the deep levels of cognitive, or even physical structure of 1, 43–54 the candidates. b Harnad, S. (1994) Levels of functional equivalence in reverse bioengineering: the Darwinian Turing Test for Artificial Life. Artif. Level T3: The ‘Total Turing Test’ (or the robotic Turing Test) Life 1, 293–301 At this level the teletype ‘screen’ is removed. T3 calls for a system c Harnad, S. Turing on reverse-engineering the mind. J. Logic Lang. that is not only indistinguishable from us in its symbolic capac- Inf. (in press) the Test46. He wants the historical record of our achievements Many of these objections concerning the difficulty of (in inventing chess, in developing languages, etc.) also to match making an actual machine that could pass the Turing Test those of the machine. are also voiced by Crockett in his discussion of the relation- One important question is: to what extent is the level ship of the Turing Test to the famous frame problem in AI specified by Turing in 1950 (i.e. Harnad’s T2, symbols-in/ (i.e. the problem of determining exactly what information symbols-out) sufficient to probe adequately the deeper sub- must remain unchanged at a representational level within cognitive and even physical levels of the candidates? If we ask a system after the system has performed some action that enough carefully worded questions (Box 1) even low-level affects its environment)48. In essence, Crockett claims that physical differences in the human and machine candidates can passing the Turing Test is essentially equivalent to solving be revealed. Questions such as, ‘Rate on a scale of 1 to 10 the frame problem (see also Ref. 49). Crockett arrives at how much keeping a gulp of Coca-Cola in your mouth feels essentially the same conclusion as French: ‘I think it is un- like having pins-and-needles in your feet’, indirectly test likely that a computer will pass the test…because I am for physical attributes and past experiences; in this case, the particularly impressed with the test’s difficulty [which is] presence of a mouth and limbs that fall asleep from time to more difficult and anthropocentric than even Turing fully time and the experience of having held a soft drink in one’s appreciated’48. mouth47. And while it might be possible for the computer to Mitchie introduced the notion of ‘superarticulacy’ into guess correctly on one or two questions of this sort, it would the debate50. He claims that for certain types of phenomena have no way of achieving the same overall profile of answers that we view as purely intuitive, there are, in fact, rules that that humans will effortlessly produce. The machine can guess can explain our behaviour, even if we are not consciously (or lie), to be sure, but it must guess (or lie) convincingly and, aware of them. We could unmask the computer in a Turing not just once or twice, but over and over again. In this case, Test because, if we gave the machine rules to answer certain guessing convincingly and systematically would mean that types of subcognitive questions – for example, ‘how do you the machine’s answer profile for these questions would be very pronounce the plurals of the imaginary English words ‘platch’, similar overall to the human answer profile in the possession ‘snorp’ and ‘brell’?’ (Answer: ‘platchez’, ‘snorps’ and ‘brellz’) of the interrogator. But how could the machine be able to – the machine would be able to explain how it gave these achieve this for a broad range of questions of this type if it answers, but we humans could not, or at least our explanation had not experienced the world as we had? would not be the one given by the computer. In this way we 120 Trends in Cognitive Sciences – Vol. 4, No. 3, March 2000 French – The Turing Test Opinion could catch the computer out and it would fail the Turing example, if the domain were International Politics, a ques- Test. The notion of superarticulacy is particularly relevant tion like, ‘Did Ronald Reagan wear a shirt when he met to current cognitive science research. Our human ability to with Mikhail Gorbachev?’ would seem to qualify as a ‘trick know something without being able to articulate that question’, being pretty obviously outside of the specified knowledge, or to learn something (as demonstrated by an domain. But now change the question to, ‘Did Mahatma ability to perform a particular task) without being aware that Ghandi wear a shirt when he met with Winston Churchill?’ we have learned it, is at present a very active line of research Unlike the first, the latter question is squarely within the in cognitive science. domain of international politics because it was Ghandi’s In a recent and significant comment on the Turing Test, practice, in order to make a political/cultural statement, to Watt proposed the Inverted Turing Test (ITT) based on con- be shirtless when meeting with British statesmen. But how siderations from ‘naive psychology’51 – our human tendency can we differentiate these two questions a priori, accepting and ability to ascribe mental states to others and to themselves. one as within the domain of international politics, while re- In the ITT, the machine must show that its tendency to ascribe jecting the other as outside of it? Further, even if it were mental states is indistinguishable from that of a real human. somehow possible to clearly delineate domains of allowable A machine will be said to pass the ITT if it is ‘unable to dis- questions, what would determine whether a domain were too tinguish between two humans, or between a human and a restricted? In a tongue-in-cheek response to Colby’s claims machine that can pass the normal TT, but which can dis- that PARRY had passed something that could rightfully be criminate between a human and a machine that can be told called a legitimate Turing Test, Weizenbaum claimed to apart by a normal TT with a human observer’51. There are have written a program for another restricted domain: in- numerous replies to this proposal52–55. It can be shown, how- fant autism61. His program, moreover, did not even require ever, that the ITT can be simulated by the standard Turing a computer to run on; it could be implemented on an elec- Test52,55. French used the technique of a ‘Human Subcognitive tric typewriter. Regardless of the question typed into it, the Profile’ (i.e. a list of subcognitive questions whose answers typewriter would just sit there and hum. In terms of the have been gathered from people in the larger population, see domain-restricted Turing Test, the program was indistin- Box 1) to show that a mindless program using the Profile guishable from a real autistic infant. The deep point of this could pass this variant of the Turing Test55. Ford and Hayes54 example is the problem with domain restrictions in a renewed their appeal to reject this type of test as any kind of Turing Test. meaningful yardstick for AI. Collins suggested his own type To date, nothing has come remotely close to passing an of test, the Editing Test53, based on ‘the skillful way in which unrestricted Turing Test and, as Dennett, who agreed to chair humans “repair” deficiencies in speech, written texts, hand- the Loebner Prize event for its first few years, said, ‘…passing writing, etc., and the failure of computers to achieve the same the Turing Test is not a sensible research and development interpretative competence’53. goal for serious AI’62. Few serious scholars of the Turing Test, myself included, take this competition seriously and Minsky Loebner Prize has even publicly offered $100 for anyone who can convince An overview of the Turing Test would not be complete Loebner to put an end to the competition!63 (For those who without briefly mentioning the Loebner Prize56,57, which wish to know more about the Loebner Competition, refer originated in 1991. The competition stipulates that the first to Ref. 57.) program to pass an unrestricted Turing Test will receive There are numerous other commentaries on the Turing $100,000. For the Loebner Prize, both humans and machines Test. Two particularly interesting comments on actually answer questions by the judges. The competition, however, is building truly intelligent machines can be found in Dennett64 among the various machines, each of which attempts to fool and Waltz65. the judges into believing that it is a human. The machine that best plays the role of a human wins the competition. Conclusions Initially, restrictions were placed on the form and content For 50 years the Turing Test has been the object of debate of the questions that could be asked. For example, questions and controversy. From its inception, the Test has come were restricted to specific topics, judges who were computer under fire as being either too strong, too weak, too anthro- scientists were disallowed, and ‘trick questions’ were not pocentric, too broad, too narrow, or too coarse. One thing, permitted. however, is certain: gradually, ineluctably, we are moving There have been numerous attempts at ‘restricted’ simu- into a world where machines will participate in all of the ac- lations of human behaviour over the years, the best known tivities that have heretofore been the sole province of hu- probably being Colby’s PARRY58,59, a program that simulates mans. While it is unlikely that robots will ever perfectly a paranoid schizophrenic by means of a large number of simulate human beings, one day in the far future they might canned routines, and Weizenbaum’s ELIZA60, which simulates indeed have sufficient cognitive capacities to pose certain a psychiatrist’s discussion with patients. ethical dilemmas for us, especially regarding their destruc- Aside from the fact that restricting the domain of allow- tion or exploitation. To resolve these issues, we will be able questions violates the spirit of Turing’s original ‘anything- called upon to consider the question: ‘how much are these goes’ Imitation Game, there are at least two major problems machines really like us?’ and I predict that the yardstick that with domain restrictions in a Turing Test. First, there is the will be used measure this similarity will look very much virtual impossibility of clearly defining what does and does not like the test that Alan Turing invented at the dawn of the count as being part of a particular real-world domain. For computer age. 121 Trends in Cognitive Sciences – Vol. 4, No. 3, March 2000 Opinion French – The Turing Test 39 Harnad, S. (1991) Other bodies, other minds: a machine incarnation of Acknowledgements an old philosophical problem. Minds Machines 1, 43–54 The present paper was supported in part by research grant IUAP P4/19 40 Harnad, S. (1992) The Turing Test is not a trick: Turing indistinguishability from the Belgian government. I am grateful to Dan Dennett and is a scientific criterion. Sigart Bull. 3, 9–10 Stevan Harnad for their particularly helpful comments on an earlier 41 Harnad, S. (1994) Levels of functional equivalence in reverse bio- draft of this review. engineering: the Darwinian Turing Test for artificial life. Artif. Life 1, 293–301 42 Harnad, S. Turing on reverse-engineering the mind. J. Logic Lang. Inf. References (in press) 1 Turing, A. (1950) Computing machinery and intelligence. Mind 59, 43 Harnad, S. (1990) The symbol grounding problem. Physica D 42, 335–346 433–460 44 Shanon, B. (1989) A simple comment regarding the Turing Test. J. Theory 2 Saygin, P. et al. (1999) Turing Test: 50 Years Later, Technical Report No. Soc. Behav. 19, 249–256 BU-CEIS-9905, Department of Computer Engineering, Bilkent University, 45 Hauser, L. (1993) Reaping the whirlwind: reply to Harnad’s ‘Other bodies, Ankara, Turkey other minds’. Minds Machines 3, 219–237 3 Anderson, A. (1964) Minds and Machines, Prentice-Hall 46 Schweizer, P. (1998) The Truly Total Turing Test. Minds Machines 8, 4 Dreyfus, H. (1992) What Computers Still Can’t Do, MIT Press 263–272 5 Haugeland, J. (1985) Artificial Intelligence, the Very Idea, MIT Press 47 French, R. Peeking behind the screen: the unsuspected power of the 6 Hofstadter, D. (1979) Gödel, Escher, Bach, Basic Books standard Turing Test. J. Exp. Theor. Artif. Intell. (in press) 7 Ginsberg, M. (1993) Essentials of Artificial Intelligence, Morgan Kaufmann 48 Crockett, L. (1994) The Turing Test and the Frame Problem: AI’s Mistaken 8 Whitby, B. (1996) The Turing Test: AI’s biggest blind alley? In Machines Understanding of Intelligence, Ablex and Thought: The Legacy of Alan Turing (Millican, P. and Clark, A. eds), 49 Harnad, S. (1993) Problems, problems: the frame problem as a symptom pp. 53–63, Oxford University Press of the symbol grounding problem. Psycoloquy 4 (34) 9 Hayes, P. and Ford, K. (1995) Turing Test considered harmful. In Proc. 50 Michie, D. (1993) Turing’s test and conscious thought. Artif. Intell. Fourteenth IJCAI-95, Montreal, Canada (Vol. 1), pp. 972–977, Morgan 60, 1–22 Kaufmann 51 Watt, S. (1996) Naive psychology and the Inverted Turing Test. 10 Johnson, W. (1992) Needed: a new test of intelligence. Sigart Bull. 3, 7–9 Psycoloquy 7 (14) 11 Simon, H. and Newell, A. (1958) Heuristic problem solving: the next 52 Bringsjord, S. (1996) The Inverted Turing Test is provably redundant. advance in operations research. Operations Res. 6, Psycoloquy 7 (29) 12 Minsky, M. (1967) Computation: Finite and Infinite Machines, p. 2, 53 Collins, H. (1997) The Editing Test for the deep problem of AI. Prentice-Hall Psycoloquy 8 (1) 13 Kolata, G. (1982) How can computers get common sense? Science 54 Ford, K. and Hayes, P. (1996) The Turing Test is just as bad when inverted. 217, 1237 Psycoloquy 7 (43) 14 Gödel, K. (1931) Uber formal unentscheidbare Sätze der Principia 55 French, R. (1996) The Inverted Turing Test: a simple (mindless) program Mathematica und Verwandter Systeme: I. Monatshefte für Mathematik that could pass it. Psycoloquy 7 (39) und Physik 38, 173–198 56 Epstein, R. (1992) Can Machines Think? AI Magazine 13, 81–95 15 Lucas, J. (1961) Minds, machines and Gödel. Philosophy 36, 112–127 57 Shieber, S. (1994) Lessons from a restricted Turing Test. Commun. ACM 16 Mays, W. (1952) Can machines think? Philosophy 27, 148–162 37, 70–78 17 Scriven, M. (1953) The mechanical concept of mind. Mind 62, 230–240 58 Colby, K. (1981) Modeling a paranoid mind. Behav. Brain Sci. 4, 515–560 18 Gunderson, K. (1964) The imitation game. Mind 73, 234–245 59 Colby, K. et al. (1971) Artificial paranoia. Artif. Intell. 2, 1–25 19 Gunderson, K. (1967) Mentality and Machines, Doubleday 60 Weizenbaum, J. (1966) ELIZA: a computer program for the study of 20 Stevenson, J. (1976) On the imitation game. Philosophia 6, 131–133 natural language communication between men and machines. Commun. 21 Purtill, R. (1971) Beating the imitation game. Mind 80, 290–294 ACM 9, 36–45 22 Sampson, G. (1973) In defence of Turing. Mind 82, 592–594 61 Weizenbaum, J. (1974) Reply to Arbib: more on computer models of 23 Millar, P. (1973) On the point of the Imitation Game. Mind 82, 595–597 psychopathic behaviour. Commun. ACM 17, 543 24 Moor, J. (1976) An analysis of the Turing Test. Philos. Stud. 30, 249–257 62 Dennett. D. (1998) Brainchildren, p. 28, MIT Press 25 Stalker, D. (1978) Why machines can’t think: a reply to James Moor. 63 Minsky, M. (1995) Article 24971 of comp.ai.philosophy, March 3, 1995 Philos. Stud. 34, 317–320 64 Dennett, D. (1994) The practical requirements for making a conscious 26 Moor, J. (1978) Explaining computer behaviour. Philos. Stud. 34, 325–327 robot. Philos. Trans. R. Soc. London Ser. A 349, 133–146 (Reprinted in 27 Hofstadter, D. (1981) The Turing Test: a coffee-house conversation. In Dennett, D., 1998, Brainchildren. MIT Press) The Mind’s I (Hofstadter, D. and Dennett, D., eds), pp. 69–95, Basic Books 65 Waltz, D. (1988) The prospects for building truly intelligent machines. 28 Anderson, D. (1987) Is the Chinese room the real thing? Philosophy 62, Daedalus 117, 191–212 389–393 29 Block, N. (1981) Psychologism and behaviourism. Philos. Rev. 90, 5–43 30 Searle, J. (1980) Minds, brains and programs. Behav. Brain Sci. 3, 417–424 31 Hofstadter, D. and Dennett, D. (1981) Reflections on ‘Minds, Brains, and Programs’. In The Mind’s I (Hofstadter, D. and Dennett, D., eds), Do you want to reproduce material pp. 373–382, Basic Books 32 Dennett, D. (1985) Can machines think? In How We Know (Shafto, M., from Trends in Cognitive Sciences? ed.), pp. 121–145, Harper & Row 33 Rumelhart, D., McClelland, J. and the PDP Research Group, eds (1986) This publication and the individual contributions Parallel Distributed Processing: Explorations in the Microstructure of within it are protected by the copyright of Cognition (Vols 1 and 2), MIT Press Elsevier Science Ltd. Except as outlined in the 34 Smolensky, P. (1988) On the proper treatment of connectionism. Behav. copyright statement (see p. iv), no part of Trends Brain Sci. 11, 1–74 in Cognitive Sciences may be reproduced, either 35 French, R. (1988) Subcognitive probing: hard questions for the Turing in print or electronic form, without written Test. In Proc. Tenth Annu. Cognit. Sci. Soc. Conf., pp. 361–367, Erlbaum permission from Elsevier Science Ltd. Please 36 French, R. (1990) Subcognition and the limits of the Turing Test. Mind address any permission requests to: 99, 53–65 37 Davidson, D. (1990) Turing’s test. In Modelling the Mind (Said, K.A. et al., eds), pp. 1–11, Oxford University Press Elsevier Science Global Rights Department, PO 38 Harnad, S. (1989) Minds, machines and Searle. J. Exp. Theor. Artif. Intell. Box 800, Oxford, UK OX5 1DX 1, 5–25 122 Trends in Cognitive Sciences – Vol. 4, No. 3, March 2000

The Turing Test: The First 50 Years PDF

Document Details

Tags

Related

Summary

Full Transcript