Summary

This chapter discusses perception, exploring how experiences result from sensory stimulation, and how various factors, including knowledge and action, influence perception. It also compares human perception to computer vision, highlighting the complexities involved in replicating human capabilities in machines.

Full Transcript

SOME QUESTIONS WE WILL CONSIDER ◗ Why can two people experience different perceptions in response to the same stimulus? (68) ◗ How does perception depend on a person’s knowledge about characteristics of the environment? (74) ◗ How does the brain become tuned to respond best to things that are likely...

SOME QUESTIONS WE WILL CONSIDER ◗ Why can two people experience different perceptions in response to the same stimulus? (68) ◗ How does perception depend on a person’s knowledge about characteristics of the environment? (74) ◗ How does the brain become tuned to respond best to things that are likely to appear in the environment? (79) ◗ What is the connection between perception and action? (80) C rystal begins her run along the beach just as the sun is rising over the ocean. She loves this time of day, because it is cool and the mist rising from the sand creates a mystical effect. She looks down the beach and notices something about 100 yards away that wasn’t there yesterday. “What an interesting piece of driftwood,” she thinks, although it is difficult to see because of the mist and dim lighting (Figure 3.1a). As she approaches the object, she begins to doubt her initial perception, and just as she is wondering whether it might not be driftwood, she realizes that it is, in fact, the old beach umbrella that was lying under the lifeguard stand yesterday (Figure 3.1b). “Driftwood transformed into an umbrella, right before my eyes,” she thinks. Continuing down the beach, she passes some coiled rope that appears to be abandoned (Figure 3.1c). She stops to check it out. Grabbing one end, she flips the rope and sees that, as she suspected, it is one continuous strand. But she needs to keep running, because she is supposed to meet a friend at Beach Java, a coffee shop far down the beach. Later, sitting in the coffeehouse, she tells her friend about the piece of magic driftwood that was transformed into an umbrella. The Nature of Perception We define perception as experiences resulting from stimulation of the senses. To appreciate how these experiences are created, let’s return to Crystal on the beach. Some Basic Characteristics of Perception Bruce Goldstein Crystal’s experiences illustrate a number of things about perception. Her experience of seeing what she thought was driftwood turn into an umbrella illustrates how perceptions (a) 60 (b) (c) ➤ Figure 3.1 (a) Initially Crystal thinks she sees a large piece of driftwood far down the beach. (b) Eventually she realizes she is looking at an umbrella. (c) On her way down the beach, she passes some coiled rope. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 60 4/18/18 2:42 PM The Nature of Perception 61 can change based on added information (Crystal’s view became better as she got closer to the umbrella) and how perception can involve a process similar to reasoning or problem solving (Crystal figured out what the object was based partially on remembering having seen the umbrella the day before). (Another example of an initially erroneous perception followed by a correction is the famous pop culture line, “It’s a bird. It’s a plane. It’s Superman!”) Crystal’s guess that the coiled rope was continuous illustrates how perception can be based on a perceptual rule (when objects overlap, the one underneath usually continues behind the one on top), which may be based on the person’s past experiences. Crystal’s experience also demonstrates how arriving at a perception can involve a process. It took some time for Crystal to realize that what she thought was driftwood was actually an umbrella, so it is possible to describe her perception as involving a “reasoning” process. In most cases, perception occurs so rapidly and effortlessly that it appears to be automatic. But, as we will see in this chapter, perception is far from automatic. It involves complex, and usually invisible, processes that resemble reasoning, although they occur much more rapidly than Crystal’s realization that the driftwood was actually an umbrella. Finally, Crystal’s experience also illustrates how perception occurs in conjunction with action. Crystal is running and perceiving at the same time; later, at the coffee shop, she easily reaches for her cup of coffee, a process that involves coordination of seeing the coffee cup, determining its location, physically reaching for it, and grasping its handle. This aspect of Crystal’s experiences is what happens in everyday perception. We are usually moving, and even when we are just sitting in one place watching TV, a movie, or a sporting event, our eyes are constantly in motion as we shift our attention from one thing to another to perceive what is happening. We also grasp and pick up things many times a day, whether it is a cup of coffee, a phone, or this book. Perception, therefore, is more than just “seeing” or “hearing.” It is central to our ability to organize the actions that occur as we interact with the environment. It is important to recognize that while perception creates a picture of our environment and helps us take action within it, it also plays a central role in cognition in general. When we consider that perception is essential for creating memories, acquiring knowledge, solving problems, communicating with other people, recognizing someone you met last week, and answering questions on a cognitive psychology exam, it becomes clear that perception is the gateway to all the other cognitions we will be describing in this book. The goal of this chapter is to explain the mechanisms responsible for perception. To begin, we move from Crystal’s experience on the beach and in the coffee shop to what happens when perceiving a city scene: Pittsburgh as seen from the upper deck of PNC Park, home of the Pittsburgh Pirates. A Human Perceives Objects and a Scene Sitting in the upper deck of PNC Park, Roger looks out over the city (Figure 3.2). He sees a group of about 10 buildings on the left and can easily tell one building from another. Looking straight ahead, he sees a small building in front of a larger one, and has no trouble telling that they are two separate buildings. Looking down toward the river, he notices a horizontal yellow band above the right field bleachers. It is obvious to him that this is not part of the ballpark but is located across the river. All of Roger’s perceptions come naturally to him and require little effort. But when we look closely at the scene, it becomes apparent that the scene poses many “puzzles.” The following demonstration points out a few of them. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 61 4/18/18 2:42 PM 62 CHAPTER 3 Perception B C D Bruce Goldstein A ➤ Figure 3.2 It is easy to tell that there are a number of different buildings on the left and that straight ahead there is a low rectangular building in front of a taller building. It is also possible to tell that the horizontal yellow band above the bleachers is across the river. These perceptions are easy for humans but would be quite difficult for a computer vision system. The letters on the left indicate areas referred to in the Demonstration. D E M O N S T R AT I O N Perceptual Puzzles in a Scene The following questions refer to the areas labeled in Figure 3.2. Your task is to answer each question and indicate the reasoning behind each answer: What is the dark area at A? Are the surfaces at B and C facing in the same or different directions? Are areas B and C on the same building or on different buildings? Does the building at D extend behind the one at A? Although it may have been easy to answer the questions, it was probably somewhat more challenging to indicate what your “reasoning” was. For example, how did you know the dark area at A is a shadow? It could be a dark-colored building that is in front of a light-colored building. On what basis might you have decided that building D extends behind building A? It could, after all, simply end right where A begins. We could ask similar questions about everything in this scene because, as we will see, a particular pattern of shapes can be created by a wide variety of objects. One of the messages of this demonstration is that to determine what is “out there,” it is necessary to go beyond the pattern of light and dark that a scene creates on the retina—the Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 62 4/18/18 2:42 PM The Nature of Perception 63 structure that lines the back of the eye and contains the receptors for seeing. One way to appreciate the importance of this “going beyond” process is to consider how difficult it has been to program even the most powerful computers to accomplish perceptual tasks that humans achieve with ease. A Computer-Vision System Perceives Objects and a Scene iStock.com/JasonDoiy A computer that can perceive has been a dream that dates back to early science fiction and movies. Because movies can make up things, it was easy to show the droids R2-D2 and C3PO having a conversation on the desert planet Tatooine in the original Star Wars (1977). Although C3PO did most of the talking (R2D2 mainly beeped), both could apparently navigate through their environment with ease, and recognize objects along the way. But designing a computer vision system that can actually perceive the environment and recognize objects and scenes is more complicated than making a Star Wars movie. In the 1950s, when digital computers became available to researchers, it was thought that it would take perhaps a decade to design a machine-vision system that would rival human vision. But the early systems were primitive and took minutes of calculations to identify simple isolated objects that a young child could name in seconds. Perceiving objects and scenes was, the researchers realized, still the stuff of science fiction. It wasn’t until 1987 that the International Journal of Computer Vision, the first journal devoted solely to computer vision, was founded. Papers from the first issues considered topics such as how to interpret line drawings of curved objects (Malik, 1987) and how to determine the three-dimensional layout of a scene based on a film of movement through the scene (Bolles et al., 1987). These papers and others in the journal had to resort to complex mathematical formulas to solve perceptual problems that are easy for humans. Flash-forward to March 13, 2004. Thirteen robotic vehicles are lined up in the Mojave Desert in California for the Defense Advanced Projects Agency’s (DARPA) Grand Challenge. The task was to drive 150 miles from the starting point to Las Vegas, using only GPS coordinates to define the course and computer vision to avoid obstacles. The best performance was achieved by a vehicle entered by Carnegie-Mellon University, which traversed only 7.3 miles before getting stuck. Progress continued through the next decade, however, with thousands of researchers and multi-million-dollar investments, until now, when driverless cars are no longer a novelty. As I write this, a fleet of driverless Uber vehicles are finding their way around the winding streets of Pittsburgh, San Francisco, and other cities (Figure 3.3). One message of the preceding story is that although present accomplishments of computer-vision systems are impressive, it turned out to be extremely difficult to create the systems that made driverless cars possible. But as impressive as driverless cars are, computer-vision systems still make mistakes in naming objects. For example, Figure 3.4 shows three objects that a computer identified ➤ Figure 3.3 Driverless car on the streets of San Francisco. as a tennis ball. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 63 4/18/18 2:42 PM CHAPTER 3 Perception Bruce Goldstein 64 ➤ Figure 3.4 Even computer-vision programs that are able to recognize objects fairly accurately make mistakes, such as confusing objects that share features. In this example, the lens cover and the top of the teapot are erroneously classified as a “tennis ball.” (Source: Based on K. Simonyan et al., 2012) In another area of computer-vision research, programs have been created that can describe pictures of real scenes. For example, a computer accurately identified a scene similar to the one in Figure 3.5 as “a large plane sitting on a runway.” But mistakes still occur, as when a picture similar to the one in Figure 3.6 was identified as “a young boy holding a baseball bat” (Fei-Fei, 2015). The computer’s problem is that it doesn’t have the huge storehouse of information about the world that humans begin accumulating as soon as they are born. If a computer has never seen a toothbrush, it identifies it as something with a similar shape. And, although the computer’s response to the airplane picture is accurate, it is beyond the computer’s capabilities to recognize that this is a picture of airplanes on display, perhaps at an air show, and that the people are not passengers but are visiting the air show. So on one hand, we have come a very long way from the first attempts in the 1950s to design computer-vision systems, but to date, humans still out-perceive computers. In the next section, we consider some of the reasons perception is so difficult for computers to master. ➤ Figure 3.5 Picture similar to one that a computer vision program identified as “a large plane sitting on a runway.” ➤ Figure 3.6 Picture similar to one that a computer vision program identified as “a young boy holding a baseball bat.” Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 64 4/18/18 2:42 PM Why Is It So Difficult to Design a Perceiving Machine? 65 Why Is It So Difficult to Design a Perceiving Machine? We will now describe a few of the difficulties involved in designing a “perceiving machine.” Remember that although the problems we describe pose difficulties for computers, humans solve them easily. The Stimulus on the Receptors Is Ambiguous When you look at the page of this book, the image cast by the borders of the page on your retina is ambiguous. It may seem strange to say that, because (1) the rectangular shape of the page is obvious, and (2) once we know the page’s shape and its distance from the eye, determining its image on the retina is a simple geometry problem, which, as shown in Figure 3.7, can be solved by extending “rays” from the corners of the page (in red) into the eye. But the perceptual system is not concerned with determining an object’s image on the retina. It starts with the image on the retina, and its job is to determine what object “out there” created the image. The task of determining the object responsible for a particular image on the retina is called the inverse projection problem, because it involves starting with the retinal image and extending rays out from the eye. When we do this, as shown by extending the lines in Figure 3.7 out from the eye, we see that the retinal image created by the rectangular page could have also been created by a number of other objects, including a tilted trapezoid, a much larger rectangle, and an infinite number of other objects, located at different distances. When we consider that a particular image on the retina can be created by many different objects in the environment, it is easy to see why we say that the image on the retina is ambiguous. Nonetheless, humans typically solve the inverse projection problem easily, even though it still poses serious challenges to computer-vision systems. Objects Can Be Hidden or Blurred Sometimes objects are hidden or blurred. Look for the pencil and eyeglasses in Figure 3.8 before reading further. Although it might take a little searching, people can find the pencil in the foreground and the glasses frame sticking out from behind the computer next to the picture, even though only a small portion of these objects is visible. People also easily perceive the book, scissors, and paper as whole objects, even though they are partially hidden by other objects. Image on retina Objects that create the same image on the retina ➤ Figure 3.7 The projection of the book (red object) onto the retina can be determined by extending rays (solid lines) from the book into the eye. The principle behind the inverse projection problem is illustrated by extending rays out from the eye past the book (dashed lines). When we do this, we can see that the image created by the book can be created by an infinite number of objects, among them the tilted trapezoid and large rectangle shown here. This is why we say that the image on the retina is ambiguous. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 65 4/23/18 9:25 AM 66 CHAPTER 3 Perception Bruce Goldstein This problem of hidden objects occurs any time one object obscures part of another object. This occurs frequently in the environment, but people easily understand that the part of an object that is covered continues to exist, and they are able to use their knowledge of the environment to determine what is likely to be present. People are also able to recognize objects that are not in sharp focus, such as the faces in Figure 3.9. See how many of these people you can identify, and then consult the answers on page 91. Despite the degraded nature of these images, people can often identify most of them, whereas computers perform poorly on this task (Sinha, 2002). Objects Look Different from Different Viewpoints s_bukley/Shutterstock.com; Featureflash/Shutterstock.com; dpa picture alliance archive/Alamy Stock Photo; Peter Muhly/Alamy Stock Photo; s_bukley/Shutterstock.com; Joe Seer/Shutterstock.com; DFree/ Shutterstock.com Another problem facing any perceiving machine is that objects are often viewed from different an➤ Figure 3.8 A portion of the mess on the author’s desk. Can you locate the gles, so their images are continually changing, as in hidden pencil (easy) and the author’s glasses (hard)? Figure 3.10. People’s ability to recognize an object even when it is seen from different viewpoints is called viewpoint invariance. Computer-vision systems can achieve viewpoint invariance only by a laborious process that involves complex calculations designed to determine which points on an object match in different views (Vedaldi, Ling, & Soatto, 2010). ➤ Figure 3.9 Who are these people? See page 91 for the answers. (a) (b) (c) Bruce Goldstein (Source: Based on Sinha, 2002) ➤ Figure 3.10 Your ability to recognize each of these views as being of the same chair is an example of viewpoint invariance. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 66 4/18/18 2:42 PM Information for Human Perception 67 Scenes Contain High-Level Information Moving from objects to scenes adds another level of complexity. Not only are there often many objects in a scene, but they may be providing information about the scene that requires some reasoning to figure out. Consider, for example, the airplane picture in Figure 3.5. What is the basis for deciding the planes are probably on display at an air show? One answer is knowing that the plane on the right is an older-looking military plane that is most likely no longer in service. We also know that the people aren’t passengers waiting to board, because they are walking on the grass and aren’t carrying any luggage. Cues like this, although obvious to a person, would need to be programmed into a computer. The difficulties facing any perceiving machine illustrate that the process of perception is more complex than it seems. Our task, therefore, in describing perception is to explain this process, focusing on how our human perceiving machine operates. We begin by considering two types of information used by the human perceptual system: (1) environmental energy stimulating the receptors and (2) knowledge and expectations that the observer brings to the situation. Information for Human Perception Perception is built on a foundation of information from the environment. Looking at something creates an image on the retina. This image generates electrical signals that are transmitted through the retina, and then to the visual receiving area of the brain. This sequence of events from eye to brain is called bottom-up processing, because it starts at the “bottom” or beginning of the system, when environmental energy stimulates the receptors. But perception involves information in addition to the foundation provided by activation of the receptors and bottom-up processing. Perception also involves factors such as a person’s knowledge of the environment, and the expectations people bring to the perceptual situation. For example, remember the experiment described in Chapter 1, which showed that people identify a rapidly flashed object in a kitchen scene more accurately when that object fits the scene (Figure 1.13)? This knowledge we have of the environment is the basis of top-down processing—processing that originates in the brain, at the “top” of the perceptual system. It is this knowledge that enables people to rapidly identify objects and scenes, and also to go beyond mere identification of objects to determining the story behind a scene. We will now consider two additional examples of top-down processing: perceiving objects and hearing words in blob a sentence. Perceiving Objects An example of top-down processing, illustrated in Figure 3.11, is called “the multiple personalities of a blob,” because even though all of the blobs are identical, they are perceived as different objects depending on their orientation and the context within which they are seen (Oliva & Torralba, 2007). The blob appears to be an object on a table in (b), a shoe on a person bending down in (c), and a car and a person crossing the street in (d). We perceive the blob as different objects because of our knowledge of the kinds of objects that are likely to be found in different types of scenes. The human advantage over computers is therefore due, in part, to the additional top-down knowledge available to humans. (a) (b) (c) (d) ➤ Figure 3.11 “Multiple personalities of a blob.” What we expect to see in different contexts influences our interpretation of the identity of the “blob” inside the circles. (Source: Adapted from A. Oliva & A. Torralba, 2007) Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 67 4/18/18 2:42 PM 68 CHAPTER 3 Perception meiz mice Time 0 sec it oaz eat oats 0.5 n doaz and does eet oaz eat oats 1.0 n litl laamz and little lambs 1.5 eet eat 2.0 ievee ivy 2.5 ➤ Figure 3.12 Sound energy for the sentence “Mice eat oats and does eat oats and little lambs eat ivy.” The italicized words just below the sound record indicate how this sentence was pronounced by the speaker. The vertical lines next to the words indicate where each word begins. Note that it is difficult or impossible to tell from the sound record where one word ends and the next one begins. (Source: Speech signal courtesy of Peter Howell) Hearing Words in a Sentence Listening time (sec) An example of how top-down processing influences speech perception occurs for me as I sit in a restaurant listening to people speaking Spanish at the next table. Unfortunately, I don’t understand what they are saying because I don’t understand Spanish. To me, the dialogue sounds like an unbroken string of sound, except for occasional pauses and when a familiar word like gracias pops out. My perception reflects the fact that the physical Listen to string Listen to pairs sound signal for speech is generally continuous, and when there are breaks in the of “words”— of words—“whole” sound, they do not necessarily occur between words. You can see this in Figure 3.12 2 minutes and “part” by comparing the place where each word in the sentence begins with the pattern of Learning Test the sound signal. (a) The ability to tell when one word in a conversation ends and the next one begins is a phenomenon called speech segmentation. The fact that a listener familiar only 8.0 with English and another listener familiar with Spanish can receive identical sound stimuli but experience different perceptions means that each listener’s experience with 7.5 language (or lack of it!) is influencing his or her perception. The continuous sound signal enters the ears and triggers signals that are sent toward the speech areas of the 7.0 brain (bottom-up processing); if a listener understands the language, their knowledge of the language creates the perception of individual words (top-down processing). 6.5 While segmentation is aided by knowing the meanings of words, listeners also use other information to achieve segmentation. As we learn a language, we are learnWhole Part ing more than the meaning of the words. Without even realizing it we are learnword word ing transitional probabilities—the likelihood that one sound will follow another (b) Stimulus within a word. For example, consider the words pretty baby. In English it is likely that pre and ty will be in the same word (pre-tty) but less likely that ty and ba will be in ➤ Figure 3.13 (a) Design of the experiment the same word (pretty baby). by Saffran and coworkers (1996), in Every language has transitional probabilities for different sounds, and the prowhich infants listened to a continuous cess of learning about transitional probabilities and about other characteristics of string of nonsense syllables and were then language is called statistical learning. Research has shown that infants as young as tested to see which sounds they perceived 8 months of age are capable of statistical learning. as belonging together. (b) The results, Jennifer Saffran and coworkers (1996) carried out an early experiment that indicating that infants listened longer to demonstrated statistical learning in young infants. Figure 3.13a shows the design of the “part-word” stimuli. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 68 4/19/18 6:40 PM Information for Human Perception 69 this experiment. During the learning phase of the experiment, the infants heard four nonsense “words” such as bidaku, padoti, golabu, and tupiro, which were combined in random order to create 2 minutes of continuous sound. An example of part of a string created by combining these words is bidakupadotigolabutupiropadotibidaku. . . . In this string, every other word is printed in boldface in order to help you pick out the words. However, when the infants heard these strings, all the words were pronounced with the same intonation, and there were no breaks between the words to indicate where one word ended and the next one began. The transitional probabilities between two syllables that appeared within a word were always 1.0. For example, for the word bidaku, when /bi/ was presented, /da/ always followed it. Similarly, when /da/ was presented, /ku/ always followed it. In other words, these three sounds always occurred together and in the same order, to form the word bidaku. The transitional probabilities between the end of one word and the beginning of another were only 0.33. For example, there was a 33 percent chance that the last sound, /ku/ from bidaku, would be followed by the first sound, /pa/, from padoti, a 33 percent chance that it would be followed by /tu/ from tupiro, and a 33 percent chance it would be followed by /go/ from golabu. If Saffran’s infants were sensitive to transitional probabilities, they would perceive stimuli like bidaku or padoti as words, because the three syllables in these words are linked by transitional probabilities of 1.0. In contrast, stimuli like tibida (the end of padoti plus the beginning of bidaku) would not be perceived as words, because the transitional probabilities were much smaller. To determine whether the infants did, in fact, perceive stimuli like bidaku and padoti as words, the infants were tested by being presented with pairs of three-syllable stimuli. Some of the stimuli were “words” that had been presented before, such as padoti. These were the “whole-word” stimuli. The other stimuli were created from the end of one word and the beginning of another, such as tibida. These were the “part-word” stimuli. The prediction was that the infants would choose to listen to the part-word stimuli longer than to the whole-word stimuli. This prediction was based on previous research that showed that infants tend to lose interest in stimuli that are repeated, and so become familiar, but pay more attention to novel stimuli that they haven’t experienced before. Thus, if the infants perceived the whole-word stimuli as words that had been repeated over and over during the 2-minute learning session, they would pay less attention to these familiar stimuli than to the more novel part-word stimuli that they did not perceive as being words. Saffran measured how long the infants listened to each sound by presenting a blinking light near the speaker where the sound was coming from. When the light attracted the infant’s attention, the sound began, and it continued until the infant looked away. Thus, the infants controlled how long they heard each sound by how long they looked at the light. Figure 3.13b shows that the infants did, as predicted, listen longer to the part-word stimuli. From results such as these, we can conclude that the ability to use transitional probabilities to segment sounds into words begins at an early age. The examples of how context affects our perception of the blob and how knowledge of the statistics of speech affects our ability to create words from a continuous speech stream illustrate that top-down processing based on knowledge we bring to a situation plays an important role in perception. We have seen that perception depends on two types of information: bottom-up (information stimulating the receptors) and top-down (information based on knowledge). Exactly how the perceptual system uses this information has been conceived of in different ways by different people. We will now describe four prominent approaches to perceiving objects, which will take us on a journey that begins in the 1800s and ends with modern conceptions of object perception. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 69 4/18/18 2:42 PM 70 CHAPTER 3 Perception T E ST YOUR SELF 3.1 1. What does Crystal’s run down the beach illustrate about perception? List at least three different characteristics of perception. Why does the importance of perception extend beyond identifying objects? 2. Give some examples, based on the “perceptual puzzle” demonstration and computer vision, to show that determining what is out there requires going beyond the pattern of light and dark on the receptors. 3. What does our description of computer-vision capabilities beginning in the 1950s say about how difficult it has been to design computer-vision systems? 4. Describe four reasons why it is difficult to design a perceiving machine. 5. What is bottom-up processing? Top-down processing? Describe how the following indicate that perception involves more than bottom-up processing: (a) multiple personalities of a blob and (b) hearing individual words in a sentence. 6. Describe Saffran’s experiment, which showed that infants as young as 8 months are sensitive to transitional probabilities. Conceptions of Object Perception An early idea about how people use information was proposed by 19th-century physicist and physiologist Hermann von Helmholtz (1866/1911). Helmholtz’s Theory of Unconscious Inference Hermann von Helmholtz (1821–1894) was a physicist who made important contributions to fields as diverse as thermodynamics, nerve physiology, visual perception, and aesthetics. He also invented the ophthalmoscope, versions of which are still used today to enable physicians to examine the blood vessels inside the eye. One of Helmholtz’s contributions to perception was based on his realization that the image on the retina is ambiguous. We have seen that ambiguity means that a particular pattern of stimulation on the retina can be caused by a large number of objects in the environment (see Figure 3.7). For example, what does the pattern of stimulation in Figure 3.14a represent? For most people, this pattern on the retina results in the perception of a blue rectangle in front of a red rectangle, as shown in Figure 3.14b. But as Figure 3.14c indicates, this display could also have been caused by a six-sided red shape positioned behind or right next to the blue rectangle. Helmholtz’s question was, How does the perceptual system “decide” that this pattern on the retina was created by overlapping rectangles? His answer was the likelihood principle, which states that we perceive the object that is most likely to have caused the pattern of stimuli we have received. This judgment of what is most likely occurs, according to Helmholtz, by a process called unconscious inference, in which our perceptions are the result of unconscious assumptions, or inferences, that we make about the environment. Thus, we infer that it is likely that Figure 3.14a is a rectangle (a) (b) (c) covering another rectangle because of experiences we have had with similar situations in the past. ➤ Figure 3.14 The display in (a) is usually interpreted as being Helmholtz’s description of the process of perception re(b) a blue rectangle in front of a red rectangle. It could, however, sembles the process involved in solving a problem. For percepbe (c) a blue rectangle and an appropriately positioned six-sided tion, the problem is to determine which object has caused a red figure. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 70 4/23/18 9:26 AM Conceptions of Object Perception 71 particular pattern of stimulation, and this problem is solved by a process in which the perceptual system applies the observer’s knowledge of the environment in order to infer what the object might be. An important feature of Helmholtz’s proposal is that this process of perceiving what is most likely to have caused the pattern on the retina happens rapidly and unconsciously. These unconscious assumptions, which are based on the likelihood principle, result in perceptions that seem “instantaneous,” even though they are the outcome of a rapid process. Thus, although you might have been able to solve the perceptual puzzles in the scene in Figure 3.2 without much effort, this ability, according to Helmholtz, is the outcome of processes of which we are unaware. (See Rock, 1983, for a more recent version of this idea.) The Gestalt Principles of Organization We will now consider an approach to perception proposed by a group called the Gestalt psychologists about 30 years after Helmholtz proposed his theory of unconscious inference. The goal of the Gestalt approach was the same as Helmholtz’s—to explain how we perceive objects—but they approached the problem in a different way. The Gestalt approach to perception originated, in part, as a reaction to Wilhelm Wundt’s structuralism (see page 7). Remember from Chapter 1 that Wundt proposed that our overall experience could be understood by combining basic elements of experience called sensations. According to this idea, our perception of the face in Figure 3.15 is created by adding up many sensations, represented as dots in this figure. The Gestalt psychologists rejected the idea that perceptions were formed by “adding up” sensations. One of the origins of the Gestalt idea that perceptions could not be explained by adding up small sensations has been attributed to the experience of psychologist Max Wertheimer, who while on vacation in 1911 took a train ride through Germany (Boring, 1942). When he got off the train to stretch his legs at Frankfurt, he bought a stroboscope from a toy vendor on the train platform. The stroboscope, a mechanical device that created an illusion of movement by rapidly alternating two slightly different pictures, caused Wertheimer to wonder how the structuralist idea that experience is created from sensations could explain the illusion of movement he observed. Figure 3.16 diagrams the principle behind the illusion of movement created by the stroboscope, which is called apparent movement because, although movement is perceived, nothing is actually moving. There are three components to stimuli that create apparent movement: (1) One light flashes on and off (Figure 3.16a); (2) there is a period of darkness, lasting a fraction of a second (Figure 3.16b); and (3) the second light flashes on and off (Figure 3.16c). Physically, therefore, there are two lights flashing on and off separated by a period of darkness. But we don’t see the darkness because our perceptual system adds something during the period of darkness—the perception of a light moving through the space between the flashing lights (Figure 3.16d). Modern examples of apparent movement are electronic signs that display moving advertisements or news headlines, and movies. The perception of movement in these displays is so compelling that it is difficult to imagine that they are made up of stationary lights flashing on and off (for the news headlines) or still images flashed one after the other (for the movies). Wertheimer drew two conclusions from the phenomenon of apparent movement. His first conclusion was that apparent movement cannot be explained by sensations, because there is nothing in the dark space between the flashing lights. His second conclusion became one of the basic principles of Gestalt psychology: The whole is different than the sum of its parts. This conclusion follows from the fact that the perceptual system creates the perception of movement from stationary images. This idea that the whole is different than the sum of its parts led the Gestalt psychologists to propose a number of principles of perceptual organization to explain the way elements are grouped together to create larger ➤ Figure 3.15 According to structuralism, a number of sensations (represented by the dots) add up to create our perception of the face. (a) One light flashes (b) Darkness (c) The second light flashes (d) Flash—dark—flash ➤ Figure 3.16 The conditions for creating apparent movement. (a) One light flashes, followed by (b) a short period of darkness, followed by (c) another light flashing at a different position. The resulting perception, symbolized in (d), is a light moving from left to right. Movement is seen between the two lights even though there is only darkness in the space between them. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 71 4/23/18 9:26 AM CHAPTER 3 Perception Cengage Learning, Inc., AnetaPics/Shutterstock.com; Scratchgravel Publishing Services 72 ➤ Figure 3.17 Some black and white shapes that become perceptually organized into a Dalmatian. (See page 91 for an outline of the Dalmatian.) objects. For example, in Figure 3.17, some of the black areas become grouped to form a Dalmatian and others are seen as shadows in the background. We will describe a few of the Gestalt principles, beginning with one that brings us back to Crystal’s run along the beach. Good Continuation The principle of good continuation states the following: Points that, when connected, result in straight or smoothly curving lines are seen as belonging together, and the lines tend to be seen in such a way as to follow the smoothest path. Also, objects that are overlapped by other objects are perceived as continuing behind the overlapping object. Thus, when Crystal saw the coiled rope in Figure 3.1c, she wasn’t surprised that when she grabbed one end of the rope and flipped it, it turned out to be one continuous strand (Figure 3.18). The reason this didn’t surprise her is that even though there were many places where one part of the rope overlapped another part, she didn’t perceive the rope as consisting of a number of separate pieces; rather, she perceived the rope as continuous. (Also consider your shoelaces!) (a) (b) Bruce Goldstein Pragnanz Pragnanz, roughly translated from the German, means “good figure.” The law of pragnanz, also called the principle of good figure or the principle of simplicity, states: Every stimulus pattern is seen in such a way that the resulting structure is as simple as possible. ➤ Figure 3.18 (a) Rope on the beach. (b) Good continuation helps us perceive the rope as a single strand. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 72 4/18/18 2:42 PM Conceptions of Object Perception 73 The familiar Olympic symbol in Figure 3.19a is an example of the law of simplicity at work. We see this display as five circles and not as a larger number of more complicated shapes such as the ones shown in the “exploded” view of the Olympic symbol in Figure 3.19b. (The law of good continuation also contributes to perceiving the five circles. Can you see why this is so?) Similarity Most people perceive Figure 3.20a as either horizontal rows of (a) circles, vertical columns of circles, or both. But when we change the color of some of the columns, as in Figure 3.20b, most people perceive vertical columns of circles. This perception illustrates the principle of similarity: Similar things appear to be grouped together. A striking example of grouping by similarity of color is shown in Figure 3.21. Grouping can also occur because of similarity of size, shape, or orientation. There are many other principles of organization, proposed by the origi(b) nal Gestalt psychologists (Helson, 1933) as well as by modern psychologists (Palmer, 1992; Palmer & Rock, 1994), but the main message, for our discussion, is that the Gestalt psychologists realized that perception is based on more than just the pattern of light and dark on the retina. In their conception, perception is determined by specific organizing principles. But where do these organizing principles come from? Max Wertheimer (1912) describes these principles as “intrinsic laws,” which implies that they are built into the system. This idea that the principles are “built in” is consistent with the Gestalt psychologists’ idea that although a person’s experience can influence perception, the role of experience is minor compared to the perceptual principles (also see Koffka, 1935). This idea that experience plays only a minor role in perception differs from Helmholtz’s likelihood principle, which proposes that our knowledge of the environment enables us to determine what is most likely to have created the pattern on the retina and also differs from modern approaches to object perception, which propose that our experience with the environment is a central component of the process of perception. ➤ Figure 3.19 The Olympic symbol is perceived as five circles (a), not as the nine shapes in (b). (a) (b) ➤ Figure 3.20 (a) This pattern of dots is perceived as horizontal rows, vertical columns, or both. (b) This pattern of dots is perceived as vertical columns. ➤ Figure 3.21 This photograph, Waves, by Wilma Hurskainen, was taken at the exact moment that the front of the white water aligned with the white area on the woman’s clothing. Similarity of color causes grouping; differently colored areas of the dress are perceptually grouped with the same colors in the scene. Also notice how the front edge of the water creates grouping by good continuation across the woman’s dress. (Source: Courtesy of Wilma Hurskainen) Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 73 4/23/18 9:27 AM 74 CHAPTER 3 Perception Taking Regularities of the Environment into Account Modern perceptual psychologists take experience into account by noting that certain characteristics of the environment occur frequently. For example, blue is associated with open sky, landscapes are often green and smooth, and verticals and horizontals are often associated with buildings. These frequently occurring characteristics are called regularities in the environment. There are two types of regularities: physical regularities and semantic regularities. Bruce Goldstein Physical Regularities Physical regularities are regularly occurring physical properties of the environment. For example, there are more vertical and horizontal orientations in the environment than oblique (angled) orientations. This occurs in human-made environments (for example, buildings contain lots of horizontals and verticals) and also in natural environments (trees and plants are more likely to be vertical or horizontal than slanted) (Coppola et al., 1998) (Figure 3.22). It is therefore no coincidence that people can perceive horizontals and verticals more easily than other orientations, an effect called the oblique effect (Appelle, 1972; Campbell et al., 1966; Orban et al., 1984). Another example of a physical regularity is that when one object partially covers another one, the contour of the partially covered object “comes out the other side,” as occurs for the rope in Figure 3.18. Another physical regularity is illustrated by Figure 3.23a, which shows indentations created by people walking in the sand. But turning this picture upside down, as in Figure 3.23b, transforms the indentations into rounded mounds. Our perception in these two situations has been explained by the light-from-above assumption: We usually assume that light is coming from above, because light in our environment, including the sun and most artificial light, usually comes from above (Kleffner & Ramachandran, 1992). Figure 3.23c shows how light coming from above and from the left illuminates an indentation, leaving a shadow on the left. Figure 3.23d shows how the same light illuminates a bump, leaving a shadow on the right. Our perception of illuminated shapes is influenced by how they are shaded, combined with the brain’s assumption that light is coming from above. One of the reasons humans are able to perceive and recognize objects and scenes so much better than computer-guided robots is that our system is adapted to respond to the physical characteristics of our environment, such as the orientations of objects and the direction of light. But this adaptation goes beyond physical characteristics. It also occurs because, as we saw when we considered the multiple personalities of a blob (page 67), we have learned about what types of objects typically occur in specific types of scenes. ➤ Figure 3.22 In these two scenes from nature, horizontal and vertical orientations are more common than oblique orientations. These scenes are special examples, picked because of the large proportion of verticals. However, randomly selected photos of natural scenes also contain more horizontal and vertical orientations than oblique orientations. This also occurs for human-made buildings and objects. Semantic Regularities In language, semantics refers to the meanings of words or sentences. Applied to perceiving scenes, semantics refers to the meaning of a scene. This meaning is often related to what happens within a scene. For example, food preparation, cooking, and perhaps eating occur in a kitchen; waiting around, buying tickets, checking luggage, and going through security checkpoints happen in airports. Semantic regularities are the characteristics associated with the functions carried out in different types of scenes. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 74 4/18/18 2:42 PM 75 Bruce Goldstein Conceptions of Object Perception (a) (b) Shadow (c) Shadow INDENTATION (d) BUMP ➤ Figure 3.23 (a) Indentations made by people walking in the sand. (b) Turning the picture upside down turns indentations into rounded mounds. (c) How light from above and to the left illuminates an indentation, causing a shadow on the left. (d) The same light illuminating a bump causes a shadow on the right. One way to demonstrate that people are aware of semantic regularities is simply to ask them to imagine a particular type of scene or object, as in the following demonstration. D E M O N S T R AT I O N Visualizing Scenes and Objects Your task in this demonstration is simple. Close your eyes and then visualize or simply think about the following scenes and objects: 1. An office 2. The clothing section of a department store 3. A microscope 4. A lion Most people who have grown up in modern society have little trouble visualizing an office or the clothing section of a department store. What is important about this ability, for our purposes, is that part of this visualization involves details within these scenes. Most people see an office as having a desk with a computer on it, bookshelves, and a chair. The department store scene contains racks of clothes, a changing room, and perhaps a cash register. What did you see when you visualized the microscope or the lion? Many people report seeing not just a Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 75 4/18/18 2:42 PM 76 CHAPTER 3 Perception single object, but an object within a setting. Perhaps you perceived the microscope sitting on a lab bench or in a laboratory and the lion in a forest, on a savannah, or in a zoo. The point of this demonstration is that our visualizations contain information based on our knowledge of different kinds of scenes. This knowledge of what a given scene typically contains is called a scene schema, and the expectations created by scene schemas contribute to our ability to perceive objects and scenes. For example, Palmer’s (1975) experiment (Figure 1.13), in which people identified the bread, which fit the kitchen scene, faster than the mailbox, which didn’t fit the scene, is an example of the operation of people’s scene schemas for “kitchen.” In connection with this, how do you think your scene schemas for “airport” might contribute to your interpretation of what is happening in the scene in Figure 3.5? Although people make use of regularities in the environment to help them perceive, they are often unaware of the specific information they are using. This aspect of perception is similar to what occurs when we use language. Even though we aren’t aware of transitional probabilities in language, we use them to help perceive words in a sentence. Even though we may not think about regularities in visual scenes, we use them to help perceive scenes and the objects within scenes. Bayesian Inference Two of the ideas we have described—(1) Helmholtz’s idea that we resolve the ambiguity of the retinal image by inferring what is most likely, given the situation, and (2) the idea that regularities in the environment provide information we can use to resolve ambiguities—are the starting point for our last approach to object perception: Bayesian inference (Geisler, 2008, 2011; Kersten et al., 2004; Yuille & Kersten, 2006). Bayesian inference was named after Thomas Bayes (1701–1761), who proposed that our estimate of the probability of an outcome is determined by two factors: (1) the prior probability, or simply the prior, which is our initial belief about the probability of an outcome, and (2) the extent to which the available evidence is consistent with the outcome. This second factor is called the likelihood of the outcome. To illustrate Bayesian inference, let’s first consider Figure 3.24a, which shows Mary’s priors for three types of health problems. Mary believes that having a cold or heartburn is likely to occur, but having lung disease is unlikely. With these priors in her head (along with lots of other beliefs about health-related matters), Mary notices that her friend Charles has a bad cough. She guesses that three possible causes could be a cold, heartburn, or lung disease. Looking further into possible causes, she does some research and finds that coughing is often associated with having either a cold or lung disease, but isn’t associated with heartburn (Figure 3.24b). This additional information, which is the likelihood, is combined with Mary’s prior to produce the conclusion that Charles probably has a cold (Figure 3.24c) “Prior”: Mary’s belief about frequency “Likelihood”: Chances of causing coughing Conclusion: Cough is most likely due to a cold High Probability ➤ Figure 3.24 These graphs present hypothetical probabilities to illustrate the principle behind Bayesian inference. (a) Mary’s beliefs about the relative frequency of having a cold, lung disease, and heartburn. These beliefs are her priors. (b) Further data indicate that colds and lung disease are associated with coughing, but heartburn is not. These data contribute to the likelihood. (c) Taking the priors and likelihood together results in the conclusion that Charles’s cough is probably due to a cold. = Low Cold Lung Heartdisease burn Cold Lung Heartdisease burn Cold Lung Heartdisease burn Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 76 4/18/18 2:42 PM 77 Conceptions of Object Perception (Tenenbaum et al., 2011). In practice, Bayesian inference involves a mathematical procedure in which the prior is multiplied by the likelihood to determine the probability of the outcome. Thus, people start with a prior and then use additional evidence to update the prior and reach a conclusion (Körding & Wolpert, 2006). Applying this idea to object perception, let’s return to the inverse projection problem from Figure 3.7. Remember that the inverse projection problem occurs because a huge number of possible objects could be associated with a particular image on the retina. So, the problem is how to determine what is “out there” that is causing a particular retinal image. Luckily, we don’t have to rely only on the retinal image, because we come to most perceptual situations with prior probabilities based on our past experiences. One of the priors you have in your head is that books are rectangular. Thus, when you look at a book on your desk, your initial belief is that it is likely that the book is rectangular. The likelihood that the book is rectangular is provided by additional evidence such as the book’s retinal image, combined with your perception of the book’s distance and the angle at which you are viewing the book. If this additional evidence is consistent with your prior that the book is rectangular, the likelihood is high and the perception “rectangular” is strengthened. Additional testing by changing your viewing angle and distance can further strengthen the conclusion that the shape is a rectangle. Note that you aren’t necessarily conscious of this testing process—it occurs automatically and rapidly. The important point about this process is that while the retinal image is still the starting point for perceiving the shape of the book, adding the person’s prior beliefs reduces the possible shapes that could be causing that image. What Bayesian inference does is to restate Helmholtz’s idea—that we perceive what is most likely to have created the stimulation we have received—in terms of probabilities. It isn’t always easy to specify these probabilities, particularly when considering complex perceptions. However, because Bayesian inference provides a specific procedure for determining what might be out there, researchers have used it to develop computer-vision systems that can apply knowledge about the environment to more accurately translate the pattern of stimulation on their sensors into conclusions about the environment. (Also see Goldreich & Tong, 2013, for an example of how Bayesian inference has been applied to tactile perception.) Comparing the Four Approaches Now that we have described four conceptions of object perception (Helmholtz’s unconscious inference, the Gestalt laws of organization, regularities in the environment, and Bayesian inference), here’s a question: Which one is different from the other three? After you’ve figured out your answer, look at the bottom of the page. The approaches of Helmholtz, regularities, and Bayes all have in common the idea that we use data about the environment, gathered through our past experiences in perceiving, to determine what is out there. Top-down processing is therefore an important part of these approaches. The Gestalt psychologists, in contrast, emphasized the idea that the principles of organization are built in. They acknowledged that perception is affected by experience but argued that built-in principles can override experience, thereby assigning bottom-up processing a central role in perception. The Gestalt psy(a) (b) chologist Max Wertheimer (1912) provided the following example to illustrate how built-in principles could override experience: Most people recognize ➤ Figure 3.25 (a) W on top of M. Figure 3.25a as W and M based on their past experience with these letters. However, (b) When combined, a new when the letters are arranged as in Figure 3.25b, most people see two uprights plus a pattern emerges, overriding the meaningful letters. Answer: The Gestalt approach. (Source: From M. Wertheimer, 1912) Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202 08271_ch03_ptg01.indd 77 4/18/18 2:42 PM CHAPTER 3 Perception Bruce Goldstein 78 ➤ Figure 3.26 A usual occurrence in the environment: Objects (the men’s legs) are partially hidden by another object (the grey boards). In this example, the men’s legs continue in a straight line and are the same color above and below the boards, so it is highly likely that they continue behind the boards. pattern between them. The uprights, which are created by the principle of good continuation, are the dominant perception and override the effects of past experience we have had with Ws and Ms. Although the Gestalt psychologists deemphasized experience, using arguments like the preceding one, modern psychologists have pointed out that the laws of organization could, in fact, have been created by experience. For example, it is possible that the principle of good continuation has been determined by experience with the environment. Consider the scene in Figure 3.26. From years of experience in seeing objects that are partially covered by other objects, we know that when two visible parts of an object (like the men’s legs) have the same color (principle of similarity) and are

Use Quizgecko on...
Browser
Browser