Lecture 2 (Object & Scene Perception) PDF
Document Details
Uploaded by FerventCyan
Tags
Summary
This document details concepts in object and scene perception, discussing different aspects like the binding problem, illusory conjunctions, and how computers struggle with the task. The document also covers competing solutions, principles of grouping and segregation, and gist perception.
Full Transcript
3/30/23 But First… Before we start lecture 2, let’s quickly discuss the answers to the questions that I posted at the end of lecture 1. These were: What is the binding problem? What is an illusory conjunction? Why does Feature Integration Theory predict conjunction searches to be slow?...
3/30/23 But First… Before we start lecture 2, let’s quickly discuss the answers to the questions that I posted at the end of lecture 1. These were: What is the binding problem? What is an illusory conjunction? Why does Feature Integration Theory predict conjunction searches to be slow? 1 0 1 What is the Binding Problem? What is the Binding Problem? Different aspects of a stimulus are processed independently, often in separate brain areas. For example, motion is processed by the dorsal stream and form is processed by the ventral stream The issue of how an object’s individual features are combined (i.e. bound) to create a coherent percept is known as the binding problem. Dorsal Stream Ventral Stream 2 2 3 3 1 3/30/23 What is an Illusory Conjunction? Why Are Conjunction Searches Predicted To Be Slow? A prediction of FIT is that if attention is inhibited, features from different objects will be incorrectly bound together. Treisman & Schmidt (1982) showed that such illusory conjunctions occur They presented character strings very briefly (95-168 ms) followed by noise mask. Some forms of visual search require binding to occur. For example, binding is required if the target contains the same features as the distractors. Target features: red horizontal The primary task was to report the two numbers. Then O’s (i.e. observers) were asked to report the coloured letters. If the target differs from the distractors only by its particular conjunction of features then that is a conjunction search FIT predicts that in conjunction searches attention needs to be applied to each object in turn (i.e. one at a time) to determine whether or not the attended object is the target Thus, these searches are predicted to be very slow O’s often associated the wrong colour with the wrong letter. Such incorrect bindings are known as illusory conjunctions. 4 4 Distractor features: red, green horizontal, vertical 5 5 Overview In this lecture, I am going to discuss how we perceive objects and scenes I am going to cover the following topics The problem of object and scene perception Competing solutions Principles of grouping Principles of segregation Gist perception 7 6 7 2 3/30/23 The Problem State Of The Art Perception seems effortless but it is much harder than Currently, the state-of-the-art computer object it seems. One way to appreciate the difficulties in perceiving objects and scenes is to try to get a computer to do it. It turns out that computers are worse at recognising objects than humans… …and fail in very unpredictable ways. recognition systems use artificial neural networks. Athalye et al. (2018) investigated what sort of images these object recognition systems would misclassify. Based on what they discovered, they then designed images that would fool these systems. 8 8 9 9 State Of The Art What Is This? Amazingly, TensorFlow’s InceptionV3 classifier thought that this was an image of a rifle! Seemingly bizarre misclassifications such as this are unsettling and fairly common. In fact, you don’t have to use specially-generated images to fool an image classifier. Misclassifications commonly occur with natural images if they are presented at unexpected orientations (Alcon, 2019) 10 10 11 11 3 3/30/23 State Of The Art State Of The Art In the previous example, common objects presented at unusual angles were often misclassified. This shows how hard it is to build an effective image classifier… …and demonstrates that scene and object perception is quite difficult! Alcon (2019) 12 12 13 13 Difficulty 1: The Stimulus On the Retina is Ambiguous Take Home Message Object perception is very hard. d Our best computer algorithms are still quite bad at it. What makes the task so hard? A number of factors, but the three most important ones are a All these lines form the same retinal image. Thus, this 1D retinal image is ambiguous The stimulus on the retina is ambiguous Objects can be hidden or blurred Similarly, 2D retinal images are also ambiguous in Objects look different from different viewpoints that multiple stimuli can give rise to the same 2D retinal image and in different poses 14 14 b c 15 15 4 3/30/23 Difficulty 2: Objects Can Be Partially Occluded or Blurred Difficulty 3: Objects look different in different poses and from different viewpoints My glasses In the above photo, can you see my glasses that are partially occluded by the book? Most likely a machine would have difficulty recognising my glasses because they are partially occluded. Machines find it hard to recognise objects when they appear in unexpected poses or are viewed from unexpected angles. 16 16 17 17 Summary How Do Humans Succeed? How do humans solve these problems and The problem Competing solutions Principles of grouping Principles of segregation Gist perception successfully perceive objects and scenes? Although a complete explanation of this is beyond the scope of this lecture, we can make some progress towards this goal. We start by discussing two competing schools of thought: Structuralism Gestaltism 18 18 19 19 5 3/30/23 Structuralism Gestaltism Structuralism was proposed by Edward Titchener, based on his studies under Wilhelm Wundt. Structuralism distinguishes between sensations and perceptions Sensations: elementary processes occur in response to stimulation Perceptions: Conscious awareness of objects and scenes Structuralism claims that sensations combine to form perceptions. In other words, according to Structuralism, conscious awareness is the sum of these elementary sensations.... …and contains nothing that was not already present in these elementary sensations. Gestaltism directly contradicts Structuralism. The Gestaltists claim that conscious awareness is more than the sum of the elementary sensations. In other words, conscious awareness can have a characteristics not present in any of the elementary sensations. What evidence is there for this claim? 20 20 21 21 Evidence for Gestaltism Apparent Motion In apparent motion an observer sees two stationary dots flashed in succession. Although each of the dots is stationary, the observer perceives motion There are two main pieces of evidence that support the claim that conscious awareness can be more than the sum of the elementary sensations These two pieces of evidence are: In other words, the conscious awareness has a character (i.e. motion) not present in the elementary sensations (because they were both stationary). The conscious percept of motion was constructed and was not present in the elementary sensations. The physical stimulus itself is not moving. Apparent motion Illusory contours 22 22 23 23 6 3/30/23 Apparent Motion Apparent Motion 24 24 25 25 Illusory Contours Take Home Message Illusory contours are a second example of where the conscious awareness has a characteristic not present in the elementary sensations. There is plenty of evidence that conscious awareness is constructed and can contain characteristics not physically present in the image. For example, motion can be perceived when there is no motion in the image (e.g. apparent motion) contours can be seen when there are no contours in the image (e.g. illusory contours) This evidence argues against Structuralism but in favour of Gestaltism. For the rest of this lecture, we will therefore confine our attention to Gestaltism. Illusory contours are seen in locations where there are no physical contours. The conscious awareness of the illusory contour is constructed – there is no physical contour at these locations. Illusory contour 26 26 27 27 7 3/30/23 Gestalt Principles of Grouping Grouping and Segregation According to Gestaltism, humans are able to perceive objects and scenes because of perceptual organisation. In other words, humans are able to make sense of a visual image because they can perceptually organise it into the constituent objects. How do they do this? Perceptual organisation is achieved by the processes of grouping and segregation. Grouping is the process by which parts of an image are perceptually bound together to form a perceptual whole (e.g. the perception of an object) Segregation is the process by which parts of a scene are perceptually separated to form separate wholes (e.g. the perception of separate objects). Together, grouping and segregation allow a scene to perceptually organised into its constituent objects thereby allowing observers to make sense of the scene. 28 28 29 29 For Example A Simple Example The next slide contains a scene containing two objects, with each object containing two components. For each object, the two components are grouped together to form a single perceptual object. The two objects are segregated to form two separate objects Thus, both grouping and segregation are needed to make sense of the scene. 30 30 31 31 8 3/30/23 A More Natural Example Take Home Message To make sense of scenes, both grouping and segregation are needed. Otherwise, the scenes cannot be perceptually organised into meaningful units. All this grouped together to form a single object All this grouped together to form a single object Additionally, the two objects are perceptually segregated so they can be perceived as separate objects 32 32 33 33 Summary Gestalt Principles of Grouping Grouping is governed by 5 key principles. The more of these principles that apply, the more likely components of an image will be grouped together to form a perceptual object. Original Gestalt principles Good continuation Pragnanz Similarity Proximity Common fate Two additional ones (added later) Common region Uniform connectedness The problem Competing solutions Principles of grouping Principles of segregation Gist perception 34 34 35 35 9 3/30/23 Good Continuation Pragnanz Remember we mentioned that occlusions can make object recognition difficult. The principle of good continuation can help. Aligned (or nearly aligned) contours are grouped together to form a single object. This is why contour A is grouped with contour B, instead of with contours C or D. Literally German for “Good figure”. Also known as “principle of good figure” or “principle of simplicity” Essentially, groupings occur to make the resultant figure as simple as possible. A C D B In the figure to the right you see a panda, not a collection of unconnected splotches. 36 36 37 37 Similarity The more similar objects are, the more likely they will be grouped together. In a), all the dots are the same colour so it is unclear whether things are organised vertically or horizontally. In b), colour similarity groups the dots into columns. Proximity a) The closer the dots are, the more likely they are to be grouped together. In b), grouping by proximity forms horizontal rows. b) b) 38 38 a) 39 39 10 3/30/23 Common Fate Common Region Things that are moving in the same way are grouped together. Elements that are within the same region of space tend to group together (Palmer, 1992) a) b) 40 40 41 41 Take Home Message Uniform Connectedness There are a number of principles that help people to group together parts of an image to form perceptual wholes. These principles include Connected regions with the same visual characteristics (e.g. colour) tend to group together (Palmer & Rock, 1994) Good continuation a) Pragnaz Similarity Proximity Common fate b) Common region Uniform connectedness 42 42 43 43 11 3/30/23 What Are the Three Main Difficulties of Object Perception? Pop Quiz Please write down your answers to the following questions: There are a number of difficulties, but the three What are the three main difficulties of object perception? most important ones are Describe two bits of evidence for Gestaltism The stimulus on the retina is ambiguous How does Gestaltism claim that perceptual organisation is Objects can be hidden or blurred achieved? Objects look different from different viewpoints Name four of the Gestalt principles of grouping. and in different poses 44 44 45 45 What is the Evidence for Gestaltism? How is Perceptual Organisation Achieved? Perceptual organisation is achieved by the processes of grouping and segregation. Grouping is the process by which parts of an image are perceptually bound together to form a perceptual whole (e.g. the perception of an object) Segregation is the process by which parts of a scene are perceptually separate to form separate wholes (e.g. the perception of separate objects). Together, grouping and segregation allow a scene to perceptually organised into its constituent objects thereby allowing observers to make sense of the scene. There are two main pieces of evidence that support the claim that conscious awareness can be more than the sum of the elementary sensations These two pieces of evidence are: Apparent motion Illusory contours 46 46 47 47 12 3/30/23 What are the Gestalt Principles of Grouping? Summary These principles include The problem Competing solutions Good continuation Pragnaz Proximity Principles of grouping Principles of segregation Common fate Gist perception Similarity Common region Uniform connectedness 48 48 49 49 Segregation Segregation It is not enough to group components of an image together to form an object, you also need to segregate the different objects in the scene from each other… …and also segregate the objects from the background. If you did not do this, you would perceive the entire image as just a single object… Much of the perceptual segregation literature has focused on figure-ground segregation. The reason for this is that objects are normally perceived as “figures” and the background is typically perceived as the “ground”. Consequently, if you can identify what the figure is, you can typically identify the objects. But how does a person determine what is “figure” and what is “ground”? …which would be very confusing. 50 50 51 51 13 3/30/23 Figural Properties Figural Properties Rubin Vase Regions of the image are more likely to be seen as figure if: They are in front of the rest of the image They are at the bottom of the image They are convex They are recognisable. 52 52 53 53 Figural Properties Figural Properties The Rubin vase is ambiguous – it can be perceived as either a vase or two faces. It is therefore not clear what the figure is – two faces or one vase. If the vase is brought in front of the image it is then seen as the figure. If the two faces are brought in front of the image, they are then seen as the figure. This shows that depth ordering affects figure perception. Take home message: Regions of an image in front of the rest of the image tend to be seen as figures (i.e. they are seen as objects) Most people perceive image (a) as a red object in front of a green background. This is because lower areas are more likely to be seen as figures (i.e. are more likely to be perceived as objects) 100 75 50 25 0 Lower seen as figure Left seen as figure Modified from Vecera et al. (2002) 55 54 54 b) a) 55 14 3/30/23 Figural Properties Figural Properties - Convexity b) a) However, there is no left-right bias. Consequently, image (b) is ambiguous. It is not clear which side is the figure and which side is the ground. 100 75 From Peterson & Salvagio (2008) 50 25 0 Lower seen as figure Left seen as figure Is the black shape figure or ground? Modified from Vecera et al. (2002) 56 56 57 57 Figural Properties - Convexity Figural Properties - Convexity Concave regions From Peterson & Salvagio (2008) Convex regions Are the white shapes figures or ground? From Peterson & Salvagio (2008) 58 58 59 59 15 3/30/23 Experience Figural Properties - Convexity Peterson & Salvagio (2008) showed that if you see People also used past experience to segregate a single border, there is a slight tendency to perceive the convex region as figure. However, if you see multiple convex regions, each with the same colour, you are more likely to perceive those regions as figure. Take home message: Convex regions are assumed to be figures (i.e. objects) overlapping objects What letters do you see below? You use your knowledge of letters to segregate these two letters into separate objects. W M 60 60 61 61 Experience Experience As a) is in a familiar orientation it is easier to segregate it from the background than in b) a) b) From Gibson & Peterson (1994) Life Magazine:58;7 1965-02-19, p 120 63 62 62 63 16 3/30/23 Experience Experience Once you have seen the Dalmatian you cannot “unsee” it. That knowledge even survives when the image is flipped left to right. Life Magazine:58;7 1965-02-19, p 120 64 64 Life Magazine:58;7 1965-02-19, p 120 65 65 Gist Perception Summary When scenes are flashed rapidly in front of an The problem Competing solutions Principles of grouping Principles of segregation Gist perception observer, she may not be able to identify all the objects in the scene. Nevertheless, she get an overall impression of what the scene is about. For example, she might think that the image shows “a crowded cafe” That “overall impression” is what is known as the “gist” of the scene. 66 66 67 67 17 3/30/23 Gist Perception Gist Perception Potter (1976) studied gist perception using the following paradigm. Bridge? In each trial, the observer was cued with a particular scene description. Then she saw 16 randomly chosen scenes, each for 250 ms. Then she was asked if any of the scenes fitted the description. Observers were at near 100% accuracy. This showed that observers can rapidly perceive a scene’s gist. 250 ms 250 ms 250 ms Tim e Potter (1976) 68 68 69 69 Gist Perception Fei-Fei investigated what the minimum scene exposure time is needed to perceive a scene’s gist. Observers were presented with just a single scene, followed by a mask Observers were then asked to describe what they had seen. Li et al. (2007) 70 70 71 71 18 3/30/23 Gist Perception Fei-Fei et al reported that the longer the stimulus presentation time, the more detailed and accurate the description. People could start to perceive aspects of the scene at about 27 ms, but the perceptions were not very detailed 27 ms Couldn’t see much; it was mostly dark w/ some square things, maybe furniture. (Subject: AM) 40 ms This looked like an indoor shot. Saw what looked like a large framed object (a painting?) on a white background (i.e., the wall). (Subject: RW) 72 72 73 73 Take Home Message Although observers can extract the gist of a scene 67 ms I saw the interior of a room in a house. There was a picture to the right, that was black, and possibly a table in the center. It seemed like a formal dinning room. (Subject: JB) 500 ms Some fancy 1800s living room with ornate single seaters and some portraits on the wall. (Subject: WC) very rapidly, the gist they extract is not very detailed. The longer observers view a scene, the more detailed the gist they extract. 27 ms is enough time to extract some gist, and very accurate perception can be achieved in just 250 ms 74 74 75 75 19 3/30/23 Summary Questions Before the next lecture, please write down your answers to the following questions: In this lecture, I have discussed object and scene perception. We have covered the following topics: Please list four figural cues What is the gist of a scene? The problem How long do you need to get a rudimentary gist? Competing solutions Principles of grouping Principles of segregation Gist perception 76 76 77 77 The End 78 78 20