Chapter 4: Perceiving and Recognizing Objects PDF
Document Details
Uploaded by HandierCerberus
University of Saskatchewan
Tags
Summary
This document provides notes on Chapter 4, focusing on the process of perceiving and recognizing objects. It covers topics such as the roles of different brain regions in visual processing, the concept of visual agnosias, and Gestalt principles. The notes also include examples and discuss how our visual system moves from raw visual input to object recognition.
Full Transcript
CHAPTER 4 PERCEIVING AND RECOGNIZING OBJECTS HOW DO WE GET FROM SPOTS TO OBJECTS? Retinal ganglion cells and LGN – detect spots of light Primary visual cortex – detects bars How do we go from spots...
CHAPTER 4 PERCEIVING AND RECOGNIZING OBJECTS HOW DO WE GET FROM SPOTS TO OBJECTS? Retinal ganglion cells and LGN – detect spots of light Primary visual cortex – detects bars How do we go from spots to bars to cats and dogs? EXTRASTRIATE CORTEX Region of cortex bordering the primary visual cortex and containing multiple areas involved in visual processing Areas v2, v3, v4, v5/MT, inferotemporal cortex Receptive fields begin to diversify to other properties important for object perception EXAMPLE: BOUNDARY OWNERSHIP CELLS DORSAL AND VENTRAL PATHWAYS Dorsal Pathway Ventral Pathway From occipital lobe into parietal lobe From occipital lobe into temporal lobe Where (and how) pathway or vision for What pathway action pathway Extracts information about the name and Extracts information about the location identity of objects in the environment and and shape of objects in the environment their functions and the relationship to your body and its movements (how to use/interact with the object) INFEROTEMPORAL CORTEX Inferotemporal (IT) cortex: Part of the cerebral cortex in the lower portion of the temporal lobe, important for object recognition Receptive field properties of IT neurons in macaque monkey cortex Very large—some cover half the visual field Do not respond well to spots or lines Do respond well to stimuli such as hands, faces, or objects QUIROGA ET AL. (2005) Recorded electrical activity in homologous regions in humans Hippocampus IT cortex Identified cells that respond to specific stimuli QUIROGA ET AL. (2005) GRANDMOTHER CELL Individual cells seem to respond selectively to very specific stimuli – e.g the face of your grandmother Must be learned, not hardwired in genetics, as DNA cannot know what your grandmother’s face looks like Is it possible that we have a single cell to identify each person or object we encounter? SPECIALIZED REGIONS Regions that are clearly specialized for specific types of identification Fusiform Face Area (FFA) – specific activation to human faces Parahippocampal Place Area (PPA) – specific activation to images of places Extrastriate Body Area (EBA) – specific activation to non-face body images VISUAL WORD FORM AREA (VWFA) Area specifically activated by written word images Left fusiform gyrus Same region as face processing areas – more lateralized to right hemisphere Evidence of evolutionary pressures modifying cortical organization WHEN IT PROCESSING GOES WRONG… Lesion: 1. (n.) A region of damaged brain. 2. (v.) To destroy a section of the brain. IT lesions lead to visual agnosias (agnosias can also be tactile or auditory) Agnosia: Failure to recognize objects in spite of the ability to sense them VISUAL AGNOSIAS Prosopagnosia: Difficulty recognizing faces Akinetopsia: Difficulty perceiving motion Astereognosis: Difficulty recognizing objects by their size, shape, weight, and texture Alexia: Difficulty perceiving written words Autopagnosia: Difficulty identifying body parts (your own or others) Achromatopsia: Difficulty naming colours, but able to see them and tell them apart Simultagnosia: Difficulty seeing more than one of an object at a time Social-emotional agnosia: Difficulty identifying nonverbal cues such as body language Form agnosia: Can recognize parts of an object but cannot recognize the object itself Environmental agnosia: Difficulty identifying where you are, describing a familiar location or giving directions to a location FAMOUS CASES OF AGNOSIA Prosopagnosia: Brad Pitt Jane Goodall Steve Wozniak The man who mistook his wife for a hat SPEED OF OBJECT RECOGNITION Happens in as little as 150ms Not time for a lot of feedback from higher brain areas Feed-forward process: Process that carries out a computation (e.g., object recognition) one neural step after another, without the need for feedback from a later stage to an earlier stage REVERSE-HIERARCHY THEORY Feed-forward processes give initial, crude information about objects by activating high-level parts of visual cortex. E.g. “saw an animal” More detailed information becomes available when activation flows back down the hierarchy to lower visual areas where the detailed information is preserved “E.g. “saw a sasquatch taking a selfie” Re-entrant processing THE PROBLEMS OF OBJECT RECOGNITION The problem of object recognition The pictures were just a bunch of pixels on a screen, but in each case you perceived a tiger How did you recognize all four images as depicting an tiger? How does your visual system move from points of light, like pixels, to whole entities in the world, like tigers? FROM SPOTS TO ELEPHANTS A – End-stopped cortical cell B – Edge-detector C and D – Corners – how do we distinguish between corners within and object (C) and corners and corners that create boundaries between objects? E – How to we tell that these are surface markings and not edges or boundaries? F – How do we know where elephant ends and rocks begin? WHAT EVEN IS AN OBJECT? Cat? Cat’s tail? The smell of cookies baking? The feel of something fuzzy under the bed in the dark? Music? A mug falling and breaking? A TRICK OF THE LIGHT Light interacts with surfaces in different ways – different surface types cause light to be reflected or scattered in different ways changing how we see an object Specular reflection: bright spots produced by light being reflected off a smooth surface – the spots are the same colour as the light source Matte-surface colour: when some light penetrates a little way into the surface of an object – some light is absorbed, other light scatters Translucency: more light is absorbed deeper into the surface - the light scatters more widely giving the impression of a glow SUBSURFACE SCATTERING With so many elements that can change how we see things, how do we ever learn to identify objects? STRUCTURALISM School of thought that believed complex objects objects or perceptions could be understood by analysis of the components Wilhelm Wundt and Edward Bradford Titchner “Perceptions are the sum of atoms of sensation.” Tichner Colour, form, orientation, width, length, breadth, depth – all summed to make a single perceptual experience Low-level vision: from the ganglion cells to the striate cortex taking in small bites of the visual scene Mid-level vision: stage of visual processing that comes after basic features have been extracted from the image and before object recognition and scene MIDLEVEL understanding VISION High-level vision: the point of object recognition and visual scene understanding Basically defined as NOT low- or high-level vision. Helpful. MIDLEVEL VISION Involves the perception of edges and surfaces How do you find the edges of objects? Cells in primary visual cortex have small receptive fields. How do you know which edges go together and which ones don’t? Determines which regions of an image should be grouped together into objects FINDING EDGES Computer-based edge detectors are not as good as humans Humans are very sensitive to even low-contrast changes that define boundaries Computers…. Are not…. FINDING EDGES The shading in this arrow makes some spots lighter than the background and some darker Even though there are places where the shading is the same in the object and the background, we do not “see” the edge disappear FINDING OBJECTS FROM LINES Unlike computers, we are not bothered by small gaps in edges We can infer edge information even when it is missing Gaetano Kanizsa – Kanizsa figures ILLUSORY CONTOURS A contour that is perceived even though there is no change in the image from one side to the other Why do the lines suddenly stop then start again? Represents the visual system’s best guess as to what is happening in a sparse image – something must be occluding the lines THE PROBLEM WITH STRUCTURALISM You cannot make a whole out If the whole is solely a sum of Illusory contours demonstrate of pieces that don’t exist! its parts, all you should that the perceptual experience perceive are the real contours is more than just the sum of the basic elemens GESTALT Gestalt: In German, “form” or “whole.” Gestalt psychology: “The whole is greater than the sum of its parts.” GESTALT GROUPING RULES Proximity: Objects that are close together are perceived as more related than those that are far apart. Continuity: Elements that are arranged on a line or curve are perceived as related, while those that are not are seen as separate. Law of figure and ground: People instinctively recognize what is in the foreground and background, and understand that the foreground is more important Closure: The brain fills in missing parts of an image or design to create a whole Similarity: People naturally group items that have similarities between them, and perceive them as connected. Law of common fate: When elements move together, people see them as a group Symmetry: The brain perceives ambiguous shapes as the simplest image possible CONTINUITY Two elements will tend to group together if they seem to lie on the same contour Which connects more strongly with 1? Which connects more strongly with 2? CONTINUITY ISN’T EVERYTHING … (OR IS IT?) CONTINUITY A) continuity overrides the occlusion cue to give the appearance of a transparent child B) Continuity makes it plausible to see a very very VERY long cat curled around a cylinder CONTINUITY AND TEXTURE Some contours in an image will group because of continuity We view lines of the same orientation as being part of the same contour CONTINUITY Texture Segmentation and Grouping Texture segmentation: Carving an image into regions of common texture properties. Texture grouping depends on the statistics of textures in one region versus another CONTINUITY Apply a statistical analysis of all figures in a region If the statistics differ the regions are separate Not all region differences use the same statistics SIMILARITY AND PROXIMITY Similarity: Similar looking items tend to group Can be based on limited number of factors Shape, size, colour, orientation Proximity: Items that are near each other tend to group CAMOUFLAGE Though Gestalt rules usually help separate objects, they can also be used to hide them Animals exploit Gestalt grouping principles to group into their surroundings. Sometimes camouflage is used to confuse the observer AMBIGUITY AND PERCEPTUAL “COMMITTEES” A metaphor for how perception works Committees must integrate conflicting opinions and reach a consensus. Many different and sometimes competing principles are involved in perception. Perception results from the consensus that emerges Committee rules: Honor physics and avoid accidents Ambiguous figure: A visual stimulus that gives rise to two or more interpretations of its identity or AMBIGUITY structure AND Perceptual committees tend to obey the laws of PERCEPTUAL physics “COMMITTEES” Accidental viewpoint: A viewing position that produces some regularity in the visual image that is not present in the world. Perceptual committees assume viewpoints are not accidental ACCIDENTAL VIEWPOINT ACCIDENTAL VIEWPOINT Your visual system knows not to rely on accidental viewpoints More likely that what you see is what is there, not that what you see is a strange trick of specific angles and viewpoint ACCIDENTAL VIEWPOINT – WHEN PERCEPTUAL COMMITTEES FAIL… ASSUMPTIONS ARE GOOD Explicit knowledge: knowledge that we can easily verbalize or articulate E.G. How do I get to Tim’s from here? Implicit knowledge: knowledge that is hard to verbalize or articulate, but we do not need to be able to verbalize in order to use it E.G. How do you balance when riding a bike? We use implicit knowledge about the world to make assumptions about the things we see ASSUMPTIONS ARE GOOD We assume light comes from above (and slightly left f center) That assumption tells us how to interpret shadows and highlights to identify objects ASSUMPTIONS ARE GOOD They generally help to resolve ambiguity Occasionally they fail when expectations are violated FIGURE AND GROUND Figure-ground assignment: The process of determining that some regions of an image belong to a foreground object (figure) and other regions are part of the background (ground) FIGURE AND GROUND Most people view the dark pink heart as the figure on a light pink background Unlikely to view the other way around, because the first explanation is the simplest and most commonly encountered FIGURE-GROUND ASSIGNMENT PRINCIPLES Return to our heart example: Surroundedness: The surrounding region is likely to be ground Size: The smaller region is likely to be figure Symmetry: A symmetrical region tends to be seen as figure FIGURE-GROUND ASSIGNMENT PRINCIPLES Rubin’s Vase Surroundedness: The surrounding region is likely to be ground Size: The smaller region is likely to be figure Symmetry: A symmetrical region tends to be seen as figure FIGURE-GROUND ASSIGNMENT PRINCIPLES Parallelism: Regions with parallel contours tend to be seen as figure. Relative motion: If one region moves in front of another, then the closer region is figure LAW OF COMMON FATE If one region moves in the same direction at the same time as another region, they must belong to a single object LAWS OF CLOSURE AND SYMMETRY The brain fills in the missing parts to make a whole/complete image The brain resolves ambiguity with the simplest solution THE PROBLEM OF OCCLUSION Objects rarely cooperate and isolate themselves against a clear background Back to the law of continuity for a moment… Continuity creates the closure needed to complete this figure in a way that makes sense THE PROBLEM OF OCCLUSION RELATABILITY: THE DEGREE TO WHICH TWO LINE SEGMENTS APPEAR TO BE PART OF THE SAME CONTOUR THE PROBLEM OF OCCLUSION Nonaccidental feature: A feature of an object that is not dependent on the exact (or accidental) viewing position of the observer. T junctions: Indicate occlusion. Top of T is in front and stem of T is in back. Y junctions: Indicate corners facing the observer. Arrow junctions: Indicate corners facing away from the observer IS IT THE WHOLE PICTURE…? Images are comprised of both global and local elements Each local part is a whole object, but the overall image is also a whole object GLOBAL SUPERIORITY EFFECT We are faster to identify the whole (global) image than we are to identify the individual (local) parts that make up the whole image. Consistent with the assumption that goal of mid-level vision is to identify large-scale objects NOT ALL PARTS AND WHOLES ARE AS SIMPLE… When two blobs are pushed together it creates a pair of concavities in the image We have an implicit knowledge that valleys indicate boundaries more than bumps We are thus more likely to perceive the black blob as two objects FROM MIDLEVEL VISION TO OBJECT RECOGNITION Moving from V1 to IT in the what pathway, neurons respond to more and more complex stimuli. V1 – lines and edges in specific areas of the visual field V2 – early steps from local features to objects V3 – colour and motion information FROM MIDLEVEL VISION TO OBJECT RECOGNITION V4 - stimuli such as fans, spirals, and pinwheels. It is difficult to know exactly what V4 neurons like, but it is something more complicated than spots or bars of light OCCLUDED SHAPES AND CELL RESPONSES V4 cell recordings The darker the circle the more a specific neuron fires in response to the shape The same shape created as an accident of occlusion does not cause the cell to fire Firing patterns are also influenced by boundary ownership rules from V2 A PICTURE OF PROCESSING IN THE BRAIN Functional imaging helps identify brain regions that respond best to certain stimuli Subtraction method: Comparing brain activity measured in two conditions: one with the mental process of interest one without the mental process of interest. The difference between the images may show the brain regions specifically activated by that mental process SUBTRACTION METHOD Mental process: Places Control images: SUBTRACTION METHOD PLACES CONTROL PARAHIPPOCALMPAL PLACE AREA - = DECODING METHOD Take fMRI scans of a participant looking at many images from various known categories Train a computer model to recognize brain activity from each category. Then test the computer model to see if it can identify an untrained image based on what it has learned PANDEMONIUM! Oliver Selfridge (1959) Model to explain how simple letters are recognized Demons were metaphors for processes PANDEMONIUM! 1. Early processing – something is detected 2. Feature processing – does it contain curves, horizontal lines, vertical lines, etc. 3. Cognitive processing – features were matched to templates for all letters with the found features to find the best fit 4. Decision – the cognitive matches are pooled and the best fit is chosen PANDEMONIUM PROBLEMS Not all As are created equal Demons need to be able to recognize all variations of the letter A to succeed Some have curves some do not How do demons cope with this problem? TEMPLATES VS STRUCTURES Templates: Structures: Internal representation of a stimulus Description of an object in terms of the used to recognize the stimulus in the nature of its constituent parts and the world relationships between them Works like a lock and key More abstract One template to match every object Does not rely on individual ever encountered representations BACK TO THE LETTERS… Templates: One match for every different kind of letter A you have ever encountered Structures: a capital A is made up of two flanking lines with a perpendicular center line PROS AND CONS Templates Structures Exemplars are easy to generate – save a Exemplars are hard to generate – what is snapshot every time you encounter a a “cat”? new version Easy to match – abstract nature makes Hard to match – so many to remember them generalizable and invulnerable to and compare variation in viewpoint RECOGNITION BY COMPONENTS Geons – geometric icons that are used to construct perceptual objects Limited: Not viewpoint independent No one set of geons can explain all objects MULTIPLE RECOGNITION COMMITTEES? Perhaps there are several object recognition processes, depending on the category level Entry-level category: For an object, the label that comes to mind most quickly when we identify the object Subordinate-level category: A more specific term for an object Superordinate-level category: A more general term for an object MULTIPLE RECOGNITION COMMITTEES? Processing occurs at each level all at once Allows for comparison across and within categories for faster identification Prepares the system for more advanced processing if needed THE SPECIAL CASE OF FACES Face recognition seems to be special and different from object recognition Holistic processing: Processing based on an analysis of the entire object or scene and not on adding together a set of smaller parts or features Subordinate processing FACE PROCESSING NEEDS TO BE FLEXIBLE Some information is invariant – your partner’s face will likely look the same today as it did yesterday Most information is dynamic - Emotional expression Facial hair Face shape while talking, yawning, laughing, etc. Evidence that faces are processed on a spatial grid where we compare points on horizontal and vertical axes for matches to identify faces Facial processing of friends and family is importantly linked to emotional processing