Object Recognition PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document covers various aspects of object recognition, including the process of recognizing objects in the visual system. It delves into concepts like representation, perception, and how the mind groups information from the environment. The text discusses the role of neural activity and various organizing principles in object recognition.
Full Transcript
Object Recognition Object recognition: the same object can project different images onto the retina different images can also project the same image onto the retina ◦ Inverse projection problem: The same object can project different images onto the retina but Different image...
Object Recognition Object recognition: the same object can project different images onto the retina different images can also project the same image onto the retina ◦ Inverse projection problem: The same object can project different images onto the retina but Different images can also project the same image onto the retina Clutter can cause problems ◦ Scenes contain many objects which can partially occlude others Viewpoint invariance: ability to recognize an object from any view point Object variety: enormous variety of objects; flexibility to represent any objects with no restrictions Variable views: different retinal images that can be projected by the same object/category of objects Representation and Recognition Representation: pattern of neural activity in the brain that contains information about a stimulus and gives rise to a subjective perceptual experience of that stimulus Recognition: process of matching the representation of a stimulus to a representation stored in long-term memory Fundamental steps Perceptual organization 1. Represent edges 2. Represent uniform regions bounded by edges 3. Divide regions into figure and group 4. Group together regions that have similar properties: groups (figures) = candidate objects ; rest = background 5. Fill missing edges and surfaces Object recognition 1. Higher-level processes to represent objects fully enough to recognize them Middle vision Middle vision: stage of visual processing between basic feature extraction & object recognition & scene understanding Identification of edges and surfaces Grouping of different regions of an image into objects How do we find the edges? : V1 neurons have small receptive fields Illusory contours: ability to perceive contours even though nothing changes from one side to the other Middle vision rules Gestalt psychology: whole is greater than the sum of its parts Gestalt grouping rules: when elements in an image will appear to group together Rule of good continuation: 2 elements will tend to group together if they lie on the same contour ◦ Can be detected by neurons with aligned receptive fields ◦ Common properties allow us to group parts of an image together Similarity Proximity Rule of Parallel contours are more likely to belong to the same figure Rule of Symmetry: regions are more likely to be seen as the same figure Rule of Common region: features that are part of a common region are likely to be grouped together 2 items that are connected tend to be grouped together Dynamic grouping properties Elements that share common fate (move together) Elements that are synchronized tend to group If meaningful or familiar Neural basis of perceptual grouping Synchronized neural oscillations: producing clumps of spikes at the same time Neural response = spikes/second = how rapidly a neuron produces action potentials Clumps: several spikes close together then a pause, then another clump of spikes Committees Perceptual committees Coming to a consensus decision Different and competing principles are involved and our perception reflects the consensus that emerges Pandemonium model of letter recognition Committee rules Respect physics and avoid accidents Ambiguous figure: visual stimulus that permits 2+ possible interpretations of its identity or structure ◦ Ambiguity: tend to obey laws of physics Accidental viewpoint: produces some regularity in the visual image not present in the world ◦ Assume that viewpoints are not accidental Formal model Formal, mathematical models we can use Bayesian approach: calculate the probability of a particular hypothesis given an observation ◦ P(H | O) Bayesian decision making How probable a particular object is to have caused the image on our retina ◦ H = hypothesis (seeing) ◦ O = observation (image) Our experience = P(O | H) ◦ How likely a particular object is to cause the retinal image that we're seeing **Our perception is the most probably object to have caused a particular image on our retina Figure and Ground Figure-Groung assignment: Deciding parts of an image belonging to the background vs whats part of the image/object Gestalt principles More likely to be seen as figure: Size: smaller region Symmetry: symmetrical regions Parallelism: regions with parallel contours Meaningfulness: recognize a shape Extremal edges: if edges are shaded tend to be seen as figure Figure or (back)ground? Relative motion (depth): if one region moves in front of the other, the closer region is figure Surroundedness: surrounding region (border) is likely to be ground Goals of middle vision 1. Bing together that which should be brought together 2. Split asunder that which should be split asunder 3. Use what you know 4. Avoid accidents 5. Seek consensus and avoid ambiguity Templates and components Naïve template theory: visual system recognizes objects by matching the neural representation of the image with a store representation of the same "shape" in the brain Structural description: description of an object in terms of the nature of its constituent parts and the relationships between those parts Problems with templates We would need a different template for every size, orientation, and style of the same "thing" Recognition by component: recognition of objects by the identities and relationships of their component parts Finite set of geometric icons (geons) Perceive objects with viewpoint invariance Problems with structural descriptions Can be too broad Geons arent always best to describe things Ability to recognize objects isnt completely viewpoint invariant ◦ Farther and object is rotated away from learned view, the longer it takes to recognize Possible solution Different object recognition processes that depend on the category level Entry-level category: label that first comes to mind Subordinate-level category: more specific term Superordinate-level category: more general term for the object Impairements Prosopagnosia: Type of visual agnosia in which the person is unable to recognize faces, with little or no loss of ability to recognize other types of objects Visual agnosia: impairment in object recognition Topographic agnosia: type of visual agnosia in which the person is unable to recognize spatial layouts such as buildings, streets, landscapes, and so on. Perceptual interpolation Perceptual interpolation: intelligently filling in edges and surfaces that arent visible Occluded Blended with the background Edge completion Edge completion: perception of a partially hidden edge as complete Operation involved in perceptual interpolation Illusory contours: nonexistent but perceptually real edges perceived as a result of edge completion Surface completion Surface completion: perception of partially hidden surface as complete Operation involved in perceptual interpolation Neural basis of perceptual interpolation Single neurons in V2 Perceptual organization Heuristics: rules of thumb based on evolved principles and on knowledge of physical regularities Perceptual inference: interpretation of a retinal image using heuristic Shape representation in V4 Neurons in V4 respond most strongly to edges that can be more complex than those in V1 in 3 ways 1. Edges can be straight or they can be curved 2. Contour with preferred orientation will elicit strong response if the contour is at a particular angular position 3. Preferred location on the retina covers a larger region of the retinal image Shape representation is richer in V4 than V1 because V4 neurons have larger receptive fields and selectively respond to more complex characteristics Shape representation beyond V4 Inferotemporal (IT cortex) Neurons have much larger receptive fields, covering almost the entire retinal image Each neuron is selective for much more complex shapes than V4 Respond to specific combinations of contour fragments located almost anywhere in the visual field Grandmother cell Grandmother cell: neuron that responds to a particular object at a conceptual level ; firing in response to the object itself Neuron with invariant response not just with respect to viewpoint but also to whether the object is actually present Modular and Distributed Representations: Faces, Places, and Other Categories of Objects Ideas about how objects are processed 1. Modular coding: representation of an object by a module, a region of the brain that is specialized for representing a particular category of objects ◦ FFA: region responding strongly to faces ◦ PPA: region responding strongly to places 2. Distributed coding: representation of objects by patterns of activity across many regions of the brain Top Down Information Bottom up: flow of information from the retina to V1, V4, and beyond From lower to higher regions of visual hierarchy Top down: flow of information: perceiver's goals, attention, and knowledge, and expectations about what objects are likely to occur from higher regions to lower regions of visual hierarchy Applications: Automatic Face Recognition Automatic face recognition systems are based on one of two different types of approaches to matching a digital image of a face to an image in a database of known faces 1. feature based: matches the spatial relationships among anatomical features in either two or three dimensions to instances stored in the face database 2. Holistic: can account for nonspatial aspects of facial appearance, is based on matching the whole image of a face to the images in the database, making use of eigenfaces