Object Recognition: 2D and 3D Pattern Matching
Document Details

Uploaded by WarmPrehistoricArt4484
2025
Tags
Summary
This document provides an overview of object recognition, covering 2D pattern matching techniques like template, feature, and structural theories. It also discusses 3D object recognition models, viewpoint invariance, and different types of agnosia. Key concepts include GEON theory, non-accidental properties, and serial stage vs. cascade models of object naming.
Full Transcript
Object Recognition 13 April 2025 19:59 Aims Marr & Nishihara (1978) Vogels, Biederman, Bar & Lorincz (2001)...
Object Recognition 13 April 2025 19:59 Aims Marr & Nishihara (1978) Vogels, Biederman, Bar & Lorincz (2001) Agnosia Describe 2D pattern matching – template, feature & structural theories Objects comprised of cylinders Found some cortical neurons in monkeys sensitive to GEONS Failure of knowledge or recognition = “agnosia”. (visual agnosia) Describe 3D object recognition – Marr & Nishihara (1978) & Biederman (1987; 1989) Specifying relationship between cylinders = structural description Assessed response of individual neurons in the inferior temporal cortex to change in Consider viewpoint invariance and viewpoint dependence They expressed structural relations by a hierarchical organisation of cylinders GEON or change in size of object Visual agnosia Describe models of object recognition – stage model & cascade model Each cylinder has axis & way in which others are joined are expressed as coordinates. Some neurons responded more to GEON changes (providing support for geons) - Feature processing and memory remain intact Agnosia - Recognition deficits are limited to the visual modality Evaluation - Alertness, attention, intelligence and language are unaffected What is object recognition? Flexible & comprehensive system for describing objects. But why 36 geons? Other sensory modalities (touch, smell) may substitute for vision in allowing objects Perception of objects is different for humans and for computers Experimental results consistent with model but doesn’t provide critical test to be recognized For humans - perception of familiar items Does describe how description is created but it doesn’t explain how description are For computers - perception of familiar patterns matched to those stored Apperceptive agnosia - Problems with early processing (shape extraction) Why is object recognition difficult Advantages Perceptual deficit, affects visual representations directly, components of visual Environments contain hundreds of overlapping objects Position of each cylinder described relative to its own axis, resulting in a description Recognizes the importance of the arrangement of the parts percept are picked up, but can’t be integrated, effects may be graded, often Yet perceptual experience is of structured, coherent objects which we can recognise, which is invariant across viewpoints Parsimonious: Small set of primitive shapes affected: unusual views of objects use and usually name Does not have to be straight cylinders Apparent size and shape of an object does not change despite large variation in Theory works well with biological objects such as humans and animals but not Disadvantages Associative agnosia retinal image everything else - too simplistic Emphasises importance of edges too much, other info such as textural info is - Problems with later processing (recognition) important as well e.g. peach vs nectarine Visual representations are intact, but cannot be accessed or used in recognition. Lack Examples of variability Biederman (1987; 1989) Which geons? of information about the percept. (Provided an alternative model) Within category discrimination (which chair?) “Normal precepts stripped of their meaning” (Teuber) Biederman’s Recognition-by-components theory: De-emphasise the role played by context in object recognition (affects later stages of Objects composed of basic shapes - aka GEONS = ‘geometrical ions’ object recognition) Summary - GEONS can include blocks, cylinders, arcs, wedges Simplifies the contribution of viewpoint-dependence Agnosia useful for studying object recognition - Approximately 36 different volumetric shapes Different kinds of agnosia - Viewpoint invariant theory - can build structural description Viewpoint dependent theory Agnosia may be restricted to specific categories Small number of structural relationships, for example: Viewpoint invariance theories Agnosia may be found alone or with problems with faces - Relative size, verticality, centring, relative size of surfaces at join - Biederman (1987) Variability in visual scenes - Object recognition if not affected by the observer’s viewpoint Key terms Partial occlusion and presence of other objects Examples of GEONS Viewpoint-dependent theories Invariance Recognition when only part of an object is visible - Tarr (1995) Tarr & Bulthoff (1995; 1998) Template, prototype and feature theories of 2D pattern matching Recognition from unusual views - Assumes changes in viewpoint reduce the speed and / or accuracy of object Structural descriptions & primitives recognition (Kallmeyer et al, 2022) Recognition-by-components (Geon theory) Processes involved in object recognition: Non-accidental properties 2D Pattern Matching Viewpoint-dependent and viewpoint-independent theories How do we recognise the letter 'A'? ‘object representations are collections of views that depict the appearance of objects Serial stage model of object naming 1. Template theories from specific viewpoints’ (Tarr & Bulthoff, 1995) (kind of like templates) Cascade model of object naming - Mini copy or template in LTM of all known patterns Apperceptive and associative agnosia - Normalisation? Numerous templates? Evidence suggests that viewpoint invariant mechanisms used sometimes in object - Real life examples: barcodes, fingerprints Relationship between GEONS recognition whereas viewpoint dependent mechanisms used at other times Multiple templates are held in memory Concave parts of an object’s contour helpful in segmenting visual image into parts Viewpoint dependent more important for within category discriminations Compare stimuli to templates in memory for one with greatest overlap until a match Geons specified in terms of ‘non-accidental’ properties: - e.g. different kinds of car is found Curvature - points on a curve Parallel - set of points in parallel Vanrie et al. (2002) Criticisms Co-termination - edges terminating in a common point ‘The key question is no longer if object recognition is viewpoint-dependent or - Problem of imperfect matches Symmetry - versus asymmetry viewpoint independent, but rather when, i.e. under which circumstances.’ - Cannot account for the flexibility of pattern recognition system Co-linearity - points in a straight line Viewpoint dependent = complex within category decisions - Comparison requires identical orientation, size, position of template to stimuli Viewpoint invariant = easy categorical decisions E.g. Cylinder possess curved edges & two parallel edges connecting the curved edges 2. Prototype theories Regularities in the visual image thought to reflect actual (non-accidental) regularities Tarr & Hayward (2018) - Modification of template matching (flexible templates) in the world ‘object representations are neither viewpoint-dependent nor viewpoint-invariant’ - Possesses the average of each individual characteristic E.g. 2D symmetry in the visual image indicates symmetry in 3D object - No match is perfect: a criterion for matching is needed Beyond recognition Supporting evidence Once structural description of an object is formed – it must be matched to stored Franks & Bransford (1971) representations - Presented objects based on prototypes If there is a match then object is ‘recognised’ - Prototype not shown Several models which specify stages in ‘recognition’ process e.g. Humphreys et al - Yet ppts are confident they had seen prototype (1988) - Suggests evidence of prototypes 3. Feature theories - Pattern consists of a set of features or attributes - A = 2 straight lines and connecting cross bar - But also need to know relationship between features - A example could also be seen as H 4. Structural descriptions - “..describe the nature of the components of a configuration and the structural arrangement of these parts” (Bruce & Green, 1990) - Capital letter T = 2 parts; 1 horizontal ; 1 vertical; vertical supports horizontal; vertical bisects horizontal - More in-depth description of pattern Edge extraction - edges are important Parsing of regions of concavity - help determine where one GEON stops and another starts Oversimplification 3D Object Recognition Non-accidental properties - work out what the GEONS are As ‘later’ processes may start before earlier ones have been completed (Similar but more complex process) Determination of components - worked out what constituent GEONS are Stages may not occur in order Must interpret input to visual system as coherent structures, segregated from one Match components to object representations - work out relationship between General support for model from patients with object recognition difficulties another and from background (early image processing) GEONS and match to stored memory about known objects Associative agnosia e.g. Patient HJA; Patient JB Must be processed to give a description - can then be matched to descriptions of If matched, then object feels familiar Humphreys et al (1988) propose alternative ‘Cascade’ model: visual objects stored in memory Structural, semantic and name stages interact If cannot see the edges (where one starts and one stops): Both within and between stages Marr's Computational Approach - It is difficult to pick GEONS Makes different predictions about how subjects will perform in object naming task - Difficult to determine which GEONS are being looked at > won't create good If there is an issue at a stage, rather than not progressing, there will be a knock-on structural description of object > difficult to match object in stored memory effect on the following stages Interaction showed in diagram Supporting evidence Beiderman (1987) Showed ppts line drawing of objects and asked to recognise them He deleted edges at points of concavity So the stimuli was presented for 100, 200 or 750 msec with 25%, 45% or 65% contours removed Four questions Slow & inaccurate at ‘non-recognisable’ but relatively good at ‘recognisable’ 1. What features are used in the description? (primitives) 2. How is the relationship between these features specified? Deletion of component affects matching stage 3. How is the overall description invariant across views? Reducing the number of components to match to 4. What about viewpoint dependence? Midsegment deletion makes it more difficult to determine components Conclusion Remove one GEON or removing edge info - makes it less recognisable Anecdotal and empirical evidence for a separation of structural, semantic and naming processes in recognition Humphreys et al (1988) propose processing across these stages operates in cascade rather than independently e.g. Patient JB. Naming visually confusable objects (birds, animals) had knock on effects, making it more difficult to identify their category Page 1