HCNN Object Recognition PDF
Document Details
Uploaded by DaringRadon4272
Tags
Summary
This document discusses object recognition in the visual system, focusing on receptive fields and neural processing. It explores how the visual system processes information to identify objects. The document appears to be part of a larger text, likely from a university course.
Full Transcript
0 1 2 Prior probability distributions in typical applications of the Bayesian strategy represent knowledge of the regularities governing object shapes, constituent materials, and illumination, and likelihood distributions represent knowledge of how images are formed through projection on the reti...
0 1 2 Prior probability distributions in typical applications of the Bayesian strategy represent knowledge of the regularities governing object shapes, constituent materials, and illumination, and likelihood distributions represent knowledge of how images are formed through projection on the retina. Some examples of prior knowledge are that solids are more likely to be convex than concave and that the light source is above the viewer. The more ambiguous the image – the greater the influence of prior knowledge in yielding a nonambiguous percept. Some perceptions may be more data-driven, others more prior knowledge driven. 3 4 A visual scene comprises many thousands of line segments and local surface patches. Intermediate-level visual processing is concerned with determining which boundaries and surfaces belong to specific objects and which are part of the background 5 6 7 8 The concept of RF was introduced in 1906 by Charles Sherrington. The receptive field is a characteristic of all neurons and, in vision, indicates the region of the visual scene where the stimulus must fall to excite or inhibit the neuron being studied. 9 10 Figure b, RF profiles of a set of five neurons with different RF locations. Horizontal dashed lines indicate the response of these five example neurons to two stimuli at nearby locations (vertical green and purple lines). Both stimuli fall into the same RF (middle grey curve), but they stimulate neurons with neighbouring RFs differently so that the population can resolve the two locations even though a single neuron cannot. In addition, the size of the RF determines the neuron’s spatial frequency tuning: the smaller the RF, the higher the spatial frequency it can resolve. 11 12 The amount of cortex devoted to one degree of viewing angle changes with eccentricity. Accordingly, more cortical space is dedicated to the central part of the visual field, where the receptive fields are smaller and densely packed and the visual system has the highest spatial resolution. 13 14 15 ON-Center ganglion cells are excited by a light stimulus in the center of the receptive field; OFF-Center ganglion cells are excited by a dark stimulus in the center of the receptive field; Note that the firing rate of ON-center ganglion cells increases soon after the dark stimulus disappears; Similarly, the discharge rate of OFF-center ganglion cells increases soon after the disappearance of the light stimulus; 16 The retinal ganglion cells have an organization of the receptive field with two concentric circular areas with opposite and antagonistic response. In ON-center cells, the illumination of the central part of the receptive field causes an excitatory response, i.e. an increase in the discharge of the cell, while the illumination of the surrounding part of the receptive field causes an inhibitory response (mechanism of lateral inhibition). The OFF-center cells are instead organized in the opposite way: the illumination of the surrounding area causes an excitatory response, while the illumination of the central part of the receptive field causes an inhibitory response. The simultaneous illumination (or darkness) of the center and surround does not evoke a variation in the discharge frequency. 17 In humans, if sinusoidal gratings are used, sensitivity is greater for spatial frequencies around 5-8 cycles / visual degree, and is attenuated both for higher frequencies (up to acuity around 30-50 cycles / degree) and for frequencies less than 1 cycle / degree. Multiplying the profile of the grating stimulus (intensity vs position) with the profile of the receptive field (sensitivity vs position) and integrating over all space calculates the stimulus strength delivered by a particular grating. In day light, contrast sensitivity declines sharply at high spatial frequencies, with an absolute threshold at approximately 50 cycles per degree. Interestingly, sensitivity also declines at low spatial frequencies. The attenuation at low frequencies reflects the inhibitory and antagonistic action of the periphery (surround) of the receptive fields of the retina, geniculate and cortex. Patterns with a frequency of approximately 5 cycles per degree are most visible. The visual system is said to have band-pass behavior because it rejects all but a band of spatial frequencies. 18 Humans are more sensitive to an intermediate range of spatial frequencies (about 4-6 cycles / degree) and less sensitive to both lower and higher space frequencies. Gratings with a frequency of about 5 cycles per degree are the most visible. The visual system is said to have bandpass behavior because it rejects everything but a narrow band of spatial frequencies. In the figure above, the stimulus contrast increases from top to bottom, while the spatial frequency increases from left to right. The central bars in the figure (medium spatial frequency) are visible even at low contrast, while the wide bars and narrow bars are visible only at high contrast. 19 Neurons in area V1 are classically divided into two types: simple and complex (Hubel and Wiesel, 1959). Neurons have elongated RFs and respond to a narrow range of orientations. Different neurons respond optimally to distinct orientations (orientation tuning curve). Example of a neuron in area V1 that selectively responds to lines that adapt to the orientation of its receptive field. This selectivity is the first step in the brain's analysis of the shape of an object. The orientation of the receptive field is thought to result from the alignment of the center- surround circular receptive fields of different LGN cells. In the monkey, the neurons of the LGN have non-oriented circular receptive fields. However, projections of adjacent LGN cells onto a simple cell create a receptive field with a specific orientation. Simple cells respond well to sinusoidal gratings (Gabor patches) of specific spatial frequencies and phases. 20 21 22 Complex cells are less selective for the position of the stimulus in the receptive field The receptive field has no defined ON and OFF regions and responds similarly to light (on a dark background) or dark (on a light background) stimuli in all positions of the receptive field. They are activated as a linear oriented stimulus crosses their receptive fields in one direction. 23 24 Respond better to linear stimuli of a certain length, or that have an end that does not extend beyond a specific portion of the cell's receptive field. End-stopped may serve to detect angles (“angle-detectors”) or curved lines of visual images. ON and OFF regions of the RF have the same preferred orientation (vertical, in the neuron illustrated in the figure). Therefore, the inhibitory effect is greater if the same oriented contour is presented both in the ON and OFF regions. A short linear segment (A), or a long curved line (C) will be effective in activating the neuron, because excitation will be greater than inhibition. On the contrary, a long straight line (B) will not be effective, because excitation will be canceled by the inhibitory effect. 25 According to the hierarchical model (Hubel and Wiesel, 1962), simple cell receptive fields are constructed from the convergence of geniculate inputs with receptive fields aligned in the visual space. In turn, complex receptive fields arise from the convergence of simple cells with similar orientation preferences. 26 27 However, it remains unclear what advantage, if any, is conveyed by this form of columnar segregation. One candidate function for cortical columns is the minimization of connection lengths and processing time, which could be evolutionarily important; 28 29 The dorsal and ventral pathways are highly interconnected so that information is shared. For example, stimulus movement information in the dorsal pathway (area V5) can contribute to object recognition through kinematic cues. Information about movements in space derived from areas in the dorsal pathway is therefore important for the perception of object shape and is fed into the ventral pathway. Note: all connections between areas in the ventral and dorsal pathways are reciprocal: each area sends information to the areas from which it receives input. Reciprocity is an important feature of connectivity between cortical areas 30 In the macaque monkey, V4 is located on the prelunate gyrus and in the depths of the lunate and superior temporal sulci and extends to the surface of the temporal-occipital gyrus. 31 It is quite simple for computer vision techniques to identify (rather than categorize) objects. On the contrary, for the human vision the task of identification (compared to categorization) is more difficult. Categorization A category exists whenever two or more distinct objects or events are treated equivalently. For example, when distinct objects or events are labeled with the same name, or when the same action is performed on different objects. Although the stimuli are distinct, organisms do not treat them uniquely; but they respond on the basis of past experience and categorization. In this sense, categorization can be considered one of the most basic functions of living beings (Mervis and Rosch, 1981) 32 We effortlessly and rapidly (100-200ms) detect and classify objects from among tens of thousands of possibilities despite the tremendous variation in appearance that each object produces on our eyes. Our daily activities (e.g., finding food, social interaction, selecting tools, reading, etc.), and thus our survival, depend on our accurate and rapid extraction of object identity from the patterns of photons on our retinae. The fact that half of the nonhuman primate neocortex is devoted to visual processing speaks to the computational complexity of object recognition. Object recognition involves integration of visual features extracted at earlier stages in the visual pathways. This integration requires generalization across different retinal images of an object, as well as generalization across different members of an object category. The representation also incorporates information from other sensory modalities, attaches emotional valence, and associates the object with the memory of other objects or events. 33 34 35 Area V1,V2 and V4 are located in the occipital lobe; Area TEO (TEmporal-Occipital) and IT (InferoTemporal) are located in the temporal lobe; 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 The results of the studies on single neurons of the temporal lobe are in agreement with the theories of the distributed code of object recognition. Although it is surprising that some cells are selective for complex objects, the selectivity is almost always relative, not absolute. 51 IT neuron selectivity often appears somewhat arbitrary. A single IT neuron could, for example, respond vigorously to a crescent of a particular color and texture. Cells with such selectivity likely provide inputs to higher-order neurons that respond to specific objects. 52 53 For neurons with small receptive fields that are activated by simple light patterns, such as retinal ganglion cells and V1, each object manifold will be highly curved. Moreover, the manifolds corresponding to different objects will be ‘‘tangled’’ together, 54 55 56 57 Objects could be reliably categorized and identified (with less than 10% reduction in performance) even when transformed (spatially shifted or scaled), although the classifier only saw each object at one particular scale and position during training. 58 59 60 61 62 63 64 65 Although the overall natural statistics of the screening images were roughly similar to those of the testing set, the specific content (semantic category) was quite different. Moreover, different camera, lighting and noise conditions, and a different rendering software package, were used. 66 67 68 69 70 71 72 Performance was significantly correlated with neural predictivity in all cases. Models that performed better on the categorization task were also more likely to produce outputs more closely aligned to IT neural responses. Thus, although the Hierarchical Linear-Nonlinear (HLN) hypothesis (i.e., higher level neurons (e.g., IT) output a linear weighting of inputs from intermediate-level (e.g., V4) neurons, followed by simple additional nonlinearities) is consistent with a broad spectrum of particular neural network architectures, specific parameter choices have a 73 large effect on a given model’s recognition performance and neural predictivity. 73 74 75 The x axis in each plot shows 1,600 test images sorted first by category identity (8 stimulus categories) and then by variation amount, with more drastic image transformations toward the right within each category block. The y axis represents the prediction/response magnitude of the neural site for each test image (those not used to train the model). In B, Distributions of model explained variance percentage (r2), over the population of all measured IT sites (n = 168). 76 In C, comparison of IT neural explained variance percentage for various models. Bar height shows median explained variance, taken over all predicted IT units. 76 77 78 79 80