Chapter 5 Perceiving The Visual World PDF
Document Details
Uploaded by UndamagedClarinet
R. G. de Almeida
Tags
Summary
This document is a chapter on visual perception, exploring the process from basic sensory input to complex cognitive interpretations. It examines the relationship between the physical stimuli and our subjective experience of the visual world. The author explores different theories of perception, including ecological and computational approaches.
Full Transcript
R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 1 PART II: CORE AREAS CHAPTER 5 Perceiving the Visual World “…from time to time I ha...
R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 1 PART II: CORE AREAS CHAPTER 5 Perceiving the Visual World “…from time to time I have found that the senses deceive, and it is prudent never to trust completely those who have deceived us even once.“ -René Descartes (1641), First Meditation, in Meditations on First Philosophy. “The most lively thought is still inferior to the dullest sensation.” -David Hume (1748), An Enquiry Concerning Human Understanding Chapter sections: 5.1 On Dull Sensations and Deceptions 5.1.1 Visual Computations vs. “Ecological Perception” 5.1.2 Perceptual Inferences 5.1.3 Dumb But Also Smart Perception 5.2 The Visual System of the Beholder 5.2.1 Issues on the Implementation of Vision 5.2.2 Beyond the Retina, Behind the Eyes [abbreviated] 5.3 The Architecture of the Visual System [abbreviated] —X— 5.1 On Dull Sensations and Deceptions Our experience of the visual world begins with arrays of light intensities reaching our eyes. These are the “dull sensations” that Hume refers to. But these sensations, as we will see in the present chapter, quickly turn into “lively thoughts”—the thoughts we have about what we see. Our first goal here is to show that these sensations aren't really “dull”. They are rich in information, enabled by the natural kinds that our visual system is hardwired to detect. Indeed, much of our visual experience is in line with what Descartes wrote, as quoted above: the senses deceive us, and they do so in at least two ways. First, the visual input seems to underdetermine the richness of the computations we perform to reconstruct and represent the world out there. And, second, the visual input system performs its computations to a large extent independent of what we know to be true about the world: illusions persist even when we know they are illusions. These "deceptions" are in fact suggestive of the elaborate machinery of perception. But before discussing these ideas more thoroughly, let me start by demonstrating rather simply the ways in which our senses deceive us. Close your eyes (after you finish reading this paragraph, of course), and keep them closed while slowly turning your head all the way to the R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 2 left or to the right; then, after a few seconds, open your eyes while fixating on something at random. What were your first impressions? What did you see as soon as light came into your eyes? Was it a kaleidoscope of colors, lines, shapes, textures, shades, perhaps fuzzy boundaries? Did you, instead, see real objects, your own belongings in a well-known scene? Most likely, what you experienced were not fragmented pieces of reality, but familiar objects. That’s what we consciously experience when we look at the world. However, just before you were able to make sense of the scene and its constituent objects, the very first things your eyes grabbed at a glance were indeed the likes of meaningless lines and colors. Granted, that is not easy to uncover. And that is one reason why the senses deceive us. They do so because they do not appear to be working on fragments of reality; they deceive us simply because we cannot have access to their actual routines. But they also deceive us in the way that Descartes suggested: the workings of the senses are to a large extent autonomous and, thus, often produce erroneous perceptual inferences about what the real world actually is. The experience you have when you open your eyes in a familiar scene, in summary, is that of perceiving objects you understand as being of a certain kind and that serve a certain function. But perception begins much earlier than the recognition of objects of interest; it begins with the “dull sensations” that Hume talked about. Before you get the objects of interest you need to decode the visible space; to put it simply, you need to see red and a shape before you understand it as being a rose. The study of visual perception focuses, to a large extent, on the kinds of elements that constitute the very earliest moments of our interactions with the visible world. We could say that what’s really important about perception occurs within a window of time that is much shorter than 1 second, counting from the moment your eyes gaze at some piece of the world. It is within this short window of time that you come to experience objects, faces, and scenes given the “kaleidoscope” of colors, lines, and shapes that your eyes get. Perception, then, is about the organization of this kaleidoscope into the objects that form our understanding of the world. As such, much of the focus here will be on what the mind does with the early visual input and how this input gets organized into meaningful objects of interest. Although much of our concern is with this approach to perception—from the early input to its products via computations—there are other views on the nature of perception. So, I want to start by contrasting the present approach with the so-called "ecological" approach. After we sort this out, we will focus on the properties of the visual system and, true to the assumptions guiding the work in this book, we will discuss the principles underlying our perceptual capacities. 5.1.1 Visual computations vs. "Ecological Perception" I said that the visual system operates on input information—the fragments of visual world. This process, is, by hypothesis, entirely bottom-up: it is bottom-up in the sense that it is driven by the low-level features of the visual world, the features which we are hardwired to R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 3 detect, together with internal mechanisms for building the representations of those features into higher-level objects, faces, and scenes. This bottom-up approach, however, is far from consensus. There are those who believe that perception is almost entirely purpose-driven. Neisser (1976), a pioneer of cognitive psychology information-processing style (see Chapter 2), had a change of heart after flow- charts and computer models began to dominate the study of cognition. He thought that perception should not be seen as the grasping of information via the sense organs and its transformation through successive stages of processing following computer-like algorithms. For Neisser, perception is somewhat different than that: it occurs over time to serve the goals of the perceiver, who anticipates much of what is to be perceived, integrating recent and incoming streams of information. As he put it, At each moment the perceiver is constructing anticipations of certain kinds of information, that enable him to accept it as it becomes available.” (Neisser, 1976, p. 20) Neisser’s colleague J. J. Gibson (1979) held a similar position, in fact developing his theory of vision on the assumption that what vision does is to connect you to objects to serve affordances: it is as if your perceptual processes are determined by what the objects “mean”. In his words, The perceiving of an affordance is not a process of perceiving a value-free physical object to which meaning is somehow added in a way no one has been able to agree upon; it is a process of perceiving a value-rich ecological object. Any substance, any surface, any layout has some affordance for benefit or injury to someone. Physics may be value free, but ecology is not. (Gibson, 1979, p. 140) This might make sense intuitively. The perception of a “value-rich ecological object” does match our intuitions on how we see the world: it seems that we focus on what is important to us while neglecting much of what is not. Going back to our little demonstration in the beginning of the present chapter, we saw that, when you open your eyes randomly at a particular scene, most likely what you see are the objects of your interest, perhaps the "value- rich" ones. Attention, as we will see in Chapter 7, plays an important role in helping us sort out the relevant and the irrelevant information. In addition, the ecological approach makes intuitive sense because we seem to connect to objects of interest almost as immediately as we lay our eyes on them: we see a coffee mug and it seems hard to dissociate it from the very idea of having a coffee. But it is not because I need or want coffee that I see a coffee mug in a special way. This is how Gibson (1979, p.139) put this issue: An affordance is not bestowed upon an object by a need of an observer and his act of perceiving it. The object offers what it does because it is what it is. Gibson wanted to draw a line between the theory of affordances and the idea of value or meaning in an object, which Gestalt psychologists such as Koffka assumed to be perceived with R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 4 the object. As Koffka (1935, p.7) wrote,1 Each thing says what it is… a fruit says 'Eat me'; water says 'Drink me'; thunder says ' Fear me'; and woman says 'Love me'. The line that Gibson wanted to draw seems to be a thin one, if in fact it can be drawn. For it does appear that affordances need to rely on what objects are “used for”, or what they "say" to us. Thus Gibson's view is pretty much in line with Koffka's: objects "say" what they are for, and our looking at them serves our desires and our expectations. Gibson's and Koffka's views meet somewhere in the middle. The key difference is that Gibson also allowed for misperception: a thing says what it is, but a thing can also deceive us. But Gibson’s “ecological” approach to perception faces some serious challenges. Perhaps most importantly, it attempts to deny the role played by internal mechanisms in the process of decoding incoming information. It does so by assuming that information about objects and scenes are picked up “directly”, by what they "say". Surely, you might be certain about the very products of your perception, what you understand about the world around you. Moreover, you might even know what you are going to do with the very objects you perceive. But the mechanisms that underlie that kind of certainty play a fundamental role in the process of understanding what you see and, ultimately, in the “benefit or injury” it yields. The object only has a meaning (or affordance) once it is decoded. It is not the case that your expectations and beliefs affect what you see: it would be wonderful to see what you want—to make a very rough caricature. Before you get to “see what you want," you need to decode what your retina (in fact, your brain) gets from the environment.2 And, as we will see, there are lots of principles that operate on the input, lots of “algorithms," to use the term from Chapter 3, that serve the simple purpose of providing you with a representation of the objects and scenes. Though tempting because perhaps matching our common sense of what perception does, Gibson’s view lacks a solution to what has been called the “frame problem”: among all the indefinitely many things you know, among all your beliefs, which ones are supposed to constitute your “anticipations” of incoming information? How do you “frame” your anticipations?3 In other words, the anticipation of incoming information requires internal representations of what is to be expected, which requires some criterion on what is to be represented to be expected, and so on. To put it simply, one cannot determine to-be processed information if not by postulating particular states compatible with to-be processed information, which in turn requires a selection criterion. But there can’t be a 1 Quoted in Gibson (1979, p. 138). It should be noted that Gibson’s quote from Koffka is somewhat misleading. In his original work, Koffka was referring to “primitive man” (p.7), at a “prescientific stage”, who gradually “developed a new activity which he called thinking” (p.7). In passing, it is also difficult to dissociate this view with that of Skinner. See Chapter 2. 2 Usually, vision scientists take the retina to be part of the brain. As Gregory (1966, p. 61) states, the retina is an “outgrowth of the brain”, adding that, evolutionarily, it “budded out and became sensitive to light”. 3 The “frame problem” was coined in the context of designing AI systems that could reason in action. The problem is that there is no principled way of determining which elements of the system’s supposedly vast knowledge should change and which ones should not change as a consequence of the system’s action (see Pylyshyn, 1997). R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 5 criterion that selects your anticipations, except for what the incoming information—and its successive transformation upstream—propose. It is certainly true that much of what you believe about the world will eventually meet what you perceive about the world—but your beliefs cannot determine what you see. I wanted to begin this chapter contrasting these views—perception as computational inference and the “ecological” view—to introduce you to what perception does, which might be different from what you think perception does. This contrast helps us make a crucial point about the study of perception: we need to understand the internal mechanisms that allow for the representation, updating, transformation and, ultimately, the interpretation of the external world. And while nobody denies that “affordances” might play a role in how eventually we act upon objects or how we select objects to be further processed, there is a lot to be said about how we get to do that in the first place. We will see in Chapter 6 how objects are perceived from a computational, “non-ecological” perspective. But first, let us look at certain mechanisms that yield the computation of objects. We will start off with some basic notions on what is at stake. Then, in Section 5.2, we will examine the neuroanatomical properties of the visual system. 5.1.2 Perceptual Inferences There is, it seems, sort of a consensus in cognitive science (apart from the "ecological" approach) that what you ultimately get is the product of complex perceptual computations— some call them perceptual inferences. These perceptual inferences begin early on with your "dull" sensations, with the imprint of light contrasts on your retinas, and end when you understand what you see as an object (or scene) of a certain kind. We take this distinction— between decoding the incoming information via perceptual inferences and understanding or cognizing what is seen—to be key to cognitive science: much of what we do in trying to understand how the mind works is to focus on the principles (that algorithms) that are hence fixed and hardwired, rather than on what is contingent on beliefs and expectations. And to a large extent perception typifies a domain in which mechanisms ought to be fixed across the species, as we shall see. This distinction between perception as an input-driven mechanism and other domains of analysis of the visual world, including what is eventually contingent on beliefs and expectations, is deeply rooted in the common histories of philosophy and psychology. Among the British empiricists, it was perhaps Hume who expressed more persuasively the idea that perceiving and understanding are very distinct processes. In his Enquiry, Hume (1748/1912) articulated this in his beautiful prose: “Every one will readily allow, that there is a considerable difference between the perceptions of the mind, when a man feels the pain of excessive heat, or the pleasure of moderate warmth, and when he afterwards recalls to his memory this sensation, or anticipates it by his imagination. These faculties may mimic or copy the perceptions of the senses; but they never can entirely reach the force and vivacity of the original sentiment. The utmost we say of them, even when they operate with greatest vigour, is, that they represent their object in so lively a manner, that we could almost say we feel or see it: But, except R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 6 the mind be disordered by disease or madness, they never can arrive at such a pitch of vivacity, as to render these perceptions altogether undistinguishable.” (Enquiry, Section II, On the origin of Ideas). Surely, one small problem with Hume’s view is that he wanted all our “Ideas” (cognition) to be “copies”—even if faint ones—of our “impressions” (perception). Descartes, a rationalist, about a century before Hume, had already put forth the notion that “perceiving” and “cognizing” are different faculties of the mind, but he doubted that they could be separated from each other as an arm or a leg could be severed from the rest of the body. When he suggested that there was a difference between “perceiving” and “cognizing”, Descartes’ main concern was with the apparent (to him) separation between body and soul: his struggle was with justifying his being in virtue of his uncertainty about the physical existence of things out there. You might remember from Chapter 2: “I am a thinking thing”, he wrote. And that was all he was certain about, for he could not show that the things he was thinking about really existed outside his own mind. We do not need to go that far, to the point of denying the existence of things external to us. We can now leave that doubt to the rhetoric of romantic songs like Rogers’ and Hart’s With a song in my heart (“When the music swells / I'm touching your hand / It tells me you're standing near”). We know that we came a long way in understanding that what we get from the visual world is formed first in our retinas—the densely neuron-populated tissue covering the back of our eyes (see below). And we also know that there is in fact a world out there (I am sort of certain about that), and that this world out there is decoded and encoded into particular forms of representations that allow for the brain to understand the world and guide our actions. The idea that a line between perception and cognition should be drawn has motivated much of the research in the second half of the 20th century up to today. The general research program has been mainly to understand properties of the functional architecture of vision, and in particular to sort out the types of mechanisms that are independent of what we know explicitly about the world. A theory of vision proper requires an understanding that the 3-D world is projected onto our retinas as a 2-D array of light intensities, which needs to be transformed—or “interpreted”—as a 3-D world again. Figure 5.1 is a simplified version of the problem of vision: to transform the world out there into a representation of that world based on the coding of light intensities (and color, motion, and other properties of the visual world). We can say that these light intensities are "projected" into our retinas based on light bouncing off surfaces of objects. But just this projection won't work. The array of light intensities needs to be transformed into how the world appears to be not only from our perspective, our viewpoint, but also from what we make of it, including surfaces or parts that are occluded to us and, thus, based on parts of objects that are not “projected” onto the retina. R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 7 Figure 5.1: The basic task of visual perception involves encoding the real 3-D world (left) into a 2-D array of light intensities in the retina (middle), and transforming it into a 3-D representation. Notice that the 2-D retinal image is a relatively poor encoding of the 3-D world, for it is periphery lacks color neurons (see text) and the image it delivers to the primary visual areas (V1) is “upside-down” and distorted.. (© The author) Here is where it comes handy to bring back Descartes once again. Separating “perceiving” from “cognizing”, will lead us to determine the kinds of mental processes (and, thus, what kinds of implicit knowledge; see Chapter 1) that occur independently of our explicit knowledge of the world; this will also lead us to understand which processes (and representations) are fixed, hardwired in the brain. That is, if we can in fact determine which mental processes and representations cannot be changed due to our expectations and beliefs (our explicit knowledge), then we can postulate which properties of the human architecture are fixed and by hypothesis hardwired in the neuronal machinery—or, as Marr (1982) put it, implemented in the brain. Knowing which properties are hardwired is a major step in understanding how our minds/brains have been shaped by evolution. Determining the hardwired properties of the mind/brain is not a bad motivation, but it poses a great challenge, which is to establish more precisely the computing algorithms that make vision and other perceptual systems capable of dealing with a vast array of tasks. The recognition of objects and scenes and our acting upon them are just the final products of those visual computations. Of course, determining the hardwired properties of the mind/brain is not a task to be performed only in the study of visual perception, as we will see in the present chapter, but it is in visual perception where it becomes more obvious. It was in perception—our impressions—where Hume grounded his most firm understanding of how the mind works. 5.1.3 Dumb but also Smart Perception The modern view of this research program has been made more explicit by Jerry Fodor (1983), who says that perception is like a reflex, the kind of patellar reflex you get when you hit your knee just below the patellar tendon causing your lower leg to kick. This reflex is independent of your background knowledge or desires: the leg just does its thing when the tendon is hit. Perception looks smart but it is dumb, Fodor says. You might think that this goes R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 8 contra Hume, who thought that sensations are richer than thoughts, but it is quite along the same lines: In Fodor’s view, perception is smart because it “knows” a lot about the properties of the world, it knows for example that the image of an “expanding” object on your retina is, most of the time, sign of a fast-approaching object. You may remember the example, from Chapter 1, of the bus coming towards you. As I mentioned there, questioning the nature of the expanding object is not something that your visual system is there to do. In this regard, perception is very smart: it “knows” what it is; it has a pretty good hypothesis about what is happening out there in the world, it computes the right kinds of inferences (most of the time) and its conclusion is quick, helping you determine your action of jumping out of the way. It’s not the case that the early visual system knows anything about busses, to be clear: it knows (or assumes) that expanding retinotopic projections should be interpreted as fast approaching objects--that's all. You may also remember the Müller-Lyer illusion (Fig. 1.1) from Chapter 1. Knowing that the two straight lines have the same length does not change the nature of the representation that you get from your perceptual system. Perception is dumb because in the course of doing its thing it deceives us, as Descartes said. A more dramatic example of how “dumb” perception may be comes from more mundane experiences such as that of watching 3-D movies or watching movies on an Imax theater (even better if it’s a 3-D movie on Imax!). The wide screen and the vivid images can really “fool” your visual system: if the images displayed on the Imax screen are that of an airplane flying over some tall peaks, diving into valleys and making sharp turns, as in the movie Grand Canyon Adventure (directed by G. McGillivray), your body will move with the flow of the motions produced by the images. You know you are not diving into the Canyon, but the experience persists. Several years ago, Walt Disney World’s Magic Kingdom had a movie theater called Circle- Vision 360o. The spectators would stand in the middle of the round-shaped room, holding themselves to bars attached to the floor—which obviously increased the suspense of what was to come in the ingenious show. Then, a well-coordinated series of nine image projections around the room made the theater turn into a virtual airplane, with spectators making sharp body movements as the images projected onto the screens signaled sharp “airplane” turns through valleys and mountains, narrowly escaping fast approaching peaks. Spectators would often feel dizzy with all the “motion” provoked by the fast turns of the “airplane”. Everybody there (but small children, perhaps) knew that the images were projected onto screens, that the room was not moving—certainly not flying anywhere—but their perceptual systems couldn’t care less about what their cognitive systems knew about the real world. If you haven’t had the chance to go that movie theater, you can experience a similar effect with 3-D animated movies which appear to bring characters flying over spectators’ heads: almost invariably, people try to “touch” these characters as they seem to come close. In all these cases, people have a pretty good understanding of what is going on, but still they cannot but feel the illusion. The perceptual system is just doing its job: providing information to higher cognitive systems about what is really happening in its domain—that there is sudden “motion” in the environment signaling the body to move or that there was a character fast approaching, floating over everybody’s heads. That’s why perceptual systems are dumb, in Fodor’s sense. But of course, R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 9 they know something about the world, just enough for them to do what they have evolved to do, but not enough for them to disregard the illusions as such. As Descartes wrote (see quote above): the senses deceive; but most of the time they deceive for the right reason: to provide us with a quick interpretation of what’s out there, even if what’s there is not really what it appears to be. 1. 5.2 The Visual System of the Beholder There is certainly more to visual perception than what meets the eyes, much more than an irresistible old pun. In order to understand visual computations proper, we need to look at what the visual sensory organ, the eye, delivers to the brain. In fact, we need to understand how light passing through the iris of the eyes (see below) gets to be encoded and transformed by the retinas and what sort of information the retinas send to different areas of the brain. In this section, we will briefly look at how vision is implemented and what its physical design suggests regarding the kinds of computations that need to be performed to account for what is seen and what is not seen by the eyes. 5.2.1 Issues on the implementation of vision As we saw in Chapter 3, one way to understand what a system does is to look at how it is physically implemented (following Marr, 1982). And, in the case of vision, physical implementation comprises a large network of different types of neurons with connections going from the retinas in our eyes all the way to the visual centers of the brain located mostly in the occipital lobe (the “back” or posterior part of the brain) and the temporal lobes (mostly the “lateral lower sides”). Figure 5.2 shows the basic anatomy of the eye. ! Figure 5.2: Anatomy of the human eye (courtesy of the National Eye Institute). R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 10 Visual perception begins with light, with the transformation of electromagnetic energy into electrical impulses generated by the neurons in the retina—a process we usually call transduction. It is from the retinal image—i.e., the pattern of activation and inhibition of millions of photoreceptors—that we start forming our representations of the visible world. This of course by no means denies the existence of knowledge already represented in the brain— innate knowledge, or knowledge that unfolds due to our experiences in the world. You cannot possibly know that chairs exist unless you experience them. But it is quite possible that your visual system knows a lot about how to build a 3-D representation of a chair, or, more basically, that light contrasts in the visual field mark object boundaries, signal the orientation of object surfaces and other basic processes bearing on our first “impressions”. In visual perception, then, it all begins with operations on light intensities that occur at the retina. Let us look at some details of this process. As can be seen in Figure 5.3, light passes through the iris and, after its dispersion in the vitreous gel (the gel that fills in the center of the “globe” of the eye), it reaches the retina. The retina is a 0.5 mm thick cell membrane composed primarily of two types of light-receptor cells, cones and rods.4 There are about 125 million rods distributed at the outer regions of the retina, and about 6 million cones mostly concentrated in the central area, the fovea—an area with a diameter of about 1.5 mm where most “focused” vision begins. Figure 5.3: Schematic cross-section of the retina showing the main types of cells and their organization. This cross-section is about 0.5 mm thick. The photoreceptors (cones and rods) are shown in the back of the retina, with light crossing the vitreous gel and several layers of cells before being transformed into electrical signals (© Pearson Education/not.cleared) 4 There are also four other cell types, including: (a) horizontal cells, which connect cones and rods providing inhibitory connections to allow for the retina to adjust to different lightning conditions; and (b) ganglion cells, which also respond to light but whose primary function is to distribute information from cones and rods into the geniculate system (see below). R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 11 The cross-section of the retina shows that the light receptors are at the back of the retina itself (the “outer layer5”) which means that light passes through intricate networks of ganglion cells and bipolar cells, among others, as well as blood vessels (not shown) until it reaches cones and rods.6. The cells that constitute the retina work in consonant to produce a two dimensional representation of what is seen (that is, seen from your point of view), helping mark boundaries between objects as a function of changes in light intensities and color contrasts. Each cell contributes its own value to the whole retinotopic array, but each cell is also constrained by what happens in neighboring cells. That is, receptor cells (cones, rods) work together to send information to higher cells in the visual pathway. In particular, they are interconnected in a way that the ganglion cells respond to particular patterns of activation and inhibition of whole patches of cones and rods. For instance, Figure 5.4 shows how a particular group of cells—we call these receptive fields—work together in response to light intensities in the environment. Figure 5.4: Schematic representation of a receptive field of a ganglion cell in the retina. The cell responds to “centre on” receptive field, that is, the activation (light) of cones in the 5 Cones and rods pick up light from light-sensitive pigment molecules which are embedded in the outer layer. Why we evolved this way—with receptor cells (cones, rods) located at the outer layer—is not clear. We can assume that the retina got more complex as we evolved, starting with simple light-sensitive molecules in the “pit” of the eyes (similar to some mollusks) to a more complex organization in terms of receptive fields (see Figure 5.4) connected to the occipital lobes of the brain. 6 The details of how photoreceptors (cones and rods) actually transform photons into electrical discharges—a biochemical process—are beyond the scope of the present chapter. See Sekuler & Blake, 2002, for in-depth exposition. R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 12 centre of the patch, together with the inhibition (dark) of cones in the surrounding periphery. The “on-center/off-surround” ganglion cell responds to the pattern of activation of the cones, as transmitted via bipolar cell. (© Pearson Education/not.cleared) The important operations here are the activations (marked “+”) and inhibitions (marked “-“) of a group of receptor cells, and how their combination activate or inhibit bipolar and ganglion cell. There are an estimated 1.25 million ganglion cells, which means that each one is responding to the joint work of many photoreceptors.7 And each eye is “seeing” via the constant activation and inhibition of these 1.25 million receptive fields, all working in consonance to determine the points of highest contrast in the image8. They are, in essence, tagging the features present in the distal stimulus in terms of bright/dark spots while also responding to color. The actual joint work of these cells requires a much more in-depth discussion than what we need presently to make the main point: the arrays of light intensities yield the tagging of features and constitute the primary data upon which we build our representations of the world. This early retinotopic representation is certainly a poor one, one that requires a lot of computations by higher-up areas of the brain—or higher-up perceptual and cognitive systems—to yield a seemingly picture-perfect representation of the seen world. For instance, the image that each retina delivers to higher-up systems in the brain is incomplete. This is so because all the ganglion cells’ axons that are collected from the retina travel through a “pipe” that constitutes the optic nerve (see Figure 5.3). At the optic disk, there are no photoreceptors, no image, while our experience is that nothing is missing, even when seeing the world with just one eye—just like a cyclops. But the “hole” in the image is there. You can experience where the optic nerve is by fixating on the cross, in Figure 5.5, with your left eye, adjusting for distance. At a certain point, the bee will disappear. That is because its (would-be) input is falling on the “blind spot” of your retina. Figure 5.5: Bees are disappearing! Demonstration of the blind spot. The bee will disappear if 7 It should be noted that this is not exactly a 100-to-1 system (i.e., 1 ganglion cell for every 100 cones or rods). Receptive fields vary enormously. In some cases, such as those of cones responding to different wavelengths of colour, located mostly in the fovea—e.g., “red” cones—there is only one cone for one ganglion cell. See, e.g., Sekuler & Blake, 2002. 8 I say that they are constantly activating and inhibiting each other, but cones and rods are graded, contrary to other neurons in the brain: they fire with more or less intensity but are constantly firing in response to changes in pigments in the outer layer. R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 13 you close your right eye while fixating on the cross with your left eye from a distance of about 30 cm. If you fixate on the cross with your right eye from about the same distance, you won’t notice the “hole” in the beehive to the right of the cross. If you move your left eye back and forth between the cross and the bee, the bee will keep appearing and disappearing. (© The author) We can imagine the retinotopic representation and its corresponding imprint in the primary visual cortex in the occipital lobes of the brain akin to an image in your computer monitor or other device, which is formed by pixels. For instance, the computer monitor I am using now has 4 million pixels. Each of the pixels is more or less constantly changing color and brightness, so that all 4 million together can respond dynamically to the images that my computer is forming at every moment in time. I say “more or less” because the analogy is a bit distant from reality: monitors have a refresh rate—the constant on and off of pixels— somewhat similar to what brain neurons do as they fire, going on and off at different intervals, with an approximate “active” duration of 1 millisecond. In the case of photoreceptors their firings are in fact changes in voltage, not similar to action potentials, responding to the different electromagnetic waves that correspond to light reflecting from different kinds of surfaces. Also, while our computer screens appear to be constantly displaying an equally sharp image across the screen, the “resolution” of the retina changes as we move from the center (the fovea) to the periphery. Much of what is in the periphery of the retina is poorly encoded, just a rough sketch, lacking in color and details. You can see this by looking at Figure 5.5 again. As you move closer to the cross, the bee becomes black and white and its details are mostly lost, turning it into a darkish blob. This is so because the distribution between cones (the cells responsible for fine detail and colour) and rods (the cells responsible for motion and vision in poor lightning conditions) varies across the retina. While the cones are concentrated mostly in the fovea, with scattered distribution throughout the retina, rods are distributed throughout the retina but are not present in the fovea. As Gregory (1966, p. 63) had observed, as we move from the center of the retina to its periphery, we travel back in time in human evolutionary history, “from the most highly organized structure to a primitive eye, which does little more than detect movements of shadows.” This distribution has implications for how we perceive objects, attend to scenes, read, and many other behaviors involving action. In particular, it is because the fovea is where detailed information about the world is gathered that we constantly move our eyes when scanning scenes, reading, focusing attention, and even avoiding to look directly at someone while keeping him or her in the “periphery”. 5.2.2 Beyond the retina, behind the eyes Thus far we have discussed the very early moments of visual perception, with special attention to the biological design of the system involved in picking up light intensities (in fact, transducting different wavelengths within the visible spectrum of electromagnetic waves) from the external world. We did not go into details about transduction—how the photoreceptors translate visible light into neuronal impulses and ultimately into the first “in print” of the world. R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 14 As we discussed in the previous chapters, the processes by which the physical properties of the world (light, sound) are transformed into code that the brain is able to compute do not explain the nature perceptual and cognitive processes, although they certainly help us grasp the nature of the task that perceptual algorithms have at hand. A thorough account of the neuroanatomy and neurophysiology of vision certainly help us understand what sorts of information are picked up by the brain, that is, what sort of information we are hardwired to detect. This in turn help us explain what kinds of building blocks of information we have at our disposal in order to build objects and scenes. Many phenomena in vision can be explained by appealing to those biological constraints and the features that they yield. But many phenomena cannot be explained by appeal to the biological constraints, requiring much more sophisticated computational theorizing. In sum, while we can explain many perceptual phenomena through biology (say, the disappearance of the bee in Figure 5.5), clearly that does not suffice for what the visual system does in order to "fill-in" the bee's location with the texture of the beehive (the hexagons). This can be explained by appealing to the representation of the whole texture. I also want to emphasize here that there is a major difference between the idea that the visual system picks up information—in the form of features (see below)—and the “ecological” approach that we discussed in the beginning of this chapter. The retina cannot tell us anything about coffee mugs or grandma’s favorite chair. The retina can only signal the contrast between those objects and their backgrounds, only the internal features that are in fact value-free. The functions proper—how you recognize an object or a face from the patterns of light—requires a computational theory together with the rules and representations that the system employs. The images are just the beginning: one still needs the “interpreter” of those images, that is, one needs to organize the “kaleidoscope” into objects, scenes, faces, etc. The computational theory and its algorithms play a central role in how we transform the information that the visual areas of the brain get from the retina. As we saw above, the ganglion cells send information about light intensities to the brain, but these light intensities do not form well-organized images. As Figure 5.6 demonstrates, the images are inverted (upside- down), but they are also “split” in half by each side of each one of the two retinas, that is, each half of each retina has access to the contralateral visual hemifield: with the right visual field (RVF) being detected by the left of each one of the retinas, and the left visual field (LVF) detected by the right of each retina. Figure 5.6 also shows that the “nasal” visual pathways (i.e., the pathways closer to the nose) from the retina to the occipital cortex of the brain cross at the optic chiasma on their way to the lateral geniculate nucleus (LGN) and the occipital cortex, while the outer retinal pathways (also called temporal pathways) reach the LGN and the occipital cortex on their own hemispheres, that is, without crossing. R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 15 Figure 5.6. Visual pathways from retina to the V1 area of the cortex. This is a schematic depiction of how the image from the world (the girl) travels through the optic nerves, from the retinas of both eyes up to its “projection” in the V1 area in the occipital lobes of both hemispheres, after the “relay station” of the lateral geniculate nucleus, in the thalamus. Notice also the projections into each superior colliculus, a structure involved in eye- movement control, head movement, and visual attention. The distortions in the projections show how the images are represented in the striate cortex (V1 and outer areas). The numbers mark the positions of the world image regarding their projections in the cortex. Notice that what would correspond to 5 (the girl’s nose) is roughly split and projected in the two outer regions of V1. See text for details. (Adapted from Frisby & Stone, 2010. Not.cleared). The superior colliculus plays a key role in visual attention and action. It contains a map of the visual field, which helps guide the eyes (and head) to move to particular locations so that the area of interest in the field can be foveated, that is, can be detected by the fovea. —X—to be continued—X— R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 16 5.2.3 The Architecture of the Visual System Figure 5.7. A simple bottom-up flowchart for visual processing. (a) The early modules are responsible for processing the “natural kinds” that the visual system is hard-wired to detect in the visual field. The sample modules include, from left to right, in the foreground: simple features such as lines that form edges and contours of objects (e.g., Hubel & Wiesel, 1962; Treisman, 1986), textures segregation determined by preattentively processing differences between “textons” (elements that are constituents of isotropic textures; see Julesz & Bergen, 1983), motion, and color. Other input modules, represented by the input boxes in gray, in the background, include stereopsis (difference between images of the two eyes), form, and dynamic form (the shapes of objects in motion; see Zeki, 1992). (b) The low-level visual perception subsystem integrates information from lower input modules and produces a “viewer-centered” representation of the object/scene. By hypothesis, visual perception is modular and does not take input back ('top-down') from visual cognition. (c) The higher, visual cognition system matches viewer-centered information with stored representations R. G. de Almeida | What’s in your mind | Part II, Chapter 5 | 17 about objects and scenes or interpret them anew, with inferences about the nature of occluded surfaces. Visual cognition produces an output that is “object-centered.” This subsystem is, to a large extent, penetrable, and subject to influence from a higher conceptual system. (d) The conceptual system is where information about the nature of the object is stored. Interpreting the object/scene, categorizing it, and other high-level processes such as reasoning and planning actions are within the domain of the conceptual system. The timing of processing, on the right, are estimates, with the first 150 milliseconds corresponding roughly to the early visual processes and the full process, up to categorization, taking approximately 250 ms. These estimates are largely based on data from neuronal recordings in monkeys and behavioral experiments with humans (e.g., Potter, 1975; Kitchner & Thorpe, 2006; Wu et al., 2014). See text for further details. (© The author)