Week 5 Lecture Notes PDF - Basic Cognitive Processes

Document Details

Indian Institute of Technology Kanpur

Dr. Ark Verma

Tags

cognitive processes object recognition visual perception psychology

Summary

These lecture notes cover various theories of visual object recognition, including template matching, feature analysis, and recognition by components. The document also discusses the role of these theories in understanding how we perceive and process visual information.

Full Transcript

Indian Institute of Technology Kanpur In Collaboration with National Program on Technology Enhanced Learning (NPTEL) Presents Course Title: Basic Cognitive Processes By: Dr. Ark Verma, Assistant Professor of Psychology, Department of Humanities...

Indian Institute of Technology Kanpur In Collaboration with National Program on Technology Enhanced Learning (NPTEL) Presents Course Title: Basic Cognitive Processes By: Dr. Ark Verma, Assistant Professor of Psychology, Department of Humanities & Social Sciences, IIT Kanpur Lecture 21: Theories of Object Recognition Theories of Visual Object Recognition A variety of theories have been proposed in order to explain how visual object recognition is achieved. These theories may differ depending upon the theoretical stance they subscribe to as far as boTom - up or top - down processing is concerned. All in all they aTempt to account for the excellent performance of object recognition in both viewer - centred & object - centred representations. Template Matching Theory: Acc. to the template matching theory, one compares a stimulus with a set of templates, or specific paTerns that we have already stored in memory. After comparing the stimulus to a number of templates, we note the template that matches the stimulus. In the template matching account, we are looking for the exact match between the stored template & the input representation. Image: Matlin M. W. (2008). Cognition. Wiley.7th Ed. (Fig. 2.4; p. 39). Several machine recognition systems are based on templates. for e.g. bank cheque books. One problem with the template matching theory is that it is extremely inflexible. If a leTer differs from the appropriate template even slightly, the paTern cannot be recognised. Furthermore, template models work only for isolated leTers, numbers, & other simple two - dimensional objects presented in their complete form (Palmer, 2003). Feature Analysis Theory: Several feature analysis theories propose a more flexible approach, in which a visual stimulus is composed of a small number of characteristics or components (Gordon, 2004). Each characteristic is called a distinctive feature. o Consider for example, the leTer R. it has three distinct features, i.e. a curved component, a vertical line & a diagonal line. o When we look at a new leTer, the visual system notes the resence or absence of various features and compares the list with the features stored in memory for each leTer of the alphabet. Even though people’s handwritings may differ, the leTer R will always have these three features. Image: Matlin M. W. (2008). Cognition. Wiley.7th Ed. (Demo. 2.2; p. 40). the feature analysis theories propose that the distinctive features for each alphabet leTers remains constant, whether the leTer is handwriTen, printed or typed. these models can also explain how we perceive a wide variety of two - dimensional paTerns such as figures in a patinting, design or fabric etc. Feature - analysis theories are consistent with both psychological & neuroscience research. for e.g. Gibson (1969) demonstrated that people require a relatively long time to decide which of the two leTers is different, if they share a number of critical features. Similarly, Larsen & Bundesen (1996) designed a model based on feature analysis that correctly recognized an impressive 95% of the numbers wriTen in street addresses and zip codes. Even, neuroscience research seems to support features analysis (Remember Hubel & Wiesel, 1969, 1975, 2005). However, feature analysis also has several problems. o First, a theory of object recognition should not simply list the features contained in a stimulus; it must also describe the physical relationship between those features (Groome, 1999). e.g. in the leTer T, the vertical line supports the horizontal line; whereas in leTer L the vertical line is resting at the side of the horizontal line. o Also, the feature analysis theories were constructed to explain the relatively simple recognition of leTers. In contrast, the shapes that occur in nature are much more complex (Kersten et al., 2004). e.g. a horse or a lion? o the theories also need to take into account distortion in features due to movements etc. The recognition by components theory: o Irving Biederman & colleagues developed a theory to recognise three dimensional shapes (Biederman, 1990; 1995). o the basic assumption of their recognition by components theory is that a specific view of ab object can be represented as an arrangement of simple 3 - D shapes called geons. o these geons can be combined to form a variety of meaningful objects. Image: Matlin M. W. (2008). Cognition. Wiley.7th Ed. (Fig. 2.5; p. 42). In general, an arrangement of three geons gives people enough information to classify an object. o In that sense, Biederman’s recognition by components theory is essentially a feature analysis theory for 3D objects. Biederman & colleagues have conducted fMRI research with humans and single - cell recording with monkeys; & their findings show that areas of the cortex beyond the primary visual cortex respond to geons as presented earlier. However, the recognition - by - components theory requires an important modification because people recognize objects more quickly when those objects are seen from a standard viewpoint, rather than a much different viewpoint (Friedman et al., 2005) A modification of the present approach, by the name of viewer - centered approach proposes that we store a small number of views of 3D objects, rather than just one view (Mather, 2006) and when we come across an object we must sometimes mentally rotate the image of that object until it matches one of the views that is stored in the memory (Dickinson, 1999). Top - Down Influences on Object Recognition: o emphasizes how a person’s concepts and higher - level mental processes influence object recognition. More specifically how a person’s expectations & memory help in identifying objects. o we expect certain shapes to be found in certain locations & we expect to encounter these shapes because of past experiences. these expectations can help us recognize objects very rapidly. o the same also helps us fill the gaps in the sensory input. Image: Matlin M. W. (2008). Cognition. Wiley.7th Ed. (Demo. 2.3; p. 45). Face Perception: As a special case of Object Recognition o Acc. to psychologists, most people perceive faces in a different fashion from other stimuli; face perception is somehow special (Farah, 2004). e.g. young infants track the movements of a photographed human face more than other similar stimuli (Bruce et al., 2003). o Similarly, Tanaka & Farah (1993) found that people were significantly more accurate in recognising facial features when they appeared within the context of a whole face, rather than in isolation; i.e. they could recognise a whole face much faster than an isolated nose. in contrast, when they judged houses, they were just as accurate in recognizing isolated houses or an isolated house feature (e.g. window). this shows that we recognize faces on a holistic basis, i.e. in terms of the gestalt or overall quality that transcends its individual elements. it thus makes sense that face perception has a special status, given the importance of our social interactions (Farah, 2004; Fox, 2005). Neuroscience research on Face Recognition o McNeil & Warrington (1993) studied a professional who had lost his ability to recognize human faces after he had experienced several strokes. o This patient changed his career, & started to raise sheep. Surprisingly, he could recognize many of his sheep’s faces, even though he could not recognize human faces. o This man was diagnosed as having prosopagnosia, i.e. a condition in which people cannot recognize human faces visually, though they perceive other objects relatively normally. The location most responsible for face recognition is the temporal cortex, at the side of the brain (Bentin et al., 2002). Specifically, the inferotemporal cortex, in the lower portion of the temporal cortex. It has been shown that certain cells in the inferotemporal cortex respond especially vigorously when encountered with faces (Farah, 2004). Also, it has been reported in fMRI studies that the brain responds much more quickly to faces presented in the upright condition in comparison to faces presented in the inverted position. Image: [hTp://ww2.hdnux.com/photos/15/67/57/3636189/7/920x920.jpg] To Sum Up We studied various approaches to object recognition. We saw that object recognition can be achieved by a co - operation boTom up & top – down mechanism We also saw that perception of faces is a special case of object recognition because faces carry much more information & value than some of the other objects. References Matlin, M. W. (2008). Cognition. Wiley. 7th Ed. Indian Institute of Technology Kanpur In Collaboration with National Program on Technology Enhanced Learning (NPTEL) Presents Course Title: Basic Cognitive Processes By: Dr. Ark Verma, Assistant Professor of Psychology, Department of Humanities & Social Sciences, IIT Kanpur Lecture 22: Perception & Action Linking Perception to Action In the course of the last few lectures we have often wondered the utility of perception. Certainly, humans are not perceiving the environment only passively. We interact with the environment & the link between perception and action is established thereof. Theoretical Background J. J. Gibson’s Ecological Approach to Perception o the idea that perception should be studied as people move through the environment and interact with it. o the ecological approach to perception then focused itself on studying moving observers and determining how this movement creates a perceptual input leading to beRer navigation in the environment. As, we have already studied in detail, the concept of optic flow is informative about our movements in the environment for e.g. direction, speed, relative distance etc. o Optic is flow is rather fast near the observer & slower farther away from the observer, this is referred to as gradient of flow. o There is now flow at the point which the observer is approaching, which is called the focus of expansion (FOE). Goldstein (2013). Sensation and Perception. Cengage Learning. (p. 154) Goldstein (2013). Sensation and Perception. Cengage Learning. (p. 154) Another important aspect of the ecological approach is the - invariant information – information that does not change with respect to the moving observer. o as soon as the observer stops moving around the environment the flow information is not there anymore. o the FOE shifts as soon as the observer changes it’s direction of movement. How does this work? Self – produced information : information that is produced when the person makes some movement; which is in turn used to guide further movement. for e.g. when a person is moving along a street in the car, the movement of the car provides flow information which can be used to help guide the car in the correct direction. Goldstein (2013). Sensation and Perception. Cengage Learning. (p. 155) the Senses : o Gibson proposed that the five senses i.e. vision, hearing, touch, smell & taste work together to produce information to facilitate moving around & interacting with the environment. o For e.g. our ability to stand upright & maintain balance while still standing or even walking or running depends upon systems like the vestibular canals (in the inner ear) and receptors in joints & muscles to work together. o Lee and Aronson (1974) through their “swinging room” experiments demonstrated that vision is a powerful determinant of balance and can override the traditional sources of balance information provided by the inner ear & receptors from muscles and joints. Navigating Through The Environment Driving a Car o Land & Lee (1994) wanted to study the information generally used by the people while driving a car. o So, they fiRed a car with devices o record the angle of the steering wheel and the speed, & measured where the driver was looking with a video eye – tracker. o They found that although drivers look straight ahead while driving, they also look at a spot in front of the car rather than looking directly at the FOE. Goldstein (2013). Sensation and Perception. Cengage Learning. (p. 159) Land & Lee also studied where drivers look while navigating a curve. o This is important as the FOE will keep changing as the driver’s destination keeps changing as the car is moving around the curve. o Land & lee found that when going around a curve, the drivers don’t look directly at the road, but at the tangent point of the curve on the side of the road. o As, the drivers are not looking directly at the FOE, Land & Lee suggested that drivers use information in addition to the optic flow to determine the direction of movement. For e.g. position of the car with respect to the lines at the center of the road. Goldstein (2013). Sensation and Perception. Cengage Learning. (p. 159) Walking o It has been argued that while walking people may not be using optic flow information. For e.g. they might be following a visual direction strategy i.e. keeping their bodies pointed towards a target. If they go off – direction, the target shifts to the left or right and so, the walkers can use this information for course - correction. Loomis & colleagues (Loomis et al., 1992; Philbeck, Loomis & Beall, 1997) demonstrated by making participants blind – walk towards a target, that people are able to walk directly towards a target & stop very close to it Goldstein (2013). Sensation and Perception. Cengage Learning. (p. 160) Wayfinding o refers to navigating for long distances towards object not in sight. o is a complex process that involves perception of objects in the environment, remembering objects and their place in the overall scene & also judging when & what direction to turn. o an important aspect of such navigation is landmarks – those objects on the route that serve as cues to indicate where to turn. Sahar Hamid & colleagues (2010) studied the use of landmarks by participants as they were negotiating a mazelike environment presented on a computer screen and while pictures of common objects were supposed serve as landmarks. Participants were first trained to go through the maze & then were told to travel from one point in the maze to another. Eye – movements were measured using a head-mounted eye – tracker. Eye – tracking measures indicate that participants spent more – time looking at more informative landmarks than uninformative landmarks. In a similar study (Schinazi & Epstein, 2010) it was shown that after the subjects had learned a particular route, they were more likely to recognize pictures of buildings at decision points that those located in the middle of the block. Also, it was shown that when in an fMRI scaner, the brain response in navigational areas of the brain (like parahippocampal gyrus, hippocampus, & retrosplenial cortex) was larger than the response to non – decision point buildings. Goldstein (2013). Sensation and Perception. Cengage Learning. (p. 161) Interacting with Objects We have now seen how movement within an environment can facilitate or influence it’s perception. One of the most salient movements we perform within our environments is reaching out & grasping objects. For e.g. reaching out & holding a cup etc. An important concept related to reaching & grasping is that of affordances. Gibson, in his ecological approach to perception had specified the idea of affordances – o “The affordances of the environment are what it offers the animal, what it provides for or furnishes.” (Gibson, 1979) o a chair, or anything that is sit-on-able, affords siRing; an object that is of the right size and shape to be grabbed by a person’s hand affords grasping; and so on. (Goldstein, 2013). So, this could imply that perception of an object not only includes physical properties, such as shape, size, colour and orientation, that might enable the person to recognize an object but our perception also includes information about how the object is to be used. One of the ways, affordances have been studied, is by investigating patients with brain – damage. o Humphreys & Riddoch (2001) studied a patient M.P., with damage to his temporal lobe that impaired his ability to name objects. M.P., was given one of two cues,i.e. (a) name of an object like a “cup” or (b) an indication of how the object worked (“ an object that you could drink from”). He was then shown 10 different objects and was told to press a key as soon as he found the object. It was found that M.P. identified the object more accurately & rapidly when the given cue referred to the object’s function. Humphrey & Riddoch concluded that M.P was using information about the object’s affordances to find the object. The Physiology of Perception and Action o The link between perception & action was formalized with the discovery of the ventral & dorsal pathways of the brain. o Ungerleider & Mishkin 91982) studied a monkey’s ability to identify an object and to determine an object’s location; using the technique of brain ablation. o Ungerleider and Mishkin presented a monkey with two tasks: (1) an object discrimination problem and (2) a landmark discrimination problem. In the object discrimination problem: the monkey was shown one object, such as a rectangular solid, 7 was then presented with a two – choice task like the, wherein one of the objects was the “target object” and another stimulus. If the monkey would push aside the target object, it received the food reward that was hidden under the object. Image: Goldstein(2010). Cognitive Psychology_Connecting Mind, research and Everyday Experience. Wadsworth Publishing. 3rd Ed. Fig. 3.34 (p.72) In the landmark discrimination task, the monkey’s task was to remove the food well cover that is closer to a tall cylinder. In the ablation phase, a part of the temporal lobe was removed in some monkeys while for the others the parietal lobe was removed. Image: Goldstein(2010). Cognitive Psychology_Connecting Mind, research and Everyday Experience. Wadsworth Publishing. 3rd Ed. Fig. 3.34 (p.72) Behavioral experiments showed that the object discrimination problem was very difficult for monkeys with the temporal lobes removed. o This was taken to imply that the pathways that reaches the temporal lobes is responsible for object identification. Ungerleider & Mishkin called this pathway as the what pathway. Monkeys who had their parietal lobes removed, had difficulty solving the landmark discrimination problem. o This indicated that the pathway leading to the parietal lobe is responsible for determining an object’s location. o Ungerleider & Mishkin called the pathway leading from the striate cortex to the parietal lobe the where pathway. Image: Goldstein(2010). Cognitive Psychology_Connecting Mind, research and Everyday Experience. Wadsworth Publishing. 3rd Ed. Fig. 3.34 (p.72) So, in a simple task of reaching & grasping cup, one could assume that the “what pathway” would be involved in the initial perception of the cup & the “where pathway” would be involved in determining the correct location of the cup, so that it could be picked up. PuRing Things in Perspective We have seen that Gibson’s approach was pushing for ‘perception for action’ while Marr’s theory was more for ‘perception for recognition’. It seems that in some way the idea of ventral & dorsal pathways echoes similar ideas. However, while these two streams may appear to be functioning independently, it is needless to say that we need both of them working fine in order to recognise objects and perform actions to interact with the environment. for e.g. Gibson’s notion of affordance emphasises that we might need to detect what things are ‘for’ rather than what they actually ‘are’. affordances are linked to actions & the dorsal stream appears to be ideally suited for providing the sort of information we need to act in the environment. Earlier, we saw that Gibson saw no role for memory in perception & as the dorsal stream seems to have very liRle storage also confirms that the dorsal stream works as Gibson proposed. In contrast, the ventral stream appears to be ideally suited to the role of recognizing objects. It is specialized in analyzing the sort of fine detail that Marr saw as essential to discriminating between objects. Also it seems to draw on our existing knowledge to assist in identifying objects. It is also slower than the dorsal stream; which is conducive to the fact that no immediate action is required. To somewhat address these & other concerns; Norman (2002) & Neisser (1994) suggested the dual processing approach: there appears to be evidence that the ventral stream is primarily concerned with recognition while the dorsal stream drives visual behaviour (pointing, grasping etc.) the ventral system is generally beRer at processing fine detail while the dorsal system is beRer at processing motion. the ventral system is knowledge based & uses stored representations to recognise objects; while the dorsal system appears to have only very short term storage. the dorsal system received information faster than the ventral system. we are much more conscious of the ventral than the dorsal stream. it has been suggested that the ventral system recognised objects & is object centred while as the dorsal stream is action oriented it uses a viewer centred frame of reference (more on this later). Norman (2002) defines the two as synergistic and interconnected rather than independent. Busted & Carlton (2002), provide an illustration of the interaction between the ventral & dorsal streams using the example of skill acquisition. previous work (FiRs, 1964) suggests that the early stages of learning a skill (e.g. driving) are characterised by cognitive processes of the kind associated with the ventral stream; whereas once, the skill is highly practiced it is characterised by learned motor actions of the sort associated with the dorsal stream. References Goldstein E.B. (2010). Cognitive Psychology: Connecting Mind, Research and Everyday Experience. Wadsworth Publishing. 3rd Ed. Goldstein E.B. (2013). Sensation and Perception. Cengage Learning. 9th Ed. Indian Institute of Technology Kanpur In Collaboration with National Program on Technology Enhanced Learning (NPTEL) Presents Course Title: Basic Cognitive Processes By: Dr. Ark Verma, Assistant Professor of Psychology, Department of Humanities & Social Sciences, IIT Kanpur Lecture 23: Auditory Perception Hearing: The Preliminaries Hearing, too begins with transduction. o sound waves are collected by our ears and converted into neural impulses, which are sent to the brain where they are integrated with past experience and interpreted as the sounds we experience. the human ear is sensitive to a wide range of sounds, ranging from the faint click of a clock to the roar of a rock band. but the human ear is particularly sensitive to the sounds in the same frequency range as the human voice. the Ear: detects sound waves. o vibrating objects (such as the human vocal chords or guitar strings) cause air molecules to bump into each other and produce sound waves, which travel from they source as peaks and valleys much like the ripples that expand outward when a stone is tossed into a pond. o sound waves are carried within medium such as air, water or metal, & it is the changes in pressure associated with these mediums that the ear detects. Physical Characteristics of Sound o we detect both the wavelength & the amplitude of sound waves. the wavelength of the sound wave (known as frequency) is measured in terms of the number of waves that arrive per second and determines our perception of pitch, i.e. the perceived frequency of the sound. longer sound waves have lower frequency & produce a lower pitch whereas shorter sound waves have higher frequency & higher pitch. the amplitude, or height of the sound wave, determines how much energy it contains and is perceived as loudness (the degree of sound volume). larger waves are perceived as louder. loudness is measured using the unit of relative loudness known as decibel. o zero decibels represent the absolute threshold for human hearing, below which we cannot hear a sound. each increase in 10 decibels represents a ten - fold increase in the loudness of the sound. o the sound of a typical conversation (about 60 decibels) is 1,000 times louder that the sound of a whisper (30 decibels). The Structure of the Ear: o audition begins in the pinna, the external & visible part of the ear, which is shaped like a funnel to draw in sound waves & guide them into a auditory canal, o at the end of the canal, the sound waves strike the tightly stretched, highly sensitive membrane known as the tympanic membrane (or eardrum), which vibrates with the waves. o the resulting vibrations are relayed into the middle ear through three tiny bones, known as the ossicles - the hammer (malleus), the anvil (incus) and the stirrup (stapes) - to the cochlea, a sail shaped liquid filled tube in the inner ear. the vibrations cause the oval window, the membrane covering the opening of the cochlea, to vibrate, disturbing the fluid inside the cochlea. the movements of the fluid in the cochlea bend the hair cells of the inner ear. the movement of the hair cells trigger nerve impulses in the aYached neurons, which are sent to the auditory nerve and then to the auditory cortex in the brain. the cochlea contains about 16,000 hair cells, each of which hold a bundle of fibres known as cilia on its tip. the cilia are so sensitive that they can detect a movement that pushes them the width of a single atom or shifting the Eiffel Tower by half an inch (Corey et al., 2004). Image: Stangor (2010). Introduction to Psychology. Flat World Knowledge. Creative Commons license. (p/ 203). the loudness of the sound is directly determined by the number of hair cells that are vibrating. two different mechanisms are used to detect pitch. o the frequency theory of hearing proposes that whatever the pitch of a sound wave, nerve impulses of a corresponding frequency will be sent to the auditory nerve. for e.g. a tone measuring 600 Hz will be transducer into 600 nerve impulses a second. but for high pitched sounds this theory can’t explain, because neurons won’t be able to fire fast enough for higher frequencies. o a solution could be that to reach the necessary speed, the neurons work together in a sort of volley system in which different neurons fire in sequence, allowing us to detect sounds up to 4000 Hz. the place theory of hearing proposes that different areas of the cochlea respond to different frequencies. o higher tones excite areas closest to the opening of the cochlea (near the oval window). whereas lower tones excite areas near the narrow tip of the cochlea, at the opposite end. pitch is therefore determined in part by the area of the cochlea firing the most frequently. that the ears are placed on either side of the head enables us to benefit from stereophonic, or three dimensional hearing. if a sound occurs on your left side, the left ear will receive the sound slightly sooner than the right ear and the sound will receive will be more intense, allowing you to quickly determine the location of the sound. o although the distance between the two ears is barely 6 inches & sound waves travel at 750 miles an hour; the time & intensity differences are easily detected (Middlebrooks & Green, 1991). when a sound is equidistant from both ears (such as when it is directly in front or back, beneath or overhead; we have more difficulty pinpointing its exact location & we may maneuver to facilitate localization. Speech Perception The most important class of stimuli that we perceive via auditory perception is speech stimuli. In that reference, speech perception deserves a more important mention. During speech perception, the auditory system needs to analyze the sound vibrations generated by someone’s conversation. Characteristics of Speech Perception (Matlin, 2008) o When describing these speech sounds, psychologists & linguists use the term phoneme. a phoneme refers to the basic unit of spoken language, which includes basic sounds as a, k, th. The English language uses about 45 phonemes, including both consonants & vowels. o Listeners can impose boundaries between words, even when these words are not separated by silence. #speech segmentation o Phoneme pronunciation varies tremendously. #phoneme variation o Context allows listeners to fill in missing sounds. #role of context o Visual cues from the speaker’s mouth help us interpret ambiguous sounds. #multi – modal perception Theories of Speech Perception The special mechanism approach proposes that speech perception is accomplished by a naturally selected module (Fodor, 1983). o this speech perception module monitors incoming acoustic stimulation and reacts strongly when the signals contains the characteristic complex paYerns that make up speech. o when the speech module recognized an incoming stimulus as speech, it preempts other auditory processing systems, preventing their output from entering consciousness. So, while the non - speech sounds are analyzed according to the basic properties of frequency, amplitude, and timbre, and while we are able to perceive those characteristics of non-speech sounds accurately, when the speech module latches onto an acoustic stimulus; it prevents the kind of spectral analysis that general auditory processing mechanisms generally carry out for non - speech auditory stimuli. the preemption of normal auditory perceptual processes for speech stimuli can lead to duplex perception under special, controlled lab conditions (Liberman & MaYingly, 1989). o to create their experimental stimuli, researchers constructed artificial speech stimuli that sounded like /da/ or /ga/ depending upon whether the second formant transition decreased (/da/) in frequency over time or increased (/ga/). o next, they edited their stimuli to create separate signals for the transition and the rest of the syllable, which they called the base. o they played the two parts of the stimulus over headphones, with the transition going in one ear & the base going in one ear. the question was, how would people perceived the stimulus? o it turned out that people perceived two different things at the same time. at the ear that the transition was played into, people perceived a high - pitched chirp or whistle. But at the same they perceived the original syllable, just as if the entire, intact stimulus had been presented. Liberman & colleagues, argued that simultaneously perceiving the transition in two ways - as a chirp & as a phoneme - reflected the simultaneous operation of the speech module and general purpose auditory processing mechanisms. duplex perception happened because the auditory system could not treat the transition and base as coming from the same source (as they were played in two different ears). as the auditory system recognised two different sources, it had to do something with the transition that it would not normally do., i.e. it had to analyse it for the frequencies it contained and the result was hearing it as a chirp. but simultaneously, speech processing module recognised a familiar paYern of transitions and formants & as a result the auditory system reflexively integrated the transition & the base and led to the experience of hearing a unified syllable. the motor theory of speech perception : o that gestures, rather sound, represent the fundamental unit of mental representation in speech (Liberman & Whalen, 2000; Fowler, 2008). i.e. when we speak, we aYempt to move your articulators to particular places in specific ways. each of these movement constitutes a gesture. the motor part of speech production system takes the sequence of words we want to say & comes up with a gestural score, that tells our articulators how to move. acc. to the theory, if you can figure out what gestures created a speech signal, you can figure out what the gestural plan was, which takes you back to the sequence of syllables or words that went into the gestural plan in the first place. So, by knowing what the gestures are, you can tell what was the set of words that produced that set of gestures. o For e.g. the “core” part of the gesture to produce either “di” or “du” sounds is tapping the tip of your tongue against the back of your teeth (or your alveolar ridge). Other parts of the gesture, like lip position are affected by coarticulation, but the core component of the gesture is the same regardless of the phonological context. Thus, rather than trying to map acoustic signals directly to phonemes, Alvin Liberman & his colleagues proposed that we map acoustic signals to gestures that produced them, as there is a closer relationship between gestures and phonemes than there is between acoustic signals & phonemes. In their words, “The relation between perception & articulation will be considerably simple than the relation between perception and the acoustic stimulus.” Further, “perceived similarities and differences will correspond more closely to the articulatory than the acoustic similarities among the sounds.” So, differences between two acoustic signals will not cause you to perceive two different phonemes as long as the gestures that created those two different acoustic signals are the same. Another aspect of the motor theory proposes, categorical perception is another product of the speech perception module. categorical perception happens when a wide variety of physically distinct stimuli are perceived as belonging to one of a fixed set of categories. o for example: every vocal tract is different from every other vocal tract & as a result the sound waves that come out of your mouth when you say pink are very different that the sound waves that come out of my mouth when I say pink, and so on. nonetheless, your phonological perception is blind to the physical differences and perceives all of those signals as containing an instance of the category /p/. Further, it may be noted that all of our voices have different qualities than each other, but we categorize the speech sounds from each of us, in much the same way. This is because, all of those different noises map to the same set of 40 phonemes (in English). In addition, although the acoustic properties of speech stimuli can vary across a wide range, our perception does not change in liYle biYy steps with each liYle biYy change in the acoustic signal. We are insensitive to some kinds of variation in the speech signal, but if the speech signal changes enough , we perceive that change as the difference between one phoneme and another (Liberman et al., 1957). An example: o the difference between /b/ & /p/ is that the /b/ is voiced while the /p/ is not. o other than voicing the two phonemes are essentially identical; in that they are both labial plosives, meaning that we make these sounds by closing our lips & allowing air pressure to build up behind our lip dam and then releasing the pressure suddenly, creating a burst of air that rushes out of the mouth. o the difference between the two phonemes has to do with the timing of the burst and the vocal fold vibrations that create voicing. for the /b/ sound, the vocal folds begin vibrating while your lips are closed or just after; but for the /p/ sound, there is a delay between the burst and the point in time when the vocal folds begin to vibrate. This gap is the voice onset time. the VOT is a variable that can take any value whatsoever, so it is called a continuous variable. but even though VOT can vary continuously in this way, we do not perceive much of that variation. o for e.g. we can not greatly hear the difference between a bot of 2ms and 7ms or between 7 ms & 15ms. instead we map a range of VOTs on the same percept. o Those different acoustic signals are called allophones - different signals that are perceived as being the same phoneme. o so the experience with a range of short VOTs is as /b/ & long VOTs is as /p/; the difference point being 20ms. References Traxler, M. J. (2013). Introduction to Psycholinguistics: Understanding Language Science. Wiley – Blackwell. Indian Institute of Technology Kanpur In Collaboration with National Program on Technology Enhanced Learning (NPTEL) Presents Course Title: Basic Cognitive Processes By: Dr. Ark Verma, Assistant Professor of Psychology, Department of Humanities & Social Sciences, IIT Kanpur Lecture 24: Auditory Perception - II The McGurk Effect Acc. to the motor theory of speech perception, understanding speech requires you to figure out which gestures created a given acoustic signal. the system therefore uses any sort of information that could help identify gestures. while acoustic stimuli offer cues to what those gestures are, other perceptual systems could possibly help out, and if they can, motor theory says that the speech perception system will take advantage of them. Infact, two non - auditory perceptual systems - vision & touch - have been shown to affect speech perception. The most famous demonstration of multi - modal perception is the McGurk Effect (McGurk & MacDonald, 1976). The McGurk effect happens when people watch a video of a person talking, but the audio portion of the tape has been altered. for e.g. the video might show a person saying /ga/ but the audio signal is of a person saying / ba/. What people actually perceive is someone saying / da/. If the visual information is removed (when the observing individual shuts his/her eyes), the auditory information is accurately perceived and the person hears /ba/ The McGurk effect is incredibly robust: It happens even when people are fully warned that the auditory & visual information do not match; and it happens even if one tries to play close a\ention to the auditory information and ignore the visual. The McGurk effect happens because our speech perception system combines visual and auditory information when perceiving speech, rather than relying on auditory information alone. Of course the auditory information by itself is sufficient for perception to occur, but the McGurk effect shows that the visual information influences speech perception when that visual information is available. The McGurk effect is an example of multi -kodal perception because two sensory modalities, hearing & vision, contribute to the subjective experience of the stimulus. Another way to create a variant of the McGurk effect is by combing haptic information with auditory information to change the way people perceive a spoken syllable (Fowler & Dekle, 1991). This kind of speech perception occurs outside the laboratory from time - to - time in a specialised mode called tadoma. Hellen Keller & other hearing & vision - impaired individuals have learned to speak by using their sense of touch to feel the articulatory information in speech. Acc. to the motor theory, information about speech gestures should be useful regardless of the source, auditory or otherwise. That being the case, information about articulatory gestures that is gathered via the perceiver’s sense of touch should affect speech perception. to test this: Carol Fowler had experimental participants feels her lips while they listened to a recording of a female speaker speaking a variety of syllables. Blindfolded and gloved, experimental participants heard the syllable /ga/ over a speaker while CF simultaneously mouthed the syllable /ba/. As a result, the experimental participant felt the articulatory gestures appropriate to one syllable but heard the acoustic signal appropriate to a different syllable. As in the visual version of the McGurk effect, what participants actually perceived was a compromise between the auditory signal & the haptic signal. So, instead of perceiving the spoken signal /ga/ or the mouthed syllable /ba/ they perceived the hybrid syllable /da/. the motor theory explains both versions of the McGurk effect, the visual one & the haptic one; as stemming from the same basic processes. The goal of the speech production system is not a spectral analysis of the auditory input; rather, it is figuring out what set of gestures created the auditory signals in the first place. Motor theory handles the visual & haptic effects on speech perception by arguing that both the modalities can contribute information that helps the perceiver figure out what gesture the speaker made. Under natural conditions, the visual, touch & auditory information will all line up perfectly, meaning that all secondary sources will be perfectly valid cues; in conditions as we saw that was not the case. Mirror Neurons: the motor theory has been enjoying a renaissance recently sparked off by new evidence about monkey neurons (Gallesese et al., 1996; Gentilucci & Corballis, 2006). i.e. researchers working on macaque monkeys discovered neurons in a part of the monkey’s frontal lobes that responded when a monkey performed a particular action, or when the monkey watched someone else perform that action or when the monkey heard a sound associated with that action. These neurons were called mirror neurons. the existence of mirror neurons in monkeys was established by the invasive single - cell recording techniques; and similar experiments in humans are not plausible; so, the existence of mirror neurons in humans remains an hypothesis rather than an established fact. However, the part of the brain of the macaques that have the mirror neurons (area F5) is similar to the Broca’s area in the human brain. Neuroimaging and research involving direct recording from neurons in the Broca’s area both show that it participates in speech perception (Sahin et al., 2009). Researchers who discovered mirror neurons propose that the mirror neurons could be the neurological mechanism that the motor theory of speech perception requires. i.e. mirror neurons in the Broca’s area could fire when an individual produces a particular set of phonemes, or hear the same set of phonemes; providing the bridge between speaking & listening. Experiments have been conducted to non - invasively find evidence for the participation of the motor cortex in speech perception. the motor theory says the accessing representations of specific speech gestures underlies speech perception. those representations of speech gestures must be stored in the parts of the brain that control articulatory movements. The parts of the brain that control articulation are the motor cortex in the frontal lobes of the brain & the adjacent premotor cortex when we perceive speech. proponents of the mirror neurons argue that mirror neurons are the neural mechanism that establishes the link between the heard speech & the motor representation that underlie speech production. Now, mirror neurons have recently been fund in the monkey equivalent of the motor cortex and so, the proponents of the mirror neurons view this as evidence that the motor cortex responds to speech as supporting their view of speech perception. Some mirror neuron theorists argue further that mirror neurons play a role in modern humans because our speech production and perception processes evolved from an older manual gesture system (Gentilucci & Corballis, 2006). Evidence for mirror neurons in humans: o In Pulvermuller & colleagues study, participants listened to syllables that resulted from bilabial stops (/pa/, /ba/) or alveolar stops (/ta/, /da/) on listening trials. o On silent production trials, participants imagined themselves making those sounds. o Measurement of their brains activity were gathered using fMRI. Listening to speech caused substantial activity in the superior parts of the temporal lobes on both sides of the participant’s brains, but it also caused a lot of brain activity in the motor cortex in the experimental participant’s frontal lobes. Further, brain activity in the motor cortex depended upon what kinds of speech sounds the participants were listening tro. o whether the sound was a bilabial stop or alveolar stop. motor theory explains these results by arguing that the same brain areas that produce speech are involved in perceiving it. In another study, when TMS was applied to a participant’s motor cortex, participants were less able to tell the difference between two similar phonemes. Further, when people listen to speech sounds that involve tongue movements, & have TMS applied to the parts of the motor cortex tat control the tongue; increased MEP are observed in the participants tongue muscles. All of these experiments show that the motor cortex generates neural activity in response to speech; consistent with motor theory of speech perception. Challenges to the Motor Theory of Speech Perception o some challenges to motor theory are rooted in the strong connection it makes between perception & production. o infants for example, are fully capable of perceiving the differences between many speech sounds, despite the fact that they are thoroughly incapable of producing those speech sounds (Eimas et al., 1971). o to account for this result, we either have to conclude that infants are born with an innate set of speech - motor representations or that having a speech - motor representations is not necessary to perceive phonemes. additional experiments have also cast doubt on whether speech - motor representations are necessary for speech perception. o no one would suggest, for example that non - human animals have a supply of speech - motor presentations, especially if those animals are incapable of producing anything that sounds like human speech. Two such animals are Japanese Quail & chinchillas. o Once they are trained to respond to one class of speech sounds & refrain from responding to another class; they demonstrate aspects of speech perception tat resemble human performance; i.e. categorical perception & compensation for co -articulation. because these animals lack the human articulatory apparatus, they cannot have the speech motor - representations; but as they respond to aspects of speech very much like humans do, motor theory’s claim that speech motor representations are necessary for speech production is threatened. further, research with aphasic patients casts further doubt on the motor theory. o Broca & Wernicke showed that some brain damaged patients could not produce speech but understand it & vice - versa. o the existence of clear dissociations between speech perception & speech production provides strong evidence that intact motor representations are not necessary for perceiving speech. Also, if speed perception requires access to intact motor representations, then brain damage that impair spoken language output should also impair spoken language comprehension; but this pa\ern does not appear much of the time. Another problem for either account is that there is a many -to -one mapping between gestures and phonemes. o i.e. the same speech sound can be produced by different articulatory gestures (MacNeilage, 1970). o more specifically, different people can produce the same phoneme by using different configurations of the vocal tract; because the vocal tract offers a number of locations where the air flow can be restricted & because different combinations o air - flow restrictions have the same physical effect; they wind up producing acoustic signals that are indistinguishable to the perceiver. this means that there is no single gesture for syllable like /ga/. Studies involving the production bite - block vowels also show that very different gestures can lead to the same or nearly the same set of phonemes. The motor theory can account for this set of findings in one of two ways: o either by proposing that more than one speech - motor representation goes with a given phoneme or that there is a single set “prototype” of speech -motor representations & that an acoustic analysis of speech signals determines which of these ideal gesture most closely matched the acoustic input. Both, violate the spirit of the theory! Other Theories of Speech Perception The General Auditory Approach to Speech Perception o starts with the assumption that speech perception is not special (Diehl & Kluender, 1989; Pardo & Remez, 2006); instead “speech sounds are perceived using the same mechanisms of audition and perceptual learning that have evolved in humans… to handle other classes of environmental sounds” (Diehl et al., 2004). Researchers in this tradition look for consistent pa\erns in the acoustic signal for speech that appear whenever particular speech properties are present. further, they seek to explain commonalities in the way different people and even different species react to aspects of speech. o for e.g. some studies have looked at the way people and animals respond to voicing contrasts (the difference between unvoiced consonants like /p/ and voiced consonants like /b/). o these studies have suggested that our ability to perceive voicing is related to the fundamental properties of the auditory system. i.e. we can tell whether two sounds occurred simultaneously if they begin more than 20ms apart. o if two sounds are presented within 20 ms of each other, we will perceive them as being simultaneous in time. if one starts 20ms before than the other, we perceive as occurring in a sequence, one before the other. o the voicing boundary for people & quail sits right at the same point. o if vocal fold vibration starts within 20ms of the burst, we perceive the phoneme as voiced; but if there is more than a 20ms gap between the burst & the vocal fold vibration, we perceived an unvoiced stop. Thus, this aspect of phonological perception could be based on a fundamental property of auditory perception, rather than the peculiarities of the gestures that go into the voiced & unvoiced consonants. the general auditory approach does not offer an explanation of the full range of human (or animal) speech perception abilities. it’s chief advantage lies in its ability to explain common characteristics of human & non - human speech perception & production. since the theory is not commi\ed to gestures as the fundamental unit of phonological representations, it is not vulnerable to many of the criticisms leveled at the motor theory. The Fuzzy Logic Model of Speech Perception (FLMP) one of the be\er known approaches within the general auditory tradition, incorporates the idea that there is a single set of “ideal” or “prototype” representations of speech sounds, as determined by their acoustic characteristics (Massaro & Chen, 2008). Acc. to the FLMP, speech perception reflects the outcomes of two kinds of processes: o bo6om - up & top - down: the bo\om up processes are those mental operations that analyse the acoustic properties of a given speech stimulus. these processes activate a set of potentially matching phonological representations stores representations of phonemes are activated to the degree that they are similar to acoustic properties in the speech stimulus; more similar phonemes a\ain higher degrees of activation, less similar phonemes a\ain lower degrees of activation. top - down processes are this mental operations that use information in the long - term memory to try & select the best possible candidate from among the set of candidates activated by the bo\om up processes. o this may be specially important if the incoming information is ambiguous or degraded. for e.g. when the /n/ phoneme precedes the /b/ sound ( as in lean bacon), often times coarticulation makes the /n/ phoneme comes out sounding more like /m/. So, when someone listens to lean bacon, bo\om - up processes will activated both the prototype /n/ & /m/ phoneme, because the actual part of the signal will be intermediate between the two types. Acc. to the FLMP, our knowledge the lean bacon is a likely representation in English should cause us to favour the /n/ interpretation. However, if the /n/ sound were in a non - word, such as pleat bacon, a listener would be more likely to favour the /m/ interpretation, because the opening sound would not receive any support from top - down processes. This tendency to perceive the ambiguos speech stimuli as real words if possible is known as the Ganong Effect, after William Ganong (1980). FLMP, also offers a mechanism that can produce phonemic restoration effects (Sivonen et al., 2006). o phonemic restoration happen when speech stimuli are edited to create gaps. for example. remember the legi(cough)lators experiment. o these phonemic restoration effects are stronger for longer than shorter words and they are stronger for sentences that are grammatical and make sense than sentences that are ungrammatical & don’t make sense. o further, the specific phoneme that is restored can depend on the meaning of the sentence that the edited word appears in. for e.g. if you hear The Wagon lost its (cough)eel, you will most likely hear the phoneme /w/ in place of the cough. But if you hear The circus has a trained (cough)eel, you will more likely hear the phoneme /s/. Research involving ERPs show that the nervous system does register the presence of the cough noise very soon after it appears in the stimulus (about 200ms). All of these suggest that a variety of possible sources of top - down information affects the way the acoustic signal is perceived. Further they suggest that the perception of speech involves analysing the signal itself as well as biasing the results of this analyses based on how well different candidate representations fit in with other aspects of the message. These other aspects could include whether the phonological representations results in a real word or not, whether the semantic interpretations of the sentence makes sense or how intact the top - down information is. To Sum Up References Traxler, M. J. Introduction to Psycholinguistics: Understanding Language Science. Wiley – Blackwell. Indian Institute of Technology Kanpur In Collaboration with National Program on Technology Enhanced Learning (NPTEL) Presents Course Title: Basic Cognitive Processes By: Dr. Ark Verma, Assistant Professor of Psychology, Department of Humanities & Social Sciences, IIT Kanpur Lecture 24: AOention -I Some Key Questions… Is it possible to selectively focus on one object/event, while many others are simultaneously going on? If yes, then under what conditions? What does research on aOention tell us about multi – tasking? Is it that we are not aOending to all other information that we are not focusing on? Preliminary Definitions… A"ention: the ability to focus on specific stimuli or spatial locations. Selective A"ention: focusing aOention on a specific object, event or location. Overt A"ention: the process of shifting aOention from one place to another by moving of eyes to those specific objects or locations. Covert A"ention: when aOention is shifted without the actual movement of the eyes. Divided A"ention: the ability of aOending two objects at the same time. Image Source: Goldstein (2011). Cognitive Psychology_Connecting Mind, research & Eveyday Experience. Cengage Learning AOentional Processes: Visual Search Search refers to our behaviour of scanning the environment looking for particular features – i.e. actively looking for something when one is not aware of the location it will appear. Search becomes more difficult by distracters, i.e. non – target stimuli that divert our aOention away from the target stimulus. o False alarms usually arise when we encounter such distracters while looking for the target stimulus. for e.g. counterfeits. the number of targets & distracters affects the difficulty of the task. o e.g. try to find T in the two figures, Panel A & B An interesting finding is the display size (i.e. the number of items in a given visual array) effect, which is the degree to which the number of items in a display hinders the search process). Image Source: Sternberg & Sternberg (2011). Cognitive Psychology. Wadsworth Publishing. 6th Ed. (p. 143). Distracters cause more trouble under some conditions than under others. o we conduct a feature search, when we simply scan the environment for a specific feature (Treisman, 1993). Distracters play liOle role in slowing our search in this case. for example, finding O in the panel c. because O has a distinctive form as compared to the rest of the items in the display; it pops out. Features singletons, i.e. items with distinctive features stand out in the display (Yantis, 1993); when feature singletons are targets, they seem to grab our aOention; even those that may be distracting. Image Source: Sternberg & Sternberg (2011). Cognitive Psychology. Wadsworth Publishing. 6th Ed. (p. 144). on the other hand, when the target stimulus has no unique or even distinctive features. In these situations, the only way we can find such items is by conjunction search, i.e. we look for a particular combination (conjunction) of features. for e.g. the only difference between a T & a L is the particular integration of line segments. Both leOers comprise a horizontal line and a vertical line. The dorsolateral prefrontal cortex as well as both frontal eye fields & the posterior parietal cortex play a role only in conjunction searches, but not so in feature searches (Kalla et el., 2009). Theories of Visual Search o Feature - Integration Theory explains the relative ease of conducting feature searches and the relative difficulty of conducting conjunction searches. o Going by Treisman's (1986) model of visual search, for each possible feature of a stimulus, each of us has a mental map for representing the given feature across the visual field. Say, there is a map for every colour, size, shape or orientation. there is no added time required for additional cognitive processing. Thus during feature searchers, we monitor the relevant feature map for the presence of any activation anywhere in the visual field. o This monitoring process can be done in parallel (all at once). This will therefore show no display size effects. However, during conjunction searchers; an additional stage of processing is needed. During this stage, we must use our aOentional resources as a sort of mental glue; where in the two or more features are conjoined into an object representation at a particular location. In this stage, we can conjoin the features representation of only one object at a time. This stage, must be carried out sequentially, conjoining each object one by one. Effects of display size (i.e. a larger number of objects with features to be conjoined) therefore appear. Such a model of visual search is supported by the work of Hubel & Wiesel, (1979), who identified specific neural feature detectors. o these are cortical neurons that respond differentially to visual stimuli of particular orientations (e.g. vertical, horizontal, or diagonal). More recent research has indicated that the best search strategy is not for the brain to increase the activity of neurons that respond to the particular target stimuli; in fact the brain seems to use the more nearly optimal strategy of activating neurons that best distinguish the targets from the distracters, while at the same time ignoring the neurons that are tuned best to the target (Navalpakkam & IOy, 2007). Similarity Theory: According to similarity theory, Treisman’s data can be reinterpreted; as being a result of the fact that as the similarity between target & distracter stimuli increases, so does the difficulty in detecting the target stimuli (Duncan & Humphreys, 1992). Thus targets that are highly similar to distracters are relatively harder to detect. Targets that are highly disparate from distracters are relatively easy to detect. (e.g. finding the black circle in panel E). Image Source: Sternberg & Sternberg (2011). Cognitive Psychology. Wadsworth Publishing. 6th Ed. (p. 146). The target is highly similar to the distracters (black squares or white circles); therefore it is very difficult to find. Further, the difficulty of search tasks depends upon the the degree of disparity among the distracters; but it does not depend on the number of features to be integrated. for instance, one reason it is easier to read long strings of text wriOen in lower case leOers than text wriOen in capital leOers is that capital leOers tend to be more similar to one another in appearance. Lowercase leOers, in contrast, have more distinguishing features. e.g. try to find R in panels F & G. Image Source: Sternberg & Sternberg (2011). Cognitive Psychology. Wadsworth Publishing. 6th Ed. (p. 146). Guided Search Theory: An alternative to Treisman’s model is offered as the guided search theory (Cave & Wolfe, 1990; Wolfe, 2007). The guided search model suggests that all searches, whether feature searchers or conjunction searchers involve two consecutive stage. The first is a parallel stage: the individual simultaneously activates a mental representation of all the potential targets. The representation is based on the simultaneous activation of each of the features of the target. In a subsequent serial stage, the individual sequentially evaluates each of the activated elements, according to the degree of activation. After that, the person chooses the true targets from the activated elements. Acc. to this model, the activation process of the parallel initial stage helps to guide the evaluation and the selection process of the serial second stage of the search. o For example; try to find the black circle in panel H. the parallel stage will activate a mental map that contains all the features of the target (circle, black). Thus black circles, white circles & black squares will be activated. during the serial stage, one will first evaluate the black circle, which was highly activated. You will also evaluate the black squares & white circles as they are less activated & dismiss them as distracters. Image Source: Sternberg & Sternberg (2011). Cognitive Psychology. Wadsworth Publishing. 6th Ed. (p. 147). To Sum Up References Sternberg & Sternberg (2011). Cognitive Psychology Wadsworth Publishing. 6th Ed.

Use Quizgecko on...
Browser
Browser