Object Recognition Theories Lecture 5 PDF
Document Details
Uploaded by CleverDream
Tags
Summary
This lecture discusses two theories of object recognition: template theory and feature analysis. Template theory suggests objects are recognized by matching them to stored templates, while feature analysis suggests objects are recognized by combining their constituent features. The lecture also briefly touches upon the concept of repetition suppression in object recognition, and the concept of geons.
Full Transcript
Lec 5 THE CHALLENGE WITH OBJECT RECOGNITION • Object recognition goes from one of the 5 senses to Memory to Recognition • But how do we bridge the gap between the perceptual stimulus of the something (coffee cup) to knowledge representation in the brain i.e how does perception of coffee lead to reco...
Lec 5 THE CHALLENGE WITH OBJECT RECOGNITION • Object recognition goes from one of the 5 senses to Memory to Recognition • But how do we bridge the gap between the perceptual stimulus of the something (coffee cup) to knowledge representation in the brain i.e how does perception of coffee lead to recognition of coffee THEORIES THAT EXPLAIN HOW WE MATCH WHAT WE SENSE TO WHAT WE KNOW 1) TEMPLATE THEORY Template theorists argue that in our mind we have templates or object representations of things we have perceived before. They further postulate that we have different templates of everything we’ve sensed before in every way you’ve ever experienced it. Ex: you have templates of chocolate in general as different shades of brown, different tastes (sweet-bitter), different smells, different shapes in different angles etc. Real life applications are Barcodes wherein a unique barcode represents the same thing. As kids when we know less and have been experienced to less we are actually busy creating templates. While this theory is intuitive and interesting it suffers from economy i.e a brain would have to store trillions of templates because of the incredible variability in how things can be represented. From a Biological Plausibility standpoint storing trillions of templates is not the most efficient way to govern object recognition. The bot prevention program called Captcha asks you to recognize letter in a weird image is used to prevent bots from accessing tickets and buying them all out because we can program bots with template recognition. The weird way the letters are written messes with the bot’s template recognition but not ours, preventing them from acting like humans. However, it was difficult for us as well in the start to perceive random letter in this weird way so they switched to words, they did this because of Top-Down processing (we’re better at reading). This theory also argues that we recognize objects as a whole, but there are other ways we can recognize objects i.e analyzing their parts. 2) FEATURE ANALYSIS THEORY This theory is one of many that argue that instead of recognizing objects as a whole (Template theory) we identify a combination of features that is used for perceptual recognition. For example, identifying the letter T, there are certain features that maps on to what we’re seeing (the ones in orange), then the collection of those features most excite the mental representation of T. • So, this theory says we don’t just go from perceiving to representation, instead we first deconstruct the object to features that make it up and then we put them back together to recognize it. There are lots of examples of feature analysis theories one of them focusing on object recognition is the RBC (Recognition by Components) model developed by Biederman, he 32 | P a g e This study source was downloaded by 100000845920345 from CourseHero.com on 12-08-2023 07:59:08 GMT -06:00 https://www.coursehero.com/file/63879694/psyb55-lec-5pdf/ said that we recognize objects by 3D components that make it up and those are called geons. He argued that in our minds we have a collection of these geons which can be combined to form objects and what separates one object from another is the geons itself and how those geons come together. Example: In the fig, (a) is a briefcase and (b) is a drawer, they have the exact same geons but the way the geons come together is different which helps us differentiate them. Adding colour and texture makes it much easier to distinguish them. Some geons are more important than others, ex: a chair doesn’t necessarily need the back, as long as it has a base and 3-4 legs to stand on, it can be recognized as a chair. • He added 2 key stipulations: If you can’t see all the geons, it makes it much harder to recognize the object. As long as you can see the geons, you will be able to recognize the object at any angle or orientation. This theory however, has limitation in terms of recognizing particular objects. Ex: looking for your bag which has unique feature compared to other buckets and this theory has trouble accounting for this i.e. two objects that are the same but one is yours and they differ is small ways. - • So, do we recognize objects as parts or as whole? It’s an ongoing debate with lots of merit to both theories. More important question is Does the perspective of the object matter to be able to perceive it? PERSPECTIVE OF OBJECTS – DOES IT MATTER? • • A little back ground info – In earlier times, brain imagers discovered something called RS or Repetition Suppression. This phenomenon basically means that when the person is exposed to an object and they perceive it, they have brain activation but when they exposed to that same object again a few seconds later, their brain activation is less and if they’re exposed more times the more the brain activation lessens. N = Novel R = Repeated F = Fixation The researchers hypothesized that the first time (novel) we perceive an object we activate a lot of neural connections and areas in the brain, but the second time (repeated) when we perceive it again, we trim out a lot of unnecessary brain 33 | P a g e This study source was downloaded by 100000845920345 from CourseHero.com on 12-08-2023 07:59:08 GMT -06:00 https://www.coursehero.com/file/63879694/psyb55-lec-5pdf/ • • • • • • • activation that we don’t need, and with each additional time we activate even less of the unnecessary areas. So, the brain’s mental representation become tighter and more efficient with practice. So, coming back to whether the angle or orientation of an object matter we can examine this by showing a person an image of a coke bottle and when you show the same image again you get repetition suppression. But what happens when you show the person an image of a coke bottle and then another image of the same coke bottle but at a different angle. If RS is observed then the perspective of objects doesn’t matter in perceiving it but if RS is not observed then the perspective of object does matter. What we observe however, when this experiment was done was that o In the left fusiform area : There is RS when viewing the same object at a different angle (view invariant) o In the right fusiform area : There is no RS when viewing the same object at a different angle (view dependent) So, in conclusion, the left hemisphere of the brain is view invariant and the right hemisphere is view dependent. DORSAL AND VENTRAL STREAM These are two parallel streams of processing through which you perceive information. The Ventral Stream, is a pathway going from the occipital cortex (vision) to the temporal cortex (memory) which helps us identify what we are seeing (vision) with what we know about it (memory) and this link is crucial for visual perception. The Dorsal Stream, is a pathway going from the occipital cortex to the parietal cortex which helps us understand where things are and how to use them. FORM AND OBJECT PROCESSING IN THE VENTRAL STREAM One of the first experiments done on Ventral Stream was by Alex Martin at the national institute of health, he showed subjects images of things that kind of looked like objects but weren’t really objects and he showed them visual noise (static screen on tv). He showed them the visual noise becoz a) With the subtraction method you have to have something to compare to i.e a control group b) The visual noise had the same brightness and had the same visual features as the other images that had imaginary objects. The only difference between the 2 image types was that one had objects and one didn’t, it was the best comparison possible. Object - not an object control for visual stimulation in things like that c) Doing this he observed brain activation across the ventral stream. Info marching down that highway in both hemispheres d) He then showed the subjects images of real objects vs images of fake objects and the ventral stream lit up again We then explored this in more detail i.e how does the brain recognize different objects like faces vs tools (how do we recognize different things if we use the same processing highway 34 | P a g e This study source was downloaded by 100000845920345 from CourseHero.com on 12-08-2023 07:59:08 GMT -06:00 https://www.coursehero.com/file/63879694/psyb55-lec-5pdf/ • • Kalanit Grill-Spector made an image of the brain where it was flattened out like a piece of paper. White areas are the bulges/gyri and the grey areas are sulci. The below figure is of the occipital lobe/cortex, going ventrally or down to the right of the image is going along the ventral stream, going up is the dorsal stream. Mapping the active areas on an image like this while exposing the subjects to object recognition photos, here's what they found. Showing subjects photos of faces vs visual noise = ventral stream activation. Showing subjects animal photos vs visual noise = ventral stream activation. Showing subjects photos of places activated the ventral stream as well as spatial areas of the brain (dorsal stream) because we identify where this place is in addition to what it is in the first place. Ex: valleys, your house etc. what you learn from these observations is that when you see an object you will always activate the ventral stream to identify what it is. They used the flattened brain image because sometime when you’re looking at brain activity it is unclear as to whether the gyrus or sulcus is playing a role and so this flat image is helpful FACE RECOGNITION • Ppl hypothesized that there was a specific place in the brain that specialized in quickly and easily distinguishing faces. Nancy Kanwisher showed 10 subjects the images of faces alone (no hair or neck) vs images of objects and using the subtraction method she found activation of a brain region only for faces on the Fusiform gyrus which she called the FFA (Fusiform Face Area). FFA is a part of the ventral stream specific to recognizing faces. This experiment also identified the Para hippocampal place which plays a role in spatial processing while identifying places. • In the spirit of good science another person Mike Tar decided to disprove that the FFA was responsible solely for the recognition of faces. He got subjects to memorize structures called Greebles. Greebles are fictional creatures designed with unique structures and traits. The subjects were asked to memorize each greeble by name and family group so much so that the subjects can identify the greebles just as well as human faces. He did brain scans of the subjects before they memorized the greebles to show they didn’t use their FFA for identifying the greebles and then he did brain scans after the memorization which showed that subjects started using their FFA for things that aren’t human faces as well. After completing this experiment based on the findings he proposed that the FFA should be called Fusiform Expertise Area, i.e an area that plays a role in expert visual discrimination (telling the difference between two things when the differences are really small). Ex: Ornithologists/Car fanatics telling differences between thousands of similar birds/cars. • Another study was done based on this, they took brain scans of car experts and ornithologists and this is what they found (image below). The squares depict activation in the FFA for subjects. The study showed that the FFA was activated in ppl for faces and for skills they had to differentiate things they were experts in. But how much expertise is enough to activate the FFA? Ans: 1000s of hours of intentional focus to develop this skill. 35 | P a g e This study source was downloaded by 100000845920345 from CourseHero.com on 12-08-2023 07:59:08 GMT -06:00 https://www.coursehero.com/file/63879694/psyb55-lec-5pdf/ • • • • • Damaging the FFA results in Prosopagnosia (an inability to distinguish between faces), unless you also have expertise in recognizing minute differences in other things i.e. being a car fanatic or ornithologist in which case you would lose that ability as well. This is a type of visual agnosia, someone with Prosopagnosia knows that they are looking at a face but they can’t tell whose face it is. They can perceive the hair, neck, clothes etc. but not the face, one way they try to identify faces is by skin tone but that’s not reliable. They can’t even see their own face in the mirror and they can’t see faces in dreams. They can still distinguish between two different types of noses or lips but they can’t identify faces because in order to do that they need to consolidate the noses and lips and other features of a face to identify who that person is. The problem here is in the inability to consolidate all of the facial features to identify someone’s face. This is something that can be acquired via brain damage i.e Acquired Prosopagnosia or you can be born with it i.e Developmental Prosopagnosia. The most common symptom of developmental prosopagnosia is inability to make friends because you can’t tell kids apart, but since this social difficulty is faced by a lot of kids who aren’t suffering from developmental prosopagnosia it’s hard to diagnose it, also the brain of ppl with developmental prosopagnosia looks normal which makes it hard to diagnose. Rehabilitation to regain this ability involves having to relearn what a specific person’s face looks like from various angles and under different lighting. However, once you learn this you can’t extrapolate it to another person’s face, you have to learn a new person’s face from scratch. If you are born with sight and then lose your vision, the FFA will be engaged when you imagine a face or if you feel somebody’s face by touching it. Sometimes when a person forgets other ppl’s names it’s because they don’t pay much attention to the person’s face and not because they are bad at remembering names. This can be shown with increased activation of FFA you are more likely to remember names. There is something called Associative Visual Agnosia. Ex: you are shown a key, you can see what it looks like but you can’t identify it visually, if the keys are jingled you can identify the keys based on sound. This is a condition where your visual perception of things is fine i.e you can clearly see all components of object because you can draw them when asked and your memory of the object is fine because you can identify it thru other sense but you can’t associate visual perception to memory. Just like this there is Auditory Agnosia (sound) and Astereognosis (touch) where the bridges between the respective sensory regions of the brain are impaired. Apperceptive Visual Agnosia is basically associative visual agnosia but without your vision being fine, i.e. your memory what things are is fine but the agnosia stems from not being able see the objects properly as well as having a damaged bridge/connection between the visual and memory regions of the brain. Does your memory organization depend in vision? Turns out our long-term memory is represented the same way whether or not you have sight from birth. So, it seems to be that our brains have a memory organizer that is plastic enough to accommodate information into our memory as long as it comes in from one sensory process or another. This means that our studies on memory can be applied to people who haven’t had sight ever. Ask about this slide READING THE MIND - PATTERN CLASSIFICATION/MULTI VOXEL PATTERN ANALYSIS (MVPA) • • • • • • We expect the brain to have different neural representation when thinking about different things. Ppl were asked to think about different things and the experimenter tracked their subjects brain activity whilst thinking about the objects. They found that thinking about different objects illicit different patterns of brain activity. We can train a software to read minds and tell what we are thinking about. We can do this by taking scans of the brain 50+ times to see what our average brain activity looks like when we are thinking of a water bottle. We then do the same for other objects, after which we can provide these criteria to a software and ask it to detect what object we are thinking of by showing it scans of our brain. The software will then make an educated guess based on what amount of info is taught to it i.e. it’s 67% sure you are thinking of a water bottle. RS doesn’t cause a problem here because when you think of the object you are in fact using only the very necessary brain areas for it, so RS is actually benefitting this method of reading the brain. The accuracy of this method, is mediocre. Decoding accuracy in different areas of the brain (multi voxel) for the software while to trying to guess whether the person is looking at beach or a city is ~30% which is not that great. This method is not good at a recognizing abstract thought because it can only recognize what you teach it and something as complex as emotions and interpersonal relations is hard to break down and teach to a software. Ex: teaching the software to detect a lie, its accuracy would depend on how you teach it what a lie is in the many different forms of lying. The quality of this method is affected by the fact that different ppl might activate their brain differently while thinking of something i.e. brain activation is relative among ppl. 36 | P a g e This study source was downloaded by 100000845920345 from CourseHero.com on 12-08-2023 07:59:08 GMT -06:00 https://www.coursehero.com/file/63879694/psyb55-lec-5pdf/ Powered by TCPDF (www.tcpdf.org)