Cognitive Psychology Notes PDF
Document Details
Uploaded by WellBredTurtle345
Tags
Summary
These notes cover cognitive psychology concepts and categorization, exploring how selective attention influences our perception and decision-making. Historical examples and research studies are used to illustrate these ideas.
Full Transcript
**[COGNITIVE PSYCHOLOGY NOTES]** **[Week 2 -- Concepts & Categorisation]** **HMAS Sydney/HSK Kormoran** *The mysterious disappearance of the HMAS Sydney* *All of the Kormoran's armament was brought to bear on Sydney, concentrating on her bridge, torpedo tubes, and anti-aircraft batteries. The Sy...
**[COGNITIVE PSYCHOLOGY NOTES]** **[Week 2 -- Concepts & Categorisation]** **HMAS Sydney/HSK Kormoran** *The mysterious disappearance of the HMAS Sydney* *All of the Kormoran's armament was brought to bear on Sydney, concentrating on her bridge, torpedo tubes, and anti-aircraft batteries. The Sydney returned fire and hit the Kormoran's funnel and engine room whilst further artillery went over the ship. The Kormoran fired two torpedoes, one striking under Sydney's turrets and the other passing close ahead of the ship.* *The Sydney, crippled and on fire, steamed slowly to the south returning sporadic fire, still receiving steady hits from the Kormoran. Until approximately 2300 hrs, all that was seen was a distant glare then occasional flickering until midnight, when all trace of the Sydney disappeared.* **Survivors of the HSK Kormoran** - Survivors (Germans) were asked about what happened? Location of the HMAS Sydney? - Interrogated 7-21 days after the attack - The accounts of the Kormoran survivors didn't match up and HMAS Sydney couldn't be found **Were the Germans lying?** - For several decades, most Australians concluded that the Germans must be lying - Conflicting accounts were part of a ploy to mislead the Australians - Ship hunters typically ignored the German accounts **Folk Devil Theory** - Folk Devil Theory introduced by Sociologist Stanley Cohen in 1972 - Three stages in the media's reporting on folk devils: - Symbolisation: the folk devil is portrayed in an oversimplified, easily recognisable, stereotyped fashion (e.g. you are explicitly told what important attributes you should pay attention to) - Exaggeration: the facts of the controversy surrounding the folk devil are distorted or simply made up (e.g. those attributes are linked to something negative) - Prediction: further immoral actions on the part of the folk devil are anticipated (e.g. change in representation affects decision making: prediction of future negative actions based on selective attention to negative attributes) - Real world examples: history of witch hunts, minorities and immigrants have often been seen as folk devils, anti-semitism (which frequently targets jews with allegations of dark, murderous practices such as blood libel, vaxxers or anti-vaxxers. - The West Memphis 3 were convicted of murder in 1994 largely on the basis that they listened to heavy metal music and dressed "weirdly". - Cognitive psychology can explain why we are so quick to focus in on one unimportant factor (heavy metal music, being German) while other relevant factors (e.g. more valid forensic evidence) are ignored. - Selective attention filters our unattended information, changes our representations, changes how we perceive similarity, and impacts our decision-making. **Categorisation determines the answer to the question 'why don't their reports match?'** - Sailors/experts, skilled in navigation, information about their location was critical to their survival; OR - Enemy, untrustworthy, lying **Selective attention and categorisation** - When we categorise, our attention is drawn to salient (attention-grabbing) or relevant attributes. - In turn, this influences how we generalise to new examples. - How do you know attention is important in categorisation? - Shepard, Hovland & Jenkins (1961) - Garner (1974), Lin & Little (2024) - Nosofsky (1986) **Shepard, Hovland & Jenkins (1961)** - The purpose of this study was to examine how people learn categories that differ in the number of features that define the category. - Over time, people learn that only candle/lightbulb is relevant for sorting into categories A black text on a white background Description automatically generated ![](media/image2.tiff) **Category Learning Difficulty** - Paying attention to one dimension accentuates differences on that dimension e.g. makes the objects which differ in colour appear even more different. - Because attention has limited capacity, attention decreases for other dimensions e.g. makes objects that differ in unattended dimensions appear more alike on that dimension. - Paying more attention to colour is helpful - Paying attention to shape is not helpful - Because only a single dimension is relevant, learning results in selective attention to only that dimension (colour). - This has the effect of making all the black objects seem more similar to each other (likewise for the white objects), and all of the black objects very dissimilar to the white objects (and vice versa) - Results from the category learning experiments shows that Type I categories were learned the fastest, followed by Type II, then Types III to V, and the Type VI. ![A close-up of several lines Description automatically generated](media/image4.png) - In contrast to the Type I problem, for the Type VI problem, where equal attention to all dimensions is required, sometimes objects which belong to one category are equally similar with objects from the other category. - That means that sometimes objects from one category might be confused with objects of another category. (Higher similarity means higher confusability). **Garner (1974)** A diagram of red squares Description automatically generated ![A white text with black text Description automatically generated with medium confidence](media/image6.png) A table of text with numbers Description automatically generated with medium confidence - Sorting along one dimension is equal regardless of what dimension is used. - Like Shepard, Garner showed that the different between integral and separable stimuli comes down to attention. **Lin & Little (2023): Increasing the number of stimuli in each category** - Attending to the relevant dimension makes the categories effectively all the same - Collapses the stimuli along the irrelevant dimension **Nosofsky (1986)** - Transfer tests provide a clue as to what people represent about the category ![A close-up of text Description automatically generated](media/image8.png) A different colored squares with black text Description automatically generated **If categorisation is harmful, why do we do it?** - Benefits of categorisation: - We think in categories - Categorisation forms equivalence classes that allow us to decide appropriate action - Categories allow us to infer ambiguous or missing features - Categories reduce the complexity of the environment (structure our knowledge) - Allow for generalisation to new examples **The pulsating heartbeat of thought** - At every moment, we are faced with an indefinite number of overlapping and intermingling situations - Understanding the world involves the automatic and effortless evocation of categories - Examples: - Language and communication - Speech is not just a series of sounds - We group sounds into words and words in the parts of speech (nouns, verbs, adjectives), which we combine into sentences, that allow us to convey ideas. - Visual scene perception - Objects in our perceptual field are not just wavelengths of reflected light - We use edge detection, contour and texture perception, colour, movement to identify objects by their categories allowing to make sense of visual scenes - Medical diagnosis - Patients are not just a collection of symptoms - Symptoms are grouped into diseases and conditions that allow doctors to make diagnoses and treatments - Education - Subjects are not just a hodgepodge of ideas - Disciplines are groups into categories like sciences, humanities, and arts that allows for the organisation and prioritisation of learning, which helps understanding of relationships between fields of study **Concepts vs Categories** - A concept refers to a mentally possessed idea or notion, whereas a category refers to a set of entities that are grouped together. - A set of sounds /ba/, /pa/, /ta/, /ga/ can be grouped together into different phoneme categories. - The concept of the phoneme groups might include an understanding that voice onset timing differentiates /ba/ from /pa/ and that place of articulation differentiates /ta/ from /da/. - The concept dog is whatever psychological state signifies thoughts of dogs. The category dog consists of all the entities in the real world that are appropriately categorised as dogs. - A concept refers to the mental representation and a category refers to the actual collection of objects. - Different medical conditions like diabetes, hypertension or asthma can be thought of as categories. The concept of diabetes would include knowledge of important relations between blood sugar, insulin resistance, and other symptoms. - The category of sphynx cats would include all individual sphynx cats. The concept of sphynx cats would include knowledge of specific characteristics: hairless skin, cheekbones, bat ears. **Grounding by similarity** - We can generally think of categories in terms of how much they are grounded in similarity. Most natural kinds and many artifacts are characterised by members that share many features and are tightly linked to perception. - Natural categories: dogs, birds, apples - Man-made artifacts: chairs, clocks, bicycles, cars - Ad hoc categories (share a common goal): things that you take camping, things that could be stood on to reach a lightbulb - Abstract schema or metaphors (metaphorical qualities): events in which a kind action is repaid with cruelty, metaphorical prisons, problems that are solved by breaking a large force into parts that converge on a target. - For abstract categories, members need not have very much in common at all perceptually -- an unrewarding job and a relationship that cannot be ended may both be metaphorical prisons, but the situations may share little other than this. **Stimuli have features, attributes, and dimensions** - Features and attributes are properties that are either on or off (have or don't have) - Dimensions are continuous properties like size **Which features do you include?** - The attribute 'doesn't own a raincoat' is true of almost all cats - Usual focus on positive attributes (things which category has rather than things which it does not) **Features vary in salience and validity** - Salience - What attracts our attention? Cuteness, funny features, playfulness etc. - Not all salient features are valid for defining the category - Validity - What is important for defining the category? - Carnivorous teeth, retractable claws, flexible spine, sense abilities, whiskers, posture, body temperature regulation, vocalisation **Benefits of categorisation** - Reduces the complexity of the environment - Crayola Colour Chart example (7 million discriminable colours, separated into categories of colour) - When we classify different objects as belonging to the same category, we treat them as equivalent. - We can respond based on category membership rather than as unique items. - The World Color Survey is a massive research project which attempts to understand how colors are categorized in different languages. The researchers studied 110 different languages, none of which had a written component, which ensured that only spoken word categories would be used to describe the colors. In the World Color Survey, a total of six to ten color names were identified as accounting for most of the colors black, white, red, green, yellow and blue (plus orange, pink, purple & brown) - Provides basis of deciding what constitutes appropriate action - A recently inseminated female mouse sniffs urine near her nest. If she categorizes it as from an unfamiliar male mouse, implantation and pregnancy are prevented (Bruce, 1959; Parkes & Bruce, 1962). - All organisms divide objects and events in the environment into separate classes or categories. If they did not, they would die and their species would become extinct. Therefore, categorization is among the most important decision tasks performed by organisms. - Once the category is known, we can interpret ambiguous features of the object allowing for identification. We feel as though we can recognise a pattern when we can classify it into a familiar category. E.g. the letter A. - Provides a means for identifying ambiguous or missing attributes. - Once you know the category it belongs to you can identify what it is meant to be. - Concepts allow for generalisation ![](media/image10.png) - Enables the organisation and relation of classes of objects and events **Importance of basic level categories** ![A close-up of a graph Description automatically generated](media/image12.png) - Only a few attributes were listed for superordinate categories, significantly more were listed for both basic and subordinate categories. The number of attributes listed for subordinate categories was only slightly more than the number of listed at the basic level. - Subjects listed different kinds of attributes at the different levels. The most frequently kind of attribute listed for superordinate categories was functional (e.g. keeps you warm, you wear it). Subjects listed noun and adjective properties at the basic level (e.g. legs, buttons, belt loops, cloth) and additional properties listed at the subordinate level were generally adjectives (e.g. blue) A screenshot of a computer screen Description automatically generated ![A screenshot of a computer Description automatically generated](media/image14.png) **Correlated features** - Basic level categories have correlated features. This allows the predictions of missing features. - For example, in the animal kingdom, flying is correlated with laying eggs and possessing a beak. There are "clumps" of features that tend to occur together. Some categories do not conform to these clumps (e.g. ad hoc categories), but many of our most natural-seeming categories do. - If we know something belongs to the category dog, then we know that it probably has four legs and two eyes, eats dog food, is somebody's pet, pants, barks, is bigger than a breadbox, and so on. Basic level category members share similarity across many perceptual dimensions/features. **Shape similarity** A close-up of a diagram Description automatically generated - In this experiment, Rosch traced the shapes of pictures of objects and then had subjects try to identify the novel shapes created by averaging the outlines. Basic level categories had shapes that were more similar to each other than objects within superordinate categories - subjects could easily identify averages from the same basic level concepts but often could not identify averages of superordinate concepts. - Rosch et al. also tested the hypothesis that similar shapes should help identification by presenting subjects with a category name and then asking them to identify a briefly presented picture of an object in that category. Subjects were faster at verifying that a picture of an object was a member of a basic-level category than they were at verifying category membership for either subordinate or superordinate categories. - For example, after hearing a category name and seeing a picture of a kitchen table, subjects were faster at verifying that the picture was a table than they were at verifying that the picture was "furniture" or a "kitchen table" **Naming** - When shown a picture, people tend to use the basic level name - E.g. when shown a dog they call it dog rather than an animal or bulldog. - Basic level names are learned first **Summary** - Basic level categories - Provide more information about the attributes of members - Are identified by shape more accurately and faster - Are retrieved more often as the category name - Are learned first - Why? ![A chart with text on it Description automatically generated](media/image16.png) **Levels of categorisation** - According to Rosch: basic level categories are the level at which cue validities and category resemblance are maximised. - Cue validity - How much does property X define the category - (Probability that an object belongs to category Y given that it has property X) - Cue validity is high if a feature only appears in members of objects of one category and not in members of another category; and low if it appears in many objects from many different categories. - E.g. the category 'lion' and the property 'four legs' co-occur frequently, however, the property 'four legs' also occurs a lot across all animals so the cue validity of 'four legs' is rather low. On the other hand, 'lion' and 'mane' co-occur frequently but 'mane' occurs much less often than 'four legs' so the cue validity of 'mane' for the 'lion' category is higher. - 'Gills' might a perfect cue for fish. 'Has leaves' might be a perfect cue for plant. - Cue validity is not maximised at the basic level, it is maximised at the superordinate level. - Consider the feature ''has wings'' and the hierarchy of categories bird, animal. P(bird \| wings) is lower than P(animal \| wings), because there are many animals with wings that aren't birds, like bats and insects. Since the animals category includes all birds, as well as many other things with wings, if something has wings, you can be surer that it is an animal than a bird. For categories that are nested, cue validity will never be lower for a more general category than for one of the categories it includes. - Category validity -- the probability of having a feature given you are a member of the category (P(wings\|bird) -- category validity is basically the opposite of cue validity so is maximised for subordinate categories - Category resemblance - How much do the members of a category go together? - (sum of all features shared by members of a category have in common minus the sum of the distinctive features that are idiosyncratic to category members A group of furniture with text Description automatically generated ![A red and green chair Description automatically generated](media/image18.png) **Exploring category resemblance further** - What makes a category a category? - Classical view: there are rules that define categories - Fails to account for family resemblance and typicality effects **Classical view of categorisation** A white paper with black text Description automatically generated - Categories are comprised of a list of necessary and sufficient conditions that define whether an object belongs to the category or not. Categories have well-defined boundaries and all members of a category are in a sense equivalent. **Challenges to the classical view** ![A white paper with black text Description automatically generated](media/image20.png) - There may be some scientific and taxonomic concepts for which the category rules can be written down but even these are debated, is Pluto a planet or celestial body? As our knowledge changes, our concepts change as well. Category boundaries, to the extent they exist, appear to be fuzzy. A group of animals in a circle Description automatically generated ![A group of animals in a circle Description automatically generated](media/image22.png) A group of animals in a circle Description automatically generated ![A group of animals in a circle Description automatically generated](media/image24.png) - In the 1970's, Eleanor Rosch identified a number of empirical phenomenon that the classical view could not explain. In a series of experiments, she demonstrated the empirical phenomenon of typicality. Natural categories (and some artificial categories), contain graded structures in which certain exemplars are considered more representative (or typical) of the category and other exemplars are less representative (or atypical) of the category. - In contrast to the classical view, objects are not either in or out of a category based on whether they have some attribute or meet some definition, but. Hampton (1979) asked subjects to rate a number of items on whether they were category members for different categories. He did not find that items were segregated into clear members and nonmembers. For example, he found that sinks were considered as just barely members of the kitchen utensil category but sponges were just barely excluded. Seaweed was just barely included as a vegetable as were tomatoes and gourds. - Categories have a graded structure -- some category members are better than others - This figure above shows the average frequency of an attribute plotted against the number of items for which that attribute was listed. Most of the attributes that were listed only applied to a single object. For instance, look at the left hand side of the graph: the frequency of attributes which only apply to one item is much higher than any of the other points on the graph. There were very few attributes which were shared by several objects. On the far right side of the graph, we see that the mean frequency of attributes which applied to all 20 items is very low. - The items were also rated on their **[typicality.]** The correlations below complete the picture. For each category, there was a very high correlation between the number of shared attributes and the item's typicality. The conclusion is that the more typical an item was, the higher number of attributes that it shared with other objects. So highly typical items contained the features shared by lots of different objects, but there were many objects made up of idiosyncratic features that were not typical of the category. Yet nonetheless, these atypical items were also category members. ![A diagram of a bird Description automatically generated](media/image26.png) - The figures shows similarity maps for the BIRD category and the MAMMAL category. - You can see that the bulk of the examples tend to cluster together. You can confirm that yourself that these examples are typical of the bird category. - Two smaller clusters of atypical birds are also evidence. Loosely, these represent subordinate categories of birds of prey and fowl. A graph with a red line Description automatically generated ![A diagram of a group of people Description automatically generated](media/image28.png) - Typicality can affect generalisation. For example, certain information that is learned about a typical exemplar (e.g. a horse) can lead to the conclusion that the same information applies to other members from the same category. - By contrast, information learned about an atypical exemplar (a bat) is often considered specific to that instance or others similar to it. Therefore doesn't transfer to other mammals. **Returning to the HMAS Sydney** - 1991, John Dunn & Kim Kirsner set out to determine whether they could use the interrogation reports to deduce the location of the HMAS Sydney. - What are the typical features of true stories? - True stories are collections of words and text - Natural language has particular statistical structure - The most frequent word appears twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word and so on - Zipf's law: the frequency of any word is inversely proportional to its rank. ![](media/image30.png) A graph of numbers and a chart of numbers Description automatically generated with medium confidence ![A graph of a graph showing the results of a graph Description automatically generated with medium confidence](media/image32.png) - The results are consistent with the assumption that the survivors were telling the truth. - Due to memory the reports differ. Not lying. - More evidence, if more evidence was required, came from the extra-ordinary number of survivors who pointed to 26° South 111° East. Many of the crew were 'in a position to know', and this knowledge was critical to survival for the crew in the lifeboats. - Here we see the failing of taking into account all of the information and relying on a single 'trusted' statement. **Category induction** - The more typical the examples are of the category in the premises, the more the conclusion is supported by the premises. Robins are more typical than penguins so the conclusion at the top is supported more than the conclusion at the bottom. A diagram of a diagram of a person\'s reaction Description automatically generated **Category induction: Effect of typical conclusion examples** - Generalisation is greater to more typical category members. Conclusions are supported more by premises that are similar to them. Bluejays and robins are more similar to sparrows than blue japs and robins are to geese. ![A diagram of a serotonin Description automatically generated](media/image34.png) **Category induction: Effect of conclusion category size** - Generalisation is greater to more specific/smaller categories. More specific conclusions are supported more than less specific conclusions. All birds is more specific than all animals. A diagram of vitamin k Description automatically generated **Category induction: Effect of premise example variability** - Generalisation is greater when the examples are more variable. The less similar the premises are among themselves, the more they confirm the general conclusion. Hippos and hamsters are not very similar, so again the top conclusion is supported more than the conclusion at the bottom (hippos and rhinos). ![A diagram of a blood vessel Description automatically generated](media/image36.png) **Summary** - Members of natural categories share differing levels of family resemblance. - Typical instances are verified more rapidly, learned faster, primed more easily, and generalised more readily. - Generalisation is affected by typicality of instances, typicality of category, category size, category variability. **[Week 3: Learning]** **What is learning for?** - To make **[predictions]** about events in an environment and to **[control]** them. - Learning exists to allow an organism to exploit and benefit from **[regularities]** in the environment. **Will it rain tomorrow?** - Which cues should an observer attend to? People can shift their attention to the cues (e.g. that predict the weather), and people can learn to associate those cues with specific outcomes (such as rain or shine). A white circle with a circle in the middle Description automatically generated - E.g. Moon halos are caused by the refraction and reflection of moon light through tiny ice crystals suspended in high, then cirrus clouds. - The presence of a moon halo is a sign that it will soon rain. First nation peoples have learned to narrow down this prediction by looking for other predictive cues. - There are many examples of how First Nations people have learned to predict changes in their environment from careful observations of their environment and cues ranging from sunspots, to the retrograde motion of Venus, to the scintillation (or twinkling of stars), to changes in the colour of variable stars. - Causal learning **Learning is about regularity and invariance** - Honey bee example: - After locating a food source, honey bees will return to the hive and perform a 'waggle dance' - The length of the waggle corresponds to distance outside of the colony, while angle of the waggle corresponds to the angle from the sun that the bee must fly. ![File:Waggle dance.png](media/image38.png)File:Waggle dance.png - The behaviour is not completely innate. - Young adult bees start by performing tasks inside the hive but end their lives as foragers. Before they start leaving the hive, they spend around 4 days observing waggle dances. ![A white paper with black text Description automatically generated](media/image39.png) - In this experiment above, group 2 does show a decrease in their directional errors and improve many aspects of their dance. However, their run durations, which indicate distance to the food source, remain overly long across their foraging life, as do their flight times returning form the food source. - Bees also prefer to watch experienced dancers, but after 20 days there were more followers for Group 1 than Group 2 **How do we learn?** - Simple language learning (understanding what 'Gavagi' means) - Words and sentences can have multiple meanings - Understanding meaning often requires context that may not be explicitly available. - There isn't one-to-one mapping between words in different languages. **Simple learning example using an alien language -- Zylian (AI made it up)** A black and white image of a person pointing at a rectangular object Description automatically generated - With repetition, an association develops between the cue (the words) and the stimulus (the possible referent). - Similar to classical conditioning (the pairing of referent and cue is reinforced through repetition). ![A close-up of a family photo Description automatically generated](media/image41.png) - Co-occurrences lead to a strengthening between cues and outcomes **Categorisation Demo** **What determines how stimuli are divided into categories?** - Condition1: Filtration (Horizontal/vertical) - ![](media/image43.png)![](media/image45.png) - Condition 2: Condensation (diagonal) ![](media/image47.png) **Data and predictions** - A simple model which just learns associations between stimuli and responses predicts no differences between the filtration and condensation conditions. - Why? In both conditions, each stimulus is paired with each category the same number of times. ![A graph of different types of data Description automatically generated with medium confidence](media/image49.png) - If we also allow attention to shift to the dimensions of the stimuli that are most relevant or diagnostic, then this model predicts that the filtration condition will be learned faster. - Why? When we shift attention to diagnostic features, we strengthen the association with the category label only for those features. A graph of different types of data Description automatically generated with medium confidence **Learning involves attention** **Blocking paradigm** - In the blocking paradigm, the mouse first learns that the red light predicts food reward. - This is based on classical conditioning - Food release (US) -- approach left food tray (UR) - Red light (cue) + food release (US) -- go left (UR) - Red light (CS) -- go left (CR) - Then the cues are paired with other cues and new cues and responses are examined. ![A diagram of a training program Description automatically generated with medium confidence](media/image50.png) - Learning is not just a reflection of co-occurrence. - The bell is paired with the food equally often as the blue light is paired with the juice. If co-occurrence were the sole factor, then we would expect 50:50 responses between the food and juice in the test phase. - The early pairing of the red light and the food is important. - By the late training phase, the red light is already paired with the food. The data suggest that little is learned about the bell because of this. We say that the 'bell' was 'blocked' by the prior learning about the red light. - This causes the mouse to go right for the juice more than the left for the food as it has learned that the blue light provides the juice reward and the bell is blocked by the red light. A screenshot of a computer screen Description automatically generated - An attention account of blocking explains that because attention is shifted to the red light, there is not much attention left over for the bell. **Highlight effect** ![A diagram of a juice box Description automatically generated with medium confidence](media/image52.png) A screenshot of a computer screen Description automatically generated - An attention account of highlighting explains that because cue A (red light) is already associated with outcome X (food), attention is shifted to cue D (alarm), which drives the final response. - Attention is shifted to D because it alone predicts the unusual event Y (Juice) **Other evidence for the role of attention** - Experts vs novices - Insight problem solving **Expertise** - Represents an extreme form of learning or learning to a high-level of performance - Experts have learned highlight specific things about specific objects, situations, events - What an expert 'sees' in a situation will be different from what a novice 'sees'. ![A diagram of different goals Description automatically generated](media/image54.png) - Your judgements of similarity depends on how much you know about this game - Novices base their similarity judgements on superficial surface features, experts base their judgements on deep features like how many moves between states. **Different experts categorise things differently** - Landscaping experts categories trees according to their specific goals - Shade trees, fast-growing trees, surface covering etc. - Taxonomists sort trees into biological kinds - Naïve subjects sort trees by their surface appearance **How are experts different from novices?** - Different experience and knowledge leads to a difference in how attention is allocated - Attention to deep structure verse surface structure - Attention to contextually-relevant features - E.g. video of chess grandmaster recalling grid -- was able to better recall the position of chess pieces when it made sense and pieces were positioned correctly in the middle of a game. However when these pieces were randomly scattered around the board, their recall accuracy decreased significantly. - This is referred to attention to relations rather than individual parts (i.e. chunking) **Problem solving** - Opposite end of the learning spectrum from expertise - E.g. radiation problem and defense problem A cartoon of a castle Description automatically generated ![A diagram of a person\'s body Description automatically generated](media/image56.png) **Stages of problem solving** - Preparation - Search for a solution using logic & reasoning - If a solution is found, stop here - Incubation - Attention not devoted to the problem - Illumination, insight, AHA!! - 'Spontaneous' manifestation of the problem solution into consciousness - Verification - Use of logic and reasoning to confirm the solution - If you invalidate the solution, return to the preparation stage **Incubation** - Is it real? Out of 39 experiments, incubation improved problem solving about 75% of the time - Different factors influence whether incubation is successful or not - Longer periods of incubation are positively correlated with success - For incubation periods less than 1 hour, 30 minutes is optimal - For incubation periods over 1 hour, the longer the better - Better preparation increases the effectiveness of incubation - Silverira (1971) had people solve a difficult problem in 3 different conditions - Work continuously for 35 minutes - Work for 3 minutes, incubation, work for 32 minutes - Work for 13 minutes, incubation, work for 22 minutes - More people solved the problem in the last condition than in the first two **Theories of incubation** - Better preparation and hints help insight problem solving - Why? - Recovery from fatigue: preparation is cognitively draining, incubation allows for recuperation of cognitive abilities - Forgetting of mental sets: false assumptions can block the path to the solution **Mental sets** - **[Einstellung Effect -- when an idea that comes to mind immediately in a familiar context prevents alternatives being considered ]** - Related to the idea of negative transfer and 'strong-but-wrong' errors - 'strong and right but not the best' A diagram of a beaker Description automatically generated **Einstellung effect** - Not all learning is beneficial (can lock in patterns of thinking) - **[Einstellung effect: existing knowledge or habitual ways of thinking influence our problem solving. ]** - After solving problems in which Solution 1 worked, participants were less likely to use the easier solution 2 compared to a control condition - And more likely to get stuck if Solution 1 no longer worked Chess example: ![A screenshot of a game Description automatically generated](media/image58.png) A close-up of a chess board Description automatically generated - The second solution is less well-known and takes fewer moves to achieve check mate. However, expert chess players waste a lot of time looking for a familiar strategy that won't work, before finding another solution. **Insight** - Not just another problem solving step - Sudden jump or transition in understanding -- not 'knowing' to 'knowing'. **Theories of incubation and insight** - Unconscious work - Solution to the problem is developed unconsciously and 'delivered' to consciousness once a goal is reached - Difficult to assess experimentally. - Conscious work - Work on the problem takes place while attending to non-taxing activities (in the shower, driving) - Then attention is shifted quickly to the problem and any activity is forgotten, only the end state is remembered leading to a feeling of 'suddenness' **Attention, learning and problem solving** - Knowing what aspects of a problem to attend to is critical to problem solving - Learning and expertise illustrates the role of attention in the development of concepts ![A group of people with different facial expressions Description automatically generated with medium confidence](media/image60.png) **Summary** - Learning and insight benefit from attention - Learning also draws on prior knowledge to guide current learning and inference - Prior knowledge can be useful when that knowledge aligns with the current problem - Can also be unhelpful in the case of the Einstellung effect. **[Week 4: ANOVA (Analysis of variance)]** **Why statistics?** 1. Inference: making conclusions about a population based on sample data. 2. Comparison: assessing whether there are significant differences between groups. 3. Decision-making: Using statistical tests to inform decisions in various fields (e.g. education, medicine, marketing) a. Standardised tests are developed to measure how well students have mastered the content outlined in different curriculum. b. Clinical trials are research studies conducted to evaluate the safety, efficacy, and effectiveness of new medical treatments, drugs or interventions in humans. c. A/B testing is method of comparing two versions of a webpage, app or other digital content to determine which one performs better in terms of a specific metric, such as click-through rate, conversion rate, or user engagement. 4. Generalisation: applying findings from a sample to the broader population. 5. Quantifying uncertainty: providing a measure of confidence in the results and understanding the role of variability in data. Uncertainty, Error and Confidence in Data **Definition of ANOVA** - ANOVA standards for Analysis of Variance - One-way ANOVA = statistical technique to determine if two or more groups are statistically different from one another. - Works by testing the probability of the null hypothesis that all group means are equal. **Purpose** - Helps determine if a at least one group has a different mean from the other groups. - Assesses the effect of a single categorical independent variable on a single continuous dependent variable - Categorical = variables with distinct groups but not inherent ordering - Continuous = variables that can take on any value within a given range - A limitation of ANOVA is that it **[does not tell you which group is different]** **Key concepts** - **Null hypothesis:** all group means are equal - **Alternative hypothesis:** at least one group mean is different - **Between-group variance:** variability attributed between group differences - **Within-group variance:** variability within each group - **ANOVA works by comparing between group variance (which comes from your experimental manipulation) to within-group variance (which comes from noise or randomness alone)** - **F-statistic:** ratio between-group group variance to within-group variance - The values that come out of the ANOVA analysis are an F-statistic, which in turn allows you to determine a p-value - **P-value:** probability of observing the test statistic, F, assuming that the null hypothesis is true **Why do we need statistical tests?** - In any dataset, there is variability due to: - Measurement error - Experimental or observational conditions - Individual differences - Random fluctuations - Statistical tests help quantify variability and distinguish what is due to randomness (within-group variance) and what is due to effects we might be interested in (between-group variance) **Application of one-way ANOVA** - Research question: are there differences in final exam scores between students taught using traditional lectures, online courses, or blended learning - DV: final exam scores - IV: learning type - Research question: a medical facility is trialling three different medications to examine the reduction in blood pressure - DV: blood pressure - IV: medications **Experiment** - Each of these applications involves an experiment - Some independent variable is manipulated across several categorical conditions - Outcomes are recorded on one continuous dependent variable - The outcome of the experiment is data **Goals of statistical inference** - From the data, we want to: - Infer something about the population -- is our sample representative of the population? - Determine whether multiple samples come from the same or different populations -- is there a significant difference between groups? - Predict what the population will look like if we apply our manipulation -- how large is the effect size? - The effect size describes the magnitude of the observed differences between groups. **Did our manipulation work?** - Data are noisy and variable - Reflect the process we're interested in (our experiment) but also random variation - How can you determine whether the data were generated by the same underlying process or by difference processes? - If data in different conditions is generated from the same process then we would assume that the data should 'look the same' in both conditions **How should we compare groups?** - In psychology, the easiest way to compare data from different groups is to compare the mean or average score within each group. - If we do find that different conditions have different means, how can we be certain that these averages are reliable and not due to randomness? ![](media/image62.png) ![](media/image64.png) **Why would our data be affected by random variation?** - Because we are not dealing directly with populations but only samples from those populations - Populations: the entire collection of data (all possible measurements or outcomes) from the group that we are interested in - Samples: a random subset of a population which is representative of that population ![](media/image66.png) - The distribution that we're sampling from tells us the probability with which we are likely to observe samples with certain values. - The distribution is the mathematical representation of the population -- it captures all of the statistically important aspects of the population. - But when we have samples, we don't have access to the population. - We have to use the data to work out what the population looks like -- this is one form of statistical inference. **Determine whether two samples were generated from the same population** ![A graph of a function Description automatically generated](media/image68.png) - The primary question is whether our samples come from the same population or different populations. - E.g. in your lap experiment, we could test whether recognition or categorization instructions lead to different results -- if yes, then the data of participants in these conditions should look different. If no, then the data should look the same. The question we need to answer is how do we determine whether samples look the same or look different. - In summary: - Data are noisy - We must infer from the sample what we want to know about the population - For one-way ANOVA, we have more than two samples and we want to know if they're different - To understand one-way ANOVA, we can generalise from thinking about the probability of a single data point **How to decide if a datum comes from a distribution or not** A diagram of a function Description automatically generated **The normal distribution** - By integrating 'under the curve' we can compute the probabilities in different regions of the normal distribution ![](media/image70.png) - Extreme cases are rare (low probability) - Extreme cases are unlikely to have been generated from this distribution - 'Unusual' data results in a low p-value (when we examine the probability of being generated by some distribution) - Unusual cases occur with a probability less than 5% (p \<.05) -- rule of thumb for psych **What the p-value means: Part 1 (dealing with a single sample)** - The probability of our data being a sample from that population distribution is less than 1 in 20 - When we make inferences we have to allow for the possibility that we might be wrong - Being wrong 1 out of 20 times is the level of risk that we (as a discipline) are willing to take) -.05 is the alpha level or the Type 1 error-rate (false alarm error rate) A red arrow pointing to a number Description automatically generated - Random sampling necessitates that we will sometimes observe data with unlikely values (unusual data) by chance alone) - We can use the p-value to determine whether or not we think a data point comes from a particular population **What the p-value means: Part 2 (dealing with two samples)** - If our experimental manipulations have no effect then the observed variation is due to chance and our groups will not be different enough to say that they are truly different. - Chance variation is also known as within-group variation - If our experimental manipulation does have an effect, then it will push the groups apart more than just chance. - In your lab report, we are not trying to infer the population mean and the variance, we are trying to infer whether there is a different between our experimental conditions -- do our experimental conditions come from the same population or different population? ![](media/image72.png) - In a one-way ANOVA, the p-value tells us the probability of observing a different between our samples (i.e. our data) given that the null hypothesis is true. - P \<.05 means that if the Null hypothesis is true, then we have a less than 5% chance of observing the difference we found between our samples. When this occurs, we conclude that the Null Hypothesis is unlikely to be true. A graph of a person with dots and lines Description automatically generated **Relation to t-tests** - The t-test looks at the distribution of differences between the scores - What types of differences are expected under the Null hypothesis that there is no difference? - The null hypothesis predicts the difference should be 0 - But due to random variation there will always be some variation around 0 (the distribution describing the variation is called the t-distribution) - If we observe a difference which is not very probably under the t-distribution, then we say that we have observed a significant difference. - We can compare pairs of typical scores (means) using t-tests - A t-test compares means for two groups - You can use a one-way ANOVA to compare two groups, but one-way ANOVA can also be compared to 3 or more groups **ANOVA: The F-statistic** - Like the t-test provides a statistic (the t-value) and an associate probability of that t-value (i.e. the p-value) - The ANOVA provides a statistic called the F-statistic that has its own associated p-value - The F-statistic is also called the f-value or f-ratio **A conceptual explanation of ANOVA** - The F-statistic: - Compares variation between and within groups - A large f-statistic suggests that the samples come from different populations - Which means that the null hypothesis is less likely to be true (Data below is for groups with a very similar mean, demonstrating very little difference between each group) ![A group of graphs and numbers Description automatically generated](media/image74.png) - SS(within) = one-way ANOVA works by calculating the sum of squares within each group and adding these values together. - SS(total) = we then compute the total sum of squares after combining all of the data - SS (between) = we find the sum of squares between groups by subtracting SS(within) from SS(total). (Data below is for groups with significantly different means, demonstrating a large difference between each group) ![A math equations and numbers Description automatically generated with medium confidence](media/image76.png) - Dataset 1: - Variation between groups is much less than variation within groups - No substantial difference among means - Dataset 2: - Variation between groups is much greater than variation within groups - Big difference between means of each group **F-statistic** - The F-statistic is a ratio of the variance between to the variation within groups A close-up of a group Description automatically generated - SS(within): due to chance alone - SS(between): chance play any experimental effect - The sum of squares is measure of variation; however, sum of squares will be sensitive to the sample size -- SS grows larger as more scores are added to increasing the sample size will increase the SS - We need to correct the sum of squares for the number of groups and the number of subjects **Degrees of freedom (df)** - Is the number of independent scores - There are two df for a one-way ANOVA - You can think of df as a way to account for both the number of groups and the size of the sample in the calculation of the F statistic ![A screenshot of a number of participants Description automatically generated](media/image78.png) - N = the total number of participants in the entire experiment - For the example above: - Df(between) = 3 (because there were three groups/conditions) -- 1 = 2 - Df(within) = 3000 -- 3 (3 groups/conditions) = 2997 **What does the F-ratio mean?** ![A graph with a line and a line pointing to the top Description automatically generated with medium confidence](media/image80.png) - The F-statistic is assessed against the F-distribution to whether it is extreme or not (i.e. less than.05?) - There are 2 degrees of freedom required - One for the within and one for the between groups variance - Df determines the shape of the distribution **Summary** - We can draw conclusions about comparisons of means from sampled data, using null hypothesis significance testing. - The basic idea is that if a sample statistic is extreme and unusual, assuming a null hypothesis, then there is evidence that the null hypothesis may not hold. - More formally, if the probability of the sample statistic (the F-statistic) is less than.05 reject the null hypothesis. **P-values and decisions** - Type I Error (false positive) - Concluding there is an effect when there is none - Alpha or significance level - Type II Error (false negative) - Failing to detect an effect when there is one - Statistical power **Assumption I: Independence of observations** - The value of one observation does not influence or is not influence by another observation - violating this assumption can lead to increased Type I and Type II error - Typically ensured by using careful data collection and random assignment to conditions **Assumption II: Normality** - Data within each group should be approximately normally distributed - ANOVA relies on this assumption to ensure that the sampling distribution of the mean is normally distributed - This is important for small sample sizes - With larger sample size, ANOVA is robust to violations of normality - Variable transformations can be used to improve normality **Assumption III: Homogeneity of variance** - Each group should have approximately the same variance - Unequal variances can affect the validity of the F-statistic and lead to incorrect conclusions - Levene's test of or Barlett's test can be used to assess whether variances are equal across groups - If violated, non-parametric tests that don't rely on this assumption can be used **NOTE LAB REPORT DATA MEETS ALL ASSUMPTIONS (SO NO ISSUES)** **Structure of the ANOVA table** ![A screenshot of a computer Description automatically generated](media/image82.png) A screenshot of a computer Description automatically generated ![A screenshot of a graph Description automatically generated](media/image84.png) A screenshot of a graph Description automatically generated - **Re the note below the table:** there are different methods of partitioning between and within group variance. Type III is the most general and is used as the default in most stats packages. **Post-hoc tests** - ANOVA tells us that there is a significant difference, but it doesn't tell us where - Common post-hoc tests - Tukey's HSD - Bonferroni correction - Both methods compare all of the groups against every other group - They differ in how they compute the significance between groups **Summary** - **For the lab experiment, we will use a one-way ANOVA to compare the 3 conditions from the experiment** - **We will additionally calculate the post-hoc Bonferroni Correction test** **[Week 5: Attention (Historical Origins) ]** **The meaning of 'Attention'** - Brain's ability to self-regulate input from the environment - Used in two sense in psychology - Sustained attention (alertness) - Related to psychological arousal (continuum from drowsy, inattentive to alert, attentive) - Problem of vigilance: performance declines over a long watch (radar operators, quality control inspectors etc.) - Selective attention - Limited in the number of stimuli we can process - Attend to one stimulus at the expense of others - People as limited capacity systems: don't treat all stimuli equally **The Cocktail Party Problem** - Cherry (1953) - How do we follow a conversation in a crowded environment? - Can 'pick out' one conversation from background - 'picking out': processes take sound energy at ear, translate to understanding - Translation is selective (stimuli not all treated equally) - Cherry: what happens to unattended messages? ![](media/image86.png) **Cherry's findings** - Shadow message 1, then ask about contents of message 2 - Unattended channel: - No memory for unattended message - Switch from English to German: not noticed - Switch from male to female: noticed - Reversed speech: 'something queer' - Switch from voice to 400 cps pure tone: noticed **Fate of unattended message** - Conclusion: - Only superficial (physical) features perceived (things distinguishing voice, non-voice, or male, female) - Semantic content not analysed (language, meaning) - Preattentive processes vs focal attention (Neisser, 1967) - Sensory (physical) features processed preattentively - Meaning requires focal attention - Plausible: aware of unattended stimuli only superficially **A criticism of Cherry** - Interested in what's perceived, Cherry looked at what's remembered (was measuring the wrong construct, measuring memory, not attention) - Confounds perception and memory - May be perceived then forgotten? ![](media/image88.png) - Attention acts as a filter to select stimuli for further processing - Meaning extracted in limited capacity channel - Filter precedes channel, protects it from overload - All stimuli stored briefly in short term store (STS) - Raw acoustic trace, decays quickly if not selected ![](media/image90.png) **Conclusion** - Attentional selection based on simple physical features (location in space, voice, etc.) - Extracted preattentively (don't require access to limited capacity channel) - Meaning requires access to limited capacity channel, only extracted if stimulus is attended **[Attention II: The Early vs. Late Selection Debate (1960s)]** **The Failure of Filter Theory** - 'Dear Aunt Jane' experiment (Gray & Wedderburn, 1960) - Split-span experiment with meaningful material - Preferred recall follows semantic context, not presentation ear **Moray (1959)** ![](media/image92.png) - Person's own name often detected on unattended channel - Selection based on meaning not consistent with idea that meaning only extracted on the attended channel **The Early vs. Late Selection Debate** - Disagreement about location and properties of filter **Attenuation Model (Treisman, 1961)** ![](media/image94.png) - Broadbent's filter completely blocks unattended stimuli, Treisman's partly blocks (attenuates) it - Like "turning down the volume" - Filter biased by context, message salience - Highly salient stimuli (name), semantically related material (Dear Aunt Jane) gets through filter, shifts attention **Evidence for Early Selection** - \% correct detections higher on shadowed channel, but not zero on unattended channel - Consistent with filter that attenuates stimuli instead of blocking them **Criticism of Early Selection** - Complexity of filter: Needs to respond to semantic context, distinguish related from unrelated stimuli -- simpler alternative? - Late selection: Differs in where filter is located, after LTM instead of before LTM **Late selection** - ES and LS theories agree recognition needs (a) encoding, (b) access to LTM - LS theory: All stimuli access LTM, not sufficient for awareness - ES theory: LTM activation = conscious awareness - LS theory: Need to pass filter for awareness ![](media/image96.png) **Late Selection (Norman, 1968)** - Bottom-up and top-down selection mechanisms - Bottom-up = stimulus driven - Top-down = selection by 'pertinence (relevance to task) - Need both kinds of activation to get through filter, otherwise decays **Evidence for Late Selection** - Semantic processing on unattended channel - McKay (1973) ![](media/image98.png) - Recognition biased by previous shadowing task **Von Wright, Anderson & Stenman (1975)** - Semantic activation in the absence of attention - Generalised to other words in category **Conclusion** - Filter theory explains simple findings, but can't explain semantic processing of unattended stimuli - Early selection: explains by attenuating filter - Consistent with partial, but reduced processing of semantic targets ('tap') - Late selection: all stimuli activate semantic representations in LTM, but need to be selected by pertinence to get into consciousness - Consistent with indirect measures of semantic processing on the unattended channel **[Week 6: Attention III -- Structural and Capacity Theories]** **Early vs Late Selection Debate (1960s)** - Developments from Filter Theory (Broadbent, 1958) - Attention viewed as a selective filter to protect a limited-capacity system from overload - Argument about what gets through the filter and where the filter is located **Summary of evidence** - Early selection theory - Filter before LTM - Evidence: Reduced detection accuracy on unattended channel (Treisman & Geffen, 1967) - Late selection theory - Filter after entry to LTM - Evidence: semantic activation on unattended channel (McKay, 1973; Von Wright, Anderson & Stenman, 1975) **Early Selection Reply** - Late selection - Semantic activation on unattended channel shown by indirect means - Early selection - Doesn't deny weak activation of semantic material on unattended channel. Indirect measures don't show it occurs to the same degree. - ES: Weak semantic activation on unattended channel; LS: brief semantic activation on unattended channel. - Shadowing tasks investigate attentional filtering (try to exclude distracting material). Can study divided attention instead. How well can we distribute attention across multiple channels? **Cost of Divided Attention (Moray, 1970)** ![](media/image100.png) **Implications of Moray Study** ![](media/image102.png) **Structural and Capacity Theories (the 70's view)** - Two ways in which attention can limit performance: - Structural (bottleneck) theories - Some neural structures can only deal with one stimulus at a time - Competition produces processing 'bottleneck' (filter theory) - (ES: bottleneck getting into LTM, LS: bottleneck getting out) - Capacity (Resource) Theories: - Information processing is mental work - Work requires activation of neural structure - Limited capacity to activate structure (glucose) **Capacity Theory (Kahneman, 1973)** ![](media/image104.png) - Reduction of capacity produces deficit in divided attention tasks - Differs from structural theories because capacity can be allocated *Flexibly* to simultaneous tasks **Inferring effects of divided attention** - Strayer and Johnston (2001): Talking on mobile phones interferes with driving (sharing capacity reduces accuracy and increases RT) **Dual Task Performance (Li et al., 2002)** ![](media/image106.png) - Attention demanding 'central' task (letters same or different?) - Easy or hard peripheral task (animal present? 'Phase' of disk?) - Difficult task much more affected by central load **Capacity Theory Explains "Inattentional Blindness"** - Cartwright-Finch & Lavie (2007) -- which arm of flashed cross is longer? - Clearly visible square not detected - Demanding central task uses all available capacity **Study Capacity by Dual Task Trade-Offs** - Attention operating characteristics (AOC) - Vary proportion of attention allocated to two tasks in dual task paradigm - 'Graceful degradation' of performance as available capacity is reduced - Shape of trade-off curve tells us about capacity demands of tasks ![](media/image108.png) **Auditory and visual dual tasks (Bonnel & Hafter, 1998)** - Easy auditory and visual tasks (**[detecting]** spot of light or tone) -- triangles - Difficult auditory and visual tasks (**[discriminating]** increases from decreases in intensity of spot or tone) - Difficult task trades off, easy one doesn't - Different capacity demands **Pros and cons of capacity theory (with Hindsight)** - Value of capacity theory is new experiments it led to - Emphasises divided attention, flexibility of attentional control - Shortcoming is its vagueness (can always come up with a capacity explanation) - Can make capacity theories mathematically precise using decision-making theories **[Attention IV: The control of visual attention]** **Attentional orienting (1980s)** - First work on attention looked at auditory system (stereo tape recorders developed before visual displays) - Also problem of eye movements: - Natural environment: movement in peripheral vision produces saccadic eye movement, greater visual acuity in foveal vision - 'Covert' attention -- movement independent of eye movements - Attention shifts precede eye movements and can occur without them - Shifts of attention are call attentional orienting **Poser: The 'spotlight of attention'** - Shifts of attention likened to moving spotlight - Selective enhancement for stimuli 'illuminated by the beam' - Expresses selective, limited-capacity idea in spatial terms ![](media/image110.png) - 'Spatial Cuing Paradigm' (Posner) - Attract attention to A, present stimulus at A or B, compare performance **Spatial Curing Paradigm** ![](media/image112.png) ![](media/image114.png) ![](media/image116.png) **Attentional Costs and Benefits (Posner, Nissen, & Ogden, 1978)** - Benefits: Faster RT with valid cue - Costs: Slower RT with invalid cue - Very flexible: can be used with RT or accuracy, and to compare all kinds of stimuli **Causes of Cuing Effects** - Costs and benefits can be due to: - Switching time - Time to move the spotlight - Costs of disengaging from wrong location, benefit from engaging at correct location before stimulus - Unequal capacity allocation - RT depends on capacity allocated to location - Neutral: capacity spread across locations; focused: capacity concentrated on one location - Hard to test between these alternatives **Attentional Orienting** - Natural environment: shifts in attention can be top-down (decide to shift attention) or bottom-up (something captures attention) - Need both kinds of systems to function - Clinical patients show deficits of both kinds: failure to focus attention, failure to disengage attention **How many orienting systems?** ![](media/image118.png) - Endogenous = central (symbolic) cue -- cognitive (need to interpret) - Exogenous = Peripheral (spatial cue) -- spatial (no need to interpret) **Evidence for Separate Orienting Systems (I)** - Different time course of central and peripheral cuing - Peripheral effect peaks rapidly, central effect peaks slowly **Evidence for Separate Orienting Systems (II)** - Different effects of load (Jonides, 1981) - Voluntary orienting slowed by memory load; reflexive orienting is not - Consistent with different capacity demands of two systems ![](media/image120.png) **Evidence for Separating Orienting Systems (III)** - Inhibition of return: - Found only with peripheral cues, not with central cues **What's the purpose of inhibition of return?** - Ecological argument: allows efficient search of complex environment - Prevents repeated search of same location. Don't need to maintain a 'mental map' of locations that have been searched **Attentional Orienting Experiments** - Multiple sources of evidence for two systems: - Effects of SOA and cue type: Reflexive system is faster, more transient; voluntary system is slower, more sustained - Affected differently by load: suggests voluntary system is under more cognitive control - Reflexive shows inhibition of return, voluntary doesn't. Suggests reflexive controlled by different processes - Combination of bottom-up and top-down control - Need to be able to focus attention, exclude irrelevant stimuli; also to respond to unexpected threats **[Week 7 -- Attention V.:] Attention in space and time (visual search and the effects of distractors** **The psychological function of spatial attention** - To assign limited-capacity processing resources to relevant stimuli in environment - Must locate stimuli among distractors and process (identify) them **Visual search** - Control complexity of search by varying the number of items ![A diagram of a letter Description automatically generated with medium confidence](media/image122.png) - Measure mean response time (RT) as a function of display size - Early experiments used stimuli like letters because they were easy to program - Easy to quantify similarity as number of features in common - Some search tasks are easy (cheetahs in a greenfield - Some search tasks are hard (bird camouflaged into its surroundings, with a baby chick) - Some search targets seem to 'pop out' from the background, others require attention **Pop-out effects in the laboratory** - Pop-out targets show little or no change in search times (RT) with set size - Non pop-out targets show large changes in search times with set size **Pop-out effects with simple features** - Unique colours and unique orientations both pop out ![A green and blue rectangles Description automatically generated](media/image124.png) **Parallel search for feature targets** - Mean RT doesn't increase with display size - Yes (target present) and no (target absent) trials take the same time - Compare contents of each display location with mental representation of target at the same time -- reject distractors and locate target - Parallel search A blue rectangular object with black lines Description automatically generated **Conjunction targets do not pop out** - Target defined by combination of colour and orientation - RT increases linearly with display size - Slope twice as steep for target absent as target present trials ![A diagram of a red and green bar Description automatically generated](media/image126.png) **Evidence for serial search** - Seem to need to focus attention on target to detect it -- focus attention on each item in turn - Constant scanning rate predicts linear RT/display size function **Self-terminating serial search** - Stop when target is found - On average, search half the display on target-present trials, all of the display on target absent trials - Constant scanning rate predicts 2:1 slope ratio **Feature integration theory (Treisman & Gelade, 1980)** - Role of attention is to bind features into perceptual compounds - Each feature (lines, colours, etc.) registered in its own feature map - Without attention features are free-floating, may lead to illusory conjunctions A diagram of a map Description automatically generated - Conjunction targets require feature binding, so need focused attention -- leads to serial search - Feature targets don't require feature binding, don't need focused attention -- leads to parallel search - Pop out sometimes depends on complex object properties, not just simple features - High-level, not low level properties predict pop out - Inconsistent with idea that pop out only occurs at level of simple features ![](media/image129.png)A black and white image of a rectangular object Description automatically generated **There are other patterns of search RTs** - Many tasks show intermediate pattern, no clear evidence of either serial or parallel search ![A graph of a set size Description automatically generated with medium confidence](media/image130.png)A graph of a set size and a line of dots Description automatically generated with medium confidence **Wolfe (1998) -- what can 1 million trials tell us about visual search?** - Search slopes with 2,500 experimental sessions - Slopes are very variable - No evidence of dichotomous population of search slopes, parallel and serial functions look like ends of continuum - Wolfe: better described as inefficient or efficient search ![](media/image132.jpeg) **Guided search theory (Wolfe, Cave, Franzel, 1989)** - Two stage theory - Initial parallel stage provides a candidate list of possible targets - Second serial stage checks candidate list for targets - Search efficiency depends on similarity of target and distractors - Similar targets and distractors lead to large candidate list and inefficient search - Dissimilar targets and distractors lead to small candidate list and efficient search - Predicts a range of search slopes A diagram of a data processing process Description automatically generated **Guided search 2.0 (Wolfe, 1994)** - In guided search (1.0) search controlled by similarity between targets and distractors - Guided search 2.0 search controlled by a priority map -- depends on the salience of stimuli (bright, distinctive, unique) ![A diagram of a process Description automatically generated](media/image134.png) **Guided search 6.0 (Wolfe, 2021)** - Latest versions of guided search investigated rules and mechanisms of attention guidance - Scene guidance: real world images rather than isolated elements -- "meaning maps", not just salience (toothbrush in bathroom scene etc) **[Attention VI.] Automaticity and failures of focused attention** **Failures of focused attention** - Visual search looks at costs of divided (distributed) attention: performance decline with increasing display size is evidence of capacity limitations - Some situations where there is a benefit not to divide attention: avoid processing distractor stimuli - Limitations of focused attention and involuntary processing of irrelevant stimuli **The Stroop Effect (Stroop, 1935) -- The first demonstration of a focused attention failure** - Name the colour of ink in which the word is written, measure RT - Fast with compatible (top), intermediate with neutral (middle), slow with incompatible (bottom) A black background with blue text Description automatically generated **Why does focused attention fail? Involuntary (automatic) processing of irrelevant attributes** - Parallel processing of colour naming and word reading - Word reading is fast and involuntary -- you can't not read the word - Word name available before colour name, creates output interference - Asymmetrical: no interference of ink colour on word naming ![A diagram of a color scheme Description automatically generated](media/image136.jpeg) **Automaticity** - Stroop effect: - Word reading: fast and automatic - Colour naming: slow and controlled - What makes a process automatic? Learned S-R associations - Criteria for automaticity: fast parallel, effortless, doesn't require capacity - Automaticity basis for skill acquisition (reading, driving, playing a musical instrument) - Allport, Antonis, Reynolds (1972) -- skilled pianists could perform dichotic-listening/shadowing task while sight-reading music **Controlled and automatic processing (Shiffrin & Schneider, 1977)** - Search for digit targets in arrays of distractor letters in rapid sequences (or vice versa) - Vary size of target (memory) set: 1-4 items - Vary size of stimulus displays: 1-4 items - Consistent mapping (CM): target and distractor sets were distinct - Varied mapping (VM): targets on some trials and were distractors on others - Performance under CM became automatic with practice (\>90%) - Became independent of memory set and display size - Subjectivity effortless, spontaneous pop-out targets from text - Under VM, never became automatic - Requires consistency of target set membership - Consistent with capacity-free, effortless encoding account - Consistent with structure practice approach to skill development **Failures of focused attention in spatial attention -- The Eriksen Flanker task** - Is the central character a \> or a \< - Compared to neutral baseline compatible has shorter RT, higher accuracy, incompatible has longer RT, lower accuracy - Involuntary processing of flankers even when attempting to ignore them - Also find with letters E flanked with F's, or F flanked with E's ![A black rectangular with yellow symbols Description automatically generated with medium confidence](media/image138.png) **Zoom Lens Model (Eriksen & St James, 1986)** - Use large spotlight to locate central target (zoom out) - Use a narrow spotlight to identify target (zoom in) - Does the identification process start before zooming in occurs? - Gradual narrowing of spotlight while target is being identified - Flankers have large effect early on, decrease as time progresses A close-up of a circle with text Description automatically generated **Attention in time: The attentional blink** - Rapid serial visual presentation (RSVP) task - 100ms exposure per item, each item masked by following item - Two targets: report the white letter, detect whether there was an X present ![A diagram of a diagram Description automatically generated](media/image140.png) **The attentional blink** - Plot second target (T2) performance as function of the time (lag) since the first target (T1) - T2 performance declines and then recovers -- 'attentional blink' A graph and diagram of a graph Description automatically generated - Only find it if T1 is processed, if T1 ignored no AB -- depends on T1 processing - Worst performance not immediately after T1 but some time later (Lag 1 sparing) - Effect takes time to build up, but relatively long lasting (up to 600 ms) - Why? Nontarget rejection is fast, decision process that identifies targets is slow -- closes the 'attentional gate' makes you unaware of other targets - Delayed start to decision process (opening the gate) gives Lag 1 sparing **Where is the attentional bottleneck then?** - Wolfe (2021) -- guided search 6 -- attentional bottleneck created by selection into visual working memory - Visual working memory has capacity of 4-5 items, but items enter it serially at rate 1 item per 50-60 ms - Decision process ("asynchronous diffuser") able to make decision about 4-5 items at a time, then have to shift attention ![A diagram of a memory Description automatically generated](media/image142.png) **Your Zen Koan** - (question by which you attain enlightenment or go mad trying) - Guided Search 6 says you can make decisions about 4-5 stimuli in parallel if they are all in visual working memory - Other experiments suggest you can only make one decision at a time -- attentional blink - Can these be reconciled? **[Week 8: Object-based attention and the cognitive neuropsychology of attention]** **What does attention act upon?** - Spotlight theory, FIT etc. assume attention acts on a region of *SPACE* -- enhances processing in that region - Alternative: attention acts on objects in space, not space itself: *OBJECT-BASED THEORIES* **Rock and Gutman (1981)** - Overlapping figures: attend to one and rate aesthetic appeal; ignore other - Memory test: good memory for attended figure, none for unattended figure (cf. Cherry, 1953) - Objects occupy same region of space - Maybe the object of attention is the object, not the space it occupies? A red and green line on a black background Description automatically generated **What happens to the unattended shape?** - Maybe it's not perceived or not fully perceived? - Maybe people quickly forget the stimulus they're not attending to? -- inattentional amnesia - (cf. early vs late selection) - Pairs of red-green figures: trumpet-kite, anchor-trumpet ![A drawing of a trumpet Description automatically generated](media/image144.jpeg) **Negative priming** - Ignore green, name red (e.g. ignore trumpet, name kite) - What happens when trumpet must be named? - RT to name trumpet is slower if ignored on previous trial - 'Negative priming' (regular priming produces speed up) - Means ignored shape must have been perceived to produce effect on subsequent trial (cf. late selection) **Implications of Rock and Guttman, Negative Priming** - Possible to attend to one object and ignore another when both occupy same region of space -- How? - Maybe attention operates on the object, not the space... **Evidence for object-based attention** - Duncan (1984): stimuli differing on four attributes: box size, gap side, line slant, dotted or dashed line - Flash briefly, ask to report two of the attributes (e.g. line slant, gap side) A black background with white lines Description automatically generated - More accurate if the two attributes belonged to the same object than different objects - Same: box size and gap side or line slant and line style (dotted/dashed) - Different: box size and line slant etc. - Stimuli occupy same region of space - Evidence that attention operates on whole objects? **Cuing object-based attention (Egly, Driver, and Rafal, 1994)** - Miscued locations in the same object or different object -- same distance from cued location - Space-based theories say miscuing costs should be the same ![A group of grey rectangular objects Description automatically generated with medium confidence](media/image146.png) - Same object advantage: Mean RTs faster to miscued stimuli if in same object - Evidence that cuing effect spreads to encompass cued objects **Effects of an Occluding Bar (Moore, Yantis & Vaughan, 1998)** - Occluding bar in stereo space: still find same object advantage - Not related to crossing edges or boundaries, agrees with percept of continuous object A black and white image of a rectangular object Description automatically generated **Neuroimaging evidence of Object-based attention** - Selective fMRI activation when viewing houses and faces - Fusiform face area -- active when viewing faces - Parahippocampal place area -- active when viewing houses - Superimpose: attend to face or house - When attending to face: FFA up, PPA down, when attending to house, PPA up, FFA down **Conclusion** - Evidence that attention selects objects in space -- possibly by enhancing representation of selected object, suppression of other object - Attention to part of an object benefits other parts **[The Cogntiive Neuropsychology of attention]** **Visual neglect** - Control of attention involves balanced top-down and bottom-up systems - Reflexive system orients to new stimuli, voluntary system provides sustained attentional focus - Failure to focus and failure to disengage and reorient both found in clinical cases - Damage to right parietal lobe **Attention and visual pathways** - Two pathways for processing visual information - Ventral pathway, temporal lobe: form, colour -- the what pathway - Dorsal pathway, parietal lobe: direction of motion, spatial location -- the where pathway - Parietal lobe damage disrupts the 'where' pathway ![](media/image148.jpeg) **Neuropsychology of neglect** - Deficit in processing spatial information - Not blind, but difficulty in making left side of space accessible to conscious awareness - Right parietal lobe damage leads to left visual field neglect A drawing of a house and a clock Description automatically generated - Cancellation test - Behavioural manifestation: failure to dress left side of body, shave left side of face etc. ![A black and white drawing of a cross Description automatically generated with medium confidence](media/image150.jpeg) **Curing deficits with right parietal damage (Posner)** - Compare intact and damaged hemispheres, use intact hemisphere as control - Posner: normal attention involves engagement, disengagement, and shift (reorienting) of attention - Ability to voluntarily engage attention not impaired; difficulties in disengaging and shifting in response to new information **Symptoms of neglect: Extinction** - Simultaneous identification of two stimuli - Unimpaired with only one stimuli - Left visual field deficit with two simultaneous stimuli - Perceptual response to one stimulus 'extinguishes' response to the other ![A diagram of apples and a comb Description automatically generated](media/image152.jpeg) **Why does extinction occur?** - Cf. Moray (1970): Normals poor in identifying two weak, simultaneous signals - Late selection theory: Only one signal can get through filter to consciousness at a time - Extinction: Two competing perceptual representations can't co-exist - Recognition, identification require activation of neural structures - Damaged hemisphere chronically underactive, stimuli don't provide activation they should - Effects strongest with activity in other hemisphere (invalid cue, competing stimulus) **Balint's Syndrome (Patient RM)** - Bilateral lesions in parietal and/or occipital cortex - Inability to focus on individual objects and to see more than one object at a time (simultanagnosia) -- prone to illusory conjunctions - Occurs even when objects overlap (object based) A black rectangle with white lines Description automatically generated **Space-based and object-based attention** - Attention seems mainly associated with 'where' pathway - Spotlight view: movement of attention through space, neglect associated with left or perceptual space - Object-based view: attention keeps track of objects ("can ignore", "shouldn't ignore") - Inhibition of return: cued spatial location tagged as uninteresting, so slower RT there - Tagged associated with objects, not just the *space* they occupy **Object-based inhibition of return (Tipper, 1991)** - Standard IOR: peripheral cue, wait long SOA, flash target, slower RT at cued location - Object-based IOR: peripheral cue, then rotate display ![](media/image154.png) - Markers move to new locations (visible on screen) A diagram of a circular object Description automatically generated with medium confidence - Measure RT to target in previously cued or miscued marker circle - Find slower RT at previously cued marker - Inhibition of return tracks cued marker to new location - IOR follows the cued object, not confined to one region of space ![](media/image156.png) **Object-based neglect (Behrmann & Tipper, 1994)** - Neglect: left visual field deficit with right parietal damage - Neglect of space, or neglect of left side of object? - Barbell stimulus: two location markers + connector, combine into one perceptual object - Longer detection RT on left - Display: present barbell, rotate 180 degrees, present target to be detected - Longer RTs on right - Neglect tracks marker to opposite visual field - Neglect of left side of objects, not just left side of space - Allows space-based and object-based effects to be distinguished ![A diagram of a diagram Description automatically generated](media/image158.png) **Attention: The Take-Home Message** - Haven't answered all the questions about attention - Shown how cognitive psychology thinks about attention - Main trends in attention from 1950s to the present - Important interplay between theory and experiments: experimental findings suggest new theories to explain them, theories suggest new experiments to test them **[Week 9: Basics of Memory: Causes of forgetting ]** **In the beginning** - Ebbinghaus's experimental findings - Spacing effects: repetitions more effective if you space them out over time rather than mass them consecutively - Presenting A-B-C-A-B-C-A-B-C is more effective for learning than A-A-A-B-B-B-C-C-C - Practicing piano one hour each day is better than seven hours on one day - List length effects: worse memory when you study longer lists versus shorter lists - Forgetting curve: the shape of how memory declines over time **Ebbinghaus's contributions** - Ebbinghaus established the experimental tradition of memory - Presenting randomised lists of 'non-meaningful' stimuli to participants under controlled conditions - Control for exposure duration, retention interval -- can more precisely understand what leads to better/worse memory - Various reasons why something is not remembered -- lack of attention to the stimulus when learning, inability to locate the memory at retrieval - Many phenomena he discovered have replicated and are still of focus today **[The laws of memory]** **What is a law?** - Invariance or regularity in data - They do exist in psychology - The law of practice -- performance improves as a logarithmic function of practice - The Yerkes-Dodson law -- performance is an inverted U shaped function of arousal - Theories and laws are not the same thing -- a theory provides an explanatory framework for phenomena in the natural world, a law states a regularity in data or the natural world **The laws of memory** - Law of recency - Recent information is almost always better remembered than older information - The recency function is non-linear: older information decays at a slower rate than newer information ![](media/image160.png) - Law of primacy - Better memory for items that were start of sequence - Most commonly observed in lists, but can also be observed during event changes - The law of repetition - Memories improve with repetition, but some repetitions help more than others. - Memory improves the most with spaced repetition - Massed repetition is when repetitions occur consecutively (e.g. AAABBBCCC) - Spacing the repetitions out over items almost always results in better memory (e.g. ABC ABC ABC) A graph of a number of pairs Description automatically generated - An implication of the spacing effect - Better to practice piano one hour a day than 7 hours on one day - Better to study a bit each day than cram before the test - Then why do I do well on the test when I cram? Peterson paradox: spaced repetitions perform WORSE than massed repetitions for a short delay, but perform better after a long delay ![](media/image162.png) - Testing improves memory above and beyond re-learning - E.g. if you study and test on a set of items, you see better memory than if you studied the same material twice - Similar to spacing effects, these effects are the strongest after a delay - Sometimes there is a benefit to repeated study at very short delays A graph of a number of classes Description automatically generated with medium confidence - The science of how repetitions benefit memory has been found to replicate with classroom materials -- the benefits of spacing and testing have been used as recommendations to both educators and students to improve memory **Does a \*negative\* occurrence of any of these phenomena refute the law?** - Even in physics, no law is consistently observed in each and every case - Newton's laws (classical mechanics) break down at the quantum level - Some laws are not observed because of some other influences -- the fact that planes or leaves float in the air doesn't violate the law of gravity **[Forgetting]** **How can forgetting occur?** - Forgetting can be due to either encoding failure or retrieval failure - Encoding failure: you never learned what you need to remember in the first place -- e.g. someone tells you that you need to pick up groceries while you are texting someone else - Retrieval failure: you initially learned the material, but you were unable to retrieve it at a later time -- e.g. you might have trouble remembering an old password or an old phone number, but you definitely learned it initially **Four possible theories of retrieval failure** - Decay theory - Interference theory - Consolidation theory - Inhibition theory **Decay theory** - Stored memories fade over time - Brown-Peterson paradigm: subjects learn trigram, count backwards by seven for a period of time - Counting backwards by 7 was to prevent participants from rehearsing the material -- rehearsal can keep memories active and overcome the influence of decay - Almost no memory after \~ 20 seconds ![A graph of a function Description automatically generated](media/image164.png) **Interference theory** - Started with John McGeoch (1932) -- claimed that decay theory is conceptually flawed - Iron rusts over time, yet it is not time itself that degrades it - With memory, forgetting occurs because over time there is more interfering mental activity - Specifically, forgetting occurs because there is competition between things we are attempting to remember -- there is greater competition as we learn more - Why is there competition? - When we learn, we associate things together -- e.g. associating a name to a face - When presented with one of those things in isolation (the cue), we can recall the associate - The more associations there are, the harder it is to remember things - We are most likely to remember the movie that has the strongest association (e.g. X-men First Class) - When there are many more associations, it becomes harder to remember a particular associate -- e.g. a list length effect (Remembering a particular Meryl Streep movie) **Interference and lockdown** - Interference theory can even explain why lockdowns impair memory - We are associating what we learn to the same cues e.g. our house, our surroundings - This makes it harder to remember particular things **What is interference?** - More associations to a cue, the harder it is to retrieve the correct memory - Inability to retrieve new associates can be because of interference from older ones -- this is proactive interference - Inability to retrieve old associates can be because of interference from newer ones -- this is retroactive interference - Ideas about interference were directly lifted from animal learning (e.g. classical and operant conditioning) **Interference in lab studies** - AB-CD paradigm - Two lists with no overlapping associations - During the memory test, subjects presented with a cue (e.g. 'dog' or 'hope') and asked to retrieve the associate ![](media/image166.png) - AB-AC paradigm - Second list shares the same cues as the first - Worse performance on List 2 items compared to the AB-CD condition (this is proactive interference) - List 2 cues can produce either list 2 targets or list 1 targets - Decay theory has difficulty explaining why performance in AB-CD lists is better than AB-AC lists - Interference theory: - Performance on AB-AC lists is worse due to response competition - When the A cue is presented, both B and C compete to be retrieved in AB-AC lists - No competition is present in AB-CD lists **What about decay from short-term memory?** - Re-analysis of all Brown-Peterson results by Underwood (1957) - Huge variation in how much material was forgotten over time - Big predictor of forgetting: number of previous experimental trials - More trials = poorer performance - Underwood's interpretation: subjects suffer form proactive interference from previous trials - Almost no forgetting over 20 seconds in the first trial (Keppel & Underwood, 1962) -- memory quickly gets worse after that A graph of a number of points Description automatically generated - So what's happening in these experiments? - Previously learned stimuli cause interference at retrieval -- learning more consonants makes it harder to retrieve what you just learned - So what if the stimulus type is switched? - Interference theory predicts