Cognitive Psychology Notes PDF

Week 1 – Cognition: Cognition: ® Cognition is an activity of the mind. ® Involves the acquisition and use of knowledge. ® Includes mental processes such as perception, attention, memory, decision-making, reasoning, problem-solving, imagining, planning & executing actions. ® “The mind is a system that creates representations of the world so that we can act within it to achieve our goals” – Goldstein. Levels of processing: ® Computational level: what is the computational problem that the brain is trying to solve? ® Algorithmic level: what are the algorithms that the brain or mind is doing? ® Implementation level: how are these computations realised physically? ® Computational approach: the computational account is based on the classical computational model of mind which states that thinking and reasoning are computations carried out with abstract symbols, according to symbolically represented rules. The symbols represent concepts and relations in the world - they are abstract tokens that bear no necessary resemblance to the thing they represent (e.g., 0s and 1s in a computer programme). The rules specify an input-output relation as in "If X, Then, Y". ® Analogue representations are the opposite of symbolic representations - analogue representations share properties in common with the thing they represent (e.g., mental images). ® Donder’s reaction choice experiment (1868) showed that mental responses (in this case, perceiving the light & deciding which button to push) cannot be measured directly but must be inferred from behaviour. ® Structuralism (founded by Wilhelm Wundt) suggests that our overall experience is determined by combining basic elements of experience called sensations. Newell’s Scale of Human Behaviour: The digital computer metaphor: ® The digital computer is a metaphor for the mind, which we use for thinking, reasoning & problem-solving. ® The mind is the software, which runs programs based on algorithms. Uses the sequential processing of ‘if x, then y’. ® The brain is the biological hardware. ® This approach does permit AI, with the brain being simply one physical system capable of running the program. ® Good for explaining some kinds of human reasoning in well-defined problem spaces, but ignores how symbols acquire meaning. Propositional representations/propositional logic system: ® A symbolic code to express the meaning of concepts, and the relationships between concepts. ® For example, the image of a cat on a rug can be expressed in natural English language as “the cat is on the rug”. The underlying propositional representation of the relationship between concepts can be expressed as a proposition – on(cat, rug); where ‘on’ is the predicate stating the relationship between two semantic arguments. ® A predicate expresses the relationship between elements. An argument expresses the subject/object elements. ® P(x,y) ® Drawbacks of this system include the fact that it doesn’t specify how abstract symbols used (e.g. ‘C’ for Canary) came to be associated with any of their meanings; the fact that it doesn’t specify how operations came to reflect their specific functions; and the fact it relies on prior knowledge for inference & generalisation. Connectionism/Connectionist system: ® Assumes a system of nodes that represent categories/concepts/abstract representations that we want to ascribe features to. ® Uses a system of fully crossed weights connecting the input to the output. ® Input features are multiplied through their various weights and activate each of the outputs according to the strength of those randomly connected weights. ® Weights are updated using the error signal (i.e. feedback) to organize the weights, such that over the course of learning the features of the input (e.g. the Canary being described) gradually become more and more associated exclusively with the output (e.g. a ‘C’ symbol for Canary). In this way, the connectionist model learns the representation for each collection of features. Perceptrons: ® The idea that there is a series of input layers (representing stimulation in the environment) which are connected to output layers through a series of weights. Whenever the input layers are turned on, they are multiplied by the weights and are then integrated to form an output representation. ® This approach is different from propositional logic, because knowledge is built from the bottom up – it is contained in the connections & associations between inputs. ® However, perceptrons cannot solve non-linearly separable problems, indicating that it is not a very accurate model of human thought. Neural networks: ® Can learn some problems where outputs cross over in a non-linearly separable way. ® The problem of learning a non-linearly separable problem was largely solved by adding more layers (e.g. a hidden layer) to the neural network. ® However, models of this type still had problems demonstrating human-like performance. Deep networks: ® Deep learning models/neural network models can easily classify images and enable language translation in many respects. ® The primary difference between the perceptrons and the deep learning networks is that a deep learning network has many, many layers (often hidden) through which information is fed and is trained on very large data sets (and then tested on smaller data sets). ® Models like this have been able to learn how to play video games, and have had success performing tasks once considered to only be within the domain of humans. Modern AI is the application of deep learning networks like this which have many layers. ® These models no longer rely on a specification of features (e.g. colour, size). Instead, these models rely on the ability to be presented with a pixelized image as input, which is directly input into the network and passes through a series of layers which represent some similarity to the human visual context. This decomposes the image into features based on combinations of features, and then these features can be directly associated with outputs (e.g. the term ‘Canary’). ® Representations made using deep networks can be embedded into an even larger network (e.g. ‘The canary next to the doctor is singing’). Analogue representations - mental imagery & mental representation: Shepard & Metzler (1971): ® Experiment required participants to indicate whether pairs of objects depicted were identical (except for rotation) or mirror images. ® It was hypothesised that people would perform this task by forming a three- dimensional mental image of one of the depicted objects, and then mentally rotating this image using their imagination to see whether it could be brought into correspondence with the other picture. ® The results of this experiment supported this hypothesis, because it was found that the time taken to confirm that the two objects were identical was a linear function of how much the objects were actually rotated. The more rotation the object in the experiment required, the longer it took to confirm the objects were rotations of one another rather than mirror images. This was true regardless whether participants were looking at picture plane pairs or depth pairs. ® This suggests that the mind simulates the physical rotation of objects to solve problems, and the time it takes to do this mental rotation exactly reflects the time taken to make a decision. ® The data provides compelling evidence that at least some of our cognitive processes are carried out using analogue representations, rather than abstract symbols. ® Mental images are analogous to what they represent. We manipulate mental images in a manner analogous to the way in which we might physically manipulate a real object. ® This refutes the idea that the mind is a ‘digital computer’ because the mind is acting on a simulated analogue representation (not a digital, symbolic or abstract representation). Carpenter (1976, 1978) (Larsen, 2014): ® Also tracked subjects’ eye movements whilst they did the mental rotation task, and argued that the linear increase in reaction time arose NOT from a simulated mental rotation of the image, but from needing to make more eye movements between the two pictures in order to compare their features. ® The more the images were rotated relative to one another (angular difference in orientation), the more eye movements were needed to be made in order to match the features between the pictures. Therefore, this suggests that we shouldn’t be too hasty in drawing conclusions about symbolic or abstract cognitive representations. Production Systems Model: ACT-R model (Anderson, 1990): ® Stands for Adaptive Control of Thought – Rational model ® Built on previous models (ACT, ACT*) ® A productions systems framework that acts like its own functional programming language ® A complete architecture model of cognition ® Theories of specific tasks can be built into the ACT-R framework ® Using the productions system model, mental rotation takes place by decomposing the figure into overlapping parts, and then rotating these parts via production rules to confirm or disconfirm alignment. ® Because the complexity of an object is correlated with the number of sub-parts (more complex objects require more sub-parts to consider), this model can explain why eye-tracking data also correlates with response time, because of the fixation on the number of parts. ® According to this model, people encode the rotated image, then store it in their working memory, then encode the target image, and then while maintaining visual attention on the target image execute a process which incrementally rotates the image towards the target image by a constant amount. After each rotation step, the amount of disparity between the two images is reviewed to determine whether they are sufficiently close enough to stop the process. ® The implementation of the model also predicts this increase in response time as a function of the difference in rotation between the two objects. Examples of approaches to cognitive questions: Herman Ebbinghaus & his nonsense (1885): ® Ebbinghaus was interested in determining the nature of memory & forgetting – specifically how information that was learnt was lost over time. ® He presented nonsense syllables to himself (e.g. DAX), and attempted to learn these syllables whilst noting the number of trials taken to do so. He then repeated this procedure using time delays of up to 31 days. ® He calculated a ‘percent savings’, indicating the percentage of information retained from the initial learning process. ® Ebbinghaus found that the % of savings was greater for short intervals compared to longer intervals. ® The results from his experiment indicate that memory drops off fairly rapidly for the first two days after initial learning, but then tends to plateau. ® This curve demonstrated that something as ineffable as memory can be quantified. Behaviourism (1930’s–1950’s): ® Interested in behaviour itself, and not what goes on in the mind underlying that behaviour. ® Best exemplified by Skinner’s operant conditioning experiments, which found that variable reinforcement schedules lead to robust responses to learning. Challenge’s to Behaviourism: ® Edward Tollman (1930’s): demonstrated that rats use a ‘cognitive map’ (a mental representation to orient themselves) when searching for a food reward within a maze – as opposed to merely learning a behaviour or a set of motor commands. ® Noam Chomsky (1958) published a critique of verbal behaviour – arguing that language cannot be acquired solely by reinforcement, and that there must be some innate structure or genetic component that helps humans to acquire language. Chomsky said that the speed with which children acquire language is too rapid and too remarkable for it to be explained solely on the basis of reinforcement. This criticism kickstarted the cognitive revolution, with an understanding that internal mental representations lie between the world and the behaviour that we observe. Computer metaphors (1950’s): ® Researchers were interested in describing how well the mind could deal with incoming information. When a number of auditory messages are presented at once (e.g. at a noisy party), can a person focus on just one of these messages? ® In an experiment by Colin Cherry (1953), participants were presented with messages simultaneously, and were told to focus their attention on one of the messages (the attended message) and to ignore the other message (the unattended message). The results of this experiment showed that people could focus their attention on the message presented to one ear whilst still picking up on some of the ‘unattended message’ ® This led Donald Broadbent to design the first flow diagram of the mind, showing what happens in a person’s mind when they focus their attention to a single stimulus in their surroundings. The flow diagram inspired the idea that the attentional system acts as an information-processing device. The ‘input’ is the sounds entering the person’s ears, the ‘filter’ lets through the part of the input to which the person is attending, the ‘detector’ records the information that gets let through the filter. Attention: Research in cognitive psychology has identified may different types of attention: ® Covert attention - looking out the side of the eye. ® Overt attention – moving your eyes to look at something. ® Divided attention – attending to more than one thing at once. ® Selective attention – focusing on a single object. Selective attention not only highlights whatever is being attended, but also keeps us from perceiving what isn’t being attended (filtering out other things in the same manner as indicated in Broadbent’s filtering theory). Categorisation: ® Category types vary in the complexity of the rule required to learn them (type I categories being the easiest to learn, through to type VI being the hardest). ® However, categorization cannot be predicted from identification – performance in categorisation is much better than explained just by looking at the confusability of the items. This could be explained by assuming that people change how they attend to the features of the stimulus – learning the name of an object so that you don’t get it confused with another object requires attending to all of its features. Learning the category an object may only require attending to some of its features. ® Modern models of categorization build in a mechanism that allows for selective attention to different dimensions depending on what is relevant and what is important for making your categorization decision. ® Modern models of categorization link together the actual process of making the category decision, the process of comparing objects to objects in memory, and selective attention all within the same framework. ® The mapping hypothesis states that you should be able to predict categorisation errors by summing up the confusions from individual stimuli. Week 2 – Similarity & Analogy: Similarity: ® Similarity describes a sense of sameness. ® Similarity can be measured using confusability. Items that are similar are more likely to be confused with one another when making an identification decision. ® We can examine values in the confusion matrix to learn about what participants were attending to. ® Shared perceptual features evoke a sense of similarity. ® Shared goals also evoke a sense of similarity (even amongst dissimilar objects). ® Similarity can be based solely on conceptual objects (e.g. the word ‘love’ and a heart). ® Similarity can be based on common structure. Why study similarity? ® Learning – the transfer of learning depends on the similarity of the new situation to the original learning situation (Osgood, 1949). ® Gestalt perception – similar things tend to be perceptually grouped. ® Memory – the likelihood of remembering depends on the similarity of the retrieval context to the original encoding context (Roedigger, 1990). ® Generalization – e.g. beliefs about ostriches given the properties of robins, depends on the similarity between ostriches and robins (Osherson et al., 1990). ® Categorisation – the likelihood of assigning a category label to some instance depends on its similarity to other members of the category. ® Eyewitness identification: the similarity of foils to suspects in a police line-up can affect the proportion of correct identifications & false alarms (Lindsay & Wells, 1980). Analogical transfer: E.g. Ducker’s radiation problem This process involves three steps: ® Noticing – noticing that there is an analogous relationship between the source story and the target problem (e.g. noticing the relationship between the fortress story & the radiation problem). Thought to be the most difficult of the 3 steps to achieve. Research has shown that the most effective stories are those that are most similar to the target problem in the surface details. ® Mapping – finding the correspondence between the source story and the target problem (e.g. mapping the corresponding parts from the fortress story onto the radiation problem – recognizing that the fortress is the tumour and that the soldiers are the rays which are to be directed at the tumour). ® Applying – applying the mapping to generate a parallel solution to the target problem (e.g. generalizing from the many small groups of soldiers approaching the fortress to the idea of using many weaker ways to approach the tumour from different directions). How do we measure similarity? ® Rating scales (Likert scales): e.g. on a scale from 1-8, how similar face 1 to face 2? ® Confusability: how often do we confuse object A with object B? ® Response time: how quickly can you tell the difference between objects? More similar = slow response time, less similar = fast response time. ® Forced choice: presenting multiple pairs (e.g. asking is ‘X’ more like A or B?) ® Stimulus arrangement: where you give people images, and ask them to arrange the them so that the distance between the images reflects the degree of similarity between them. § Only stimulus arrangement is efficient when dealing with large numbers of objects. Theories of Similarity: ® Geometric models: similarity as captured by the distance between objects. Similar things are placed close together; dissimilar things are placed far apart. Objects are represented in a ‘psychological space’ in which perceived similarity is represented by distance. The psychological space is imbued within a set of meaningful dimensions (e.g. age and adiposity) that organise the stimuli. Examples include the visible light spectrum, mapping emotions on quadrants in terms of potency and hedonic valence. Geometric models must obey 3 axioms: Minimality – the distance between an item and itself should equal 0. Symmetry – the distance from X to Y should equal the distance from Y to X. The triangle inequality - Let d(a,b) equal the distance from point a to point b, then the triangle inequality refers to the fact that d(x,y) Man-made artefacts > Ad hoc categories > Abstract schema § Rosch also showed that there is good agreement between people about whether category members are typical or atypical. ® The more typical an item is, the higher the number of attributes that it shares with other objects. ® A highly typical items contains the features shared by many different objects within that category. ® Typicality affects how quickly examples are endorsed as belonging to a category. Typical category members are verified more rapidly. ® Typical instances are learned faster. Typical instances achieve a lower number or errors and a faster response time after a fixed amount of learning than non-typical members. ® Priming results in faster reaction time for more typical category members. For example, when hearing the term ‘green’ and then having to verify whether a colour is a good example of that, then you are faster to respond to more prototypical examples of green than less typical examples of green. ® Typicality also affects generalization (Dunsmoor & Murphy, 2015). If you learn information about a typical exemplar (e.g. a horse), that can lead you to conclude that this information also applies to other members from the same category (e.g. dolphins). ® Learning generalizes more readily when the instances that are learned are typical of the category. ® If you learn information about an atypical category member (e.g. a bat), then that information does not translate very readily to any other category members. The reason for this is that an atypical category member is likely to contain idiosyncratic properties that don’t generalize to the rest of the category. Rule-learning studies: § Bruner, Goodnow & Austin (1956) sought to examine how people test hypothesis in a non-scientific context in order to form conceptual representations. § In experiments of this type, the number of attributes comprising the target category can be varied, and the way in which attributes were combined could also be manipulated (AND, NOT, OR). Learning logical concepts: ® As the complexity of the category increases, the proportion correct after spending a period of time examining the different category allocation decreases. Hypothesis testing strategies: ® Simultaneous scanning: where the participant considers all of the attributes at once; the category never contains attributes that do not belong to the target (e.g. ‘the target is three green shaded squiggles’). ® Successive scanning: where the participant considers only one attribute, until that attribute gets disconfirmed. This is not demanding, but is inefficient (because it takes a lot longer to work your way through all of the different attributes). ® Conservative focusing: where the participant changes only one attribute on each trial. If the feedback is ‘no’, then it can be assumed that the attribute is not part of the category For example – (Trial 1) 3 red shaded diamonds, (Trial 2) 3 green shaded diamonds). ® Focus gambling: where the participant changes all but one attribute. If the feedback is ‘yes’, then that attribute can be identified as the target-defining attribute. However, the probability of receiving affirmative feedback using this strategy is very low. Typicality & generalisation: ® Category induction tasks illustrate how typicality affects generalization. ® Generalization is much greater for typical category members (Osherson et al., 1990). ® Generalizability is dependent upon the conclusion category size. A smaller conclusion category size leads to greater generalizability, presumably due to the size of the generalization required. ® Generalizability is greater when premise examples are more variable. The less similar the premises are amongst themselves (i.e. are more variable), the more they confirm the general conclusion. In the example below, the generalizability for the hippopotamuses/rhinoceroses combination is much lower because our examples tend to fall in the same region of category space, and therefore don’t allow for such a strong conclusion to be made to the entire category. Theories of categorisation: Prototype model: ® Assumes that we represent categories through abstractions of members from that category. ® This allows us to explain typicality effects by allowing individual category members to be compared to a ‘prototypical’ category member. ® The prototype is similar to members of its category, and is highly dissimilar to members of other categories. ® The prototype of a category is often termed the ‘central tendency’ of the category to represent the fact that it is capturing this average. The prototype has a similarity = 1, low distortion a similarity of 1 > Sim > x, and high distortion a similarity of 1 > Sim = X. ® In prototype theory, we don’t have access to individual category members – we only have access to some average or idealized representation. From that idealized representation, we can compute the distance between the probe and each of the category prototypes, convert that distance to similarity, and then make a category decision based on whichever category is most similar to the probe. ® On dot pattern tasks, the typicality falls out because the low distortion pattern is more similar to the prototype than the high distortion pattern. ® The prototype model can explain effects including the fact that more typical instances are retrieved quicker, are learned faster, and are primed more strongly than less typical instances. Criticisms of this model include: ® Prototype models cannot solve categorisation problems with non-linearly separable rules. Exemplar model: ® Suggests that we store categories by storing all of the members of those categories in our memory, and basing category decisions on our memory retrieval. ® People are more likely to categorize an object into which categories are retrieved from our memory more often based on their similarity. ® For example, in an exemplar model concerning birds there is no prototypical model bird. Instead, we have each of the birds we have encountered before represented in geometric similarity space, along with the category label ‘bird’. ® Because we have access to all of the members of the categories, instead of computing the distance to some prototype (as per the prototype model), we instead compute the distance between the probe item and all of the members of the respective categories. We then chose the category with the smallest overall distance (the most similar). ® Because categorisation is based on similarity, exemplar models can predict the effects of typicality in graded structures as shown by the protype model. ® An exemplar model can also predict performance on a non-linearly separable structure (Nosofsky, 1986). This includes category learning tasks which result in rule- like performance. ® Incorporates a mechanism for selective attention – allowing for more important dimensions to be given higher weight. This is conceptualized as a stretching of the underlying geometric space. ® In this model, attention is optimized to increase accuracy. Nosofky found that the estimated attention weights from the exemplar model correlated strongly with locations for maximal task accuracy, suggesting that the exemplar model fits performance on a variety of tasks requiring selective attention by optimally allocating attention in those tasks. For example, if only one dimension is important (e.g. in the dimensional condition), then the attentional weight is shifted to only that dimension. When two dimensions are relevant, attention is split between both of those dimensions. ® This model can also predict the prototype enhancement effect because the prototype is very similar to members of its category. Criticisms of this model include: ® Exemplar models cannot store every instance of a real-world category in a separate trace. They either only store part of the information associated with an exemplar, or only store some of the exemplars. ® Separate events might be lumped together into single, individual stores if those representations are similar enough (e.g. two similar looking birds might be inaccurately combined into a single representation). Causal models: ® Involve categorisation as schema. ® Murphy & Medin (1985) argued that categorization should be seen as a process of explaining why something belongs with the rest of a category. ® Knowing the cause underlying the category provides an additional deeper dimension or feature that can be used to understand the category. Week 4 - Learning: What is learning for? ® To make predictions about events in an environment and to control them. Learning exists to allow an organism to exploit and benefit from regularities in the environment. ® We know whether events are related from the degree of contingency or correlation. Behaviourist approaches to learning: Classical conditioning: Þ Classical conditioning is where one learns an association between a stimulus that reliably predicts another stimulus that is naturally associated with a defensive or appetitive reflex response. Put simply, it is learning to produce a reflex response to a stimulus that would not naturally cause it. In classical conditioning, an initially neutral stimulus comes to elicit a conditioned response. Þ Before conditioning: unconditioned stimulus (UCS) (a stimulus that hasn’t been learned that produces the response in an unconditioned reflex) + unconditioned response (UCR) (a response to an UCS that does not have to be learned) = reflex (e.g. a dog salivating from a biscuit). Þ A stimulus that does not produce the reflex is called a neutral stimulus (NS). Þ The stimulus that you are trying to train a response to must occur immediately before the presentation of the food (or the other stimulus that produces a response). Operant conditioning: ® A behaviourist technique in which behaviour is paired with either reinforcement or punishment to shape the behaviour. ® Positive reinforcement is where you present a stimulus and that increases a given behaviour. ® Negative reinforcement is where you remove a stimulus to decrease a given behaviour ® Positive punishment is when a stimulus is presented which decreases a behaviour ® Negative punishment is when you remove an aversive stimulus which causes behaviour to increase. Blocking paradigm: ® First, the association between the red light and food is established using classical conditioning procedures. ® In a later training phase, after learning that the red light predicts the release of food, the red light is paired together with a bell. When the red light & the bell are paired together, food is administered on the left-hand side and the mouse learns to move to the LHS to receive the food. ® Then, intermixed with these trials is other trials in which a blue light is paired with an alarm sound and the mouse learns to move to the right to receive a juice reward. ® Finally, in the test phase, the blue light is paired with the bell to investigate what reward the mouse is anticipating upon combination of these two stimuli. ® Surprisingly, when the blue light & the bell are paired together, the mouse tends to go right more for the juice then left for the food. ® This finding tells us that learning is not just a reflection of cooccurrence – otherwise we would expect 50-50 responses between the food and the juice in the test phase. ® The second thing this tells us is that the early pairing of the red light with the food is important. Little is learned about the bell when it is paired with the red light because the red light already predicts the food reward. Any learning about the bell was blocked by prior learning about the red light. Cognitive approaches to learning: Non-associative learning: ® Refers to processes including habituation, priming and perceptual learning. ® Habituation – learning to ignore a stimulus because it is trivial (e.g. screening out background noises). ® Priming – demonstrated by a change in the ability to identify a stimulus as the result of prior exposure to that stimulus, or a related stimulus. There are two forms of priming – repetition priming (for example, prior exposure to a word in a lexical decision task will make that word easier to respond to next time it is encountered) and associative/semantic priming (for example, the prior presentation of the word ‘nurse’ facilitates subsequent identification of the word ‘doctor’. ® Perceptual learning – occurs when repeated exposure enhances the ability to discriminate between two (or more) otherwise confusable stimuli. Associative learning: ® Learning involves developing associations between stimuli and actions/behaviours. ® Includes classical and operant conditioning ® Also includes other types of reinforcement learning and training algorithms like backpropagation (where feedback is provided and the difference between the expectation and the feedback is used to update associations between inputs and outputs. ® Learning can occur even without a change in behaviour – e.g. Solomon & Turner (1962); which showed that even when the drug curare was used to block all skeletal responses during associative learning, conditioned skeletal responses to the CS are still present when the drug has worn off. ® Learning is more than just specific behaviour – organisms learn more than just overt behaviour. For example, Macfarlane (1930) showed that rats learned to navigate through a maze even after it was filled with water in order to find food, implying that learning involves acquiring knowledge a mental representation of the learning environment – in this case the spatial layout of the maze. ® Modern studies of learning use training & transfer methods to uncover complex mental representations. Learning tasks: Learning contingencies: ® Delta rule: the principle that the change in strength of an association during learning is a function of the difference between the maximal strength possible and the current strength of the association. ® ΔP tells us that if there is no causal relationship between two events, then ΔP will = 0 and the outcome is just as likely when the cue is present as when the cue is absent. ΔP will approach 1 if the presence of the cue will increase the likelihood of the outcome. ΔP will approach -1 if the presence of the cue will decrease the likelihood of the outcome. Probabilistic contrast model: ® Tells us that when the background is variable, we need to conditionalize on the background to understand the strength of the contingencies in the environment. ® This tells us that people are sensitive to the context in which information is presented. Mechanisms of learning: The Standard Associative model: Assumptions of the standard model: ® The environment affords the existence of directional connections between pairs of elements (cause & effect). ® The elements are the mental representations of events or features of stimuli that are activated by the presence (or suggestion) of stimuli. ® The presence of an element modifies the state of activation of another element. ® Learning involves the strengthening of connections (associations) between elements. The Rescorla-Wagner model: ® A model of how associations are formed and strengthened. ® Assumes that there is some existing connection strength between the unconditioned stimulus (Vx) and the reward. ® ΣV: sum of connections across all stimuli ® λ: strength of unconditioned stimulus ® λ – ΣV: the difference between maximum connection strength (what you expect to happen – ΣV) and current strength (what actually happens – λ). ® ΔVx = λ – ΣV. The change in the strength of X is the difference between what you expect to happen and what actually happens. ® If you expect a reward but don’t receive one, then the sum of the strengths will be higher than 0, so the change will be negative. If you don’t expect a reward but receive one, then the change will be positive. ® ΔVx = αβ (λ – ΣV); where β is a learning rate parameter that depends on the unconditioned stimulus, and α is a learning rate parameter that depends on the salience of the conditioned stimulus. ® These learning parameters are often grouped into a single learning rate (γ) ® ΔVx = γ (λ – ΣV): the change in the associative strength for a cue is the difference between the actual reward and the expected outcome multiplied by the learning rate (γ). The learning rate determines how quickly learning proceeds. ® The difference between the actual reward and the expected reward is called the reward prediction error. ® The Rescorla-Wagner model is an error-driven learning model. If the learning rate is positive, then you will learn more to the extent that there is a large difference between the sum of the strengths across all of the cues and the actual reward that you receive. ® The R-W model can be thought of as a connectionist, neural network, and used to solve machine learning problems using backpropagation. The backpropagation algorithm works by comparing the activation of the output to what we expect to happen to provide us with an error signal, which then flows backwards through the weights. ® However, this model misses the effects of selection. The R-W model explanation of blocking: ® The association strength of the first stimulus (the red light) increases until an asymptote. The association strength of the second stimulus (the bell) only grows by a small amount, because of the initial learning about the red light. The model predicts that we should learn more about a first presented stimulus than a later presented stimulus. What’s important for learning in the R-W model is surprise. For example, there is no more surprise when the novel bell stimulus is introduced, because the reward is already predicted by the red light. We learn a lot, rapidly when we are surprised. We don’t learn a lot when not surprised. Selective attention: Evidence for the role of attention in learning: ® Blocking – in blocking, dimensional attention is directed away from a novel cue. For example, early in training attention is shifted to the A cue because it is a perfect predictor of the outcome. Later in training no attention is left for cue B when A & B are paired together, meaning little is learned about the B cue (because all of the attention has been tuned to cue A). ® Highlighting – involves directing dimensional attention towards a novel cue. Here, A & B are already paired with X from the early training phase. During the second late training phase A & B are repeated, but A is also paired with D which predicts a different reward. In this explanation, attention is shifted to D because D alone predicts the unusual event Y (the element of surprise). We already know what A does, so attention is shifted to D and D then develops a strong association between its outcome. ® Categories with more demanding attentional requirements are harder to learn – e.g. Shepard, Hovland & Jenkins (1961): type 2 categories, which have higher attentional requirements, are learned slower & with higher errors than type 1 categories. Likewise, the rule plus exception categories are more difficult to learn than the type 1 or 2 categories, because you have to pay attention to all of the dimensions. Category learning difficulty depends on how many dimensions are used to define the category. ® Unidimensional category boundaries are easier than diagonal boundaries – Kruschke (1993): subjects are shown 1 of 8 different stimuli to categorize on each trial. Subjects were trained over 2 conditions – filtration (where the irrelevant dimension can be filtered out by attention and ignored) & condensation (where more than one stimulus dimension has to be condensed into a single category decision). Results showed that people weren’t as good at learning a diagonal boundary (i.e. in the condensation condition) as when only a single dimension is relevant (in the filtration condition). How does prior knowledge interact with learning?: ® Learning relies on error-driven updating (integration effects) and selective attention (selective weighting effects). ® Adjustments during learning are informed by prior knowledge which facilitates (feature interpretation effects) or hinders (facilitation effects) learning by biasing the interpretation or weighting of information. Integration effects: ® Initial representations are based on existing knowledge. ® Representation is gradually updated (based on integration of information). e.g. Selective weighting effects: ® Prior knowledge leads us to selectively attend to specific features. ® This leads to changes in the underlying representation, influencing how readily you learn about different objects. ® For example, when examining the similarities between trees, taxonomists give almost all of their weight to the taxonomic properties of the tree whilst landscapers also consider features such as their size, aesthetic and landscape utility. Feature interpretation effects: ® Prior knowledge helps people to interpret ambiguous features of stimuli that they are learning about. ® For example, expert radiologists are better able to distinguish the appearance of disease tissue from artefacts on X-ray film. ® Experts are more likely to use a different type of representation (e.g. a 3D representation rather than focusing on 2D cues) to interpret the features in ambiguous images. Facilitation effects: ® Prior biases make some types of concepts easier to learn. ® If you want people to learn a relationship between things, it is better to make that relationship consistent with their prior expectations. They learn faster, and their biases don’t try to overcome that relationship. However, if you get them to learn something counter-intuitive, it is often that their biases will overcome what they are trying to learn. ® For example, designs which conform to people’s expectations are easier to use. Week 5 – Attention – Historical origins, early & late selection: Attention: ® The brain’s ability to self-regulate input from the environment. ® Used in two senses in psychology: 1. Sustained attention (alertness): related to psychological arousal (a continuum from drowsy & inattentive to alert & attentive) or a problem of vigilance (performance declines over a long watch – e.g. in radar operators, quality control inspectors). 2. Selective attention: we are limited in the number of stimuli we can process. We attend to one stimulus at the expense of others. People have limited capacity systems – we don’t treat all stimuli equally. The cocktail party problem (Cherry, 1953): ® In a crowed environment, we can ‘pick out’ one conversation from background noise. ® ‘Picking out’ processes take sound energy at the ear, and translate that sound into intelligible speech which then gets translated into understanding/meaning. Translation is selective – stimuli are not all treated equally. ® Cherry was interested in what happens to unattended messages in this scenario. ® Cherry asked participants to listen to two simultaneous passages of speech (known as dichotic listening), asking participants to attend to one passage (the attended channel) and ignore the other (the unattended channel). ® Cherry found that participants had no memory of the unattended message. He then tried switching the language on the unattended channel from English to German, which they still did not notice. Then, he switched the speaker from male to female, which was noticed. He then played the speech backwards, which people noticed was ‘something queer’ but they still couldn’t pinpoint what it was. He then switched the voice to a 400-cps pure tone, which they reliably noticed. ® Only superficial (physical) features of the unattended message were perceived (i.e. things distinguishing voice & gender). The semantic content of the message was not analysed (its language & meaning). ® This suggests that sensory (physical or acoustic) features of stimuli are processed preattentively (automatically) - regardless of whether we are attending to the associated stimulus, whereas meaning requires focal attention (where one focusses their attention on a particular stimulus). Criticisms of Cherry: ® His experimental method looked at what people remembered rather than what people perceived. This method confounds perception and memory. Source localisation in space: How do we select the attended message? ® Our ability to localise the source of the sound in space is key. This is done via phase differences in the arrival times of sounds in our left & right ears. We resolve these phase shifts cognitively in order to subjectively & psychologically localise sound in space. Binaural presentation: ® Cherry presented a version of the task involving binaural listening, in which both messages were presented in the same voice to both ears. People were instructed to shadow one of the messages on the basis of the content, which people found next to impossible to do (because they were unable to localize the source of the sound). Filter theory (Broadbent, 1958): ® Attention acts as a filter to select stimuli for further processing. ® Attentional selection is based on simple physical features (location in space, voice etc.). Those simple features/properties are extracted (processed) pre-attentively – not requiring access to the limited capacity channel. ® Meaning is extracted in a limited capacity channel, which translates sensory information (sound) into conceptual understanding. This can only be done on the contents of one sensory channel at a time. ® Meaning requires access to the limited capacity channel, and can only be extracted if the stimulus is attended to. ® The selective filter precedes the limited capacity channel, protecting it from overload. There is only one arrow going from the filter to the channel – suggesting that we can only process one thing at a time. ® All stimuli are stored briefly in the short-term store (STS) – which stores unanalysed sensory material. This is known as iconic (visual) or echoic (auditory) memory. ® Sensory information decays quickly if not selected. Evidence for filter theory: ® Interaction of short-term store (STS) and filter – Dichotic digit stream: when people were instructed to recall digits in temporal order, they only got 3-4 digits correct. Whereas, when instructed to do ear- by-ear recall, they got 6 digits correct. This is because ear-by-ear recall only needs 1 filter switch, whereas 5 switches are needed to follow temporal order. Switches take time, which decays the STS trace. The failure of filter theory: Gray & Wedderburn (1960): ® Gray & Wedderburn conducted a split span experiment (‘Dear Aunt Jane’) with meaningful material. ® Broadbent’s theory predicted that people would recall words in a way that minimized the number of filter switches. ® However, this is not what Gray & Wedderburn found. Instead, they found that preferred recall order followed the semantic context, not the presentation ear. ® This showed that people change the way they filter material based on its semantic context. Moray (1959): ® Had participants complete a dichotic listening task, where they listened to two passages of continuous speech (one to the left ear and one to the right). Participants were asked to shadow only one of these passages. ® However, he embedded the person’s name in the unattended channel. ® He found that people often detected the occurrence of their own name on the unattended channel. ® This selection was based on meaning, and is not consistent with the idea that meaning is only extracted on the attended channel. ® This presents the same problem for filter theory as Gray & Wedderburn’s study, because if the filter completely blocks the semantic content (preventing anything from getting through to the limited capacity channel that is being unattended to), then it shouldn’t be possible to do this. The early vs. late selection debate: Early selection theory (Treisman, 1961): ® Sensory analysis takes raw wave form and converts it into sound, allowing you to distinguish one sound from another. ® Understanding sound involves semantic analysis, where sounds are made meaningful by activating stored knowledge. It is intimately connected with our stored knowledge of the meanings of language in long-term memory. Attenuation model – a feature of early selection (Treisman, 1961): ® Suggests that the filter partially blocks (attenuates) unattended stimuli – akin to ‘turning down the volume’ on the unattended channel. ® This is in contrast to Broadbent’s model, which suggests that the filter completely blocks unattended stimuli. ® Argued that the filter is biased by context & message salience. Highly salient stimuli (e.g. one’s name) & semantically related material (e.g. Dear Aunt Jane) get through the filter, shifting attention. Evidence for early selection (Treisman & Geffen, 1967): ® Got people to perform a dual task (doing two things at once) – a shadowing task and a detection task. The word tap was embedded in unpredictable places in both the shadowed and ignored passage. ® This method got people to indicate what they perceived as they perceived it (rather than at the end), avoiding the methodological issues in Cherry’s procedures which confounded perception & memory. ® Found that the percentage of correct detections was higher on the shadowed channel, but was not zero on the unattended channel. This is consistent with the idea of a filter that attenuates unattended stimuli instead of blocking it altogether. Criticisms of early selection: ® The complexity of the filter that the theory seems to require – the filter requires an enormous amount of knowledge to respond to semantic context and distinguish related from unrelated stimuli. ® In late selection, the filter is located after the long-term memory (LTM) instead of before LTM. Late selection theory (Deutsch & Deutsch, 1963; Norman, 1968): ® In this view, the filter is located later in the sequence of operations - between the semantic analysis stage & conscious awareness. ® Both early selection and late selection theories agree that recognition requires encoding and access to long-term memory. ® Late selection theory suggests that all stimuli access long-term memory, but this is not sufficient for conscious awareness. This stimuli must pass through an additional filtering stage for conscious awareness (activating stored knowledge in LTM is insufficient for conscious awareness as in early selection theory). ® Norman (1968) suggested bottom-up (stimulus driven) and top-down (conceptually driven, selection by ‘pertinence’ [relevance to task]) selection mechanisms that activate the LTM system. Need both kinds of activation for information to get through the filter into conscious awareness, otherwise it decays. Evidence for late selection: McKay (1973): ® Used indirect methods to demonstrate semantic processing on an unattended channel independent of awareness. ® A sentence with an ambiguous word was given on the attended channel, and on the unattended channel a word was given which was related to the meaning of the ambiguous word on the attended channel. ® McKay found that the semantic content of a word on an unattended channel can bias people’s recognition performance on the attended channel (the unattended word was used to resolve ambiguity on the attended channel). This is only possible if the unattended channel was processed up to the level of meaning. Von Wright, Anderson & Stenman (1975): ® Classically conditioned a galvanic skin response (GSR) to target words. Following this conditioning phase, participants were asked to shadow a passage and to ignore a second unattended passage. Target words has a classically conditioned GSR were embedded in this passage. ® Found that there was semantic activation (GSR) in response to these words, even in the absence of attention. This could only be possible if these words were processed to the level that their semantic category could be identified. Week 6 – Attention – Structure, Capacity & Control: Cost of divided attention: Moray (1970): ® Had participants perform a dichotic listening task using pure tone stimuli (beeps). Participants were told to monitor for target tones. ® Participants performed this task under one of three conditions: o Selective attention condition: participants were presented with two sequences of tones and told to attend to one sequence/channel only, ignoring what is presented on the other channel. Here, participants detected 67% of tones correctly. o Exclusive OR (XOR) condition: participants were asked to monitor BOTH channels at the same time, with tones occurring on either channel (with no simultaneous targets). Here, accuracy dropped to 54% of tones detected, showing that there was a cost to dividing attention compared to the selective attention condition. o Inclusive OR (IOR) condition: participants were required to monitor both channels, with simultaneous targets possible. Simultaneous (AND trials) and non-simultaneous (OR trials) were compared. There was found to be a moderate cost of divided attention (52% correct on OR trials) and a large cost of simultaneous detection (31% correct on AND trials). ® Early selection doesn’t predict the AND < OR finding (because the degree of the attenuation shouldn’t depend on the identity of the stimulus), whereas late selection does (because two simultaneous selection targets will both be selected by ‘pertinence’ and compete to get through the filter, with this competition giving an AND deficit). ® The finding that OR < SEL was predicted by early selection (because there is attenuation with divided attention) but not by late selection (because if there aren’t two targets, it would expect no competition and therefore predict no OR deficit). ® Overall, Moray found that this pattern is not consistent with either early or late selection, suggesting that a new theory may be needed. Structural and capacity theories: There are two ways in which attention can limit performance: Structural (Bottleneck) theories: ® Some neural structures can only deal with one stimulus at a time. ® Competition produces a processing ‘bottleneck’ (as per filter theory). ® Early selection suggests a bottleneck going into LTM, late selection a bottleneck getting out of LTM. Capacity (Resource) theories: ® Information processing is mental work. ® Work requires activation of neural structures. ® We have a limited overall capacity to activate these structures (due to large metabolic costs etc). Capacity theory (Kahneman, 1973): ® According to this theory, a reduction of capacity produces a deficit in divided attention tasks. ® Differs from structural theories, because under this theory capacity can be allocated flexibly to simultaneous tasks Phenomena explained by capacity theory: Interfering effects of divided attention: ® Strayer & Johnston (2001) showed that talking on a mobile phone interferes with driving – sharing capacity reduces accuracy and increases RT. Dual task performance: ® The costs (in terms of errors etc.) of dividing attention (doing two things at once) will depend on the capacity demands of each of the tasks. ® Li et al. (2002) had participants complete an attention demanding central task, and an easy or hard peripheral task. The difficult task was much more affected by central load. Inattentional blindness: ® Cartwright-Finch & Lavie (2007) had participants examine which arm of a flashed cross was longer. During this experiment, a clearly visible square was not detected by participants. ® This suggests that the demanding central task used up all available attentional capacity, leading to the inattentional blindness in the form of missing the square. Study capacity by dual task trade-offs: ® We can vary the proportion of attention allocated to two tasks in a dual task paradigm. ® The shape of the trade-off curve (attention operating characteristic (AOC)) tells us about the capacity demands of tasks. ® There is a ‘graceful degradation’ of performance as available capacity is reduced from one task and allocated to another. Auditory & Visual Dual Tasks: ® Bonnel & Hafter (1998) had participants complete easy auditory & visual tasks (detecting a spot of light or tone); and difficult auditory and visual tasks (discriminating increases from decreases in intensity of a spot or tone). ® Found that there was an attentional trade-off for the difficult tasks, but not for the easy detection ones. This is consistent with the idea that simple tasks such as detection can be done without much attentional capacity, but more difficult tasks such as discrimination between two similar stimuli requires far greater attentional capacity. Pros & Cons of Capacity Theory: ® Emphasizes divided attention, flexibility of attentional control. ® Can make capacity theories mathematically precise using decision-making theories. ® Shortcoming is its vagueness – can always come up with a capacity explanation. Attentional orienting (1980s): ® Shifts of attention are called attentional orienting. ® Attention shifts precede eye movements, and can occur without them. ® ‘Covert attention’ is movement independent of eye movements. ® These shifts in attention can be top-down (where you decide to shift your attention) or bottom-up (where something captures your attention). Clinical patients show deficits of both kinds – failure to focus attention, failure to disengage attention. ® There are two attentional orienting systems, engaged by different kinds of cues – endogenous (voluntary) or exogenous (reflexive). The endogenous system requires cognition – it needs to interpret information (e.g. central (symbolic) cues). The exogenous system is direct & spatial (doesn’t need to interpret information – e.g. peripheral (spatial) cues). Evidence for two separate orienting systems: 1. The different time course of central & peripheral cueing: ® The attentional effect of a peripheral cue is fast and transient (peaks rapidly), whereas the attentional effect of a central cue is slow and sustained. ® These different time courses are consistent with the idea that processing a central cue requires some intervening cognition/interpretation (isn’t automatic - takes some time to reach its maximum), whereas the peripheral cue (because it is direct & spatial) doesn’t require that kind of intervening cognition and therefore automatically and reflexively produces a fast & transient shift in attention. 2. The different effects of load: ® Jonides (1981) had participants perform a capacity demanding primary task (memory task) at the same time they did an attention cueing secondary task (orienting task). ® Found that voluntary orienting was slowed by concurrent memory load, but reflexive orienting was not. This is consistent with the different capacity demands of two systems. 3. Inhibition of return: ® Found only with peripheral cues, not with central cues. ® This is because once attention wanders, it takes more time to get it back on the cued location (resulting in longer reaction times). ® This suggests that the reflexive system is controlled by different processes. ® Inhibition of return may allow for efficient search of a complex environment (ecological argument), and prevents repeated search of the same location (don’t need to maintain a ‘mental map’ of locations that have been searched). The ‘spotlight of attention’ (Posner): ® Likened shifts of attention to a moving spotlight. There is selective enhancement for stimuli ‘illuminated by the beam’. ® This expresses the selective, limited capacity idea in spatial terms. Spatial cuing paradigm (Posner): ® Posner also proposed the spatial cuing paradigm. In this paradigm, attention is attracted to location A, then a stimulus is presented to either location A or B. Performance is compared, to determine whether A is processed faster, more accurately than B. Cued trials (stimulus occurs at attended location), miscued trials (stimulus occurs at other location, with low probability), and neutral trials (with an uninformative cue, used as a baseline to compare to valid & invalid – 50% left and 50% right) were used. ® Found that there are attentional costs and benefits of cueing: there is a faster reaction time with a valid cue, but a slower reaction time with an invalid cue. This finding is very flexible – can be used for both reaction time or accuracy, and to compare all kinds of stimuli. ® These costs and benefits may be due to switching time (the time needed to move/orient the attentional spotlight) – the cost of disengaging from the wrong location, and benefit from engaging at the correct location before the stimulus. However, they may also be due to unequal capacity allocation – reaction time depends on attentional capacity allocated to location. For a neutral cue capacity is spread across locations, but for a focused cue capacity is concentrated on one location. It is hard to test between these two alternatives. Distraction: Lavie (2010): ® High perceptual load reduces susceptibility to distraction. ® High cognitive (working memory) load increases susceptibility to distraction. Week 7 – Attention in Space & Time: The psychological function of spatial attention: Þ To assign limited capacity processing resources to relevant stimuli in the environment. To do this, we must locate stimuli among distractors and process (identify) them. ® We can measure visual search in terms of mean RT as a function of display size. i.e. for a larger display with more distractors, there will be increased visual search and therefore a slower RT. Pop-Out Effects: ® Unique colours (e.g. a red line amongst green lines) and unique orientations (a vertical line amongst horizontal lines) both ‘pop out’ from their background. Parallel search for feature targets: ® Mean RT doesn’t increase with display size. This suggests that we are doing this search in parallel – for all items in the display at once. ® We compare the contents of each display location with our mental representation of the target at the same time. Conjunction targets: ® Conjunction targets are those defined by a conjunction of features – a combination of colour and orientation. ® For these targets, RT increases linearly with display size. ® The slope is twice as steep for target absent as target present trials. ® This suggests evidence for serial search – we seem to need to focus our attention on the target to detect it – we need to focus our attention on each item in turn (serial search). ® Constant scanning rate predicts linear RT/display size function. Self-terminating serial search: ® On trials where the target is present, we carry out the search via a self-terminating serial search (we stop when the target is found). ® On average, we search half the display on target-present trials, but all of the display on target-absent trials (an exhaustive search). ® Constant scanning rate predicts the 2:1 slope ratio. Pop-out effects with letter stimuli: ® Pop-out targets can be identified by a single feature (e.g. straight lines amongst curves, or vice versa). ® There is no pop-out when targets can’t be identified by a single feature (e.g. straight lines amongst strait lines, or curves among curves). Feature integration theory: Triesman & Gelade, 1980: ® Suggests that the role of attention is to bind an object’s individual features into perceptual compounds. ® Each individual feature (lines, colours etc.) is registered in its own feature map. ® Without attention features are free-floating, and may lead to illusionary conjunctions (perceiving that two features are connected when they are not) ® Conjunction targets require feature binding, so need focused attention – which leads to serial search. ® Feature targets don’t require feature binding, and therefore don’t need focused attention – which leads to parallel search. Problems with feature integration theory: ® Pop-out sometimes depends on complex object properties, not just simple features (Enns & Rensink, 1990) ® High-level, not low-level properties predict pop-out. This is inconsistent with the idea that pop-out only occurs at the level of simple features. Efficient vs inefficient search: ® Many tasks show an intermediate pattern – don’t provide clear evidence of either serial or parallel search. Wolfe suggested that this is better described as inefficient or efficient search. ® There is no evidence of dichotomous population of search slopes – parallel and serial functions look like the ends of a continuum ranging from efficient (parallel search) to inefficient (serial search). It is unclear how feature integration theory would account for this. Guided search theory (Wolfe, 1989): ® Suggests that search is carried out not just in parallel or just serially – but in a two- stage process. ® The initial parallel stage provides a candidate list of potential targets. The second serial stage checks the candidate list for targets. ® Search efficiency depends on the similarity of the target and distractors. Similar targets and distractors lead to a large candidate list & inefficient search. Dissimilar targets and distractors lead to a small candidate list & efficient search. ® There is an initial parallel processing of sensory features, which guides the allocation of a limited capacity channel. Failures of focused attention: ® Visual search looks at the costs of divided (distributed) attention – performance declines with increasing display size, which is evidence of capacity limitations. ® There are some situations where there is a benefit not to divide attention, avoiding processing distractor stimuli. ® There are limitations of focused attention, and there is some involuntary processing of irrelevant stimuli (e.g. the Stroop effect, which shows that we are unable not to process distractor stimuli). The Stroop effect (Stroop, 1935): ® Participants were required to indicate the colour of a word presented as fast as possible. ® 3 different types of stimuli were shown (compatible (e.g. ‘red’ written in red ink), neutral (‘rain’ written in blue ink) and incompatible stimuli (e.g. ‘blue’ written in yellow ink). ® RT was fastest with compatible stimuli, intermediate with neutral stimuli and slowest with incompatible stimuli. ® This tells us that there is parallel processing of colour naming and word reading. ® Word reading is fast and involuntary, with the name of the word being available before the name of the colour (colour naming is slow and controlled). This creates an output interference (known as the Stroop effect) ® These processes are asymmetrical – there is no interference of ink colour on word naming (because word naming in the faster process). ® Automaticity: word reading is a fast and automatic process, whilst colour naming is slow and controlled. Learned stimulus-response associations make a process automatic. For a process to be automatic, it must be fast, carried out in parallel, effortless and not require capacity/allocation of attention. Automaticity forms the basis for skill acquisition (e.g. reading, driving, playing a musical instrument). Controlled and automatic processing (Shiffrin & Schneider, 1977): ® Investigated whether practice produces automaticity. ® Involved searching for digit targets in arrays of distractor letters in rapid sequences (or vice versa). In the consistent mapping (CM) condition, target and distractor sets were distinct. In the varied mapping condition, targets on some trials were distractors on others. ® Results showed that performance under the CM condition became automatic with practice (>90%). Also, performance became independent of memory set and display size. This is consistent with people being able to perform this task in parallel. Over time, this task became subjectively effortless, and there was spontaneous pop-outs of targets from text (consistent with automaticity). ® However, performance never became automatic under the VM condition. It always remained effortful, and required consistency of target set membership. This was consistent with the effortless encoding account. The Eriksen Flanker Task: ® Shows a failure of focused attention – the involuntary processing of flankers even when attempting to ignore them. ® The task asks whether the central character is an ‘E’ or an ‘F’ – and measures RT. Found RT on compatible tasks is fastest, RT on neutral tasks was intermediate, and RT on incompatible trials was slowest. ® This implies some parallel processing of conjunction stimuli. ® Parallel processing decreases with spatial separation, and disappears at 1-1.5 degrees. This provides an estimate of the size of the focus of attention (the ‘spotlight’). Stimuli falling within the spotlight are processed automatically. Attention in Time: The Attentional Blink: ® A cognitive phenomenon that suggests that attention can ‘blink’ in a similar way to our eyes, during which time we are not aware of stimuli. ® Demonstrated using a rapid serial visual representation (RSVP) task, where you prevent a sequence of items at the same location in consecutive frames of visual display. Each item is masked by the item which follows it. ® The attentional blink arises as a result of attentional resources being directed towards the first target, which makes people temporarily insensitive (unable to see) towards the second target. Target 2 (T2) performance declines and then recovers (the ‘attentional blink’). ® We only find the attentional blink if Target 1 (T1) is processed – if T1 is ignored there is no attentional blink. The blink depends on T1 processing. Worst performance occurs not immediately after T1, but some time later (Lag 1 sparing). If the two targets are immediately back-to-back, there is no evidence of attentional blink. ® The effect of the attentional blink takes time to build up, but is relatively long-lasting (up to 600ms). Week 8 – Object-based Attention & the Cognitive Neuropsychology of Attention: What does attention act upon? ® Spotlight theory & feature integration theory assume that attention acts on a region of space – enhancing processing in that region. ® Object based theories suggest that attention acts on objects in space, not the space itself. Attention can be directed selectively to one of two objects occupying the same region of space: Rock & Gutman (1981): ® Used two overlapping figures as stimuli, and had participants attend to one and rate the aesthetic appeal whilst ignoring the other. The two objects occupied the same region of perceptual space. ® Then used a memory test – found that people had good memory for the attended figure, and no memory for the unattended figure. ® Suggests that the object of attention is the object itself, as opposed to the space it occupies. ® It is suggested that the unattended shape may not be perceived or fully perceived. Alternatively, maybe people quickly forget the stimulus they’re not attending to (inattentional amnesia). Tipper (1985): ® Asked people to name one figure as quickly as possible and to ignore the other one. ® Found that the RT to name the item (e.g. the trumpet) is slower if that object had been ignored on the previous trial. This is evidence of negative priming – where ignoring something previously makes you slower to respond to it than if you hadn’t seen it. ® This also means that the ignored shape must have been perceived in order to produce an effect on the subsequent trial (consistent with late selection). ® This is evidence that people can attend selectively to one of two objects which occupy the same region of space. Evidence for Object-Based Attention: Duncan (1984): ® Presented stimuli differing on 4 attributes – box size, gap side, line slant, dotted/dashed line. ® Flashed the stimuli briefly, and asked participants to report two of the attributes (e.g. line slant, gap side). ® Found that participants were more accurate if the two attributes belonged to the same object than different objects (same – box size and gap side or line slant and line style. Different – box size and line slant etc.). If you only have to report the attributes of one of the objects, then you only have to form a representation of one object and direct your capacity towards it versus forming representations of both and directing your capacity to both – which involves a cost of divided attention. ® He found these costs when stimuli were occupying the same region of space, which is evidence that attention operates on whole objects (rather than regions of space). Cueing Object-Based Attention: Egly, Driver & Rafal, 1994: ® Adapted the standard peripheral cueing paradigm to an object-based cueing paradigm. ® Miscued locations in the same object or a different object – the same distance from the cued location ® Space-based theories would say that miscuing costs (the magnitude of the slowing) should be the same for both the same object and different object conditions. ® Found an advantage for the same object – mean RTs were faster to miscued stimuli if in the same object. ® This is evidence that the cueing effect spreads to encompass the entirety of a cued object – which is consistent with the proposition that attention is being directed NOT to the region of space in which the object resides but to the perceptual object itself. Effects of an Occluding Bar: Moore, Yantis & Vaughan, 1999: ® Added an occluding bar in stereo space to Egly et al.’s object based cueing paradigm experiment. ® Still found the same object advantage as in Egly et al. This was not related to crossing edges or boundaries; agreeing with the percept of continuous objects. Visual neglect: ® Control of attention involves a balance of top-down and bottom-up systems. Reflexive systems orient to new stimuli, whilst voluntary systems provide sustained attentional focus. ® A failure to focus and a failure to disengage & reorient are both found in clinical cases of visual neglect caused by damage to the right parietal lobe. Attention and visual pathways: There are two pathways in the brain responsible for processing visual information: ® The ventral pathway: found in the temporal lobe, responsible for detecting form & colour. The what pathway. ® The dorsal pathway: found in the parietal lobe, responsible for detecting direction of motion & spatial location. The where pathway. Damage to the parietal lobe disrupts this ‘where’ pathway. Neuropsychology of neglect: ® Neglect refers to a deficit in processing spatial information (e.g. a patient asked to reproduce a picture of a clock will only draw the numbers on one side of it). ® Patients are not blind, but have difficulty in making the left side of space accessible to their conscious awareness. ® Neglect can also be demonstrated using the cancellation test, where people are asked to cross out line segments on a page. Neglect patients will omit the line segments on one side of the page. ® Damage to the right parietal lobe leads to left visual field neglect. ® Behavioural manifestations of neglect can often include a failure to dress the left side of the body, shave the left side of the face etc. Cueing Deficits with Right Parietal Damage (Posner): ® Compared intact and damaged hemispheres, using the intact hemisphere as a control. ® Posner argued that right parietal damage does not impair the ability to voluntarily engage attention, but causes difficulties in disengaging and shifting in response to new information (as measured by MRT in response to invalid cues directing attention to the left visual field). Symptoms of Neglect – Extinction: ® This occurs when a clinician holds up a pair of objects at once (e.g. an apple and a comb). ® Where the perceptual response to one stimulus ‘extinguishes’ the response to the other. ® Remember that Moray (1970) found that when two weak, simultaneous signals were presented, individuals had difficulties identifying both signals. This is consistent with late selection theory which suggests that only one signal can get through the filter to consciousness at a time. ® Extinction shows that two competing perceptual representations cannot co-exist in consciousness of those with damage to the brain. This occurs because recognition & identification of objects requires activation of neural structures. The damaged hemisphere is chronically underactive, meaning stimuli don’t provide activation as they should. These effects are strongest with activity in the other hemisphere (invalid cue, competing stimulus). Balint’s syndrome (Patient RM): ® Bilateral lesions in the parietal and/or occipital cortex resulting in an inability to focus on individual objects and see more than one object at a time (simultanagnosia) – prone to illusionary conjunctions. ® This occurs even when objects overlap (suggesting that it is object and not spatial- based). Space-Based and Object-Based Attention: ® Attention seems mainly associated with the ‘where’ pathway. ® Under the spotlight view (associated with the movement of attention through space), neglect is associated with the left side of perceptual space (the ‘where’ pathway). ® Under the object-based view, attention keeps track of objects (‘can ignore’, ‘shouldn’t ignore’) (the ‘what’ pathway). ® Under inhibition of return, it is suggested that we tag a cued spatial location as uninteresting, so therefore there is a slower RT there. This tagging is associated with objects, not just the space they occupy (Tipper, 1991). Object-based inhibition of return (Tipper, 1991): ® Tipper rotated cueing markers to new locations, and then presented a target to be located in one of these markers. He found a slower RT at a previously cued marker, with inhibition of return tracking the cued marker to its new location. This shows that inhibition of return follows the cued object, and is not confined to one region of space. This is consistent with the idea that what is being inhibited is not just a region of space, but the perceptual object at that region of space (IOR is at least in part an object-based phenomenon). Object-based neglect (Behrmann & Tipper, 1994): ® Neglect occurs in terms of a left visual field deficit for those with right parietal damage. Behrmann & Tipper wanted to determine whether this was neglect of space or neglect of the left side of the object. ® Used a barbell stimulus with two location markers and a connector, combining them into one unified perceptual object. ® Found that there was a longer RT on the left (the neglected side), but then rotated the barbell 180 degrees and presented the target to be detected again. ® Here, they found longer RTs on the right side, showing that neglect has tracked the marker from its previous location to the opposite visual field. This showed that neglect is object-based – the neglect acted on the left side of objects, not just the left side of space. Week 9 – Memory (Learning & Forgetting): Ebbinghaus: ® Established the experimental tradition of memory – presented randomized lists of ‘non-meaningful’ stimuli to participants under controlled conditions (exposure duration, retention interval etc.). ® Spacing effects: repetitions are more effective for memory if you space them out over time, rather than mass them consecutively. ® List length effects: we have worse memory when studying longer lists of items versus shorter lists. ® Forgetting curve: the shape of how memory declines over time (a rapid drop off initially after learning, which slows over time). Decay theory: ® The theory that stored memories fade/degrade over time. ® Evidence for this theory came from the Brown-Peterson paradigm – where subjects learned a trigram, and then counted backwards by seven (to prevent participants from rehearsing the material) for a period of time. Found that memory degrades over time, with almost no memory after ~20 seconds. Interference theory: ® Suggests that forgetting occurs because over time there is more interfering mental activity. ® Specifically, forgetting occurs because there is competition between the things we are attempting to remember. There is greater competition when we learn more. ® There is competition because when we learn we associate things together, and the more associations we make the harder it is to remember everything (the more associations to a cue, the harder it is to retrieve the correct memory). ® Inability to retrieve new associates can be because of interference from older ones (proactive interference). ® Inability to retrieve old associates can be because of interference from newer ones (retroactive interference). ® Interference theory can even explain why lockdowns impair memory – we are associating what we learn to the same cues (e.g. our house, our surroundings). This makes it harder to remember particular things. When you are travelling to different environments, there is less interference because you are experiencing unique cues. ® Interference theory suggests that performance on AB-AC lists is worse due to response competition. When the A cue is presented, both B & C compete to be retrieved in AB-AC lists. However, no competition is present in AB-CD lists. ® Decay from short-term memory: Underwood (1957) found that the biggest predictor of forgetting was the number of previous experimental trials. More trials equalled poorer performance. Underwood’s interpretation of this was that subjects suffered from proactive interference from previous trials. ® Keppel & Underwood (1962) found almost no forgetting over 20 seconds if memory was restricted to just one trial. ® Release from proactive interference (Wickens, Born & Allen, 1963): switching the stimulus type (e.g. digits to letters) causes a big improvement in performance. This is because the new category hasn’t been learned, so it should suffer less interference. Many manipulations cause release from proactive interference – changing modality (auditory to visual), changing semantic category (e.g. fruits to occupations), changing part of speech etc. There is no obvious decay explanation for these results. Consolidation theory: ® Suggests that memories are in one of two states – preservation period (where memories are vulnerable) or consolidation phase (where memories that survive the preservation period are permanently stored and won’t be forgotten). If consolidation does not occur, the memory will be forgotten. Consolidation involves exporting memories from the hippocampus to the cerebral cortex. ® There is better memory when there is a rest between two lists – supporting a consolidation explanation. This theory makes 3 predictions: 1. Protecting memories during the preservation period should enhance consolidation & prevent forgetting. This is true – mental inactivity following learning leads to better memory (as it provides time for memories to be consolidated). Going to sleep after learning leads to better memory than being awake (Jenkins & Dallenbach, 1924). Mental activity whilst still awake prevents memory consolidation. 2. Interrupted preservation should prevent consolidation, and prevent memory from being stored. Interruption after consolidation occurs won’t impair memory, but interruption before consolidation should. This is confirmed by studies on retrograde amnesia – where most recent memories before the trauma were lost because they were not consolidated. Studies involving induced retrograde amnesia show similar effects – e.g. removal of the hippocampus, electroconvulsive shock. The longer the delay between learning and electroconvulsive shock (ECS), the better the memory (Duncan, 1949). When ECS was given one hour after learning, rats performed as well as the controls (same applies to removal of the hippocampus). This is because these memories are consolidated in that time, and no longer require the hippocampus. The longer the delay between learning and lesion, the better the memory. 3. If consolidation is prevented, the memory is not stored and remembering should be impossible. If you re-test memories forgotten after ECS, they should not be remembered. Quartermaine, McEwen & Azmitia, 1972: when tested soon after ECS, rats showed no memory of learned responses. However, when tested 72 hours after, the rats exhibited memory. This does NOT fit with the predictions of consolidation theory (is an example of spontaneous recovery). Some memories never seem to be forgotten: ® Bahrick et al. found virtually NO forgetting of foreign language courses from 3 to 50 years. ® 80-90% recognition of high school classmates 50 year later. Hybrid theories: ® Consolidation + interference: the consolidation process protects older memories from interference. Shape of the forgetting curve: ® Exponential function: implies that loss is a fixed proportion for each unit of time. Implies that old and new memories are just as likely as one another to be forgotten. ® Power function: forgetting described by the amount of time raised to a power. Power functions imply that the proportional loss decreases with respect to time. In other words, power functions imply that older memories are more likely to survive than newer ones. ® The data overwhelmingly supports a power function for forgetting. What causes forgetting to slow with time? ® Wixted (2004) argues that forgetting is driven by interference, but as time progresses memories are more likely to be consolidated – which slows forgetting. Temporal context: ® Ecker, Tay & Lewandowsky (2015) found that there is similarly better memory for a given list when there is greater rest BEFORE a list. This supports a modified interference theory based on temporal context. ® Temporal context suggests that interference between items depends on how closely the items were studied in time. There is more interference from items that are studied closer in time to each other. ® Pre-study and post-study rest produces greater separation between lists, and this leads to less interference between them. Inhibition theory: ® Emerged with the discovery of retrieval-induced forgetting – where remembering something causes the forgetting of other associative details. The retrieval-induced forgetting (RIF) paradigm: ® Study phase: learning pairs of categories and exemplars (e.g. fruit & apple as a pair) ® Retrieval practice: where you test some of the studied exemplars from some of the categories. ® Test phase: where you are tested on all of the initial category-exemplar pairings. o RP+ items: are practiced items from practiced categories. o RP- items: are non-practiced items from practiced categories. o Nrp items: are non-practiced items from non-practiced categories. These function as a control condition. ® RIF is evident if Rp- items perform worse than Nrp items (impairment of recall due to retrieval practice with RP+ items). ® Anderson, Bjork & Bjork (1994) argued that RIF provides evidence that forgetting is due to inhibition. When attempting to remember an item, other items from the same category are inhibited or suppressed (e.g. when cued with ‘fruit’ and ‘orange’ is retrieved, other exemplars (e.g. banana & pomegranate) are inhibited and it is more difficult to retrieve them. Strength dependence of inhibition theory: ® Inhibition theory makes a unique prediction – that strong competitors during retrieval practice are suppressed (to make it easier to remember what we need to), and therefore should exhibit large RIF effects (strength dependence of RIF). Weak competitors do not need to be supressed. ® Anderson et al. (1994) tested between the predictions of interference theory & inhibition theory. Anderson used categories with strong and weak associations to the cues. Strong RP- exemplars should show the strongest RIF effects, as these should come to mind during retrieval practice and therefore require suppression. Weak RP- exemplars should show almost no RIF effects, as they do not intrude during retrieval practice & therefore do not need to be supressed. ® Anderson found that only STRONG RP- items suffered from retrieval induced forgetting. Cue independence of inhibition theory: ® Makes a cue-independent assumption of RIF. When an item is supressed, it is harder to remember regardless of the cue. This is in contrast to interference theory – in which forgetting is cue-dependent. ® Retrieval-induced forgetting should also be evident with novel cues. ® Anderson & Spellman (1995) tested the independence assumption, and found confirmation of the cue independence assumption. Retrieval dependence assumption of inhibition theory: ® In inhibition theory, items are suppressed due to the competition that emerges at retrieval. ® If there is no retrieval competition, there should be no RIF (the retrieval dependence assumption). ® Anderson et al. (2000) found that exemplars cuing their own category does not cause RIF. Restudying also does not cause RIF. ® Inhibition theory does not make this same assumption – it says that any time you are strengthening associations (whether via learning or via retrieval), you should get more interference. Þ However, the strength-dependence assumption has not generally replicated, and did not generalize to other kinds of strengthening operations (such as repetition). Þ Jakab & Raajimakers (2009) found that RIF was observed for both strong and weak items, consistent with interference theory. Week 10 – False Memory: Memory errors: ® We are more likely to confuse things that were studied nearby in time. Drewnowski & Murdock (1980) found that most intrusion errors come from the immediately preceding list – with intrusions from distant lists being much less likely. ® Learned content that is similar leads to greater memory errors. For example, Underwood (1965) found higher levels of false alarms (erroneously saying ‘yes’ to novel items) if the novel items were associates of studied items. For example, if ‘dog’ was a studied word, then there are higher false alarm rates to words such as ‘canine’ or ‘cat’. There were lower false alarm rates for words such as ‘couch’ or ‘groom’. o Underwood argued that these results were due to implicit associative responses – when you learn a word, you might think of other related words. o For example, when studying ‘boy’, you might think of the word ‘girl’. ‘Girl’ then gets associated to the current situation, which makes you more likely to false alarm to it when presented with it later. Even events that we imagine or are reminded of can be a source of false memories. ® Even when participants were given a forewarning about false memories (Gallo, Roberts & Seamon, 1997), false memories were reduced but still NOT eliminated. ® Robinson & Roediger (1997) found that false recognition increases rapidly as the number of studied associates is increased. In other words, the more similar the content we learn, the easier it is to falsely recognize or remember something that is similar to what we experienced. Why do false memories occur in the DRM paradigm? ® Generation of the critical lure during learning. For example, when you see words like ‘sheets’, ‘dream’, ‘slumber’ you might be reminded of the word ‘sleep’. This makes it more likely that you will falsely recall or recognize the word later. This is referred to as a source monitoring error – false memory occurs because we are unable to distinguish between a real event (a presented word on a list) and an imagined one (generating the word ourselves in our own minds). ® Generation of the critical lure during learning can be tested using: o Overt rehearsal procedure: having participants rehearse the words out loud as they’re studying the words. ® One other possibility is that false memory occurs because we use the ‘gist’ of a learning episode in conjunction with real memories to reconstruct the event. This is referred to as fuzzy trace theory (Brainerd & Reyna, 2002). This theory suggests that we have two types of memories when we are learning: ‘verbatim’ traces – which are real memories that are incomplete or have encoding errors (e.g. memory records of the specific pieces of dog artwork that you saw), and ‘gist’ traces – which are an overall sense of what was learned in an event or study list (e.g. knowing that you just looked at some artworks of dogs). At retrieval, both verbatim and gist memory traces are used to reconstruct memories. According to fuzzy trace theory: o Forewarning doesn’t reduce false memory because gist is still a useful or necessary cue to reconstruct an event. o Increasing the number of associates increases false memory because you get stronger extraction of gist traces (it becomes easier to tell what the gist is). o False memories still occur even if they are not generated at study, becau

Cognitive Psychology Notes PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue