Language 3-4 Transcript PDF

Summary

This document discusses visual word recognition, language exposure, and semantics. It covers historical context and use of devices to measure response times to stimuli.

Full Transcript

SPEAKER 0 Hey, good morning everyone. Um, I would like to start. Um, I hope you can all hear me. If if you can't hear me, well, just let me know. I hope everyone can see the screen. If there's any issues, uh, just let me know. So welcome to the second, uh, lecture that I'm going to give you about v...

SPEAKER 0 Hey, good morning everyone. Um, I would like to start. Um, I hope you can all hear me. If if you can't hear me, well, just let me know. I hope everyone can see the screen. If there's any issues, uh, just let me know. So welcome to the second, uh, lecture that I'm going to give you about visual word recognition. So this is a language three and four. So in the first part of my lecture today, I will be focusing on word frequency, one of the most important, uh, aspects of word recognition. I will talk a bit about language exposure, how we can measure that, and also, uh, print exposure, exposure to written the written word. Then I will talk about, um, factors that influence word recognition, such as word frequency, but also contextual and semantic diversity, lexical similarity. So I mentioned that already yesterday. A little bit about words can be very similar in terms of their writing. Um, so we'll be looking at orthographic neighbours and then a little bit going back to some of the models that I talked about yesterday, how they can account for, for example, word frequency and lexical similarity effects. And then after the break I will talk about, um, semantics. How can we represent the meaning of words? How would that work in the brain? How would meaning be represented? Okay, so if we think about word frequency, um, it's uh, it's very interesting because we can go back a long time in history, um, that this has already been observed and noticed by an researcher called, um, James McKean. Um, so James McKean is really interesting because he was, um, a PhD student in the 19th century. Um, he was from America and he did his PhD in Leipzig under, uh, under, uh, William Wundt and William Wundt. You probably have heard it's a very famous psychologist, one of the fathers of experimental psychology, and one of the people who tried to really quantify things and use a scientific, scientific approach to psychology. And so in the 19th century, a lot of people considered psychology as a kind of pseudoscience. And people like James James MacQueen and William Wundt, his supervisor, were really trying to use a scientific method to study study, for example, cognitive psychology. It was not called cognitive psychology there, but at that time. But it was kind of things like visual perception and language. So James McQueen, uh, James Cattell, um, used device called the Chrono scope to measure precisely. So this is the device they used? Um, and it was a device that enabled them to measure at a millisecond accuracy how quickly people could respond to stimuli. And this was the very first time that people started measuring how long it takes for humans to respond to particular stimuli. And they used extremely simple stimuli, just lights, different light colours. But also they were able to present, for example, a letter, strings, letters, letter strings, but also pictures. And how did they do that? Because at that time, of course, they didn't have any monitors. Um, so how did they do that? They used just a piece of cards that they then at the precise moment made visible to the participant. And you can see that a little bit here. The device, what they used, um, was basically an ability to make this card, which was just a printed card piece of paper make visible, and at that moment when it becomes visible, a clock started. And then they could they use a device, uh, like a button basically, and button press so that they could measure how long it takes for when something is presented to a participant, how long it takes for them to respond at a millisecond. Accuracy. So that time was extremely interesting and remarkable that they were able to to measure at a millisecond accuracy how long it takes to make a response. And not only they looked at button responses, but they also had the device here, which you can see is like a funnel. And basically participants were asked to name a word, for example, or name a letter and then name that aloud. So how do you measure then the response. So they use this funnel and then um, basically an an in kind of um sheet basically that starts vibrating when someone starts to speak, and that enabled them to then pick up this vibration and then use an electrical circuit, then to stop the clock. So extremely sophisticated for that time. And they used and they, uh, James Cattell wrote in a series of papers where they just tried various things. So it was extremely exciting at that time. No one has ever measured response times at that time. And they were the very first people to try that. And so, for example, they presented a letter and then recorded how quickly people respond to a particular letter. So for example, they had to respond to the letter B and they were presented with different letters. Um, and then they also presented a words. And what they found, for example, very remarkable is that people were just as fast as responding to a particular letter than to a particular word. And you would think it takes longer for a word to respond to a word, because a word consists of multiple letters. No, they found that it takes about the same time. So they already argued that maybe words are stored as a whole, as a whole unit and whole perceptual unit. So there was an extremely important insight, um, that they already started looking into. And James Cattell looked at many, many different things. So also he, for example, posed a question, you know, is there a difference between monolingual and bilinguals, for example, in terms of responding to words in a second language or a first language? So very fundamental questions that after them, lots of other researchers have picked up on later. Now, James McQueen did a speech in Leipzig on the wound and then went to Cambridge to teach there, and then became the first professor in of psychology in the United States. And so he was an extremely influential figure in the United States in terms of development of psychology. He was, for example, one of the first editors of one of the famous journals academic journals science. And so he was, and he also was one of the founders of the American Psychological Association. And so a really interesting person. And it's really interesting to go back to the papers that they wrote in the 19th century, at the end of the 19th century. So one of the other things that Cattell noticed that there was a difference in terms of words, in terms of how common words were in the language. So he he basically said that the frequency of use of words in a language influences the task involved in printed words. So he found that if you take words that are common in a language, for example, the word car, uh, that they are recognised faster than words that are uncommon in the language, for example AAC. So he was the very first person to observe this particular phenomena. So, um, this means that and of course, many researchers after that started looking at this word frequency effect. And it's now in well-established effect that the process of lexical identification, a single word, is very sensitive to word frequency. The question, of course, is then is this only when you have a particular task? Or is this something that you observe in multiple tasks? And this is something that Munsell started looking at. So he looked at word frequency effects in lexical decision um, and also in semantic categorisation. So in in semantic categorisation is also a task often used. So basically you ask people to categorise is it one category or a different category. So they use the category of is it a person or is an inanimate thing. And they also use a task like word naming something that Cattell also used. So what kind of material did they use. So here you see some examples of the material that they use. a high frequency word, medium frequency words, and low frequency words, and you can see the different categories student desk, medium window, furnace and low frequency and tyrant. And in short. So what did they have found? Well, they found a very clearly, uh, effects for if you look at person versus things, you can clearly see that there is a frequency effect. People are slower to making categorisation, for example, here, especially for the when it's about things. Um, but overall you see a kind of nice effect of frequency, uh, high frequency words are responded to faster the medium frequency words and high and then low frequency words. So you have a nice linear pattern, but especially the effect is very strong going from a medium frequency to a low frequency word. And there was no interaction between task and frequency. Now this is an extremely, uh, well known effect. Um, if you any experiment that manipulates frequency, you find these very strong effects of frequency, uh, particularly in the lower range. That means also that any experiments that people are doing involving words would mean that you need to be taking a care of what frequency you need to know that word frequency would likely play a role. And so if you are not interested in word frequency, you have to make sure that you make sure that the words that you are using are equal in terms of word frequency. So how question is how do we determine what the word frequency is. How do you estimate what frequency, how common something is. Now of course this is a really critical question because how do we determine that something is low or medium. What is high frequencies. Generally people have a good guess what. It's high frequency. But medium or low is more difficult. Now one way to do that is to use a large corpus of text. Now, one of the first people to actually establish a large corpus of text and, and then calculate basically how often the word occurs in this large text is crucial. And Francis, in 1967, uh, they looked at an corpus of a variety of material, magazines, books, and they had an 1 million, uh, word corpus. That time was, of course, massive and 1 million corpus. And that allows them to give, to obtain counts of how, how often those words occur in this 1 million corpus. And so some words might occur only once, others might occur 100 or 200 times in this million word corpus. There were later on other corpora that have been developed. Famous ones are the selected corpus and developed in the Netherlands, uh, looking at English dots and German uh And in Britain we have the British National Corpus, uh, which is a large collection of books. It's in hundreds of millions of, uh, uh, of tokens and so individual words. And more recently, we have also something called the Google Book Corpus. So basically Google what Google did is they scanned all the books that they could get Ahold of. So not only from the last ten years or last 20 years from the very first prints. So this is an extremely massive corpus of books, really interesting. And you can actually look at how often words occur at a particular point of time. So they, for example, looked at books printed, for example, in 1920 or 19 30s or the 1960s. And so you can go to this website and it's really interesting. You type in some words and it will give you over time to see what the frequency of that word is. Now, of course, certain words were often used in books and in the early 20th century, while other words became more and more used later on. A thing like a mobile phone, of course, was not used in a mobile, or a phone was not really common, let's say in the 19th century. Um, because it doesn't, doesn't really exist. And so you can see that the occurrence of new things, uh, in, uh, in our environment, in terms of the usage of those words. So it's really interesting. Now, one of the one of the important thing is, of course, that because we know that word frequency plays an important role in all studies, using words and investigating words have used these counts. And most, most often they have used is Coursera and Frances count of 1 million words to to to kind of get an idea of how frequent the word is. Well, this corpus was developed in the 1960s. It was actually based on even older material in the 1930s and 40s. So it is relatively old. What then, Margaret Barton Burroughs new did in 2009. They looked at this corpus of 1 million words by Caceres and Francis, and they compared it to something more, more recent. And they actually looked at a corpus based on subtitles from television series and movies. And what they found is and they developed this suplex US database, which is basically word frequencies based on, uh, film and television series. What they found is that the estimates from the Shearer and Francis were actually very poor compared to this corpus based on subtitles. And they looked at, for example, how well it could predict response times in, for example, lexical decision task and naming task. And what they found is that the Christina and Frances are pretty poor. Um, actually, these subtitle databases are much, much better in terms of estimating the frequencies of words and then also predict how fast and response time would be in an experiment. Now, this is quite crucial because certainly, uh, ten, ten, 20 years ago, these Coursera and Frances norms were used in almost all studies out there. And so this is really not good. These are not very good frequency estimates of of when we do experiments with people. And nowadays we really should not be using that. Unfortunately we still see in the literature researchers using these old Coursera and Frances norms, which is not really good. So, um, basically this study showed that we should really be looking at these more modern corpora to get good estimates. And of course, this was based on American English. We have now many, many of these, uh, corpora based on subtitles. in many, many different languages, which allows us to get a very good estimate of how frequent a particular word is in the language. Okay, the other question then, of course, is we can look, for example, if we think about books and the Google book corpus, you would think, now this Google book corpus would be a very good estimate. The problem, of course, with the Google Book corpus, it's all the books published from early 19th century up till now. Now, of course, no one has read all those books, and so it's not a good kind of estimate of our actual exposure. So the question is how? How do we, for example, estimate how much, how much language we are exposed to, which is an important question. How many languages do we, for example, read and how much language do we hear and do we speak? So this question was addressed them by my response. He really tried to estimate how much language actually people are exposed to, because that gives us some kind of idea of of of how much language there is and how potentially what the effects are of this exposure. Now, how did they estimate that? How did he estimate that? Now he made use of a of a paper published in 2007 that asked him, very interesting question. Are women more talkative than men? Now, if you ask the random person in the street, what would they say? They would say likely you know the common answer. Maybe women talk more than men. Okay. But can we actually test that? So this is what male and colleagues were doing. So they actually, um, uh, investigated this. So they did a very large study with almost 400 participants. And this study, how how did they do that? So they basically gave people, uh, in And device that they were wearing all the time. And every, uh, 2.5 minutes 30s of, uh, the environment was recorded, the audio was recorded. So it captured basically whether there was anything that the person was saying or anything that they were listening to. And, and this was then captured over a long amount of time, about 2 to 10 days. And, and they used these samples and to describe it and to get an idea. And what they found from this study is that there was absolutely no difference between men and women. So they speak both about 16,000 word tokens each day. Now, because if you speak 16,000 tokens a day, that means it's likely that you will hear those things. Or if someone talks back to you and so you will hear. So it's actually in terms of speaking and listening. It's double. It's about 32,000. So that allows us to kind of get an idea of the exposure. Now the question is then also this is the participants in this study where students. So you might think students maybe talk more. Um, maybe this is not kind of common for other people. So another study a couple of years later looked at language exposure in and in the elderly. So they had people aged 64 to 91. And what they found was a very similar number. They use a similar technique. They gave them a device that recorded, uh, the, their environment. Uh, they sampled that. And what they found is, again, that the exposure is about, in this case, about 32,000 words. Now, of course, the exposure was slightly different rather than actually speaking, um, they were often more passively listening, for example, to radio or television. So although the manner was kind of different, they were still exposed to language. They were still hearing language. Now if you take those numbers and then start estimating it, you can just do a simple calculation. We have 32,000. We have this number of days in a year. That means we are exposed to 11.69 million words each year. So a 20 year old is exposed to 234 million, and a 60 year old person is has been exposed to 1.6 billion words. So that is massive, our exposure to language. Now of course, that means that this exposure is likely to shape our language system, is likely to influence how we process and use language. And so this is likely what is driving for example, this word frequency effect. As a word frequency. Certain words are common, some other words are more rare. And this is driven, of course, by the exposure to language that we have. And likely it's affecting lots of other aspects of our language this exposure. So word frequency is considered as a very important factor. However, it has been questioned whether actually it is really what frequency is that driving the differences between words that are common and rare. So adamant, in 2006, looked at the fact of what he called contextual diversity. And what is contextual diversity is basically in how many different types of documents, for example, does a word occur? And different documents can be seen at different contexts. And so a word might be presented in a book, uh, which is a fact book about nature. Or it could be in a storybook. So it could be in different types of documents in that sense. So what he did is he looked at whether this this new measure, what he called contextual diversity, if it occurs in many different context documents or only a few, whether that actually can explain more of variance in terms of response times in lexical decision task and lexical decision to repeat. It's just making a decision where a letter string is a word or not a very simple task. You just measure the response time, or you ask people to just name the word and you measure the response time. So what? What measure can best explain the variance? So they did an analysis of different corpora a different word frequency corpora uh, in terms of um lexical decision and naming. So they had also large set of reaction times from participants to words. And what they found in regression analysis is that the amount of variation explained if you put in factors like word length, but we know that influences also the response time, something that James Cattell also found in the 19th century that the length is really important. But if you then add frequency, you can explain more variance. And if you add contextual diversity can also explain more variance. However, the contextual diversity was more predictive than the word frequency in reaction times. So that potentially poses an issue of whether it's really word frequency or is it contextual diversity. Now this has led to many, many other researchers investigating that. It's very difficult to disentangle both of these factors. So it is still debated. And often these factors are also highly correlated contextual diversity and word frequency. But then also somebody, Uh investigated and suggested was Jones into 2012 is basically that maybe it's not really contextual diversity, but it's more relevant in terms of semantic diversity. And what are semantic diversity. It basically relates to the fact, for example, you have the word bank that can occur in similar documents, for example, about mortgages or to very different documents about mortgages and rivers and bank of a river. So although the word can have a massive document count because there are lots of documents about mortgages, um, it might just be the same kind of semantic environment in which this word is presented a rather than different environments. And so this document might not be the best measure. So he started looking into this semantic diversity. So again, I looked at reaction times of naming latencies obtained with a large project, which is called the English Lexicon Project. This is a project where they asked, uh, many, many participants to, uh, to do a lexical decision task. And so they obtained lexical lexical decision times and also naming times for, uh, ten thousands of words of English words. And then they looked at document count and, uh, semantic, uh, diversity. And what, what what did they find is they found this is interesting effect of both document count but also semantic diversity. So if you have a particularly in high document count, what you see is that actually semantic diversity is playing in a really important factor. And so semantic diversity seems to be a really, uh, crucial factor as well. So what kind of context does the word occur? And when they then looked at specifically at the lexical decision times and naming times here, they found that the as they said count is the semantic diversity count and how many different kind of semantically diverse contexts the word occurs versus document count, which is basically the contextual diversity measure. From Edelman, they found that the amount of extra variance explained is much higher for this semantic diversity count. So suggesting that maybe this is really important, the semantic diversity in terms of also determining and helping to explain variance in reaction times. And so this this isn't, uh, on top of word frequency, an additional factor that is really important. And so clearly our language system is also shaped by these things. Whether something is occurring in different semantic contexts or it's always occurring in the same context. Okay. So next thing I'm going to talk a little bit more about print exposure because this is also quite interesting. So how much influence does it have that for example you do a lot of reading. You have a lot of exposure to print. And this is something that Chatto and Jared, uh, Deborah Jarrett investigated. They looked at the impact of print exposure on word recognition. So if you have read more books, you have more exposed to the written form. Does that then influence your word recognition system? So they used 64 participants, quite a large group, and they looked at different tasks and lexical decision. They also looked at the form priming mask priming which I talked about yesterday, and also word naming. And they then measured print exposure using alter recognition task. So they basically gave people a list of authors of of books and ask them, do you recognise some of these authors? And of course, there are also some falls in there. So they were just able they were just asked to detect, you know, what are the real authors of books, uh, in this list? And that gives an idea of an estimate or potentially the print exposure. It's, it's an it's it's not, of course, directly measuring. It's very difficult to measure exactly how much exposure you have had to books. Um, so it's a kind of a useful quick estimate of, of this. So what they found in, in a lexical decision task, for example, that people who have exposed to more books have a written of, uh, read more books than people have not read very many books, that there is a huge difference in terms of response times. And so people have read a lot of books are much faster in a lexical decision task. Um, and you can see that they, they, they the frequency effect is smaller compared to a low exposure. And what they found is that particularly the difference is between for the low frequency words, which makes sense, frequency words that don't really occur very often. Of course, if you read more, you are likely to be more exposed to those low frequency words. They are still low frequency because there's still a significant difference between high and low frequency words. For people who have read a lot of books versus people have read only a few books. So this shows that there is an impact of print exposure. Now here's some other data in terms of naming and priming, in terms of the impact of print exposure. This is a word naming the difference between one syllable and two syllable words. Again, we see that people who have exposed to lots of books are much faster than people who have not so much exposure to books. And then they found a really interesting thing in the word in the form priming. So here, think back at my lecture from yesterday where I talked about, um, form priming. So they used items like and here you can see an example. The target word is Coutts. And then the prime is touch or show. Now in one case it's completely unrelated. In the other case it's it's related in terms of the orthography touch and counts. But critically the pronunciation of o, you see, h in both words is different. We have touch and counts. This is called the the. This has a particular name is called the body of the word. There's a different pronunciation, but in terms of orthography, they are very similar. Only one letter difference. Now what they did this mask priming experiment then with the people who have high exposure and people have a low exposure. And interestingly, for the people with a low exposure, they found no significant effects at all. They were not sensitive, although numerically it seems that they are a little bit sensitive. But this was not significant, so there was no effects. However, for people who have a high print exposure, they have read a lot of books. They found a really interesting pattern. They found that people were indeed faster for touch counts versus cell counts. If the prime was presented for just 30 milliseconds when it was presented slightly longer for 60 milliseconds, this difference was not significant. Now, remember back at my lecture yesterday that there were studies looking at orthographic priming effects, and they found that orthographic priming effects with short prime durations you get an orthographic facilitation effect. So this is basically replicating that. But what it shows is that this can only be replicated in people who have a high print exposure. It doesn't seem to be significant in low print exposure. And then furthermore, the interesting fact was that if you look at low frequency words, you don't see an orthographic priming effect at 30 milliseconds. What you actually see is an inhibitory effect that becomes significant at 60 milliseconds. Now why does it become inhibitory? Now think back at the lecture yesterday, I talked about orthographic priming and phonological priming effects. There is a delay or the graphic priming effects occur early. And if you present the prime is slightly longer, then you see phonological effects. So that seems to be exactly occurring here. And why is it a negative. Well think about the example tuts and Coutts. They are pronounced differently although they have the same orthographic letters here. Or you see it's pronounced differently in each of those two words. So that's why there is an inhibitory effect. It gets actually slower if you present the low frequency word for a bit longer. And the example for the low frequency word is clown and blown for the related one, or sware and clown. And so again the letters are the same oh one pronunciation is different of that letters that a combination. And that is driving this inhibitory effect. But the interesting thing is this, this, these type effects, which are very consistent with what I talked about yesterday, only occur for people who have a high print exposure. Now, print exposure is a really other interesting aspect. If you look at children print exposure in children. Now we know there's a lot of research about that. I cannot really talk about all the research here, but it is very known that print exposure not only affects, for example, vocabulary, but also general knowledge in children. And we know now also that it enhances word recognition processes as we seen in the adults. And in fact, there is a large, uh, meta analysis of studies investigating the impact of print exposure on from infancy to early adults and that, uh, that study concluded that print exposure and readers from age 3 to 5 was associated. Associated with all language skills. And that means that, for example, reading is really, really important. Reading routines for children in school provide really substantial advantages for oral language growth. And it's not about reading as a as an only one thing. Actually, the print exposure this reading reading for pleasure is also improving academic success. For example, if you look at college and university students. So print exposure has a massive influence not only on language also, but generally academic performance based on this large analysis of a number of studies. Okay, so so far I've talked about frequency, word frequency. We've talked about contextual diversity, semantic diversity and print exposure. All really interesting. And this was only mostly from the perspective of single words. Now a really interesting question is do these frequency effects do they extend beyond single words? Are the frequency effect of longer units. So this was investigated in 2010 by Arnold and Schneider. And they were looking for frequency effects of multi-word phrases. It's basically a succession and phrase of multiple words in a particularly we're looking at for word sequences. So they called for grams and they obtained the frequency of those, uh, phrases from a large telephone conversation corpus. And they used count basically, and convert it into frequency per million. And these are the kind of examples of the material they use. They used a group of relatively high frequency, uh, words, uh, Phrases that um, and they were compared to a control condition. So we have for example, uh, don't have to worry versus don't have to wait. So don't have to wait is less frequent then don't have to worry or I don't know why versus I don't know one is more frequent than the other. And something like I don't have any money versus I don't have any place. So according to this corpus, there are differences in the frequency of those phrases. Now the question is, if you ask people then to um, you give them basically forward phrases, uh, and you ask them, you give them not only valid phrases, but also phrases that don't make any sense that are not correct English. So you basically what you ask participants to do is a phrase decision task. Is the is this a possible sequence in English. And so you have 26 students there. And what they found is that indeed, the frequency of this phrase has an impact on this phrase decision task. So high frequency phrases where recognising where responded to in 20 14 milliseconds they were faster than low frequency phrases, which was 1100 milliseconds. And they also found this in the low bin situation. So in both cases, um, this is the phrase frequency is is influencing this phrase decision task, which seems to suggest that, um, word frequency effects, uh, extend beyond, um, just a single word, which is a very important finding. Now, this was also, um, investigated further, uh, using different types of material. The CIA, Norwegian, Tura et al. Um, and she was a PhD student of mine, actually, and she looked at what we call binomial expressions. So these are three word phrases. And they basically two content words with an uh, of the same lexical class and a conjunction. So here are some examples. Bride and groom or King and queen. Critically, what they did is they not only looked at these, um, these kind of phrases, but also at the reversed forms, uh, not only in native speakers but also in non-native speakers. And they use very natural naturalistic material, basically sentences that contained those binomials or the reversed form. So the reversed form is basically rather than bride and groom, it's groom and bride or king and queen and queen and king. Now the important thing is that the meaning is identical. It's exactly the same. King and queen or queen and king, it's exactly the same thing. However, if you ask people, would you say king and queen? Or would you say queen and king? Likely people would say it's king and queen or its bride and groom. And so this particular order seems to be more common. It's the it's the kind of common way in which this phrase is used. And so these that means that the reversed form is the perfect control condition, because you could make a different phrase like they used in the previous study. But then there might be other aspects that are different. And the frequency of the individual words right and and groom is exactly the same in, in one order and in the other order reversed order. But it's the frequency of the unit as a whole that is different. So that kind of material is actually the most the best controlled material to use. The other thing they did is they use an eye tracker, um, that is a device to measure where people are looking. Now, a little bit more about eye tracking will be explained by Ruth Phillips in her lectures next week, because she will talk a lot about eye tracking data. Um, but all to know for now is basically that it's the device that enables us to see where people are looking when they are reading a sentence. And for example, we can then look where they are looking in the world and how long they are looking in the world. So where they're looking is the fixation, how long is the fixation duration? And we can if it's if it's a for example, in sequence like bride and groom, they are likely looking at maybe twice or three times on those words. So we can look at the total time. They look at that phrase bride and groom. And we can look at, for example, the first time they look at it. And when they continue with the sentence, they might go back to the same sequence and look at it again. And you can look at the total time and you can look at the first time they look at it and how many times they look at it. So that's all to know for now. And so it's a very naturalistic way to look at a thing that all of us do a lot reading and a very sensitive measure. So what are the results of this study? So they looked, um, not only at native speakers but also non-native speakers. And what they found is that there is clearly an effect between the binomials versus the the first form. We see here a significant difference of three, 13 and 14 milliseconds in terms of two measures of eye movements. The first pass reading time. That's the first time you look at bride and groom. And so how how much how much time you spent there looking at it. And the total reading time is not only the first time, but maybe also when you look back at it. Because when you are reading, you tend to read from left to right, of course. But sometimes you go back to the sentence and you read through some of the words again. In both of these measures, there was a significant difference. It was small, but the response times are also the time. Looking is also very small, but it was significant and the same thing was found in native speakers. So both native speakers and even non-native speakers are sensitive to these combination of words. One is more frequent than the other. And so they are. Readers are sensitive to the frequency of multiple words frequencies, which is a really important finding. Okay, so how can we explain a word frequency? How can we account for it in models? Now I will mention it a little bit already in my lecture yesterday that you can account for that, for example, by assuming that in an in the interactive activation model, each word in at the word level has a baseline activity and resting level activity. So more frequent words are closer to zero then less frequent words. And this allows us to the model to make a distinction between high frequency words and low frequency words. And this has been used not only in the interactive activation model but other models as well. So all models use this mechanism to account for word frequency effects. Okay, so that's about word frequency. Now I'm going to talk a few slides before the break about lexical similarity. And this is also something I talked about yesterday a little bit already. So we know that some words are very similar to each other because there's only one letter difference for example, between them. And so these words that when you change one letter, it comes in different words. And so these kind of similar words are called orthographic neighbours. So for example if we have the word mind there are orthographic neighbours like pine line mines, mint etc. actually there are about 29 of them. So quite a lot. And so we are looking here only in terms of the spelling we are not interested into to the phonology, the sound of those things, because some of them sound quite different than mine. Only thing we are focusing here in terms of orthographic neighbours is in terms of the spelling. Now, there have been lots of studies looking at whether a word has only a few of those orthographic neighbours, or a lot of them, whether that has an impact in terms of responding to words in, for example, lexical decision task. So one of the early studies was a study by Sally Andrews, and she looked at lexical decision task, and she manipulated a number of similar words, or the graphic neighbours in words that are low frequency or high frequency. And what he found was the significant interaction between frequency and the number of orthographic neighbours, which is also referred to as neighbourhood density, so it either is a low density or in high density. And the critical finding was that for low frequency word, he found that having more words um, and so a larger number of words speed it up the response compared to a small number of graphic neighbours. And so there was a 43 millisecond facilitation effect of the large number of neighbours neighbourhood density effect. Well, this was not found for the low frequency. Actually for the low frequency there was a tendency to be actually slower. So for low frequency faster, but for high frequency was in tendency to get slower. But it was not significant. Now, this then was investigated by many, many studies, and there was quite a bit of discussion, basically about the impact of neighbourhood density. Some studies have found facilitation, like Andrews, particularly for low frequency words. Other studies have found no effects at all, or some police have found inhibition effects. And also it seems that the language might play important. In some languages you find more inhibition effects than facilitation effects. So this is kind of still kind of debated. The other thing is that some people looked at the frequency of those orthographic neighbours because they could be high frequency or low frequency. They could be more frequent than the target word or lower frequency. And this seemed to be also very important. A number of studies have found that actually when you look at the frequency, you always see any inhibitory effects if you have more orthographic neighbours. Also, the task seems to be important, for example in word naming. It seems that you see more facilitation effects of orthographic neighbours. Now how can we account for orthographic neighbour effects in the interactive activation model? So go back to the integrated interactive activation model. The crucial thing is to look at this word level. And what do we know from the word level in this model is that there is lateral inhibition. That means that all words inhibit all other words. So there is always competition between words. So that means that when there are lots of other words, it's likely to take more time for the model to recognise it. So it takes more time for the to reach and activation thresholds for words with a large number of orthographic neighbours relative to words that have a small number of active or graphic neighbours. And this lateral inhibition is something that is also present in many other models. So they all predict basically that things would get slower. Now we see that from the data actually, that sometimes people actually get faster. So the interactive activation model predicts inhibitory effects. So how can we reconcile this that sometimes people find facilitation, sometimes inhibition. How can we reconcile that within an interactive activation model. Now this was investigated by Granger and Jacobs. And they basically proposed a slight variation of the interactive activation model that has a more sophisticated decision component. Basically, the moment that the model decides this is a word or it's not a word. So basically, the base model makes use of three criteria. And the main thing that I'm going to talk about is this single word threshold, the m threshold and the sigma threshold, which is referring to the overall activity at the word level. And what this, uh, this study showed by Granger and Jacobs that if you take into account for your decision, not only when a single word with which the threshold, but also assume that if there is a lot of activity at the word level, it is likely that the that the latter strain you are seeing is a word. So if you make use of that, you could actually make a decision based on that information only, which means that you can make a decision very quickly. So this is the summed activity of all active words. And what is thus basically in the model is when there is a lot of activity, it lowers the activation threshold. And so it means that the word is recognised quicker. Now how would that then work in the model. So if we we see some simulations with just the interactive activation model and we see an example here of jazz, the word jazz, the word type and the word name, Now we can see that there are differences in terms of the activation curve, uh, because of orthographic neighbours. Here we see some orthographic neighbours. We see a lot of Arctic graphic neighbours. So the interactive activation model, the standard one would predict that if the frequency is equal between those words, this one would be faster than this one, and this one won't be the slowest. However, data has shown that sometimes you see that this is actually fast. Now how could that then work? Well, if you take into account the total activity, here is much more activity than there. So if there is a lot of activity, the model could lower this red line downwards. And in that way the response would be much faster for this word than for that word. So that is the sigma criteria. Some activity if you take that into account, you can actually explain that people might be faster for words like a name than a word like guess just because you can take into account that this generates a lot of activity. So likely the system could say, well, this is likely to be a word and press the yes button in a lexical decision task. Okay. Then finally, two things about further orthographic similarity, because so far I've talked about orthographic similarity only in terms of one letter being different. Now, of course, you can look at similarity also in different ways. And that is what Giacconi did. So the standard definition is that similarity is calculated just by looking at one letter difference. What Giacconi did is they looked at what they call the Levenstein distance between letter strings. And this is a well known, uh, thing in, in in computer science, the lever down distance. Basically it's the number of edit operations to go from one string to another string. So if we have the the letter string, chance and strength, there are five operations in terms of editing, deleting, adding, substitution. So here is you replace basically in C by T and eight by and r and c by D. And you insert the letter S and you delete the letter e and then you go from chance to strength. So it takes five of those operations. So what they were interested in this is a more sophisticated measure of similarity in terms of orthography, in terms of the written form. And what they did is they looked at the average number of operations needed between one word and the 20th 20, its closest words in terms of these operations. And they calculated that measures and they then analysed what the impact is of this orthographic similarity in terms of what they call the old 20. So if you have a lot of words that are only one letter difference, it means that the average. Then if all the words, for example, are Arctic neighbours, they only require one operation, which is a substitution. So all the 20 words have value one. If you average that, you get one. So an old value of one is basically all the 20 words are orthographic neighbours. If you have words that are similar but they require two operations, the average would be larger. So in small old values that's basically they are closely similar. Well in larger all 20 values as they are more distantly similar. Now the crucial finding is that when they analyse the reaction times of a large study of behavioural data of lexical decision times and naming times, they found that the effect of all 20 was facility. So that brings another dimension that about the effects of orthographic neighbours, that may be some of it is driven by this more sophisticated measure of orthographic similarity that looks beyond just one letter difference. Could also be 2 or 3 letter difference. Okay, so that's, uh, that's about orthographic similarity. Word frequency, print exposure. For this part, we have a short break now till let's start again at 10:00. And you can register your attendance using this QR code. So we start at 10:00 for the second part in which we'll talk about words meaning. Okay. Let's continue. So in this second part of my lecture I'm going to talk about word meaning. So so far what I talked about was focusing mainly on orthography and spelling. The phonology I mentioned also a little bit of the word uh semantics. But this part is really about semantics. So I will talk about representations and processing of semantics. And then I will talk about some models of how could meaning be represented of words. Um, and I will talk about feature network models and then also about distributional uh semantic models. Um, okay. So word meaning. So important question is how is knowledge about word meaning stored in our minds? How is that really stored there? Um, and how not only how it's stored, but how do we actually access that information? And how do we take that information when we, for example, start, uh, when we're reading, when we're listening, when we're speaking, of course, that information needs to be accessed. The first question is really about the representation. How is it represented in our brains? And the second really is about processing. So when I say represented in our brains, I don't mean represented at a neural level, but more at a cognitive psychological level as a more abstract level. That's what I'm talking about, not in actual implementation in terms of the neurones or so we are talking more at an abstract level, um, in terms of the representations Patience that we might have in our brain in terms of processing and storing meaning. Okay, so the general idea is that we have an internal storage basically of of words, of knowledge of words. And that means we have something called a mental lexicon. Now in this mental lexicon, the ideas you have, for example, stored the information about the words, how it's written, and also, for example, how it is pronounced. How do you pronounce the word? There might be other information as well. For example, syntactic information. Is it a noun or a verb is also very important information. When you start producing a sentence, for example, or when you parse and you understand that sentence, you need to know the syntactic category. And so that information is stored in our mental lexicon. Then the idea is you also have some kind of semantic memory and word meaning, some kind of store of those meanings. Now, what does a semantic theory and theory of how semantics are represented? What should it try to explain, for example, how words relate to things in the world? It should also explain the lexical ambiguity. Words can have multiple meanings. I will give you an example of bank as multiple meanings. Also, that relationship between meanings and the meaning of dog and animals. Dog is an animal. Uh, and long is the opposite of short. And that you can do kind of reasoning. You can say, um, uh, dog Fido is a dog, and that means Fido is an animal. And so that relationship between animal and dog mammal, things like that should be somehow represented these relationships. And when you say implying is long, it's not short. So that all that kind of information should be, um, should be explained. So in theory of meaning should be compatible also not only with these facts of that I explained here, but also of course with psychological data, for example, response time data in experiments that you do with participants. And it's not only about accuracy, it's about the time it takes. That's the most important thing, actually, because that tells us a little bit about the processes underlying it. So what do we know from semantic processing if we look at the literature. So I talked a little bit about already about some mask priming where where the prime was semantically related to the targets. And so we see then that that speeds up the processing. And that is related to the general empirical finding. The processing of a word involves activation of related words or words that have a similar meaning. So this is a well-established, uh, kind of finding in the literature. So let's have a look at some data that that is consistent with this idea. So Balata and Lourdes, for example, they looked at prime target pairs, um, using a kind of priming technique which I mentioned many, many times. And so you have, for example, things like lion, tiger. They are kind of what they call directly related or tiger and stripes. And so tiger has stripes. Therefore there are kind of semantically there's some kind of relationship between them in terms of semantics. What they were interested in is not only these two, which are called direct semantic relations, but they are also interested in something like lion stripes. Now, lion and stripes, you would think, well, do they have any relationship? Well, they do, because lion is related to tiger, and tiger is related to stripes. So if for example, this semantic system would involve kind of um And activation of something like lion activates tiger and tiger and activates stripes. Then you would expect that there would be a response would be faster for stripes. So the the response is always on the target and the prime is presented for a brief amount of time. So what they found in this study was that they not only found facilitation effects relative to an unrelated word for directly, but uh, and this was found in lexical decision task and in naming tasks. But they found in naming that there was also effects of these mediated primes. So they found an effect of line and stripes. So that is quite, quite interesting and quite important finding as well. There was a kind of a surprising that in lexical decision they did not find it. And actually later studies, uh, argued that maybe there were some issues with the material that they used, and they then used more control material where they showed effects of mediated primes, also in lexical decision. And this is really important if we look later at some models to see if those models are consistent with this kind of data. And so that's really important to think about. But this is some of the empirical findings. Now this is involving priming. So you might argue well maybe this this priming. This might be the reason why you get the effects of semantics. So really interesting question is do you see an effect of semantics in terms of activation of other semantically related words in other tasks when you're not using a priming task? So this was a investigated looking at semantic activation by Jennifer Roth. And she used an interesting task and really interesting manipulation. And it involves basically words like leotard, seller and a word like hamster. And the task was very simple. Task is the word an animal. Now in case of hamster yes, it's an it's an yes response. So it's an yes respond. Leotard is not an animal. It's a piece of, uh, clothing. So it's it's basically a no response. Also, the control seller is not it's not an animal, so it requires a no response. Now, the critical difference between the control here and the experiment two word is that there is an orthographic neighbour that is either semantically a relevant for the task because it's an animal. So leotard has an autocratic neighbour, leopard, which is of course an animal. So it's an animal neighbour, semantic animal neighbour and a control word seller has also an orthographic name, a colour. But that's not an animal. Now, if the semantics are activated of these orthographic neighbours, leopard and colour, then maybe there is a difference between the experimental item leotard and seller. And that would give us evidence that when you see a word, actually orthographic names are activated. But not only the Arctic graphic neighbours, also the meaning of those orthographic neighbours are activated. And this is not in a priming task, it's just single words are presented here. So they used in experiment one a animal task. And in the second experiment they use a task called is it a plant or not? Now, the findings were very clear. They found a massive effect of these experimental items. Leotards. Responses to those words were much, much slower than the responses to words like cela. And so the the the black line is the response to leotard. And the white line is a response to a control word like seller. So these are all no responses. Of course it's not an animal and it's also not a plant. But the critical thing, when the task is about animals, they see a huge difference and significant difference, but not when it's about plants. So that's a really, really important finding, showing that if, if and particularly when, when of course, the meaning is relevant for the task. And this is really relevant for the task here, that then the meanings of those Arctic neighbours are really activated and they influence then our process of saying no. So any theory of word meanings should be able to account for these kind of findings. So let's go back then to representation. How can we represent meaning. Now there are different approaches. There are many approaches. And I can only go into a little bit explain some of these approaches. Now the more classical approach is a kind of feature based approach. So and I will talk about more next in next couple of slides what that means. So it's a feature based models. The next option is kind of a network of of of how words are related to each other. And then another approach is what they call distributional distributional semantic models. And these are much more recent models. Some of them are at the foundation of the latest AI models for example. And so we have two types count and predict models, which I will a little bit briefly discuss as well. And these are very sophisticated, uh, uh, representations of meanings of words. So let's start with a more classical approach to this. And this was introduced in the 1960s and 70s. And basically the idea is that word meaning can be described in terms of a set of features and features that are bivalent. They are either there or they are not there. And so we have, for example, the word father that has certain features. It's human, it's older, it's not a female. Then we have mother. That is of course, uh, human, older and female. Uh, then we have something like daughter. In that case it's not older and so on. So you can define these words in feature in terms of features, different features that they have. So this is called feature theories. And the key thing about feature theories is about what meaning can be decomposed into a finite set of primitive features. And the idea is also in the classical ideas that these should be kind of universal across language. It should not be language dependent. Now then, Schmitt's role. Shubin and Ripps argued that there are different types of features, and they made the distinction between defining and characteristic features. So defining features that are the features of a particular object, that it must have to be the same category. So a bird needs to have a beak and wings to be part of the category bird. Then you can have characteristic features. These are features possessed by most of the class of the objects. For example, most birds fly and am built a nest, but not all birds fly and not all birds build a nest. And so these are characteristic features of particular birds, but not for all birds. Now how can we we can define those features. Uh, for example, uh, we can set those features. Now the features can be also classified and later has been classified in terms of inter correlated features. Um, so for example beaks and fly and these inter correlated features tend to occur all together. Uh living things for example tend to be represented by many inter correlated features means many members of a natural kind of category will share those inter correlated features. For example, fly and beaks are features of most of the birds. They are highly inter correlated. And then they argued, there are also distinguishing or distinctive features uh for example from a leopard. Doesn't leopard has spots? Is a very distinctive. And these distinctive distinctive features allow us to distinguish among things. For example, in terms of if you think about predators, a leopard or a tiger, these have a particular features that allows it to distinguish between them. And these tend to be exclusive for single kind of items within a category. And then artefacts as human like objects. Human made objects are represented by generally many distinctive features. Now these studies look for example about, for example, the impact of if you look at these distinctive and inter correlated features. And they, for example, argued that in terms of, uh, dementia, early dementia, that, uh, that we see that living things are because they are strongly inter correlated. These, uh, dementia has little impact on the the meaning or the kind of knowledge of those words. Um, well, it is not the case for things that have many distinctive, uh, distinctive features and therefore, um, um, distinctive features holds a kind of privileged state in semantic memory. So these are the kind of feature theories. Now, early theories, for example, used a set of set of features, um, and determined a set of features used. Of course, it's quite difficult to determine, you know, what kind of features things have. Words have word meanings, have latest studies. For example, ask participants what are the kind of key features of, let's say, a table or a chair or a plane or lots of different objects? And they obtained that for many, many participants, all these features. Now another approach that has been in terms of word meaning is a word meaning represented in terms of a network. And the idea of this is that there is a network, and that concepts within the network are represented as nodes in a network, and then nodes are linked together representing particular relationships. For example, in set membership, Fido is a dog or a set inclusion, dogs are animals or part and whole seed is a part of a chair and property attributions. Canaries are yellow and so you have properties and links between concepts that can be different types. Now the meaning of the word is then basically determined by where it is in this network. And you have these set inclusion things called is a link, meaning that nets have a kind of hierarchical organisation and the storage of for particular properties can be put at particular levels to, uh, to enable something, what they call semantic economy, because you can store information properties at each of the instant levels, but also at higher levels. Now I will explain that in the next slide, basically with this example of a semantic network. And so here we see the hierarchical organisation of this network. And the key thing is that for example we see here a number of properties can move around has skin. This is stored at the level of animals. It's not stored at the instances of animals cannery ostrich shark or shaman salmon because it's it's it's it's in terms of economical uh, economically being semantically economically. You would you would store that at a higher level rather than its instance. And that means you you don't need as much storage for it. You can restore it higher up in the hierarchy. And because you have a link. Bird is an animal. Canary is a bird. You have this hierarchical structure. Now one thing you might notice here, if you look at the semantic networks and you look at these properties, that looks very much like features isn't it? Uh, a bird has certain features. So people have argued that these semantic networks are actually really like features, feature networks. And in fact, a study by Holland actually showed that simple semantic networks based on that kind of idea of its links and properties, actually are formally equivalent to features, feature models. So maybe there's no distinction in between the two. Now it's really important that it's not only about representation of meaning, it's also how we access that information and that is made. Point made is by Rip Rip Shadow in 1975, is that the formal equivalence of features and network model does not imply psychological equivalence, because there might be different processing aspect in terms of how we process features, where compared to how we process the information stored in this hierarchical network. And so one might be more relevant in terms of actually explaining performance in in tasks when we present particular words. And so it's really about can we explain, for example, how fast the response is based on the fact that it has a particular semantic, um, uh, meaning or it's related to other semantically related words? So that is really the key question. So let's have a look at some of the studies, the early studies that looked at uh, semantic representations. And one of the tasks that they were using was a simple sentence verification task. So subjects were asked to respond true or false to a sentence. For example, a robin is a bird or a whale is fruit. So they had to say, is it true or false? Basically, that's a very simple decision. And also they looked at set inclusion and also property attribution. Now the rationale basically what it is, is you present the words in the centre and show context. It's more like normal comprehension, it's more realistic situation than presenting single words. And you can use a sentence frame that is then similar for two conditions, which is useful. And of course the critical thing is we can get reaction times. We can measure how long it takes for humans to process to verify that the sentence is true or false. So let's have a look at some some studies then. So first thing that studies are looking at is the principle of cognitive economy. And so that is I where I explained you stored information about context at the highest appropriate level in the hierarchy. So has ferret is stored with Bert rather than at a different exemplars of the bird for example, and robin or canary. So early studies confirmed the prediction that sentence verification times would increase as the distance in the hierarchy increases. So canary is an animal, takes longer than a canary is a bird or canary eats. It takes longer than a canary is yellow. So that's basically illustrated here. If you say um, um, the example um is it's canary is a is an animal, Canary is a bird that can be uh, we can look at that. Um, Canary is a bird. How can we verify that? By looking at Canary. Um, how did we find out? It's a bird. We need to look here. Uh, a bird or a canary is an animal. It would take longer. It would take more time to traverse. Basically the network and also things like a canary, uh, breathes would take longer because it would mean that we cannot find that property here. We need to look at here is the property there. And then we need to go there to see if the property there. Yes. It breeds. So that's the idea you you traverse through that network. So the assumptions of this hierarchical model is basically it takes time to move to each step through the network. And one step is dependent on completion of another steps. And then the steps are additive. So it takes additional time but that it's equal. And the retrieval process goes from one node to all directions in at once in parallel. And of course, the average time for any step is independent of which particular level it's involved. So the time takes to look up the property. It takes just as much time at the lower level than in the higher levels. So here you can see some of the material they used uh, where you have different property levels and uh, different uh, inclusion levels. So it is either the same level one level difference or two latter difference. Now the results were very straightforward. Here we see the results of that study. We see it takes a bit more time to look at properties. But we see a nice linear pattern where the distance relationship determines the response time. So the longer the distance the longer the response it takes longer. And then it takes also longer to respond to property than to an isolation. But there is of course a bit of an odd one out here. Canary is a canary that is likely to do that. There is some repetition, a bit of an old sentence, of course, and therefore it's likely that that explains this extra fast time here. Now then, people have argued that this this set inclusion thing might actually be due to something else, and that's to do with the size of the categories. For example, the size of the category of words is much larger than the category of nouns and is much larger than the category of living things. That's much larger than the category of animals and then of dogs. So in both positive and negative instances, the recognition of the category membership for large categories took longer than for small categories. And this could be explained just in terms of the size of the, of the, of the of of this category. And there are many, many words that are much, many more than nouns. It's just that aspect explains this effect. So you don't need this hierarchical structure potentially to explain it. Other studies have pointed out that maybe the cognitive economy is is not really the the the can be explained in a different way because it could be explained in terms of association, strength, how strongly things are associated rather than the cognitive economy. Now, this is actually a powerful concept. The associations how strongly words are associated. And we will see that later on in some other models where the association is actually the crucial thing that is there in the network. And actually what the Conrad et al did, they actually constructed hierarchies where they made sure that everything was nicely, nicely matched. And they asked participants basically to give information about how strongly things are associated, etc. and they actually found that this association could explain it. Now, there are other problems for these kind of network models is that Robin, for example, an ostrich, are one level away from birth, but it's much easier to verify that robin is a bird than that on Robin. An ostrich is a bird, even though they're at just one level away from each other. And also, it is easier to make a decision that a dog is an animal rather than dog is a mammal. And so these are not compatible with the idea of this hierarchical structure. You could, for example, say that maybe the the hierarchy could be bypassed with additional links to solve some of these things. But the other idea is that maybe the semantic judgements I have to do with the semantic distance basically between, um, the concepts. Now, how can you measure semantic distance between words? It's basically you can ask participants how related are two words. Now that's what in the study they did, they asked people to rate the similarity and semantics of two words. And then when you do that for lots of word pairs and lots of combinations, you can then plot basically the similarity of words in a two dimensional space. And you get something like this here. And then you can see some interesting things. You can see some interesting clustering. Here we have certain farm animals clustered together. We can also see kind of typical effects that robin is closer to birds than chicken is to bird. And that could explain why it's easier to say that robin bird rather than a chicken is a bird. The other thing here is that you can see that these semantic distances could explain why it's easier to say that a dog is an animal then a dog is a mammal again, because the distance between the two is different. So that's interesting from the perspective of semantic distance, that maybe that is the driving driving factor here. Now the the, the network models. Um, ah. And one, one interesting, uh, kind of variation of a network word model is a model that was introduced in the 1970s, is called the spreading Activation Network. And this is an non-hierarchical model. So again it's similar to the idea of you have nodes that are interconnected but it's not hierarchical organised. It could be for example be organised in terms of semantic similarity. That's one of the ideas. Um, and it could be in terms of the links between the different concepts. But the critical idea of this spreading activation is is in a name spreading activation. The idea that if one node in this network is active, it sends activity to all the other nodes that is linked to. And those words then send activation to other nodes. So this is a quite of an interesting, uh, concept. And the idea also is that this spreading activation is automatic. It's not under control of you not consciously controlling controlling it. It's an automatic process. Now, this idea of course, is very compatible with some of the data that I talked about before. This idea that you could have an effect of stripes online spreading activation would explain that because of the spreading activity in the semantic network, and also it would explain, um, the other kind of semantic priming effect. So the classical actually is this this is a semantic priming that you get doctor nurse is doctor prime nurse uh, relative to table nurse. And so you see false responses. Uh, and so this can also be explained with this spreading activation. And also it can explain this between category typically effects because you move from the organisation of an hierarchy to more in terms of associative strength. And that can explain why doc is an animal is actually faster than a dog is a memo mammal. So that problem that before was there in an hierarchical model does not really exist. Now, how does that work a little bit in how does a planning intervention, what does it look like? It looks a baby, something like this. And the idea is that if you, for example, take the word fire engine, that this could potentially be stronger, be primed by a vehicle, for example, car, then something like house. and that the reason for that is that car is is all linked, closely associated with lots of other cars, lots of other vehicles that are again can be related to a fire engine. So that's why you get a lot of activity that can speed up the in the process of fire engine, which is not the case in terms of house. So that is an interesting, uh, interesting idea. So spanning activation models, um, it can explain for a number of, of findings. It can explain for these prototype effects within category topicality or prototype effects can be explained in terms of associative strength. Um so Austria and birth uh versus Roman and Bert. And so Austerlitz and Bert are not strongly associated, but Robin and Bert are. And this kind of network could explain it. It can also, So, uh, you see, another finding, for example, is variability of negative judgements. That can also be explained by assuming that there are additional is not a links. And they could also have different associative strengths. So that could explain that a canary is an ostrich versus a canary is a salmon. One is easier than the other if there are specific is not a lynx. Now of course you see then here you there's different assumption. Here there are different links is a link and is not a links. This is of course potentially a problem. You get a model that has lots of different links and and that is not a nice theory in the sense it gets more and more complicated. Um, another question is whether models like it, which seem to be the associative strengths, seem to be quite an important thing. Question is, is that really something in the representation itself? Or does it have more to do with how we actually access it? Information retrieval based processes. So this is kind of still a kind of ongoing discussion. But spanning activation as an idea is a very powerful idea. And you see that a lot. This idea, you see that in a lot of other models out there and in lots of other areas of cognitive psychology, this idea of spreading activation, um, and in a sense, it's also part of this interactive activation model where there's also kind of spreading activation from one layer to another layer. But that's just in terms of not in terms of semantics, but just because of a relationship between letters and words. Okay. So that was some classical models. Now let's talk a little bit about, um, more recent models and more recent assumptions of how semantics could be represented. Now, this is all to do with the idea that words with similar meanings are used in similar contexts, and this was already pointed out by Harrison in the 1950s. Now, how can we then measure that? Basically, if if words that have similar meanings are used in similar contexts, we could potentially look at how the words are used. And that is used in models which we call count models. And there are two approaches which I will talk a little bit more in detail in the next slide. But basically there is a latent semantic analysis or an hyperspace analogy to language. These are two models that try to count how often words co-occur together in text, in lots of text. And these do just counting. Another more recent at the idea is what we call predict models. And these models um use and prediction based approach to created semantic representation from large amounts of text. And again, I will explain it in the next couple of slides. But the interesting thing is that these predict models. One of them is called continuous bag of words. Again I will explain it next. But these were developed. This idea was developed by Microsoft and colleagues at Google, uh, about ten years ago. And these ideas actually have led to the, the, the incredible, um, development that we now see in AI models. And so that this work was done as Goog at Google is not surprising at Google. They were very interesting in how we do search when we type in search in Google, what Google try to do is come up with related searches or related words. So they try to come up with a way to to present related search. So that's where this research comes from. And this has led to some really interesting insights and some of it I will talk about right now. So let's talk a bit more about this distributional semantics. So how did they do that. So what they take is a large corpus of text. And what they do is they look at how often are the words co-occur. And what they create is basically in similarity in terms of a vector of numbers. Vector is basically an array of numbers. So each word has basically a large associated with it and vector which is basically in row of of numbers. Now let's look at the first model methods. It's a latent semantic analysis. What it basically does it counts in text passages that can be paragraphs. Uh, in this case how often two words co-occur together. So they look at one paragraph and they see, for example, the word table. And in the same paragraph the word chair is mentioned. And in that case those two words uh, get basically. You get a value for the word. So basically what it builds up is a matrix a matrix of words. Um, so you have the number of words and the number of paragraphs. So assume we have ten paragraphs. And we're looking at a word whether that occurs in those paragraphs. You basically count an indicate a one if it occurs in the paragraph. So if the word occurs in ten paragraphs it has just ten once. If another word occurs only in one paragraph, it would have all zeros except for one paragraph it would have in one. So the idea is then that if you take a word, it has basically an array of numbers associated with it. And another word has another array. Now if the words are very similar, it's likely that they occur in the same paragraph. So the numbers should be very, very similar. So that way you can get a similarity value. So that's the latent semantic analysis. The uh the other model Hall model is basically looking at the co-occurrence of word pairs within a small window across a large amount of text. So basically something like ten words. And you go through a large book and you basically look at whether the two words are are occurring together. So you get something like a word by word matrix. So all the words that you have in a book and all the words, uh, so you make basically a matrix. And if they co-occur together you put in one there and otherwise it's a zero. So that's a very simple approach. And it means that for each word you get a series of numbers. And then you can basically get the semantic similarity by comparing the vector of numbers with each other. And you can do that with a technique called cosine distance. It's basically the distance how closely they are related to each other. So that's a approach of counting, just simply counting how often words co-occur together or how often they occur in particular paragraphs across a text. So that's one approach. Then the more recent approach is what we call predict models. So these models are very different. They are not counting they are predicting, they're trying to predict. And basically they are learning. So they're using a neural network uh, to learn when particular words are co-occurring together. Now how does that work. It's using a technique called word to vec again developed by by Google. And how it works is a little bit like this. So here we see a neural network. So you don't need to know all the details here but just a general idea here. So the neural network is basically you have all the words in in a book for example. So that are on the left side, we have got a column of all the words that occur in a book on the right side. And that's called the input layer. On the right side we see the output layer, which is also all the words in a book. Now what you give the model to do is you activate two words. For example, in if we look at this sentence here black furry cat. The idea is you give us input the context, you give it black and you give it cat. What the model needs to learn that if you have black and cat, that furry should be activated in the output layer. So what it tries to learn in A in A is basically that when you have the word black and cat, that the word furry should be in between, that's all it tries to do. So it's trying to learn and trying to learn. That for not only black cat, but for all triplets in all the text. Now you can imagine that is a massive process. Um, again, you don't need to know the details, but what you need to know is here that it's a learning model. It learns to predict basically, that if you have black and cats, that should be furry. Now, the key thing is that after learning that and exposing it to, for example, all the books on the internet or the whole internet, um, it becomes very good at predicting and then you can take all the values here. So basically what it does, it creates an values here of these connections between the nodes from the input to the output. And then it takes all the values of these connections to the number of what we call hidden nodes. And that is generally 300 or 100 to 200. So basically you get a vector again of numbers. So each word has a vector. And then you can look again at similarity between the vectors. So that's the kind of key thing. So both methods try to create a vector of numbers. And then what we can do for each word look at the similarity of those those words in terms of how similar the numbers are. So that's a simple calculation. So what you can do with these vectors turns out to be very, very interesting. And that's what they discovered at Google. These research that discovered you can actually do all kinds of interesting things with those vectors. With those numbers for example, you can do something like this if you if you ask someone if you this question about analogy, a man is to king as a woman is to what would be the answer? Queen. A queen is the answer. Now what actually turns out is that if you take the vector, the number, the numbers for the word king, and you subtract the numbers for the word men, and then you add the numbers for the word woman. You get a final set of numbers, and that final set of numbers is very closely related to the word queen. So these vector representations can do something like analogy. And that is represented here in this three dimensional space. So the idea is you have different factors vector of king here and vector of men vector of woman. And then you can do a calculation which is this illustrated here of these different vectors. And then you get a vector like this which is closest to the word queen. Now this is a quite a remarkable discovery. Just by creating a series of numbers, you subtracting these numbers and you can actually create something that does analogy So extremely powerful. Um, and it turns out if you take those, um, factors and you look at actual data. So there was a study recently by Mandara et al that looked at, uh, semantic priming studies. So all the kind of studies that I've talked about, about words that are semantically related and what they did is. So lion, tiger versus item tiger. And what they did is they evaluated how well these count and predict models can account for the reaction times. And what they found is actually that these predict models are extremely good and much better than the current models in terms of predicting reaction times in semantic priming studies. So that is a very encouraging and very interesting that these predict models are so good. Now you can imagine that these predict models are. That's why the reason why we have now these AI models, basically, because these predictive models are the basic for the current highly performant AI models that can generate sentences, uh, can answer questions, etc. it goes all back to these vectors, basically that they discovered and, and, and basically the similar kind of thing is used in the uh, with some additional mechanisms in the current AI models. Of course, there are now issues, of course, um, uh, with these representations, for example, of vectors. Are they realistic? Um, because all what they do is predict, um, the network models is what they do is they have relationships between words. But how do we know it's it's still relationships, but how do we know the the meaning of each of these individual words? The same with these vectors. We have these vectors. That is, this relationship between where a word is is a is occurring in a text, or we can predict things, but we still don't have really a concept of a word. The other thing is, and particularly an issue for um, for uh, if we think about feature models, for example, and also network models is how do we deal with abstract words? That is quite tricky. Um, and how do abstract things, for example, relate to our knowledge, our experience, actions and perceptions? So some people have argued that maybe metaphors can be really useful to represent, uh, abstract words, um, because an abstract word like time can be represented by things like time flies or time is money. These metaphors are quite useful to explain what time is at. Time is a very difficult concept, a possible solution for some of the, um, uh, problems, particularly in terms of where the meanings actually coming from is, actually have them in terms of grounding them in terms of modality specific systems. So assume that the meaning is really to do with things in the environment, things that you see. So visual information, sound information, tactical information that that information is is really important to get a good representation of what word meaning is. Okay. So that was my lecture on word meaning any any questions. Or if you have any questions you can post it on the forum or attend my office hours. Um, and I hope you enjoyed these lectures and next week is going to continue. Thank you.

Use Quizgecko on...
Browser
Browser