Preparation Questions: Corpus Linguistics
Document Details
![IdyllicOctagon](https://quizgecko.com/images/avatars/avatar-2.webp)
Uploaded by IdyllicOctagon
Osnabrück University
Tags
Summary
This document presents theoretical questions and answers related to corpus linguistics. It covers definitions, methodology, and applications of studying language through corpora, providing a comprehensive overview of the subject and its relevance in language studies.
Full Transcript
QUESTION ANSWER Theore1cal ques1ons What is the defini,on large principled collec1on of naturally occurring language usually of “corpus”? stored electronically - A ‘corpus’ is a large collec,on or database of machine-readable...
QUESTION ANSWER Theore1cal ques1ons What is the defini,on large principled collec1on of naturally occurring language usually of “corpus”? stored electronically - A ‘corpus’ is a large collec,on or database of machine-readable texts involving natural discourse in diverse contexts (Bernardini 2000). Such discourses can be spoken, wriJen, computer- mediated, spontaneous, or scripted and may represent a variety of genres (for example everyday conversa,ons, lectures, seminars, mee,ngs, radio and television programmes, and essays). - There’s one final criterion for something to be a corpus. The texts that make up the corpus must have been produced in a natural communica,ve seMng. That means that the texts were spoken or wriJen for some authen,c communica,ve purpose, but not for the purpose of puMng them into a corpus. For example, many corpora consist to a large degree of newspaper ar,cles. These are of course oOen included for convenience’s sake, but they also meet the criterion of having been produced in a natural seMng because journalists write the ar,cle to be published in newspapers and magazines and to communicate something to their readers, but not because they want to fill a linguist’s corpus. (On the other hand, journalese is oOen heavily edited.) Similarly, if I obtained permission to record all of a par,cular person’s onversa,ons in one week, then hopefully, while the person and his interlocutors usually are aware of their conversa,on being recorded, I will obtain authen,c conversa,ons rather than conversa,ons produced only for the sake of my corpus. What is the defini,on study of language in use through corpus analysis of “corpus - Corpus linguis,cs is a whole system of methods and principles of linguis,cs”? how to apply corpora in language studies and teaching/learning. - The study of language in use through corpora What is the defini,on very large corpora aiming to represent language as a whole of The broadest type of corpus is a generalized corpus. Generalized “general/generalized corpora are oOen very large, more than 10 million words, and contain a corpora”? variety of language so that findings from it may be somewhat generalized. Although no corpus will ever represent all possible of a language, generalized corpora seek to give users as much of a whole picture of a language as possible. Examples: - Bri,sh Na,onal Corpus (BNC) - American Na,onal Corpus (ANC) - COCA - WriJen texts (newspaper and magazine ar,cles), works of fic,on and nonfic,on, wri,ng from scholarly journals, spoken transcripts (informal conversa,ons, government proceedings, business mee,ngs) What is the defini,on a par1cular part of a language is represented (e.g. the language of a of “specialized par1cular subject field, a par1cular dialect, etc.) corpora”? A specialized corpus contains texts of a certain type and aims to be representa,ve of the language of this type. Specialized corpora can be large or small and are oOen created to answer very specific ques,ons. Examples: - Michigan Corpus of Academic Spoken English (MICASE) - CHILDES Corpus (only language used by children) - Michigan Corpus of Upperlevel Student Papers - Medical corpus containing language used by nurses and hospital staff What is the defini,on A kind of specialized corpus that contains wriJen texts and/or spoken of “learner corpora”? transcripts of language used by students who are currently acquiring the language. Cane be examined to see common errors students made What is the defini,on A pedagogic corpus is a corpus that contain language used in classroom of “pedagogic seMngs. Pedagogic corpora can include academic textbooks, transcripts corpora”? of classroom interac,ons, or any other wriJen text or spoken transcript that learners encounter in an educa,onal seMng. Pedagogic corpora can be used to ensure students are learning useful language, to examine teacher-student dynamics, or as self-reflec,ve tool for teacher development. What is data-driven Data-driven learning (DDL) is an approach to foreign language learning. learning (DDL)? Whereas most language learning is guided by teachers and textbooks, data-driven learning treats language as data and students as researchers undertaking guided discovery tasks. What does ⁃ a corpus follows different principles/guidelines, which can differ “principled” mean in depending on the corpus, the context of corpus ⁃ the texts that go into the corpus must be planned, that means the compila,on/corpus language of which the corpus is comprised of cannot be random linguis,cs? but must be chosen according to specific characteris,cs ⁃ texts must be chosen so that they are useful for your research ques,on/the aim of your corpus ⁃ depending on what you want to do with your corpus you have to choose your texts What are concrete - Context: Words need other words to convey meaning (isolated examples of basic words do not carry meaning) → without context words can be principles of corpus interpreted completely differently: „rose“ for example can refer linguis,cs? to the flower or to the past tense of „rise“ - Language paPerns: paJerned nature becomes apparent through corpus analysis → colloca,ons (words that usually co-occur together): without one part of the colloca,on the whole colloca,ons does not make any sense anymore: it is not „make homework“ but „do homework“, it also is not „do a mistake“ but „make a mistake“ + tense and aspect: some verbs only stand with a specific tense or aspect („love“ can hardly ever be found with the con,nuous form) -Language in use: corpus linguis,cs studies language in use and is not concerned with prescrip,ve "rules" some expressions/ uJerances may be strictly speaking incorrect such as sentences in which the subject is missing (Heading to the pharmacy; Looks good) but those examples must be considered as well in the corpus → language in use might not be the one we expect/what we learned about English What are corpus tools - Frequency counts (single words and mul,word units) used for (generally)? - Concordancing - PaJern iden,fica,on (e.g. colloca,on search) - Corpus comparison (comparison of different corpora or parts of different corpora/the same corpus) → compare language varie,es (regional varie,es, language register), diachronic language comparison, compare general and specialzed corpora to iden,fy corpus specific terminology, for transla,on (mul,-language corpora, e.g. linguee) What are research More theore,cal: fields and other fields - Phraseology (study of mul,-word units) of applica,on for - Lexicogrammar: looking at the connec,on between lexical and corpus linguis,cs? gramma,cal aspects - Register, language change More prac,cal - (Foreign) language teaching - Lexicography - Transla,on (studies) (not important for the exam) - Language for specific purposes - Wri,ng assistance/language reference (when you ask yourself if you can use this word in this context, corpora can help you find it out; or compile your texts into a corpus and see if you overuse some words) How would you - Every uJerance (be it spoken, wriJen or transcribed) that has explain “authen,c been produced for communica,on and not for the purpose of language” and being put into a corpus “naturally occurring - Language that has been produced in natural communica,ve language” in the seMngs. context of corpus - For us teaching English: authen,city also implies that we want linguis,cs? language material produced by na,ve speakers → but not all corpora have to be like that (some corpora look at the language of English learners) How can corpora be - Download corpus + download corpus soOware accessed? - Download corpus + access corpus soOware online - Access corpus + corpus soOware online How can corpora be - Collect texts in accordance with purpose of corpus/research compiled? interests. - Use texts from books, newspapers, journals; transcrip,ons; texts - wriJen by language learners; etc. - Use texts from the web. - Use online corpus tools like the Sketch Engine or corpus tools you can download to your computer like #LancsBox. Which corpora are - General corpora for working on lexical and grammar skills – par,cularly useful in authen,c language in context (e.g., BNC, enTenTen, English the context of Corpus for SkELL, Open American Na,onal Corpus) language teaching - Specialized corpora for working on current topics or for bilingual and learning and subject teaching (e.g., Brexit corpus, ScienceBlogs, Environment why? corpus, e-flux, EcoLexicon English Corpus) - Literary corpora for working on literary skills (e.g. Project Gutenberg English, English Drama Corpus, Shakespeare English Drama Corpus) What are main differences, advantages and disadvantages between Sketch Engine and #LancsBox? Explain the Sketch A collec1on of the occurrences of a word-form, each in its own textual Engine tool environment. In its simplest form it is an index. Each word-form is “Concordance”. indexed and reference is given to the place of occurrence in a text. This tool can find words, phrases, tags, documents, text types or corpus structures and displays the results in context in the form of concordance. The concordance can be sorted, filtered, counted and processed further to obtain the desired result. Despite being the most powerful tool, the concordance used with large corpora may find so many results that it can be tedious to analyse and interpret them. KWIC=Keywords in context -> concordance lines How are the query - Simple: If you type the lemma (base form), the simple search will types in the automa,cally search for all forms of the word (typing go will find “Concordance” tool goes, going, gone, went). If you type one of the word forms, e.g. different from each goes, it will only find that word form. other? - Lemma: The lemma op,on will find all word forms of the lemma (base form). - Phrase: The phrase op,on will find a phrase composed of several tokens (words) exactly as typed. The results will not include other word forms. - Word: The word op,on will find a word form exactly as typed. - Character: The character op,on will find tokens (words) which contain the character(s). E.g. it looks for the actual punctua,on (with other query types punctua,on characters like “?” or “.” have a different meaning) - CQL: The CQL op,on uses Corpus Query Language for complex criteria making use of part-of-speech tags and regular expressions (=a collec,on of special symbols that can be used to search for paJerns rather than specific characters, e.g. to find all words star,ng, containing or ending in a specific sequence of characters, for example.*,on will find all words ending in ,on and having an unlimited number of characters at the beginning How are the - Depending on what you are searching for and how much data addi,onal search you want the different query types give you an op,on that is func,ons in the best suitable for you, and if you want very specific data the use “Concordance” tool of CQL might be the best op,on useful in corpus - Subcorpus: The search can be limited to only certain parts into analysis? which the corpus is divided (e.g. for the enTenTen21 corpus only the genre legal/news/blog/fic,on or only the topic science/religion/ sports/poli,cs/tourism or only the domain.uk/.us/.au) - Macro: When you need to carry out more searches with the same criteria, you can save them as a macro which enables you to use them quickly without the need to set them again (e.g. for all the searches that I want to carry out, always give me a sample of 2000 with a frequency list of the first word to the right) - Filter context: Only keep lines fulfilling addi,onal condi,ons. - Text types: Text types help exclude or include specific documents or parts of corpus. - What are tags? (also called part-of-speech tag, POS tag or morphological tag) is a label assigned to each token in an annotated corpus to indicate the part of speech and oOen also gramma,cal categories and morphological informa,on Explain the Sketch The wordlist tool is used to generate frequency lists of all kinds: lists of Engine tool words, lemmas, nouns, verbs, tags, words containing or not containing “Wordlist”. certain characters etc. How can you compile - Log in and select a corpus from which the list should be word frequency generated counts in the Sketch - Leave the seMngs at the default values to generate the list of the Engine with the most frequent word forms. “Wordlist”? - Or set the criteria you need, e.g. nouns beginning with p - Click GO What kind of word - Nouns, verbs, adjec,ves and other parts of speech lists can you compile - Words beginning, ending, containing certain characters using the Sketch - Word forms, tags, lemmas and other aJributes Engine tool - Or a combina,on of the three op,on above “Wordlist”? Explain the Sketch - A keyword analysis shows which individual words (tokens) Engine tool appear more frequently in the focus corpus than they would in “Keywords”. general language. The general language is represented by the reference corpus - Any token can qualify for a keyword if it is used more frequently in the focus corpus than in the reference corpus -Sketch Engine combines sta,s,cs with linguis,c criteria to extract keywords and terms - (term=mul,-word units/phrases that appear in the focus more frequently than they would in the general language/reference corpus and have a structure allowed for terms in the language) - Log in, build a corpus or select a corpus built previously. Click KEYWORDS. The procedure will start automa,cally. How are keywords - Can be used to define or understand the main topic of the useful in language corpus learning and/or in - Extract words which are typical for the topic of the document or applied linguis,cs? corpus, i.e. they appear in the corpus more frequently than they would in general language - Easy to compare different corpora What are n-gram - Frequency lists of mul,word expression (MWEs) or lexical lists? bundles - N-gram = a sequence of items (bigram = 2 items, trigram = 3 items …n-gram = n items). An item can refer to anything (leJer, digit, syllable, token, word or others). In the context of corpora and corpus linguis,cs, n-grams typically refer to tokens (or words). In linguis,cs, n-grams are some,mes referred to as MWEs, i.e. mul,word expressions. - Genera,ng a list of the most frequent n-grams will help us see linguis,cs phenomena that might go unno,ced when using other tools - N-grams can iden,fy discourse markers or chunks of language which should be taught/learnt as fixed phrases in language teaching What is the difference Feature N-gram lists Keyword + MWE between n-gram lists Focus Pure frequency of Sta,s,cal significance and mul,-word word sequences compared to a expression lists reference corpus compiled with the Output Common fixed word Keywords and fixed keyword func,on? sequences (bigrams, expressions with trigrams, etc.) seman,c significance (-> Toxicity corpus: I (-> Toxicity corpus: don’t, a lot of, one of fake news, gary crum, the…) trump supporter…) Use Case Finding common Iden,fying domain- phrases and specific or unusual colloca,ons words/phrases How can n-gram lists Log in, choose a corpus, click N-GRAMS and select the number of items be compiled in the Sketch Engine? How can lexico- With the help of the Word Sketch tool. gramma,cal features be analysed in the Sketch Engine? Explain the Sketch The word sketch processes the words collocates and other words in its Engine tool “Word surroundings. It can be used as a one-page summary of the word’s Sketch”. gramma,cal and colloca,onal behavior. The results are organized into categories, called gramma,cal rela,ons, such as words that serve as an object of the verb, words that serve as a subject of the verb, words that modify the word etc. Explain the Sketch - Words used in a similar context -> over ,me it developed into Engine tool synonyms “Thesaurus”. - Sketch Engine: Synonyms and similar words based on their collec,ons - How does the Thesaurus work? Sketch Engine first iden,fies all colloca,ons of this word and then all the colloca,ons of all the other words from the same part of speech. All those other word sketches are then compared to the word sketch of the searched word and those words whose word sketches contain the highest propor,on of iden,cal colloca,ons are also most similar in meaning -> based on the theory of distribu,onal seman,cs (words occurring in similar contexts are also similar in meaning) - Beware: not all the results are actual synonyms, they appear because those words can appear in the same context - May help with the lexical variability of students Explain the Sketch The word sketch difference is used for making comparisons by Engine tool “Word contras,ng colloca,ons. Three op,ons are available: Sketch Difference”. lemma compares the use of two different lemmas via their collocates word forms compares the use of two different word forms of the same lemma via their collocates subcorpora compares the use of the same lemma in two different subcorpora of the same corpus via their collocates How is the Sketch Helps you understand the differences of (similar) words and shows in Engine tool “Word which colloca,ons they appear and which combina,ons they both can Sketch Difference” appear useful in language teaching and learning? What are the 6 relevance, authen,city, response, autonomy, availability, sustainability aspects of how corpus analysis benefits foreign language teaching. Explain the - Content has to be relevant for the students tool/method of - But more important: relevance in the sense that it appears “relevance” in frequently → items (words or mul,word expressions) that connec,on with the appear oOen in the language should be taught → if it appears ques,on of how more frequently it is more useful for the students → refers to corpus analysis lexical items but also to gramma,cal structures (e.g. if-clauses in benefits foreign real life are not paid a lot of aJen,on to, at least not as much as language teaching in the classroom) and learning. Explain the - Language produced by “na,ve speakers” tool/method of - Language that appears in natural context “authen,city” in - Authen,c language always appears in paJerns connec,on with the - Fluency is more important than accuracy → if you know a lot of ques,on of how authen,c paJerns used in that language it enhances your corpus analysis fluency because it saves you ,me applying those paJerns benefits foreign language teaching and learning. Explain the - Respond directly to your learner’s needs (E.g. the wri,ng tool/method of assistance) “response” in - Whenever you have the impression that the student’s need connec,on with the more exercises (but there are no more in the textbooks) → you ques,on of how can look up at more exercises with the help of corpora corpus analysis benefits foreign language teaching and learning. Explain the - Learners get aware of their own learning process and do not tool/method of have to depend on a teacher or an ins,tu,on “autonomy” in - You don’t have to know everything, but you have to know where connec,on with the to find it ques,on of how - The more autonomous the learners get the more aware of the corpus analysis language the learners get benefits foreign language teaching and learning. Explain the - Corpus= 24/7 available na,ve speaker tool/method of - Data is always available and also available for all people → not “availability” in only to linguists connec,on with the ques,on of how corpus analysis benefits foreign language teaching and learning Explain the - The more relevant the topic is for the learners the beJer they tool/method of will memorize it “sustainability” in - As soon as you discover things for yourself it s,cks longer in your connec,on with the head -> the autonomy helps with memorizing → If the teacher ques,on of how presents all the paJerns to you without having to work for it corpus analysis than you are not as likely to memorize it than discovering it for benefits foreign yourself language teaching and learning. How can corpus - Syllabus design linguis,cs be o Which language paJerns are par,cularly important indirectly applied in o What to teach and when to teach it language teaching? - Teaching materials design, development of reference works o Find authen,c language examples to use o High quality reference works and teaching material is always corpus-based -> we want to teach the language that is actually spoken and not some kind of idea of a language - Evalua,on of exis,ng pedagogical descrip,ons o How accurate and appropriate are the descrip,on that are used in textbooks o Mostly based on literature and not on everyday use - Selec,on of language phenomena - Progression in the course: when to address certain language phenoma -> it is needed for the students to do some of the tasks - Presenta,on of selected items and structures o How is the item used and with which other items/structures o Find out how items interact with each other How can the concept - Language learners as “linguis,c researchers” or “language of data-driven detec,ves” learning be combined o Students are not taught about the language but learn with the concept of about it themselves direct corpus - Learners interact with corpus data and tools applica,on in - Learners interact with corpus data pre-processed by the teacher language learning? o Teacher selects the concordance lines that they think shows the paJern/item the best and students work with that How can corpus - Lexical skills: Near-synonyms (e.g., “speak” and “talk”; “random” linguis,cs be directly and “arbitrary”; “see”, “look” and “watch”, “if” and “whether”, applied in language “big” and “large”); colloca,onal paJerns; meaning, use and teaching? Give register; etc. examples. - Interferences between learners’ L1 and L2 (e.g., “diet” → it is not “to make a diet” like in German) → iden,fy this with learner corpora (compiling a corpus from what learners actually produced) - Grammar structures in use (e.g., “some” and “any”, con,nuous aspect, future tense, linking words) - Literature lessons (e.g., inves,ga,on of central concepts and mo,ves, development of characters) - Responding to concrete learner needs (*go on a travel, overuse of “nice”, meaning of phrasal verbs) → students have specific ques,ons that can be inves,gated with the help of corpora What are reasons - Learning to learn (the idea of actually engaging with the for/advantages of language) data-driven learning? - Personalized learning (issues/problems that your par,cularly your learners have can be solved) - Exploring topics perceived to be of relevance - Ac,ve involvement in the learning process - Communica,on and teamwork - Naturally occurring language - Lexico-gramma,cal approach (lexical and gramma,cal aspects cannot be taught separately as they relate closely) è Learner autonomy è Enhanced no1cing skills è Extended cogni1ve abili1es è Increased language awareness What are reasons - Huge effort to introduce it as it needs a lot of prepara,on ,me against/disadvantages - It needs special resources → access to technical recourses not of data-driven always given BUT it is not always necessary (you could print the learning/direct concordance lines or use only one laptop for the whole class) corpus applica,on in - Costly licenses for the corpus programmes BUT there are language teaching? cheaper school licenses - The curricula are already very full which makes it difficult to integrate topics like corpus analysis What impact does - Prescrip,ve vs. descrip,ve approach to teaching grammar (focus corpus analysis have on what people “should” vs. actually “do” say) -> corpus analysis on lexical teaching is a descrip,ve approach material? - Corpus linguis,cs allows to describe actual language use and to verify/falsify assump,ons/intui,ons regarding what is “wrong” or “right” in language - Data-driven commercial teaching materials rela,vely rare because of the tradi,on of foreign language teaching → Debate about rule-governed vs. grammar-in-context approach → underlying problem: only focusing the teaching material on norms (norms are necessary to a certain extent, but should not govern all language learning, especially because language learners probably will only speak English with non-na,ve speakers - Despite the availability of corpus-driven reference grammars, teaching materials tend to feature findings from corpus analysis only to a very limited extent → addi,ons rather than revisions. - E.g., reported speech (repor,ng verbs, tense sequences and tense backshiOing rule), style (nominal style), fewer syntac,c structures (e.g., “that” and WH-subordinate clauses, causa,ve subordina,on), progressive aspect. - Simplifica,ons? Lack of register varia,ons? → even though the communica,ve approach focuses on authen,c language in communica,on, it oOen does not feature register varia,ons in textbooks -> corpus informed descrip,on mostly not simple enough/to many registers and therefore don’t get included in the textbooks - Selec,on of lexical items deemed pedagogically relevant - The role of phraseology and formulaic language (e.g. phrasal verbs, colloca,ons, light-verb construc,ons) - English for Specific Purposes - Register -> you can only add different registers, you don’t have to remove the established wriJen register What is the defini,on Lexis and grammar cannot be studied separately – lexical items coincide of “lexico- with paJerns, e.g., verbs associated with tenses (e.g., “know”, “maJer”, gramma,cal “suppose” associated with present tense; “smile”, “reply”, “pause” approach”? associated with past tense); verbs associated with clauses (e.g., “know”, “think” associated with that-complement clauses; “like”, “want”, “seem” associated with to-complement clauses). Do phraseological - Lexico-Gramma1cal Con1nuum: Language is beJer understood items count as lexicon as a con,nuum where lexicon and grammar are integrated. or grammar items? Usage-based approaches, like Construc,on Grammar, support Why? this view, emphasizing that phraseological items combine both lexical elements (words) and gramma,cal structures (paJerns) - Psycholinguis1c Evidence: Language is seen as a network of "form-meaning pairings" (construc,ons) that range from small units like morphemes to larger ones like phrasal paJerns. These construc,ons combine lexical items and gramma,cal func,ons, showing that the two are interdependent - Characteris1cs of Phraseological Items: o They involve changes in word meanings depending on context. o They some,mes "violate" standard gramma,cal rules, relying instead on conven,onal paJerns. o They include mul,word units that are oOen opaque, making them unique combina,ons of lexical and gramma,cal components which is why lexicon and grammar cannot be treated as two different areas of a language è From a pedagogical perspec1ve, this means that separa,ng lexicon and grammar in teaching or analysis fails to account for the real, integrated nature of language as it is used. Instead, phraseological items should be addressed as part of a lexico- gramma,cal approach. What are examples - In the BNC, the verb “smile” is mainly used in the past simple for phraseological tense (64% past simple, 25% con,nuous aspect, roughly 11% items that show that present simple). lexicon and grammar - The verb “love” is hardly ever used in a con,nuous aspect (about should not be taught 2% of all occurrences). separately? What are aspects of - Allow for integra,on of corpora into language pedagogy usage-based - Focus on authen,c texts approaches to - Focus on communica,on describing and - Allow for focus on gramma,cal form without contradic,ng teaching languages? communica,ve approach - Con,nuum between lexicon and grammar – no exclusive focus on neither “lexical approach” nor on tradi,onal “grammar rules” - Acknowledge importance of language input - From prototypical construc,on to abstrac,on (Over ,me, through repeated exposure to various examples of similar construc,ons, learners start to iden,fy paJerns and rela,onships) - Integrated teaching of lexis and grammar - Emphasis on meaning What are reasons for - Authen,c language instead of (scripted) textbook language integra,ng corpus use - Empirical answers to language ques,ons instead of relying on into the EFL intui,on or insufficient treatment in teaching materials classroom? o Near synonyms (which word is actually used in which context) o Grammar features (e.g., reported speech, use of “any”, condi,onals, progressive aspect) o Contras,ve aspects of L1 and L2 (e.g., word equivalents - > direct transla,ons not always appear in the same environments, language structures) o Lexical features (e.g., colloca,ons, discourse markers, text type-specific or subject-specific language) - Data-driven learning: people will be more proficient in the language if they explore the language themselves - Reference source: establishing what you can or can’t say today, staying up-to-date with the actual language use What are examples of - Respond to learners’ informa,on needs. direct corpus use in o Learners come across unknown lexical items and ask for the EFL classroom? clarifica,on. (e.g., example from Frankenberg-Garcia: “aisle”) o Learners come across unknown grammar structure and ask for clarifica,on. (e.g., example from Frankenberg-Garcia: “It’s the first ,me”) - Have learners explore grammar structures. (e.g., have the students look at example of sentences with “some” and with “any”: What pa'erns can you see here? In which environment does “any” occur and in which “some”?) - Have learners explore lexical paJerns o Phrasal verbs (e.g., “get back”), colloca,ons (e.g., “learn a lesson”), func,on verbs (e.g., “make”), etc. → students can find out what the phrase means/with what words it collocates by looking at the examples - Have learners explore features of a literary text → have the students find out what the mo,ves/themes of the text are - Show learners how to use corpora to improve language produc,on → so that they can look for the colloca,ons/phrases/words they want to use themselves What are Concrete Ac,vity “False Friends” Examples of Exercises - Level: Lower-intermediate and Ac,vi,es - Learning objec,ve: Deduce meaning of “map” and/or “card” and Involving Direct find out how the respec,ve word is used in English. Corpus Use? - Corpus and corpus analysis tool: BNC or EnTenTen, Sketch Engine - Descrip,on: o Students work with handout of concordance lines or are asked to perform a concordance search in the Sketch Engine. o Students are asked to highlight examples which illustrate meaning of word(s). o With which syntac,c paJerns is the search item usually used? Students are asked to iden,fy and categorize paJerns and record them in their vocabulary books. o Follow-up ac,vity? Ac,vity “Register Varia,on” - Level: Advanced - Learning objec,ve: Iden,fy register-related differences for near synonyms – such as “signify” and “mean” – or syntac,c paJerns – such as repor,ng verb followed by either “that” or a pronoun. - Corpus and corpus analysis tool: BNC or EnTenTen, Sketch Engine - Descrip,on: o Students either work with printouts of the Word Sketch Difference for “signify” and “mean” (columns “and/or”, “objects”, “subjects”, “modifiers”) or are asked to perform a Word Sketch Difference search themselves. o Students are asked to iden,fy and explain differences in usage and jus,fy their conclusions. - Descrip,on: o Students either work with printouts of concordance lines of “say that” and “say” + pronoun (Filter context – Part- of-speech context – pronoun – within 1 token to the right), or are asked to perform the searches themselves. o Students are asked to iden,fy and explain differences in usage and jus,fy their conclusions. Prac1cal ques1ons What could be 1. Pragma,cs of the Cork dialect word “langer” language-related → Spoken language from the Cork region (e.g., transcrip,ons of research ques,ons audio recordings) that you can answer 2. Diachronic inves,ga,on into usage and connota,ons of the word with the help of “gay” corpus analysis? And → Literary texts, newspaper ar,cles, digital language, spoken what kind of text language, etc. from different decades would you collect to compile your corpus 3. Development of gender awareness as reflected in words from? referring to occupa,ons, jobs, etc. → Literary texts, newspaper ar,cles, digital language, spoken language, etc. from different decades 4. Syntac,c structures that are typical of social media language such as “because” + explanatory individual word → Language used in social media communica,on è Comparison of a social media corpus and a web corpus indicates preferred usage of structure in social media communica,on. è Searching for the same structure in other corpora yields no or only very few results. è Inves,ga,ng parts of speech and concrete communica,ve contexts as well as specifying the research ques,on as further steps to be taken. 5. Lexicogramma,cal characteris,cs of words (e.g., preferences regarding preposi,ons, tense, aspect, etc.) → Current literary texts, newspaper ar,cles, digital language, spoken language, etc. The adverb "very" - Use the Thesaurus tool and search for “very” or “nice” and look and the adjec,ve for similar words/synonyms "nice" are frequently overused by language learners. How could you use the Sketch Engine to help students improve their lexical variability? How could you use - Use the Word Sketch Difference tool and search for “random” Sketch Engine to find and “arbitrary” and see how they differ in regards to with which out how the words words they collocate with “random” and “arbitrary” are different from each other? Take a look at a - Use the Concordance tool to find out if the rules and examples worksheet and decide presented on the worksheet are actually used in that way → how you would re- maybe not as limited/specific rules apply as proposed (e.g., structure/re-design it signal words maybe overemphasized if measured against results to make it more of corpus analysis) authen,c and - Re-designing the worksheet: meaningful for o iden,fy actual communica,ve situa,ons and lexical language learners. preferences o use authen,c language material rather than invented example sentences o tasks would be more meaningful if language structure was actually used for communica,on and not merely for the sake of prac,cing it Imagine you want to - Typical usage (literal or figura,ve): use concordance and look at inves,gate the typical the sentences to see if mostly literal or figura,ve usage of “bridge” in OR English. Is the word use the word sketch simple search and take a look at the most more frequently used frequent modifiers of “bridge” to iden,fy if it used more literally in a figura,ve or in a or figura,vely literal sense? Is it - More frequently used as verb or noun? more frequently used in the Word sketch it immediately shows you the dis,nc,on as a verb or as a between noun and verb and how many ,mes it appears in the noun? Which verbs corpus are typically used - Which verbs are used with the noun? with the plural noun Use the concordance tool and use the advanced search “bridges”? What tools → Query type: word → part of speech: noun → filter context: in Sketch Engine part of speech context → only keep lines with: verb → within 5 would you use, and tokens leO and right → then it automa,cally highlights the verbs how would they help? it appears with OR Use the Word Sketch: simple search: bridges, choose verbs with “bridges” as object How could you - Wordlist for both corpora, then compare inves,gate how the - Use the advanced search in Wordlist and look for tags 20 most frequent - Use the advanced search and look for nouns words in the COVID- 19 corpus are different from the 20 most frequent words in the Brexit corpus? What are the three most frequent parts of speech in each of the two corpora? What are the most frequent nouns in each of the two corpora? How could you - Use the Keywords func,on and look for single words and mul,- extract subject- word terms specific terminology from the COVID-19 corpus? How could you create - With the help of the word sketch func,on a lexico-gramma,cal profile of the word “bridge” (What other words does it typically co- occur with? What other words are typically used in the same contexts? Etc.) What are possible Condi,onal sentences (are the tenses really that structured as the gramma,cal or lexical textbooks say?) features dealt with in Reported speech (are the tenses that should be used for reported textbooks that you speech actually that fixed?) could inves,gate to Going To-Future (are the reasons for using this future as clear-cut as see if they proposed?) correspond to - Textbook presenta,on: authen,c language o Used for future plans and inten,ons. use o Decision has already been made. - Corpus data (EnTenTen): o Also frequently used to outline schedules, processes, sequences of events, plans, etc. (also in case of forced ac,ons), e.g.: § “These changes are going to force us to rethink our current assump,ons about usage, as well as the economic and technological underpinnings of the web.” § “Below we are going to look at some very common mistakes that you might be doing.” § “However, what is going to happen next?” o Going to-future in the past as an important means to clarify statements/inten,ons, and to communicate agreement or objec,on in conversa,ons (means of turn- taking, topic-shiOing), e.g.: § “I was gonna say, the Europeans are far more tolerant of what we might perceive as transgressive opinions.” What are sugges,ons - Use authen,c language material to illustrate linguis,c features. for re-designing the - Focus on communica,ve use of features instead of schema,c presenta,on of rules. linguis,c features in - Focus on more concrete speech acts and realis,c communica,ve textbooks? situa,ons. - Include lexico-gramma,cal informa,on. - Explain characteris,cs of aspects rather than explaining and contras,ng isolated verb forms and tenses. How would create - Tourism corpus-based o Year and proficiency level: year 9 or year 10 – B1 // year teaching materials for 11 or year 12– B2-C1 the EFL classroom? o Aims and language objec,ves: introduc,on to a certain Decide on form, topic, subject-specific vocabulary and language paJerns, proficiency level of ability to reflect on topic-related issues, ability to learners, the par,cipate in controversial debates educa,onal objec,ve, o Topic: sustainable tourism the topic/language o Corpus: create subject-specific corpus to iden,fy relevant feature, a corpus, language material through Keyword search as a basis for exercises, corpus authen,c teaching materials linguis,c techniques. o Scanning the lists for irrelevant material, elimina,ng irrelevant material –consult concordances to make sure that unknown expressions or apparently irrelevant expressions are really not interes,ng or important for the topic. o Finding interes,ng subtopics worth addressing in class, e.g. overtourism or ecotourism (to add varia,on to the topic, to avoid repea,ng clichés, to increase general knowledge about the topic). o Crea,ng exercises based on corpus findings. o Processing language material, e.g. in the form of a vocabulary notebook. - Christmas o Year and proficiency level: year 5 to year 8 – A2-B1 o Aims and language objec,ves: consolida,on and enlargement of subject-specific vocabulary and language paJerns, ability to reflect on topic-related issues, ability to par,cipate in culture-specific debates o Topic: Christmas tradi,ons o Corpus: create subject-specific corpus to iden,fy relevant language material through Keyword search as a basis for authen,c teaching materials. o Scanning the lists for irrelevant material, elimina,ng irrelevant material – consult concordances to make sure that unknown expressions or apparently irrelevant expressions are really not interes,ng or important for the topic. o Finding interes,ng subtopics worth addressing in class, e.g. Christmas personali,es and characters from around the world (to add varia,on to the topic, to avoid repea,ng clichés, to increase general knowledge about the topic). o Crea,ng exercises based on corpus findings. o Processing language material, e.g. in the form of a vocabulary notebook. Think of a concrete - Rela,ve clauses teaching scenarios o Teaching scenario: private tui,on, 2 learners (ins,tu,onal seMng, o When to use “which”, “who”, “that”. learner o Provide learners with concordance lines and have them group, etc.) and plan figure out common usage paJerns. a lesson on a o Addi,onal focus on punctua,on/comma placement. topic/language - A/An feature of your choice o Teaching scenario: school context, year 5 or 6 (A1-A2) involving o When to use “a” and “an”. direct corpus use. o Provide learners with pre-selected concordance lines and (Task 9 + presenta,on have them figure out when to use which indefinite 10) ar,cle. Learners should have access to a computer/digital informa,on resource to look up meaning and pronuncia,on of (unknown) words. - Shakespeare o Teaching scenario: school context, year 12 (B2-C1) o Get familiar with Shakespeare’s language. o Provide learners with access to a concordance analysis tool and a corpus of Shakespeare’s work. o Have learners compile frequency lists and guess meaning of words and expressions (expecta,on guide). o Have learners consult respec,ve concordance lines to prepare their reading (which words are used and how, what do they mean, etc.). - Social Media o Teaching scenario: school context, year 8 (A2-B1) o Mo,vate learners to interact with language (data). o Provide learners with access to a social media corpus and have them find out how certain words are typically used in the social media context (e.g., “raw”). o Alterna,vely, provide learners with pre-selected concordance lines or word sketches of a linguis,c social media feature that you wish to address in class. Opinion ques1ons In your opinion, how I think, scripted texts and radio/TV language fit well into this concept as do scripted texts and they have not been produced to be put into a corpus. In a way those radio/TV language fit scripted texts try to mimic the actual real language and some,mes the into the concept of actors don’t adhere to the script word for word; in this way a more authen,c and authen,c language can be incorporated as well. Obviously, it should be naturally occurring noted when using scripted material, that the material may be language? compromised. But there are also corpora only consis,ng of scripted material, if an analysis of special scripted lanugage is wanted. So, it depends on what you want to look at. Which of the two - If I wanted to use already available corpora than I would prefer tools (Sketch Engine Sketch Engine, but if I wanted to create my own corpus on which and #LancsBox) I wanted to work on all the ,me, then I would probably prefer would you prefer to #LancsBox use (in which - In general, however, I prefer Sketch Engine because of its user- situa,on) and why? friendliness and (mostly) self-explanatory interface - But I guess if you once understood the workings and func,ons of #LancsBox it would be more resourceful, especially when not being a student anymore and otherwise having to pay for Sketch Engine Exam: - Transla,on not important - I don’t have to list any concrete corpora or corpus soOwares/pages -> she might give us a list and we have to decide which are most useful in the context of language teaching - No ques,ons about LancsBox, but the advantages and disadvantages on a general level I have to know - For the tasks: focus on one of the examples - Example: photo of n-gram list and mul,-word list and explaining the difference - Opinion ques,on: How can corpus analysis benefit foreign language teaching? - We will have to form our opinion on the topic of session 6 -> data driven learning - Task 7: focus on one aspect/linguis,c feature you want to explain -> “Describe how one linguis,c feature is presented differently in textbooks and in corpus analysis.” - Task 10 not relevant