Podcast
Questions and Answers
In corpus linguistics, what is the primary significance of collocations?
In corpus linguistics, what is the primary significance of collocations?
- They identify words that frequently appear together, highlighting patterns of language use. (correct)
- They categorize single words based on their frequency of occurrence in a corpus.
- They isolate words that appear in specific tenses, ignoring aspectual considerations.
- They demonstrate the prescriptive rules that all speakers should follow to maintain grammatical correctness.
Why does corpus linguistics consider utterances with missing subjects, such as 'Heading to the pharmacy,' despite their grammatical incorrectness?
Why does corpus linguistics consider utterances with missing subjects, such as 'Heading to the pharmacy,' despite their grammatical incorrectness?
- To create new prescriptive rules accommodating common deviations from standard grammar.
- To promote a specific dialect of English where such constructions are considered standard.
- To enforce strict grammatical rules through observation and categorization of errors.
- To reflect actual language use, acknowledging that real-world communication often deviates from prescriptive norms. (correct)
Which of the following tools is NOT typically used in corpus linguistics?
Which of the following tools is NOT typically used in corpus linguistics?
- Syntactic parsing to generate abstract syntax trees. (correct)
- Pattern identification, such as collocation searches.
- Frequency counts of words and multi-word units.
- Concordancing to examine words in context.
What type of linguistic analysis is facilitated by comparing different parts of the same corpus or different corpora?
What type of linguistic analysis is facilitated by comparing different parts of the same corpus or different corpora?
What are corpus tools NOT generally used for?
What are corpus tools NOT generally used for?
Which field of study directly examines multi-word units within the framework of corpus linguistics?
Which field of study directly examines multi-word units within the framework of corpus linguistics?
How does corpus linguistics contribute to lexicogrammar?
How does corpus linguistics contribute to lexicogrammar?
How does the subcorpus function enhance corpus analysis within the 'Concordance' tool?
How does the subcorpus function enhance corpus analysis within the 'Concordance' tool?
What is a practical application of corpus comparison between general and specialized corpora?
What is a practical application of corpus comparison between general and specialized corpora?
What is the advantage of using 'macros' in the Concordance tool for corpus analysis?
What is the advantage of using 'macros' in the Concordance tool for corpus analysis?
How does the 'filter context' function contribute to a more refined search in corpus analysis?
How does the 'filter context' function contribute to a more refined search in corpus analysis?
In corpus linguistics, what primary information do 'tags' provide about a token within an annotated corpus?
In corpus linguistics, what primary information do 'tags' provide about a token within an annotated corpus?
What is the main function of the 'Wordlist' tool in Sketch Engine?
What is the main function of the 'Wordlist' tool in Sketch Engine?
When using the 'Wordlist' tool in Sketch Engine, what happens if you leave the settings at their default values?
When using the 'Wordlist' tool in Sketch Engine, what happens if you leave the settings at their default values?
How could you use the Concordance tool to find instances of words starting with 'un'?
How could you use the Concordance tool to find instances of words starting with 'un'?
If a researcher wants to compare word frequency across different sections of a corpus (e.g., legal documents vs. news articles), which feature of the Concordance tool would be most useful?
If a researcher wants to compare word frequency across different sections of a corpus (e.g., legal documents vs. news articles), which feature of the Concordance tool would be most useful?
Which application of corpus linguistics directly assists individuals in determining the appropriateness of specific word choices within a given context?
Which application of corpus linguistics directly assists individuals in determining the appropriateness of specific word choices within a given context?
What is the key distinction between 'authentic language' and 'naturally occurring language' according to corpus linguistics?
What is the key distinction between 'authentic language' and 'naturally occurring language' according to corpus linguistics?
A linguist aims to investigate regional variations in the use of specific idioms. Which approach to accessing and using corpora is most appropriate?
A linguist aims to investigate regional variations in the use of specific idioms. Which approach to accessing and using corpora is most appropriate?
A researcher wants to create a corpus to study the evolution of political discourse in online forums over the past decade. Which method of corpus compilation is most suitable?
A researcher wants to create a corpus to study the evolution of political discourse in online forums over the past decade. Which method of corpus compilation is most suitable?
For a language teacher aiming to enhance students' understanding of vocabulary in authentic contexts, which type of corpus would be most beneficial?
For a language teacher aiming to enhance students' understanding of vocabulary in authentic contexts, which type of corpus would be most beneficial?
A bilingual teacher wants to create a corpus to assess the impact of climate change on curricula. Which specialized corpus would be the best choice?
A bilingual teacher wants to create a corpus to assess the impact of climate change on curricula. Which specialized corpus would be the best choice?
How might a literary corpus be used most effectively in a language learning context?
How might a literary corpus be used most effectively in a language learning context?
A linguist is comparing the frequency of a particular verb tense across different genres of writing. What is an effective method to achieve this using corpus linguistics tools?
A linguist is comparing the frequency of a particular verb tense across different genres of writing. What is an effective method to achieve this using corpus linguistics tools?
In what way does corpus linguistics contribute to syllabus design indirectly?
In what way does corpus linguistics contribute to syllabus design indirectly?
How does corpus linguistics aid in the development of language teaching materials and reference works?
How does corpus linguistics aid in the development of language teaching materials and reference works?
What role does corpus linguistics play in the evaluation of existing pedagogical descriptions in language teaching?
What role does corpus linguistics play in the evaluation of existing pedagogical descriptions in language teaching?
How can corpus linguistics inform the selection and sequencing of language phenomena in a language course?
How can corpus linguistics inform the selection and sequencing of language phenomena in a language course?
In what way does corpus linguistics contribute to the presentation of selected items and structures in language teaching?
In what way does corpus linguistics contribute to the presentation of selected items and structures in language teaching?
How does the 'data-driven learning' approach, combined with direct corpus application, change the role of language learners?
How does the 'data-driven learning' approach, combined with direct corpus application, change the role of language learners?
What is the key distinction between learners interacting with raw corpus data versus pre-processed corpus data in data-driven learning?
What is the key distinction between learners interacting with raw corpus data versus pre-processed corpus data in data-driven learning?
Which of the following is an example of how corpus linguistics can be directly applied to enhance lexical skills in language teaching?
Which of the following is an example of how corpus linguistics can be directly applied to enhance lexical skills in language teaching?
Which approach best leverages authentic language material for teaching linguistic features?
Which approach best leverages authentic language material for teaching linguistic features?
How can 'going to'-future in the past be utilized in conversations, according to proposed suggestions?
How can 'going to'-future in the past be utilized in conversations, according to proposed suggestions?
In the context of EFL textbook redesign, what is the primary benefit of focusing on speech acts and realistic communicative situations?
In the context of EFL textbook redesign, what is the primary benefit of focusing on speech acts and realistic communicative situations?
What would be the best way to incorporate lexico-grammatical information into EFL materials?
What would be the best way to incorporate lexico-grammatical information into EFL materials?
For a tourism-based corpus designed for year 11 or year 12 students, what proficiency level should the materials target?
For a tourism-based corpus designed for year 11 or year 12 students, what proficiency level should the materials target?
Which of the following describes a primary advantage of using corpus linguistics in language teaching?
Which of the following describes a primary advantage of using corpus linguistics in language teaching?
What is a significant reason why data-driven commercial teaching materials are not more common in foreign language teaching?
What is a significant reason why data-driven commercial teaching materials are not more common in foreign language teaching?
Why might teaching materials avoid including numerous register variations, even within a communicative approach?
Why might teaching materials avoid including numerous register variations, even within a communicative approach?
Teaching materials, even corpus-driven ones, tend to simplify certain aspects of English. Which of the following is an example of a common simplification?
Teaching materials, even corpus-driven ones, tend to simplify certain aspects of English. Which of the following is an example of a common simplification?
How do corpus-driven reference grammars typically appear in teaching materials?
How do corpus-driven reference grammars typically appear in teaching materials?
What underlying issue is highlighted by the debate between rule-governed and grammar-in-context approaches to language teaching?
What underlying issue is highlighted by the debate between rule-governed and grammar-in-context approaches to language teaching?
What does corpus linguistics allow that traditional linguistic studies often do not?
What does corpus linguistics allow that traditional linguistic studies often do not?
Despite the communicative approach emphasizing authentic language, what aspect is often lacking in textbooks, according to the text?
Despite the communicative approach emphasizing authentic language, what aspect is often lacking in textbooks, according to the text?
Flashcards
Collocations
Collocations
Words that frequently appear together. Reveal patterned nature of language.
Language in use
Language in use
How language is actually used, even if it breaks 'rules'.
Frequency Counts
Frequency Counts
Counts of single words and multi-word units.
Concordancing
Concordancing
Signup and view all the flashcards
Pattern Identification
Pattern Identification
Signup and view all the flashcards
Corpus comparison
Corpus comparison
Signup and view all the flashcards
Phraseology
Phraseology
Signup and view all the flashcards
Lexicogrammar
Lexicogrammar
Signup and view all the flashcards
Authentic Language
Authentic Language
Signup and view all the flashcards
Naturally Occurring Language
Naturally Occurring Language
Signup and view all the flashcards
Accessing Corpora
Accessing Corpora
Signup and view all the flashcards
Compiling Corpora
Compiling Corpora
Signup and view all the flashcards
General Corpora
General Corpora
Signup and view all the flashcards
Specialized Corpora
Specialized Corpora
Signup and view all the flashcards
Literary Corpora
Literary Corpora
Signup and view all the flashcards
Corpus Software
Corpus Software
Signup and view all the flashcards
Wildcard Use in Concordance
Wildcard Use in Concordance
Signup and view all the flashcards
Subcorpus
Subcorpus
Signup and view all the flashcards
Corpus Analysis Macros
Corpus Analysis Macros
Signup and view all the flashcards
Filter Context
Filter Context
Signup and view all the flashcards
Tags (POS Tags)
Tags (POS Tags)
Signup and view all the flashcards
What is a Tag?
What is a Tag?
Signup and view all the flashcards
Sketch Engine’s “Wordlist”
Sketch Engine’s “Wordlist”
Signup and view all the flashcards
Compiling Word Frequency Counts
Compiling Word Frequency Counts
Signup and view all the flashcards
Indirect Corpus Application
Indirect Corpus Application
Signup and view all the flashcards
Corpus in Syllabus Design
Corpus in Syllabus Design
Signup and view all the flashcards
Corpus in Material Design
Corpus in Material Design
Signup and view all the flashcards
Corpus for Evaluation
Corpus for Evaluation
Signup and view all the flashcards
Data-Driven Learning (DDL)
Data-Driven Learning (DDL)
Signup and view all the flashcards
Direct Corpus Application
Direct Corpus Application
Signup and view all the flashcards
Data-Driven Learning with Direct Corpus Application
Data-Driven Learning with Direct Corpus Application
Signup and view all the flashcards
Corpus for Lexical Skills
Corpus for Lexical Skills
Signup and view all the flashcards
Corpus Analysis
Corpus Analysis
Signup and view all the flashcards
Corpus Linguistics in Teaching
Corpus Linguistics in Teaching
Signup and view all the flashcards
Rule-Governed Approach
Rule-Governed Approach
Signup and view all the flashcards
Grammar-in-Context Approach
Grammar-in-Context Approach
Signup and view all the flashcards
Language Norms
Language Norms
Signup and view all the flashcards
Corpus-Driven Materials
Corpus-Driven Materials
Signup and view all the flashcards
Register Variation
Register Variation
Signup and view all the flashcards
Formulaic Language/Phraseology
Formulaic Language/Phraseology
Signup and view all the flashcards
"Going to" Future
"Going to" Future
Signup and view all the flashcards
"Going to" Future in the Past
"Going to" Future in the Past
Signup and view all the flashcards
Authentic Language Material
Authentic Language Material
Signup and view all the flashcards
Communicative Use of Features
Communicative Use of Features
Signup and view all the flashcards
Lexico-grammatical Information
Lexico-grammatical Information
Signup and view all the flashcards
Study Notes
Definition of Corpus
- A corpus is a large, principled collection of naturally occurring language.
- Corpora are usually stored electronically.
- Texts in a corpus must be produced in a natural communicative setting.
- The texts are spoken or written for an authentic communicative purpose.
- The purpose is not to put them into a corpus.
- A corpus may consist of newspaper articles, even though they are included for convenience.
- Journalists write articles to communicate with readers, not to fill a linguistics corpus.
Definition of Corpus Linguistics
- The study of language in use using corpus analysis.
- It involves methods and principles of applying corpora in language studies and teaching/learning.
Definition of General/Generalized Corpora
- Very large corpora aim to represent language as a whole.
- They often contain more than 10 million words.
- A generalized corpus contains a variety of language to generalize findings.
- Examples:
- British National Corpus (BNC)
- American National Corpus (ANC)
- COCA
- Written texts (newspaper and magazine articles).
- Works of fiction and nonfiction.
- Writing from scholarly journals.
- Spoken transcripts (informal conversations, government proceedings, business meetings).
Definition of Specialized Corpora
- Represents a particular part of a language.
- Examples include a particular subject field or dialect.
- It contains texts that represent the language of that type.
- They can be large or small.
- Examples:
- Michigan Corpus of Academic Spoken English (MICASE)
- CHILDES Corpus (language used by children)
- Michigan Corpus of Upperlevel Student Papers
- Medical corpus (language used by nurses and hospital staff)
Definition of Learner Corpora
- Contains written texts and/or spoken transcripts of student language.
- The students are acquiring the language.
- These corpora can be examined to see common student errors.
Definition of Pedagogic Corpora
- Contains language used in classroom settings.
- Includes textbooks, transcripts of classroom interactions, or any text learners encounter.
- Useful to make sure students learn useful language.
- They can also examine teacher-student dynamics, or as self-reflective tool for teacher development.
Definition of Data-Driven Learning (DDL)
- An approach to foreign language learning.
- Language is treated as data.
- Students are researchers undertaking guided discovery tasks, rather than following teachers and textbooks.
"Principled" in Corpus Compilation/Linguistics
- A corpus follows guidelines, which can differ depending on the corpus.
- The texts must be planned.
- The language must be chosen according to specific characteristics.
- Texts must be chosen to be useful for your research question/aim.
- Text choice depends on the corpus' goal.
Concrete Examples of Basic Corpus Linguistics Principles
- Context:
- Words need context to convey meaning.
- Isolated words can be interpreted differently.
- Language patterns:
- Patterned nature is apparent through corpus analysis.
- Collocations are words that usually co-occur.
- Without one part of a collocation, the whole collocation does not make sense.
- Tense and aspect: some verbs only stand with specific tenses or aspects.
- Language in use:
- Corpus linguistics studies language in use.
- It is not concerned with prescriptive "rules".
- Some expressions/utterances may be strictly incorrect.
- Language in use might not be the one expected.
Corpus Tools (Generally)
- Frequency counts (single words and multiword units).
- Concordancing.
- Pattern identification (collocation search).
- Corpus comparison.
- Compare different language varieties (regional varieties, register).
- Diachronic language comparison.
- Compare general and specialized corpora to identify corpus-specific terminology.
- For translation (multi-language corpora, e.g. linguee).
Research/Application Fields for Corpus Linguistics
- More theoretical:
- Phraseology (multi-word units).
- Lexicogrammar: connection between lexical and grammatical aspects.
- Register, language change.
- More practical:
- (Foreign) language teaching.
- Lexicography.
- Translation (studies).
- Language for specific purposes.
- Writing assistance/language reference.
"Authentic Language" and "Naturally Occurring Language"
- Every utterance (spoken, written, or transcribed) has been produced for communication.
- The purpose is not to put it into a corpus.
- Language has been produced in natural communicative settings.
- Authenticity also implies wanting language material produced by native speakers.
- Some corpora look at the language of English learners.
Accessing Corpora
- Download a corpus and corpus software.
- Download a corpus and access the corpus software online.
- Access a corpus and corpus software online.
Compiling Corpora
- Collect texts in accordance with the research interests.
- Use texts from books, newspapers, journals, transcriptions, texts by language learners.
- Use texts from the web.
Useful Corpora for Language Teaching/Learning
- Use online tools like Sketch Engine or downloadable tools like #LancsBox.
- General corpora
- authentic language in context (BNC, enTenTen, English Corpus for SkELL, Open American National Corpus)
- for working on lexical and grammar skills
- Specialized corpora
- for working on current topics or bilingual subject teaching (Brexit, ScienceBlogs, Environment corpus, e-flux, EcoLexicon English Corpus)
- Literary corpora
- for working on literary skills (Project Gutenberg English, English Drama Corpus, Shakespeare English Drama Corpus)
Main #LancsBox and Sketch Engine Differences
- Accessibility:
- Sketch Engine is online.
- #LancsBox is local.
- Pre-Loaded Corpora:
- Sketch Engine: >700.
- #LancsBox: 7.
- Languages:
- Sketch Engine: >100. -#LancsBox: 2.
- Pricing:
- Sketch Engine: Around 100€ per year.
- #LancsBox: Free.
- Corpus building:
- Sketch Engine: Up to 1 million for your own.
- #LancsBox: No word limits for your own.
- Usability:
- Sketch Engine: Basic knowledge is sufficient.
- #LancsBox: Sound knowledge is helpful.
Sketch Engine's Concordance Tool
- A collection of word-form occurrences in its textual environment.
- In its simplest form, it is an index.
- References the place of occurrence in a text.
- It finds words, phrases, tags, documents and displays the results in context.
- The concordance can be sorted, filtered, counted, and processed further.
- Too many results from large corpora may find can be tedious.
- KWIC=Keywords in context -> concordance lines
Concordance Tool's Query Types
- Simple:
- typing go will find goes, going, gone, went
- Lemma: finding all forms of the lemma (base form)
- Phrase:
- finding phrases exactly as typed
- Word:
- this option will find a word form as typed
- Character:
- finding tokens with the character(s)
- CQL:
- Corpus Query Language for complex criteria making use of part-of-speech tags and regular
Concordance Tool's Use in Corpus Analysis
- Subcorpus:
- limiting the search
- Macro: saving searches for future use
- Filter context:
- keeping lines fulfilling conditions
- Text types:
- excluding or including specific documents or corpus parts
Tags
- A label is assigned to each token in an annotated corpus.
- They indicate part of speech.
- They often also show grammatical categories and morphological information.
Sketch Engine's "Wordlist" Tool
- Used to generate frequency lists of all kinds.
- Examples are words, lemmas, nouns, verbs, tags, words containing/not containing certain characters.
- To compile word frequency counts:
- Log in and select a corpus.
- Select corpus for the list to derive from.
- Leave default settings.
- Or, set criteria or use a custom search.
- Click GO.
- You can compile wordlists for...
- Nouns, verbs, adjectives and other parts of speech; words beginning, ending, containing certain characters; word forms, tags, lemmas and other attributes; or a combination of all of the above.
Sketch Engine tool “Keywords”
- Shows which words appear more in the focus corpus than would be expected.
- General language is represented by the reference corpus.
- Any token can qualify if it is used more in a focus corpus than in reference corpus.
How Keywords Can Be Helpful
- Sketch Engine combines statistics with linguistic criteria to extract keywords and terms.
- Useful in language learning or applied linguistics
- Can define or understand the main corpus topic.
- Easy to compare different corpora.
N-Gram Lists
- Frequency lists of multiword expressions (MWEs) or lexical bundles.
- N-gram = a sequence of items.
- An item can refer to anything.
- In linguistics, n-grams are sometimes multiword expressions.
Key Differences Between N-Gram and MWE Lists
- N-gram lists
- Focus: Frequency of word sequences.
- Output: Common fixed word (bigrams, trigrams, etc.)
- Use Case: Find common phrases and collocations.
- Keyword + MWE
- Focus: Significance compared to a reference corpus.
- Output: Keywords and fixed expressions with semantic significance.
- Use Case: Identify domain-specific or unusual words/phrases
N-Gram List Compilation
- Log in, choose a corpus, click N-GRAMS and select the number of items.
Analyzing Lexico-Grammatical Features
- With the help of the Word Sketch tool.
Sketch Engine tool "Word Sketch"
- Words collocated. Words and other words in its surroundings.
- Summary of the word's grammatical and collocational behavior.
- Results are organized into grammatical relations.
Sketch Engine tool "Thesaurus"
- Words used in similar contexts develop into synonyms.
- Identify collocations of both a given word and other words from the same part of speech. Compare sketches to those of the searched word, and find whose closest matches. It suggests similar meaning.
- Beware: not all synonymous
- May help with the lexical variability of students
Sketch Engine tool “Word Sketch Difference”
- Used to make comparisons by contrasting collocations.
- You can compare:
- Lemma
- Word Forms
- Subcorpora
Use of Word Sketch Difference
- Helps understand the differences between similar words.
- Shows which collocations in which combinations appears.
Corpus Analysis Benefits
- Relevance
- Content must be relevant.
- Important that it appears frequently.
- Authenticity
- Language produced by native speakers.
- Language that appears in natural context.
- Authentic language always appears in patterns.
- Response
- Respond directly to your learner's needs (E.g. the writing assistance).
- Allows you to look up more exercises with the help of corpora.
- Autonomy
- Learners get aware of learning process.
- The more aware, the more language can be learned.
- Availability
- Corpus= available native speaker.
- Data is available for everyone.
- Sustainability
- Easily memorized due to the relevant topic.
- It sticks longer in your head.
Corpus Linguistics (Indirect)
- Syllabus design
- Which language patterns are important.
- What to teach and when to teach it.
- Teaching materials design, reference work design
- Always corpus-based.
- Teach language actually spoken.
- Evaluating existing pedagogical material
- Base material on literature.
- Selection of language phenomena.
- Presentation of selected items and structures.
- Items interact with one another.
Data-Driven Learning Combination
- Language learners are "linguistic researchers" or "language detectives".
- Learners interact with corpus data pre-processed by a teacher.
- Teachers select concordance lines that students work.
Corpus Linguistics (Direct)
- Lexical skills: skills of synonymity, collocation, register
- Interferences between learners’ first and second language.
- Grammar structures in use
- Literature Lessons
- Responding to concrete learner needs
Benefits of Data-Driven Learning
- Learning how to learn.
- Personalized learning.
- Topics appear relevant.
- Communication and teamwork.
- Language is natural to learners.
- Lexico-grammatical Approach.
- Enhanced Noticing Skills
- Extended Cognitive Abilities
- Learner Autonomy
The Negative Aspects of Data-Driven Learning
- Requires great effort in the beginning.
- Preparing takes a lot of time.
- Special resources need to be in place.
- License to use corpus programmes must be purchased.
- The curriculum is full already.
Impact of Corpus Analysis
- Prescriptive vs. descriptive.
- Corpus analysis allows to confirm actual use, disprove tradition.
- Reveals there is rarely commercial teaching material
- Corpus informs commercial teaching material in textbook.
- Data driven vs simplifications & lack of register variations
The Definition of lexico-grammatical approach”?
- Lexis and grammar are integrated The role of phraseology and formulaic language (e.g. phrasal verbs, collocations, light-verb constructions)
3 Categories to Phraseological Item Count
Lexico - Grammatical Continuum: Language Psycholinguistic View: Language Characteristics of Phraseological Items: phrase meaning
Types of Phrases
In BNC ”smile” verb main past simple tense The phrase is “love“ aspect of the verb.
Aspects of Usage-Based
Allow for integration of corpora into language pedagogy Focus on authentic texts Focus on communication
Reasons for Integrating Corpus Use
- Focus on the language itself.
- Empirical answers to language questions are empirical. Near synonyms
- Contrastive aspects -word equivalents Lexical features (e.g., collocations, discourse markers, text
- Data driven learning and people exploration of a language
- Reference point and up-to-date usage
Examples of Direct Corpus Use
- Respond to learners' information needs. Learners come across items and ask questions. Learners come grammar structures and ask questions Have explore structures Have learners explore patterns
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore key aspects of corpus linguistics, including the significance of collocations, handling grammatically incorrect utterances, and comparative corpus analysis. Learn about the lexicogrammatical contributions and practical applications of corpus comparison. Understand how tools like 'Concordance' enhance corpus analysis through subcorpora and macros.