Corpus Linguistics Fundamentals
48 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following is the MOST accurate description of how corpus linguistics approaches the study of language?

  • It relies on theoretical constructs without empirical validation.
  • It primarily focuses on establishing prescriptive rules for correct language usage.
  • It prioritizes subjective interpretations of language over objective analysis.
  • It analyzes language as it is naturally used, regardless of prescriptive norms. (correct)

In corpus linguistics, the continuous form of verbs like 'love' is frequently observed in various contexts.

False (B)

Explain how corpus comparison can be utilized to identify specialized terminology within particular fields.

By comparing a general corpus with a specialized corpus to identify terms that are significantly more frequent in the specialized corpus.

In corpus linguistics, the study of multi-word units is known as ______.

<p>phraseology</p> Signup and view all the answers

Match each corpus linguistics tool with its primary function:

<p>Frequency counts = Determining how often words appear Concordancing = Displaying words in context Pattern identification = Finding recurring sequences Corpus comparison = Analyzing linguistic differences</p> Signup and view all the answers

Which of the following examples demonstrates a common deviation from prescriptive grammar that corpus linguistics would still consider?

<p>Sentences with missing subjects in spoken language. (D)</p> Signup and view all the answers

What is the significance of collocations in corpus linguistics?

<p>They are words that frequently occur together and contribute to meaning. (B)</p> Signup and view all the answers

Describe how corpus linguistics can be applied in the field of lexicogrammar.

<p>by examining how lexical choices influence grammatical structures and vice versa, revealing patterns of usage</p> Signup and view all the answers

Which 'Concordance' tool search function is best for finding words with a defined ending and unlimited preceding characters?

<p><code>*,on</code> (A)</p> Signup and view all the answers

Macros in corpus analysis are used to exclude specific documents or parts of a corpus.

<p>False (B)</p> Signup and view all the answers

Which of the following statements accurately distinguishes Sketch Engine from #LancsBox regarding corpus analysis?

<p>Sketch Engine focuses on user-friendly interfaces and interactive visualizations, unlike #LancsBox, which emphasizes command-line tools and scripting. (B)</p> Signup and view all the answers

What is the primary function of a POS tag in an annotated corpus?

<p>to indicate the part of speech and grammatical categories</p> Signup and view all the answers

The 'Wordlist' tool generates frequency lists of words, lemmas, nouns, verbs, and ______.

<p>tags</p> Signup and view all the answers

When using the 'Lemma' query type in Sketch Engine's concordance tool, the search will only return the base form of the word, excluding any inflected forms.

<p>False (B)</p> Signup and view all the answers

Match the following 'Concordance' tool functions with their descriptions:

<p>Subcorpus = Limits search to specific parts of the corpus (e.g., genre, topic, domain) Macro = Saves search criteria for quick reuse Filter context = Keeps lines fulfilling additional conditions Text types = Excludes or includes specific documents or parts of the corpus</p> Signup and view all the answers

Explain how the 'Character' query type in Sketch Engine's concordance tool differs from other query types in its handling of punctuation.

<p>The 'Character' query type treats punctuation marks as literal characters to be searched for within tokens, while other query types often assign special meanings to punctuation or disregard them altogether.</p> Signup and view all the answers

The query type in Sketch Engine's concordance tool that allows for the use of regular expressions and part-of-speech tags for complex searches is called ______.

<p>CQL</p> Signup and view all the answers

Which action configures the 'Wordlist' tool to generate a list showing the most frequent word forms in its default setting?

<p>Leave the settings at the default values. (D)</p> Signup and view all the answers

Which of the following best describes the purpose of the 'Filter context' function in corpus analysis?

<p>To remove lines that do not meet specific criteria beyond the initial search. (A)</p> Signup and view all the answers

Match each Sketch Engine concordance query type with its corresponding functionality.

<p>Lemma = Finds all forms of the word. Phrase = Finds a phrase composed of several tokens exactly as typed. Word = Finds word forms exactly as typed. Character = Finds tokens which contain the character(s).</p> Signup and view all the answers

The 'Wordlist' tool can only generate frequency lists of word forms, and cannot be used for lemmas or POS tags.

<p>False (B)</p> Signup and view all the answers

A researcher wants to find all instances of the word "run" regardless of its form (e.g., "runs", "running", "ran"). Which concordance query type in Sketch Engine would be most suitable?

<p>Lemma (C)</p> Signup and view all the answers

What is a primary limitation of using a concordance tool with very large corpora, despite its power?

<p>The overwhelming number of results, which can make analysis and interpretation difficult and time-consuming. (C)</p> Signup and view all the answers

The 'Simple' query type in Sketch Engine will only find the exact word form entered by the user.

<p>False (B)</p> Signup and view all the answers

Which criterion is NOT essential for a collection of texts to be considered a corpus?

<p>The texts must be stored electronically. (D)</p> Signup and view all the answers

Texts written specifically for the purpose of being included in a corpus, such as eliciting specific grammatical structures, are considered valid components of a corpus.

<p>False (B)</p> Signup and view all the answers

Briefly explain why authenticity is crucial to the definition of a 'corpus'.

<p>Authenticity ensures the language represents real-world usage, avoiding biases introduced by artificial or contrived language samples.</p> Signup and view all the answers

Corpus linguistics involves a system of methods and principles for applying corpora in language studies and ______.

<p>teaching/learning</p> Signup and view all the answers

Match each characteristic to its corresponding term:

<p>Corpus = A large, principled collection of naturally occurring language. Corpus Linguistics = The study of language in use through corpus analysis. Authenticity = Texts produced in a natural communicative setting.</p> Signup and view all the answers

Which of the following statements best characterizes corpus linguistics?

<p>The application of statistical methods to analyze large language datasets. (B)</p> Signup and view all the answers

Which is the LEAST representative genre for inclusion in a corpus designed to study natural language use?

<p>Scripted dialogues from a stage play, based on natural conversations. (D)</p> Signup and view all the answers

Explain how the definition of a 'corpus' ensures that linguistic analyses based on corpora are ecologically valid.

<p>By requiring texts to be naturally occurring and produced in authentic communicative settings, corpora reflect real-world language use, making analyses generalizable to natural language contexts.</p> Signup and view all the answers

Which of the following statements best describes the relationship between lexicon and grammar according to the presented perspective?

<p>Lexicon and grammar are so intertwined that treating them as separate entities in teaching or analysis misrepresents the integrated nature of language use. (D)</p> Signup and view all the answers

The verb 'love' is commonly used in continuous aspect, reflecting its dynamic and ongoing nature.

<p>False (B)</p> Signup and view all the answers

Briefly explain how a usage-based approach to language pedagogy integrates corpora.

<p>Usage-based approaches integrate corpora into language pedagogy by utilizing authentic texts to illustrate real-world language use and frequency patterns, which then inform teaching and learning materials.</p> Signup and view all the answers

Phraseological items consist of multi-word units that are often __________, combining lexical and grammatical components uniquely.

<p>opaque</p> Signup and view all the answers

Match the benefit of integrating corpus use into the EFL classroom with its description:

<p>Authentic language = Using real-world texts instead of scripted material. Empirical answers = Basing teaching on evidence rather than intuition.</p> Signup and view all the answers

What is a key characteristic of usage-based approaches to describing and teaching languages?

<p>Emphasis on communication and authentic texts. (C)</p> Signup and view all the answers

Explain the process by which learners move from prototypical constructions to abstraction in language acquisition, according to the principles outlined.

<p>Through repeated exposure to similar constructions, learners begin to identify patterns and relationships, gradually abstracting general principles.</p> Signup and view all the answers

What is the primary implication of the statistic that the verb 'smile' is predominantly used in the past simple tense (64% of occurrences in the BNC)?

<p>There is a need to integrate corpus data into language teaching to show actual usage patterns, with less emphasis on the continuous aspect. (A)</p> Signup and view all the answers

Which tool in Sketch Engine is most effective for directly comparing the collocational patterns of 'random' and 'arbitrary'?

<p>Word Sketch Difference tool (D)</p> Signup and view all the answers

Using authentic language material in worksheets, as opposed to invented sentences, is suggested to make language learning more meaningful.

<p>True (A)</p> Signup and view all the answers

When redesigning language learning worksheets, what is a key consideration to enhance their effectiveness, besides using authentic materials?

<p>Identifying actual communicative situations and lexical preferences</p> Signup and view all the answers

To determine whether 'bridge' is used more often in a literal or figurative sense, one can use the Word Sketch to examine the most frequent ______ of 'bridge'.

<p>modifiers</p> Signup and view all the answers

Match the Sketch Engine tool with its primary function for analyzing word usage:

<p>Concordance tool = Examining sentences to understand typical word usage Word Sketch Difference tool = Comparing collocational patterns between words Word Sketch = Identifying distinctions between noun and verb usage of a word</p> Signup and view all the answers

What type of query in the Concordance tool allows you to identify verbs typically used with the plural noun 'bridges'?

<p>Advanced search filtering by part of speech context (D)</p> Signup and view all the answers

The use of signal words should be emphasized when creating a worksheet.

<p>False (B)</p> Signup and view all the answers

Besides identifying literal or figurative usage, what other distinction can the Word Sketch immediately show about a word’s usage?

<p>Distinction between noun and verb usage and frequency</p> Signup and view all the answers

Flashcards

Collocations

Words that commonly appear together.

Language in Use

Corpus linguistics examines language as it is used, ignoring strict prescriptive rules.

Frequency Counts

Counting word frequencies to see which words are most common.

Concordancing

Tool that shows the occurrences of a word in its context.

Signup and view all the flashcards

Pattern Identification

Identifying common phrases for a specific search.

Signup and view all the flashcards

Corpus

A large, principled collection of naturally occurring language, stored electronically.

Signup and view all the flashcards

Authenticity in Corpora

Texts within a corpus should be produced in a natural communicative setting.

Signup and view all the flashcards

Corpus Comparison

The comparison of different sets of texts.

Signup and view all the flashcards

Corpus Linguistics

The use of corpus data to study language.

Signup and view all the flashcards

Phraseology

The study of multi-word units.

Signup and view all the flashcards

Corpus Linguistics (expanded)

A system of methods and principles on how to apply corpora in language studies and teaching/learning.

Signup and view all the flashcards

Lexicogrammar

The connection between vocabulary and grammar.

Signup and view all the flashcards

Discourses in a Corpus

Spoken, written, computer-mediated, spontaneous, or scripted examples of language.

Signup and view all the flashcards

Corpus Definition

Collections or databases of machine-readable texts involving natural discourse in diverse contexts.

Signup and view all the flashcards

Authentic Conversations

Authentic conversations; not produced only for the sake of the corpus.

Signup and view all the flashcards

Communicative Setting

Texts spoken/written for an authentic communicative purpose.

Signup and view all the flashcards

Concordance

A collection of a word-form occurrences, each in its textual environment; an indexed reference to the place of occurrence in a text.

Signup and view all the flashcards

Sketch Engine 'Concordance' tool function

A tool to find words, phrases, tags, document types, or corpus structures, displaying results in context.

Signup and view all the flashcards

KWIC

Keywords in context; concordance lines that show the searched item within its surrounding text.

Signup and view all the flashcards

Simple Query (Concordance)

Automatically searches for all forms of a word (e.g., typing 'go' finds 'goes,' 'going,' 'gone,' 'went').

Signup and view all the flashcards

Lemma Query (Concordance)

Finds all word forms of the base form of a word.

Signup and view all the flashcards

Phrase Query (Concordance)

Finds a sequence of words exactly as typed. It will not include other word forms.

Signup and view all the flashcards

Word Query (Concordance)

Finds a word form exactly as typed.

Signup and view all the flashcards

Character Query (Concordance)

Finds tokens that contain the character(s) you enter, focusing on punctuation.

Signup and view all the flashcards

Wildcard Use in Concordance

Wildcards find words with specific character sequences, e.g., *.on finds words ending in 'on'.

Signup and view all the flashcards

Subcorpus

Limits searches to specific corpus parts (e.g., genre, topic, domain).

Signup and view all the flashcards

Macro (in Corpus Analysis)

Saves search criteria for repeated use.

Signup and view all the flashcards

Filter Context

Keeps only lines that meet specific conditions.

Signup and view all the flashcards

Text Types (in Corpus)

Excludes or includes specific document types or corpus parts.

Signup and view all the flashcards

Tags (POS, Morphological)

Labels assigned to words indicating part of speech, grammatical categories, and morphological information.

Signup and view all the flashcards

Wordlist Tool

Generates frequency lists of words, lemmas, tags, etc.

Signup and view all the flashcards

Compiling Word Frequency

Log in, select a corpus, and generate the list (defaults show most frequent words).

Signup and view all the flashcards

Phraseological Items

Multi-word units that combine lexical (vocabulary) and grammatical elements.

Signup and view all the flashcards

Lexico-grammatical Approach

Teaching lexicon (vocabulary) and grammar together, reflecting how they function in real language use.

Signup and view all the flashcards

Corpus Examples: 'smile' & 'love'

The verb 'smile' is frequently used in the past simple tense; 'love' is rarely in continuous aspect.

Signup and view all the flashcards

Usage-Based Approaches

Using real-world texts and focusing on authentic communication to teach languages.

Signup and view all the flashcards

Integrating Corpus Use

Incorporating corpora (large language databases) to provide authentic language examples and data.

Signup and view all the flashcards

Benefits of Authentic Language

Authentic materials offer real-world language, helping avoid reliance on intuition or insufficient textbook content.

Signup and view all the flashcards

Prototypical Construction to Abstraction

Learners identify patterns and relationships in language through repeated exposure to similar constructions.

Signup and view all the flashcards

Continuum between lexicon and gramma

Shifting focus from communicative approach to integrating lexis and gramma.

Signup and view all the flashcards

Word Sketch Difference

A Sketch Engine tool to compare how words differ in their typical usage by showing which words they commonly appear with.

Signup and view all the flashcards

Concordance Tool

A Sketch Engine tool to find examples of how a word or phrase is used in context.

Signup and view all the flashcards

Authentic Material Adaptation

The process of revising learning materials to be more representative of real-world language use and communication.

Signup and view all the flashcards

Literal vs. Figurative Usage

Investigating whether a word is used literally (its exact definition) or figuratively (in a metaphorical sense).

Signup and view all the flashcards

Word Sketch Simple Search

A visual summary in Sketch Engine showing the most frequent words that modify a given word.

Signup and view all the flashcards

Part-of-Speech Distinction

The distribution of a word's occurrences as different parts of speech (noun, verb, adjective, etc.).

Signup and view all the flashcards

Lexical Variability

Using corpus data to identify communication preferences and typical lexical choices in communicative situations

Signup and view all the flashcards

Advanced Search by POS

A search strategy in Sketch Engine that focuses filtering search results based on the part of speech.

Signup and view all the flashcards

Study Notes

Corpus Definition

  • It is a large, principled collection of naturally occurring language, usually stored electronically.
  • A large collection/database of machine-readable texts involving natural discourse in diverse contexts.
  • Discourses can be spoken, written, computer-mediated, spontaneous, or scripted.
  • It represents a variety of genres such as everyday conversations, lectures, seminars, meetings, radio/television programs, and essays.
  • Texts must have been produced in a natural communicative setting.
  • Texts are spoken/written for some authentic communicative purpose.

Corpus Linguistics Definition

  • It's the study of language in use through corpus analysis.
  • A whole system of methods and principles of how to apply corpora in language studies and teaching/learning.

General/Generalized Corpora Definition

  • These are very large corpora aiming to represent language as a whole.
  • They often contain more than 10 million words.
  • They encompass a variety of language.
  • Findings from it are somewhat generalized.
  • Examples include:
    • British National Corpus (BNC)
    • American National Corpus (ANC)
    • COCA
    • Written texts (newspaper/magazine articles); works of fiction/nonfiction.
    • Writings from scholarly journals.
    • Spoken transcripts (informal conversations, government proceedings, business meetings)

Specialized Corpora Definition

  • This represents a particular part of a language.
    • Represents language of a particular subject field or dialect.
  • It contains texts of a certain type.
  • It aims to be representative of the language of this type.
  • It can be large/small.
  • Often created to answer very specific questions.
  • Examples include:
    • Michigan Corpus of Academic Spoken English (MICASE)
    • CHILDES Corpus (only language used by children)
    • Michigan Corpus of Upperlevel Student Papers
    • Medical corpus (language used by nurses/hospital staff)

Learner Corpora Definition

  • A specialized corpus containing written texts and/or spoken transcripts of language used by students.
  • Students are currently acquiring the language.
  • It can be examined to see common errors students made.

Pedagogic Corpora Definition

  • A corpus that contains language used in classroom settings.
  • Pedagogic corpora can include academic textbooks and transcripts of classroom interactions.
  • It also includes any other written text/spoken transcript that learners encounter in an educational setting.
  • These can be used to:
  • Ensure students are learning useful language.
  • Examine teacher-student dynamics
  • Act as a self-reflective tool for teacher development

Data-Driven Learning (DDL) Definition

  • An approach to foreign language learning.
  • Most language learning is guided by teachers/textbooks.
  • It treats language as data and students as researchers undertaking guided discovery tasks.

"Principled" in Corpus Compilation/Linguistics Definition

  • A corpus follows different principles/guidelines, which can differ depending on the corpus.
  • The texts that go into the corpus must be planned.
  • The language comprised cannot be random.
  • It must be chosen according to specific characteristics.
  • Texts must be chosen so that they are useful for your research question/the aim of your corpus.
  • Depending on what you want to do with your corpus you have to choose your texts.

Basic Principles of Corpus Linguistics

  • Context: Without context words can be completely misinterpreted
    • Words need other words to convey meaning and isolated words do not carry meaning
    • "Rose" can refer to the flower or past tense of "rise."
  • Language patterns: Collocations are words that usually co-occur together.
    • Patterned nature becomes apparent through corpus analysis
    • Without one part of the collocation, the whole collocation does not make any sense anymore.
    • It is not "make homework" but "do homework".
    • It also is not "do a mistake" but "make a mistake."
    • Tense and aspect: Some verbs only stand with a specific tense or aspect.
    • "Love" can hardly ever be found with the continuous form.
  • Language in Use: Corpus linguistics studies language in use and is not concerned with prescriptive "rules."
    • Some expressions/utterances may be strictly speaking incorrect sentences with missing subjects
    • Those examples must be considered as well in the corpus.
    • Language in use might not be the one expected/what individuals learned about English.

Common Corpus Tools

  • Frequency counts (single words and multiword units)
  • Concordancing
  • Pattern identification (e.g., collocation search)
  • Corpus comparison (comparison of different corpora or parts of different corpora/the same corpus)
    • Compare language varieties (regional varieties, language register)
    • Compare general and specialized corpora to identify corpus specific terminology
    • Translation (multi-language corpora, e.g. linguee)

Research Fields and Applications for Corpus Linguistics

  • More theoretical:
    • Phraseology (study of multi-word units)
    • Lexicogrammar: connection between lexical and grammatical aspects
    • Register, language change
  • More practical:
    • (Foreign) language teaching
    • Lexicography
    • Translation (studies) (not important for the exam)
    • Language for specific purposes
    • Writing assistance/language reference

Authentic and Naturally Occurring Language

  • Every utterance (be it spoken, written or transcribed) that has been produced for communication and not for the purpose of being put into a corpus
  • Language that has been produced in natural communicative settings.
  • Authenticity also implies that researchers want language material produced by native speakers
    • Not all corpora have to be like that
    • Some corpora look at the language of English learners

Corpus Access

  • Download corpus + download corpus software
  • Download corpus + access corpus software online
  • Access corpus + corpus software online

Corpus Compilation

  • Collect texts in accordance with purpose of corpus/research interests
  • Use texts from books, newspapers, journals, transcriptions, texts written by language learners, etc.
  • Use texts from the web.

Useful Corpora for Language Teaching and Learning

  • Use online corpus tools like the Sketch Engine or corpus tools you can download to your computer
  • General corpora for working on lexical and grammar skills include:
  • BNC
  • enTenTen
  • English Corpus for SkELL
  • Open American National Corpus
  • Specialized corpora include:
    • For working on current topics or for bilingual subject teaching
  • Brexit corpus
  • ScienceBlogs
  • Environment corpus
  • e-flux
  • EcoLexicon English Corpus
  • Literary corpora:
    • For working on literary skills
    • Project Gutenberg English
    • English Drama Corpus
    • Shakespeare English Drama Corpus

Sketch Engine and LancsBox Differences

  • Sketch Engine:
  • Has online accessibility (servers in CZ)
  • Around 700 pre-loaded corpora
  • Supports more than 100 languages
  • Has free access (no word limits for own corpora) - Can build corpora from own files and web sources (websites, URLs, and seed words)
    • Usability only requires basic knowledge of corpus usage
  • LancsBox:
    • Is requires local storage on device
    • Contains 7 pre-loaded corpora
    • Only supports 2 languages
    • Costs around 100€ per year (including up to 1 million words for own corpora) - Can build corpora from own files and web sources (websites and URLs but NO seed words)
      • Benefits from search option require sound knowledge of corpus linguistics (and statistics)

Sketch Engine "Concordance" Tool

  • Collection of word-form occurrences in own textual environment.
  • Provides a word-form index and reference to occurrence place in text.
  • Helps in finding words, phrases, tags, documents, text types, or corpus structures.
  • Displays results in context in the form of concordance.
  • Concordance can be sorted, filtered, counted, and processed further.
  • It obtains the desired result.
  • One of most powerful tools.
  • Tediousness to analyze and interpret may arise when used with large corpora.

"Concordance" Tool Query Types

  • KWIC=Keywords in context - concordance lines
  • Simple: Automatic search for all forms of a word with base form
  • Lemma: Finds all word forms of the lemma (base form)
  • Phrase: Finds a phrase composed of several tokens (words) exactly as typed; no other word forms included
  • Word: Finds a word form exactly as typed
  • Character: Finds tokens (words) which contain the character(s)
    • looks for the actual punctuation
  • CQL:Uses Corpus Query Language for complex criteria
    • makes use of part-of-speech tags and regular expressions
      • collection of special symbols to search for patterns, instead of specific characters

"Concordance" Additional Search Functions

  • Subcorpus: Limits search to certain parts of the divided corpus
    • genre legal/news/blog/fiction
    • topic science/religion/sports/politics/tourism
    • domain .uk/.us/.au
  • Macro: Saves same-criteria searches
  • Filter context: Keeps lines fulfilling additional conditions
  • Text types: Helps exclude or include specific documents or parts of corpus

Tags

  • Also called part-of-speech tag, POS tag or morphological tag.
  • Assigned to token in an annotated corpus.
  • Indicates the part of speech and often also grammatical categories and morphological information.

Sketch Engine “Wordlist” Tool

  • Used to generate frequency lists of all kinds:
    • Lists of words
    • Lemmas
    • Nouns, verbs, tags
    • Words containing/not containing certain characters

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Explore corpus linguistics: methodologies, verb forms, corpus comparison, and multi-word units. Learn about tool functions, deviations from grammar, collocations, and applications in lexicogrammar. Also, study concordance tool search functions and the use of macros in corpus analysis.

More Like This

Close Reading Quiz
42 questions

Close Reading Quiz

ContrastyConsciousness avatar
ContrastyConsciousness
Introduction to Corpus Linguistics Quiz
10 questions
Corpus Linguistics 08
41 questions
Use Quizgecko on...
Browser
Browser