Corpus Linguistics 09
45 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary purpose of concordance in corpus linguistics?

  • To develop algorithms for machine translation.
  • To analyze the frequency of words and their patterns of use. (correct)
  • To create dictionaries of rare or specialized vocabulary.
  • To identify grammatical errors in written text.
  • Which of the following is NOT a characteristic of a corpus?

  • Designed for artistic expression rather than analysis. (correct)
  • Compiled following a specific design.
  • Structured and typically annotated.
  • Collection of linguistic data.
  • What term describes the analysis of language data using statistical methods to identify patterns?

  • Lexical analysis.
  • Semantic analysis.
  • Quantitative data analysis. (correct)
  • Qualitative data analysis.
  • Which of the following best describes the qualitative approach to corpus linguistics?

    <p>Analyzing the meaning and context of words and phrases. (B)</p> Signup and view all the answers

    What is the purpose of POS (Part-of-Speech) tagging in corpus linguistics?

    <p>To identify the frequency of different grammatical categories. (C)</p> Signup and view all the answers

    What is the significance of the example 'outdoor/outdooring (V/N): the bringing ‘out of doors’ of a child after seven days.' in the lecture?

    <p>It illustrates how quantitative data can be used to explore language usage. (C)</p> Signup and view all the answers

    What is the relationship between quantitative and qualitative approaches in corpus linguistics?

    <p>They are often complementary and can be used together. (D)</p> Signup and view all the answers

    Which of the following are popular corpus families used in corpus linguistics? (Select all that apply)

    <p>BNC (A), COCA (B), GloWbE (C), BROWN (D)</p> Signup and view all the answers

    What is the main point of the provided text?

    <p>To prove that the concept of &quot;outdooring&quot; is mainly a Ghanaian tradition. (A)</p> Signup and view all the answers

    What is the significance of the word "outdooring" being used for various events such as political party launches and product launches?

    <p>It suggests a shift in meaning, indicating that the term has become more widely applicable. (C)</p> Signup and view all the answers

    What does the text suggest about the term "outdooring" in relation to other English-speaking countries besides Ghana?

    <p>It remains a primarily Ghanaian expression. (A)</p> Signup and view all the answers

    What is the most significant difference between 'types' and 'tokens' in the context of language analysis?

    <p>Types refer to unique words, while tokens refer to all instances of words, including repetitions. (D)</p> Signup and view all the answers

    What is the approximate type-token ratio (TTR) of the ICE GB corpus?

    <p>3.2% (D)</p> Signup and view all the answers

    What is a "frequency analysis"?

    <p>A process of counting the number of times a specific word or phrase appears in a corpus. (B)</p> Signup and view all the answers

    How does the size of a corpus influence the type-token ratio (TTR)?

    <p>Smaller corpora have a higher TTR because there is more lexical diversity. (C)</p> Signup and view all the answers

    Which of the following is NOT mentioned as an example of how the word "outdooring" has been used?

    <p>The unveiling of a company's new brand identity. (B)</p> Signup and view all the answers

    Why is it important to normalize frequency data when comparing frequencies across different corpora?

    <p>To account for differences in the number of tokens in each corpus. (D)</p> Signup and view all the answers

    What is the purpose of comparing the frequency of "outdooring" in GloWbE and NOW corpora?

    <p>To support the claim that &quot;outdooring&quot; is primarily a Ghanaian expression. (A)</p> Signup and view all the answers

    What does 'per-X-word frequency' refer to?

    <p>The frequency of a word per unit of text, such as per million words or per thousand words. (A)</p> Signup and view all the answers

    What is the significance of the phrase "semantic shift" as used in the text?

    <p>It describes the process of a word acquiring a new meaning. (A)</p> Signup and view all the answers

    What type of data is primarily used to analyze the term "outdooring" in the provided text?

    <p>Quantitative data, focusing on numerical statistics. (B)</p> Signup and view all the answers

    What is the purpose of calculating 'per-million-word' (pmw) frequencies?

    <p>To compare the relative frequency of words between different corpora, irrespective of their size. (B)</p> Signup and view all the answers

    Which statement accurately describes the COHA corpus?

    <p>COHA is a corpus of written language collected from a variety of sources, mainly from the 1930s-1980s. (A)</p> Signup and view all the answers

    What is the approximate size of the ICE GB spoken corpus?

    <p>600,000 words (A)</p> Signup and view all the answers

    Which variety of English is expected to exhibit the strongest influence from American English?

    <p>Canada (D)</p> Signup and view all the answers

    What is the primary focus of lists that compare British and American English words?

    <p>Categorical differences (A)</p> Signup and view all the answers

    In GloWbE, how can one filter to find only nominal uses of a word?

    <p>By using _nn in the search (B)</p> Signup and view all the answers

    Which word represents the British English term for 'French fries'?

    <p>Chips (D)</p> Signup and view all the answers

    What is a challenge in researching terms like 'chips' and 'crisps'?

    <p>They have multiple meanings (D)</p> Signup and view all the answers

    Which component of GloWbE is larger compared to JM or TZ components?

    <p>US component (B)</p> Signup and view all the answers

    What adjective might accompany the noun 'aubergine' in English usage?

    <p>Eggplant (B)</p> Signup and view all the answers

    Which of the following regions is least likely to show American English influence?

    <p>Great Britain (C)</p> Signup and view all the answers

    What is the key difference between a 'type' and a 'token' in corpus linguistics?

    <p>A 'type' refers to a unique word form, while a 'token' is each instance of that word form in a text. (D)</p> Signup and view all the answers

    When studying the impact of American English on other varieties of English, what is the term used to describe the process by which words or phrases from one variety are adopted into another?

    <p>Americanisation (C)</p> Signup and view all the answers

    What type of data analysis focuses on the frequency of words and their occurrences in a text?

    <p>Quantitative analysis (A)</p> Signup and view all the answers

    What does the acronym 'PMW' stand for in the context of language studies?

    <p>Phrase Meaning Word (C)</p> Signup and view all the answers

    What does the Type/Token Ratio (TTR) measure?

    <p>The diversity of vocabulary used in a text. (C)</p> Signup and view all the answers

    What is the purpose of the 'Frequency' section?

    <p>To illustrate how the term 'television' has become more common over time. (C)</p> Signup and view all the answers

    What is a 'collocation'?

    <p>A statistical association between two or more words. (A)</p> Signup and view all the answers

    What is the purpose of setting a 'minimum collocate frequency' in AntConc?

    <p>To filter out irrelevant collocations that occur too infrequently. (B)</p> Signup and view all the answers

    What is a key argument presented regarding the 'Americanization of English'?

    <p>American English is rapidly becoming the dominant form of English globally. (C)</p> Signup and view all the answers

    Identify a factor contributing to the 'Americanization of English' based on the provided content.

    <p>The popularity of American media, such as movies and music. (D)</p> Signup and view all the answers

    What is the primary purpose of the 'Case Studies' section?

    <p>To explore the reasons behind the increasing influence of American English. (A)</p> Signup and view all the answers

    Based on the content, what is a potential reason for the increasing influence of American English?

    <p>The global reach of American media and cultural products. (B)</p> Signup and view all the answers

    The provided information focuses primarily on the analysis of:

    <p>The influence of American English on other varieties. (D)</p> Signup and view all the answers

    Flashcards

    Corpus Linguistics

    The empirical analysis of authentic language data using structured collections of linguistic data (corpora).

    Corpus

    A collection of linguistic data compiled with a specific design, often structured and annotated.

    Corpus Typology

    Classification of corpora based on features like written/spoken or synchronic/diachronic.

    Concordancing

    A method in corpus linguistics that shows keywords in context to analyze usage patterns.

    Signup and view all the flashcards

    AntConc

    A frequently-used software tool in corpus linguistics for analyzing linguistic data.

    Signup and view all the flashcards

    Part-of-Speech (POS) Tagging

    A process that adds grammatical information to elements within a corpus, helping identify their roles in sentences.

    Signup and view all the flashcards

    Quantitative Approach

    An analytical method focusing on counting occurrences and comparing frequencies, often using statistics.

    Signup and view all the flashcards

    Qualitative Approach

    An analytical method that focuses not on frequency, but on describing language usage examples from the data.

    Signup and view all the flashcards

    Outdooring

    A Ghanaian tradition marking the official introduction of a newborn into the community.

    Signup and view all the flashcards

    GloWbE

    Global Web-Based English corpus used for linguistic analysis.

    Signup and view all the flashcards

    Frequency Analysis

    A method to count occurrences of words or phrases in linguistic data.

    Signup and view all the flashcards

    Semantic Shift

    Changes in the meanings of words or phrases over time.

    Signup and view all the flashcards

    Libation

    A ritual pouring of a drink as an offering to spirits or deities.

    Signup and view all the flashcards

    Outdoored (verb)

    The act of officially introducing something or someone publicly.

    Signup and view all the flashcards

    Meanings in Context

    Understanding a word's meaning based on its use in specific situations.

    Signup and view all the flashcards

    Americanisation

    The process of adapting language, culture, or practices to the American style.

    Signup and view all the flashcards

    Collocation

    A natural combination of words that often appear together in a language.

    Signup and view all the flashcards

    Type/Token Ratio (TTR)

    A measure comparing unique words (types) to total words (tokens) in a text.

    Signup and view all the flashcards

    Qualitative vs Quantitative

    Qualitative focuses on descriptive data while quantitative emphasizes numerical analysis.

    Signup and view all the flashcards

    Word frequency list

    A compiled list showing how often specific words appear in a corpus.

    Signup and view all the flashcards

    Minimum collocate frequency

    The least number of times a word must appear to show collocations in a search.

    Signup and view all the flashcards

    Window span

    The number of words to include to the left or right of a searched word when finding collocations.

    Signup and view all the flashcards

    Americani[sz]ation of English

    The process by which American English becomes more dominant globally.

    Signup and view all the flashcards

    Pingo

    A platform or tool used for linguistic analysis and searching for terms in corpora.

    Signup and view all the flashcards

    Globalisation in language

    The spread of language and its variants across the world due to interconnectedness.

    Signup and view all the flashcards

    Predictable combinations

    Word pairings that sound right to native speakers, based on learned use.

    Signup and view all the flashcards

    American English Forms

    Variations of English influenced by American usage found globally.

    Signup and view all the flashcards

    Canadian English

    A variety of English influenced heavily by American English and local factors, prevalent in Canada.

    Signup and view all the flashcards

    Philippine English

    A variety of English in the Philippines, showing American English influence due to historical ties.

    Signup and view all the flashcards

    British vs American English Lexis

    Comparison of lexical items differing in British and American English usage.

    Signup and view all the flashcards

    Nominal Uses

    Refers to the use of words specifically as nouns within text analysis.

    Signup and view all the flashcards

    Normalized Frequencies

    Frequency counts adjusted for corpus size, allowing for fair comparisons across varieties.

    Signup and view all the flashcards

    Plural Forms Analysis

    The study of words in their pluralized versions to understand variations better.

    Signup and view all the flashcards

    Token

    The total number of words or constructions in a text.

    Signup and view all the flashcards

    Type

    The total number of different words or constructions in a text.

    Signup and view all the flashcards

    Lexical Variation

    The diversity of vocabulary used in a text.

    Signup and view all the flashcards

    Normalization

    Adjusting data for comparison by using a per-word frequency count.

    Signup and view all the flashcards

    Per-million-word frequency (pmw)

    A normalization method showing occurrences per million words.

    Signup and view all the flashcards

    Per-thousand-word frequency (ptw)

    A normalization method showing occurrences per thousand words.

    Signup and view all the flashcards

    Corpus Size Effect on TTR

    TTR is usually higher in smaller corpora due to fewer total words diluting the types.

    Signup and view all the flashcards

    Study Notes

    Corpus Linguistics (2)

    • Corpus linguistics is the empirical analysis of authentic language data using corpora
    • A corpus is a collection of linguistic data, compiled following a design, often structured and annotated
    • Corpus typology can include written/spoken, synchronic/diachronic data
    • Frequently used corpus families include BROWN, ICE, COCA, BNC, GloWbE, and NOW
    • Concordance is a method in corpus linguistics to highlight keywords in context
    • AntConc is frequently used software in corpus linguistics
    • Part-of-speech (POS) tagging provides grammatical information in a corpus

    Today's Lecture

    • Frequency
    • Collocations
    • Case Studies

    1 Frequency: Quantitative and Qualitative Data Analysis

    • Concordances are central to corpus linguistics
    • Quantitative approaches count occurrences, compare frequencies, and use statistics to find patterns
    • Qualitative approaches focus on how often a feature occurs, but rather to identify and describe language usage in context
    • Data used as a basis for describing language usage providing real-life examples of phenomena

    1 Frequency: Quantitative Data Analysis

    • 'Outdooring' (V/N): the bringing 'out of doors' of a child after seven days

      • GloWbE: 72 hits (1 from Canada, 71 from Ghana)
      • NOW: 339 hits (1 each from Canada, Nigeria, South Africa and US; 336 from Ghana)
    • 'Outdoored' (verb)

      • GloWbE: 64 hits (1 from Nigeria, 63 from Ghana)
      • NOW: 438 hits (1 each from Kenya, Nigeria, 6 from South Africa, 430 from Ghana)
    • Good evidence that the tradition of 'outdooring' is very Ghanaian

    • The 'outdooring' and naming ceremony starts when a family elder pours libation

    • A child is considered human after being outdoored

    • NDC (National Democratic Congress) succeeded since its outdooring in Cape Coast in 1992

    • African Union (AU) replaced the Organisation of African Unity (OAU) in 2002

    • Frequency analysis among the most common in corpus linguistics

    • Find all instances of a construction (e.g., in GloWbE)

    • Find words and spelling variations

    • All word forms for the lemma 'GIVE'

    • Frequency of fixed expressions (e.g., merry Christmas vs. happy Christmas)

    • Comparison of frequencies across varieties and over time (e.g., GloWbE; "GO on holiday" vs. "GO on vacation," COHA "telephone" vs. "phone")

    1 Frequency: Types and Tokens

    • Token: total number of words/constructions in a text, corpus or sub-corpus
    • Type: total number of different words/ constructions in a text, corpus, or sub-corpus
    • ICE GB: 1,071,926 tokens and 34,421 types
    • Each sample contains ~2,000 words, with complete sentences
    • Tags are included in the token count

    1 Frequency: Type-Token Ratio (TTR)

    • TTR: a measure of diversification
    • Calculated as the number of types divided by the number of tokens, multiplied by 100
    • TTR for ICE GB is approximately 3.2
    • TTR is strongly dependent on corpus size, usually higher in smaller corpora

    1 Frequency: Comparing Frequencies

    • Corpora often have different sizes

      • Compare frequencies, normalize data
      • Calculate per-X-word frequency
        • per-million-word (pmw), per-thousand-word (ptw)
    • Example: Compare frequency of "Scotland" in spoken vs. written sections of ICE-GB

    Activity 2: Comparing Frequencies

    • COHA search for "television" yields results by decade

    2 Collocations: Introduction

    • Collocations: predictable word combinations
    • How to determine predictablility? Native speaker intuition & learned construction
    • Examples include "fast food" vs. "slow food," "quick food" vs. "unhurried food," "quick shower" vs. "fast shower"

    2 Collocations: AntConc

    • Create word frequency list in the Word list tab
    • Enter search term (word, construction or regular expression) in the Collocates tab
    • Set the window span (e.g., 1L, 3R)
    • Choose minimum collocate frequency (n)

    3 Case Studies: The Americanization of English

    • Many factors attribute to American English gaining prominence (e.g., globalization, popular culture, media exposure)
    • American English usage observed in other varieties
    • Likely strongest in Canada and Philippines, least likely in Great Britain

    3 Case Studies: The Americanization of English - Lexis

    • Dozens of words list differing between British and American English
    • Categorical lists for concepts (cf. OALD)
    • Examples include "chips" vs. "fries," "crisp" vs. "chip", "aubergine" vs. "eggplant"

    3 Case Studies: The Americanization of English - Lexis (GloWbE)

    • Include nouns only in analysis
    • Extend analysis to include plural forms, capitalized terms
    • Examine normalized frequencies (GloWbE components - US/GB/others)

    3 Case Studies: The Americanization of English - Past Tense

    • Examine past tense forms using GloWbE

    Keywords

    • Americanization
    • Collocation
    • Construction
    • Frequency
    • PMW
    • Qualitative
    • Quantitative
    • Semantic shift
    • Token
    • Type
    • Type/token ratio (TTR)

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Corpus Linguistics (2) PDF

    Description

    Test your understanding of key concepts in corpus linguistics with this quiz. It covers various aspects, including the purpose of concordance, analysis methods, and characteristics of a corpus. Perfect for students and enthusiasts of linguistics.

    More Like This

    Introduction to Corpus Linguistics Quiz
    10 questions
    Corpus Linguistics for Translators Quiz
    16 questions
    Modern Corpus Linguistics Overview
    6 questions
    Corpus Linguistics 08
    41 questions
    Use Quizgecko on...
    Browser
    Browser