Corpus Linguistics 09

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary purpose of concordance in corpus linguistics?

  • To develop algorithms for machine translation.
  • To analyze the frequency of words and their patterns of use. (correct)
  • To create dictionaries of rare or specialized vocabulary.
  • To identify grammatical errors in written text.

Which of the following is NOT a characteristic of a corpus?

  • Designed for artistic expression rather than analysis. (correct)
  • Compiled following a specific design.
  • Structured and typically annotated.
  • Collection of linguistic data.

What term describes the analysis of language data using statistical methods to identify patterns?

  • Lexical analysis.
  • Semantic analysis.
  • Quantitative data analysis. (correct)
  • Qualitative data analysis.

Which of the following best describes the qualitative approach to corpus linguistics?

<p>Analyzing the meaning and context of words and phrases. (B)</p> Signup and view all the answers

What is the purpose of POS (Part-of-Speech) tagging in corpus linguistics?

<p>To identify the frequency of different grammatical categories. (C)</p> Signup and view all the answers

What is the significance of the example 'outdoor/outdooring (V/N): the bringing ‘out of doors’ of a child after seven days.' in the lecture?

<p>It illustrates how quantitative data can be used to explore language usage. (C)</p> Signup and view all the answers

What is the relationship between quantitative and qualitative approaches in corpus linguistics?

<p>They are often complementary and can be used together. (D)</p> Signup and view all the answers

Which of the following are popular corpus families used in corpus linguistics? (Select all that apply)

<p>BNC (A), COCA (B), GloWbE (C), BROWN (D)</p> Signup and view all the answers

What is the main point of the provided text?

<p>To prove that the concept of &quot;outdooring&quot; is mainly a Ghanaian tradition. (A)</p> Signup and view all the answers

What is the significance of the word "outdooring" being used for various events such as political party launches and product launches?

<p>It suggests a shift in meaning, indicating that the term has become more widely applicable. (C)</p> Signup and view all the answers

What does the text suggest about the term "outdooring" in relation to other English-speaking countries besides Ghana?

<p>It remains a primarily Ghanaian expression. (A)</p> Signup and view all the answers

What is the most significant difference between 'types' and 'tokens' in the context of language analysis?

<p>Types refer to unique words, while tokens refer to all instances of words, including repetitions. (D)</p> Signup and view all the answers

What is the approximate type-token ratio (TTR) of the ICE GB corpus?

<p>3.2% (D)</p> Signup and view all the answers

What is a "frequency analysis"?

<p>A process of counting the number of times a specific word or phrase appears in a corpus. (B)</p> Signup and view all the answers

How does the size of a corpus influence the type-token ratio (TTR)?

<p>Smaller corpora have a higher TTR because there is more lexical diversity. (C)</p> Signup and view all the answers

Which of the following is NOT mentioned as an example of how the word "outdooring" has been used?

<p>The unveiling of a company's new brand identity. (B)</p> Signup and view all the answers

Why is it important to normalize frequency data when comparing frequencies across different corpora?

<p>To account for differences in the number of tokens in each corpus. (D)</p> Signup and view all the answers

What is the purpose of comparing the frequency of "outdooring" in GloWbE and NOW corpora?

<p>To support the claim that &quot;outdooring&quot; is primarily a Ghanaian expression. (A)</p> Signup and view all the answers

What does 'per-X-word frequency' refer to?

<p>The frequency of a word per unit of text, such as per million words or per thousand words. (A)</p> Signup and view all the answers

What is the significance of the phrase "semantic shift" as used in the text?

<p>It describes the process of a word acquiring a new meaning. (A)</p> Signup and view all the answers

What type of data is primarily used to analyze the term "outdooring" in the provided text?

<p>Quantitative data, focusing on numerical statistics. (B)</p> Signup and view all the answers

What is the purpose of calculating 'per-million-word' (pmw) frequencies?

<p>To compare the relative frequency of words between different corpora, irrespective of their size. (B)</p> Signup and view all the answers

Which statement accurately describes the COHA corpus?

<p>COHA is a corpus of written language collected from a variety of sources, mainly from the 1930s-1980s. (A)</p> Signup and view all the answers

What is the approximate size of the ICE GB spoken corpus?

<p>600,000 words (A)</p> Signup and view all the answers

Which variety of English is expected to exhibit the strongest influence from American English?

<p>Canada (D)</p> Signup and view all the answers

What is the primary focus of lists that compare British and American English words?

<p>Categorical differences (A)</p> Signup and view all the answers

In GloWbE, how can one filter to find only nominal uses of a word?

<p>By using _nn in the search (B)</p> Signup and view all the answers

Which word represents the British English term for 'French fries'?

<p>Chips (D)</p> Signup and view all the answers

What is a challenge in researching terms like 'chips' and 'crisps'?

<p>They have multiple meanings (D)</p> Signup and view all the answers

Which component of GloWbE is larger compared to JM or TZ components?

<p>US component (B)</p> Signup and view all the answers

What adjective might accompany the noun 'aubergine' in English usage?

<p>Eggplant (B)</p> Signup and view all the answers

Which of the following regions is least likely to show American English influence?

<p>Great Britain (C)</p> Signup and view all the answers

What is the key difference between a 'type' and a 'token' in corpus linguistics?

<p>A 'type' refers to a unique word form, while a 'token' is each instance of that word form in a text. (D)</p> Signup and view all the answers

When studying the impact of American English on other varieties of English, what is the term used to describe the process by which words or phrases from one variety are adopted into another?

<p>Americanisation (C)</p> Signup and view all the answers

What type of data analysis focuses on the frequency of words and their occurrences in a text?

<p>Quantitative analysis (A)</p> Signup and view all the answers

What does the acronym 'PMW' stand for in the context of language studies?

<p>Phrase Meaning Word (C)</p> Signup and view all the answers

What does the Type/Token Ratio (TTR) measure?

<p>The diversity of vocabulary used in a text. (C)</p> Signup and view all the answers

What is the purpose of the 'Frequency' section?

<p>To illustrate how the term 'television' has become more common over time. (C)</p> Signup and view all the answers

What is a 'collocation'?

<p>A statistical association between two or more words. (A)</p> Signup and view all the answers

What is the purpose of setting a 'minimum collocate frequency' in AntConc?

<p>To filter out irrelevant collocations that occur too infrequently. (B)</p> Signup and view all the answers

What is a key argument presented regarding the 'Americanization of English'?

<p>American English is rapidly becoming the dominant form of English globally. (C)</p> Signup and view all the answers

Identify a factor contributing to the 'Americanization of English' based on the provided content.

<p>The popularity of American media, such as movies and music. (D)</p> Signup and view all the answers

What is the primary purpose of the 'Case Studies' section?

<p>To explore the reasons behind the increasing influence of American English. (A)</p> Signup and view all the answers

Based on the content, what is a potential reason for the increasing influence of American English?

<p>The global reach of American media and cultural products. (B)</p> Signup and view all the answers

The provided information focuses primarily on the analysis of:

<p>The influence of American English on other varieties. (D)</p> Signup and view all the answers

Flashcards

Corpus Linguistics

The empirical analysis of authentic language data using structured collections of linguistic data (corpora).

Corpus

A collection of linguistic data compiled with a specific design, often structured and annotated.

Corpus Typology

Classification of corpora based on features like written/spoken or synchronic/diachronic.

Concordancing

A method in corpus linguistics that shows keywords in context to analyze usage patterns.

Signup and view all the flashcards

AntConc

A frequently-used software tool in corpus linguistics for analyzing linguistic data.

Signup and view all the flashcards

Part-of-Speech (POS) Tagging

A process that adds grammatical information to elements within a corpus, helping identify their roles in sentences.

Signup and view all the flashcards

Quantitative Approach

An analytical method focusing on counting occurrences and comparing frequencies, often using statistics.

Signup and view all the flashcards

Qualitative Approach

An analytical method that focuses not on frequency, but on describing language usage examples from the data.

Signup and view all the flashcards

Outdooring

A Ghanaian tradition marking the official introduction of a newborn into the community.

Signup and view all the flashcards

GloWbE

Global Web-Based English corpus used for linguistic analysis.

Signup and view all the flashcards

Frequency Analysis

A method to count occurrences of words or phrases in linguistic data.

Signup and view all the flashcards

Semantic Shift

Changes in the meanings of words or phrases over time.

Signup and view all the flashcards

Libation

A ritual pouring of a drink as an offering to spirits or deities.

Signup and view all the flashcards

Outdoored (verb)

The act of officially introducing something or someone publicly.

Signup and view all the flashcards

Meanings in Context

Understanding a word's meaning based on its use in specific situations.

Signup and view all the flashcards

Americanisation

The process of adapting language, culture, or practices to the American style.

Signup and view all the flashcards

Collocation

A natural combination of words that often appear together in a language.

Signup and view all the flashcards

Type/Token Ratio (TTR)

A measure comparing unique words (types) to total words (tokens) in a text.

Signup and view all the flashcards

Qualitative vs Quantitative

Qualitative focuses on descriptive data while quantitative emphasizes numerical analysis.

Signup and view all the flashcards

Word frequency list

A compiled list showing how often specific words appear in a corpus.

Signup and view all the flashcards

Minimum collocate frequency

The least number of times a word must appear to show collocations in a search.

Signup and view all the flashcards

Window span

The number of words to include to the left or right of a searched word when finding collocations.

Signup and view all the flashcards

Americani[sz]ation of English

The process by which American English becomes more dominant globally.

Signup and view all the flashcards

Pingo

A platform or tool used for linguistic analysis and searching for terms in corpora.

Signup and view all the flashcards

Globalisation in language

The spread of language and its variants across the world due to interconnectedness.

Signup and view all the flashcards

Predictable combinations

Word pairings that sound right to native speakers, based on learned use.

Signup and view all the flashcards

American English Forms

Variations of English influenced by American usage found globally.

Signup and view all the flashcards

Canadian English

A variety of English influenced heavily by American English and local factors, prevalent in Canada.

Signup and view all the flashcards

Philippine English

A variety of English in the Philippines, showing American English influence due to historical ties.

Signup and view all the flashcards

British vs American English Lexis

Comparison of lexical items differing in British and American English usage.

Signup and view all the flashcards

Nominal Uses

Refers to the use of words specifically as nouns within text analysis.

Signup and view all the flashcards

Normalized Frequencies

Frequency counts adjusted for corpus size, allowing for fair comparisons across varieties.

Signup and view all the flashcards

Plural Forms Analysis

The study of words in their pluralized versions to understand variations better.

Signup and view all the flashcards

Token

The total number of words or constructions in a text.

Signup and view all the flashcards

Type

The total number of different words or constructions in a text.

Signup and view all the flashcards

Lexical Variation

The diversity of vocabulary used in a text.

Signup and view all the flashcards

Normalization

Adjusting data for comparison by using a per-word frequency count.

Signup and view all the flashcards

Per-million-word frequency (pmw)

A normalization method showing occurrences per million words.

Signup and view all the flashcards

Per-thousand-word frequency (ptw)

A normalization method showing occurrences per thousand words.

Signup and view all the flashcards

Corpus Size Effect on TTR

TTR is usually higher in smaller corpora due to fewer total words diluting the types.

Signup and view all the flashcards

Study Notes

Corpus Linguistics (2)

  • Corpus linguistics is the empirical analysis of authentic language data using corpora
  • A corpus is a collection of linguistic data, compiled following a design, often structured and annotated
  • Corpus typology can include written/spoken, synchronic/diachronic data
  • Frequently used corpus families include BROWN, ICE, COCA, BNC, GloWbE, and NOW
  • Concordance is a method in corpus linguistics to highlight keywords in context
  • AntConc is frequently used software in corpus linguistics
  • Part-of-speech (POS) tagging provides grammatical information in a corpus

Today's Lecture

  • Frequency
  • Collocations
  • Case Studies

1 Frequency: Quantitative and Qualitative Data Analysis

  • Concordances are central to corpus linguistics
  • Quantitative approaches count occurrences, compare frequencies, and use statistics to find patterns
  • Qualitative approaches focus on how often a feature occurs, but rather to identify and describe language usage in context
  • Data used as a basis for describing language usage providing real-life examples of phenomena

1 Frequency: Quantitative Data Analysis

  • 'Outdooring' (V/N): the bringing 'out of doors' of a child after seven days

    • GloWbE: 72 hits (1 from Canada, 71 from Ghana)
    • NOW: 339 hits (1 each from Canada, Nigeria, South Africa and US; 336 from Ghana)
  • 'Outdoored' (verb)

    • GloWbE: 64 hits (1 from Nigeria, 63 from Ghana)
    • NOW: 438 hits (1 each from Kenya, Nigeria, 6 from South Africa, 430 from Ghana)
  • Good evidence that the tradition of 'outdooring' is very Ghanaian

  • The 'outdooring' and naming ceremony starts when a family elder pours libation

  • A child is considered human after being outdoored

  • NDC (National Democratic Congress) succeeded since its outdooring in Cape Coast in 1992

  • African Union (AU) replaced the Organisation of African Unity (OAU) in 2002

  • Frequency analysis among the most common in corpus linguistics

  • Find all instances of a construction (e.g., in GloWbE)

  • Find words and spelling variations

  • All word forms for the lemma 'GIVE'

  • Frequency of fixed expressions (e.g., merry Christmas vs. happy Christmas)

  • Comparison of frequencies across varieties and over time (e.g., GloWbE; "GO on holiday" vs. "GO on vacation," COHA "telephone" vs. "phone")

1 Frequency: Types and Tokens

  • Token: total number of words/constructions in a text, corpus or sub-corpus
  • Type: total number of different words/ constructions in a text, corpus, or sub-corpus
  • ICE GB: 1,071,926 tokens and 34,421 types
  • Each sample contains ~2,000 words, with complete sentences
  • Tags are included in the token count

1 Frequency: Type-Token Ratio (TTR)

  • TTR: a measure of diversification
  • Calculated as the number of types divided by the number of tokens, multiplied by 100
  • TTR for ICE GB is approximately 3.2
  • TTR is strongly dependent on corpus size, usually higher in smaller corpora

1 Frequency: Comparing Frequencies

  • Corpora often have different sizes

    • Compare frequencies, normalize data
    • Calculate per-X-word frequency
      • per-million-word (pmw), per-thousand-word (ptw)
  • Example: Compare frequency of "Scotland" in spoken vs. written sections of ICE-GB

Activity 2: Comparing Frequencies

  • COHA search for "television" yields results by decade

2 Collocations: Introduction

  • Collocations: predictable word combinations
  • How to determine predictablility? Native speaker intuition & learned construction
  • Examples include "fast food" vs. "slow food," "quick food" vs. "unhurried food," "quick shower" vs. "fast shower"

2 Collocations: AntConc

  • Create word frequency list in the Word list tab
  • Enter search term (word, construction or regular expression) in the Collocates tab
  • Set the window span (e.g., 1L, 3R)
  • Choose minimum collocate frequency (n)

3 Case Studies: The Americanization of English

  • Many factors attribute to American English gaining prominence (e.g., globalization, popular culture, media exposure)
  • American English usage observed in other varieties
  • Likely strongest in Canada and Philippines, least likely in Great Britain

3 Case Studies: The Americanization of English - Lexis

  • Dozens of words list differing between British and American English
  • Categorical lists for concepts (cf. OALD)
  • Examples include "chips" vs. "fries," "crisp" vs. "chip", "aubergine" vs. "eggplant"

3 Case Studies: The Americanization of English - Lexis (GloWbE)

  • Include nouns only in analysis
  • Extend analysis to include plural forms, capitalized terms
  • Examine normalized frequencies (GloWbE components - US/GB/others)

3 Case Studies: The Americanization of English - Past Tense

  • Examine past tense forms using GloWbE

Keywords

  • Americanization
  • Collocation
  • Construction
  • Frequency
  • PMW
  • Qualitative
  • Quantitative
  • Semantic shift
  • Token
  • Type
  • Type/token ratio (TTR)

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Corpus Linguistics (2) PDF

More Like This

Corpus Linguistics for Translators Quiz
16 questions
Modern Corpus Linguistics Overview
6 questions
Corpus Linguistics Fundamentals
48 questions
Use Quizgecko on...
Browser
Browser