NLP Collocation Discovery

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is the primary advantage of using Likelihood Ratios over the Chi-Square test for collocation discovery?

Likelihood Ratios are computationally simpler to calculate.
Likelihood Ratios directly measure the strength of association, unlike Chi-Square.
Likelihood Ratios always yield lower p-values.
Likelihood Ratios are more suitable for sparse data. (correct)

In the context of collocation discovery using Likelihood Ratios, what null hypothesis (H1) is typically examined?

The occurrences of $w_1$ and $w_2$ are equally frequent in the corpus.
The occurrence of $w_2$ is dependent on the previous occurrence of $w_1$.
The occurrence of $w_2$ is independent of the previous occurrence of $w_1$. (correct)
The bigram $w_1 w_2$ occurs more frequently than expected by chance.

What is the role of corpora in discovering subject-specific collocations using likelihood ratios?

Larger corpora are needed to offset the effect of sparse data on the likelihood ratio test.
Comparing relative frequencies across different corpora helps identify collocations characteristic of specific subjects. (correct)
Using multiple corpora ensures that all possible collocations are identified, regardless of subject matter.
The corpora identify the grammatical relations between words in a collocation.

Which of the following best describes the primary reason for the shift towards statistical methods in NLP?

Statistical methods provide the flexibility needed to model the variability in human language use. (D) Signup and view all the answers

Which activity falls under the umbrella of Natural Language Processing (NLP)?

Developing algorithms for automatic text summarization. (A) Signup and view all the answers

Given the log-likelihood formula provided, what does the term $L(c_{12}, c_1, p_1)$ represent?

The likelihood of observing $c_{12}$ co-occurrences of $w_1$ and $w_2$, given the frequency of $w_1$ and the probability $p_1$ that $w_2$ follows $w_1$. (B) Signup and view all the answers

Under what specific condition would pointwise mutual information be an unreliable measure for collocation discovery, and what is an alternative approach better suited for this condition?

Pointwise mutual information is unreliable with sparse data; an alternative approach is to use the Likelihood Ratio test. (C) Signup and view all the answers

Which of the following is NOT typically considered a subdivision of NLP?

Phonetics (A) Signup and view all the answers

What is the main focus of 'Pragmatics' within the context of Natural Language Processing?

How context and world knowledge influence the interpretation of meaning. (B) Signup and view all the answers

What distinguishes Language Engineering from Computational Linguistics?

Language Engineering emphasizes building large-scale systems, while Computational Linguistics focuses on the research side of NLP. (B) Signup and view all the answers

Imagine a system designed to analyze customer reviews and automatically categorize them as positive, negative, or neutral. Which area of NLP is MOST directly involved in enabling this functionality?

Semantics (D) Signup and view all the answers

A highly advanced NLP system is designed to not only translate text between languages but also to adapt the translated text to suit the cultural norms and expectations of the target audience. Which aspect of NLP is MOST critical for this adaptation?

Pragmatics (C) Signup and view all the answers

What does the probability mass function for a random variable X provide?

The probability that X takes on different numeric values (D) Signup and view all the answers

Which statistical measure describes the consistency of a random variable's values across multiple trials?

Variance (C) Signup and view all the answers

What is the defining characteristic of a joint probability distribution involving two random variables, X and Y?

It represents the probability of X and Y occurring together. (D) Signup and view all the answers

In the context of estimating probabilities from data, what does the 'relative frequency of the outcome' represent?

The proportion of times a certain outcome occurs in the sample. (D) Signup and view all the answers

What is the critical difference between parametric and non-parametric approaches when modeling aspects of language or other data?

Parametric approaches assume data follow a known distribution; non-parametric approaches do not. (A) Signup and view all the answers

Which concept defines families of probability mass functions (pmfs) characterized by different constants?

Distributions (D) Signup and view all the answers

In Bayesian updating, what role does the Maximum A Posteriori (MAP) distribution play after a new datum is observed?

It serves as the new prior distribution for the subsequent update. (A) Signup and view all the answers

What is the primary purpose of using Bayesian Statistics in the context of Bayesian Decision Theory?

To determine which model or family of models best explains the observed data. (A) Signup and view all the answers

Given two models for an event, how does Bayesian statistics assess which model better explains observed data?

By calculating the likelihood ratio between the two models. (C) Signup and view all the answers

What is the primary function of lemmatization in text processing?

To transform words into their base form. (B) Signup and view all the answers

Which of the following is a common heuristic approach for sentence boundary detection?

Identifying sentence endings based on punctuation marks like '.', '?', and '!'. (D) Signup and view all the answers

In end-of-sentence detection, under what condition should a period NOT be considered an end-of-sentence marker?

If it is preceded by a known abbreviation followed by an uppercase letter. (A) Signup and view all the answers

What is the purpose of mark-up schemes?

To mark up the structure of text. (B) Signup and view all the answers

Which of the following is an example of a mark-up scheme?

SGML (A) Signup and view all the answers

What does grammatical coding (tagging) primarily indicate in text?

Various conventional parts of speech. (D) Signup and view all the answers

What is a key characteristic of collocations?

Limited compositionality, where the meaning cannot be directly derived from its individual components. (D) Signup and view all the answers

Which concept shares a large overlap with collocations?

Terms (A) Signup and view all the answers

According to the definition provided, which attribute is essential for a sequence of words to be considered a collocation?

It has characteristics of a syntactic and semantic unit, with a meaning not directly derived from its components. (D) Signup and view all the answers

Imagine you are designing a system for sentiment analysis of movie reviews. Which of the following NLP steps would be LEAST crucial in the preprocessing stage, considering the primary goal is to capture the overall emotional tone of the reviews?

End-of-sentence detection (A) Signup and view all the answers

In the context of the t-test described, what does the variable 'n' represent?

The number of scores for each system. (B) Signup and view all the answers

What is the purpose of calculating the pooled variance ($s^2$) in the t-test?

To estimate the overall variance when assuming equal variances in both samples. (D) Signup and view all the answers

If the calculated t-value is less than the critical t-value, what conclusion can be drawn?

It cannot be concluded that System 1 is superior to System 2. (C) Signup and view all the answers

Why might the t-test be criticized in the context of statistical NLP?

It assumes probabilities are normally distributed, which is not always the case. (B) Signup and view all the answers

According to the content, what is the null hypothesis for the Chi-Square test?

Observed and expected frequencies are independent. (B) Signup and view all the answers

In the Chi-Square formula, what do Oij and Eij represent?

Observed and Expected frequencies, respectively. (D) Signup and view all the answers

What is one of the early applications of the Chi-Square test in Statistical NLP, as mentioned?

Identification of translation pairs in aligned corpora. (C) Signup and view all the answers

What is a limitation of using the Chi-Square test, according to the content?

It should not be used in small corpora. (B) Signup and view all the answers

Relating to the Chi-Square test, what is the implication of a very large $X^2$ value?

It suggests strong evidence to reject the null hypothesis of independence. (A) Signup and view all the answers

Given $O_{11} = 50$, $O_{12} = 30$, $O_{21} = 20$, and $O_{22} = 40$, calculate $X^2$ using the provided formula, and determine if there is a statistically significant association at α = 0.05 (critical value = 3.841). Report your answer, and whether the null hypothesis should be accepted or rejected.

$X^2 = 5.56$, reject the null hypothesis. (D) Signup and view all the answers

Flashcards

Natural Language Processing (NLP)

A field focused on enabling computers to process, understand, and generate human language.

Information Retrieval, Extraction, and Filtering

Finding relevant information, extracting specific data, and filtering content based on user needs.