Podcast
Questions and Answers
What is the primary advantage of using Likelihood Ratios over the Chi-Square test for collocation discovery?
What is the primary advantage of using Likelihood Ratios over the Chi-Square test for collocation discovery?
- Likelihood Ratios are computationally simpler to calculate.
- Likelihood Ratios directly measure the strength of association, unlike Chi-Square.
- Likelihood Ratios always yield lower p-values.
- Likelihood Ratios are more suitable for sparse data. (correct)
In the context of collocation discovery using Likelihood Ratios, what null hypothesis (H1) is typically examined?
In the context of collocation discovery using Likelihood Ratios, what null hypothesis (H1) is typically examined?
- The occurrences of $w_1$ and $w_2$ are equally frequent in the corpus.
- The occurrence of $w_2$ is dependent on the previous occurrence of $w_1$.
- The occurrence of $w_2$ is independent of the previous occurrence of $w_1$. (correct)
- The bigram $w_1 w_2$ occurs more frequently than expected by chance.
What is the role of corpora in discovering subject-specific collocations using likelihood ratios?
What is the role of corpora in discovering subject-specific collocations using likelihood ratios?
- Larger corpora are needed to offset the effect of sparse data on the likelihood ratio test.
- Comparing relative frequencies across different corpora helps identify collocations characteristic of specific subjects. (correct)
- Using multiple corpora ensures that all possible collocations are identified, regardless of subject matter.
- The corpora identify the grammatical relations between words in a collocation.
Which of the following best describes the primary reason for the shift towards statistical methods in NLP?
Which of the following best describes the primary reason for the shift towards statistical methods in NLP?
Which activity falls under the umbrella of Natural Language Processing (NLP)?
Which activity falls under the umbrella of Natural Language Processing (NLP)?
Given the log-likelihood formula provided, what does the term $L(c_{12}, c_1, p_1)$ represent?
Given the log-likelihood formula provided, what does the term $L(c_{12}, c_1, p_1)$ represent?
Under what specific condition would pointwise mutual information be an unreliable measure for collocation discovery, and what is an alternative approach better suited for this condition?
Under what specific condition would pointwise mutual information be an unreliable measure for collocation discovery, and what is an alternative approach better suited for this condition?
Which of the following is NOT typically considered a subdivision of NLP?
Which of the following is NOT typically considered a subdivision of NLP?
What is the main focus of 'Pragmatics' within the context of Natural Language Processing?
What is the main focus of 'Pragmatics' within the context of Natural Language Processing?
What distinguishes Language Engineering from Computational Linguistics?
What distinguishes Language Engineering from Computational Linguistics?
Imagine a system designed to analyze customer reviews and automatically categorize them as positive, negative, or neutral. Which area of NLP is MOST directly involved in enabling this functionality?
Imagine a system designed to analyze customer reviews and automatically categorize them as positive, negative, or neutral. Which area of NLP is MOST directly involved in enabling this functionality?
A highly advanced NLP system is designed to not only translate text between languages but also to adapt the translated text to suit the cultural norms and expectations of the target audience. Which aspect of NLP is MOST critical for this adaptation?
A highly advanced NLP system is designed to not only translate text between languages but also to adapt the translated text to suit the cultural norms and expectations of the target audience. Which aspect of NLP is MOST critical for this adaptation?
What does the probability mass function for a random variable X provide?
What does the probability mass function for a random variable X provide?
Which statistical measure describes the consistency of a random variable's values across multiple trials?
Which statistical measure describes the consistency of a random variable's values across multiple trials?
What is the defining characteristic of a joint probability distribution involving two random variables, X and Y?
What is the defining characteristic of a joint probability distribution involving two random variables, X and Y?
In the context of estimating probabilities from data, what does the 'relative frequency of the outcome' represent?
In the context of estimating probabilities from data, what does the 'relative frequency of the outcome' represent?
What is the critical difference between parametric and non-parametric approaches when modeling aspects of language or other data?
What is the critical difference between parametric and non-parametric approaches when modeling aspects of language or other data?
Which concept defines families of probability mass functions (pmfs) characterized by different constants?
Which concept defines families of probability mass functions (pmfs) characterized by different constants?
In Bayesian updating, what role does the Maximum A Posteriori (MAP) distribution play after a new datum is observed?
In Bayesian updating, what role does the Maximum A Posteriori (MAP) distribution play after a new datum is observed?
What is the primary purpose of using Bayesian Statistics in the context of Bayesian Decision Theory?
What is the primary purpose of using Bayesian Statistics in the context of Bayesian Decision Theory?
Given two models for an event, how does Bayesian statistics assess which model better explains observed data?
Given two models for an event, how does Bayesian statistics assess which model better explains observed data?
What is the primary function of lemmatization in text processing?
What is the primary function of lemmatization in text processing?
Which of the following is a common heuristic approach for sentence boundary detection?
Which of the following is a common heuristic approach for sentence boundary detection?
In end-of-sentence detection, under what condition should a period NOT be considered an end-of-sentence marker?
In end-of-sentence detection, under what condition should a period NOT be considered an end-of-sentence marker?
What is the purpose of mark-up schemes?
What is the purpose of mark-up schemes?
Which of the following is an example of a mark-up scheme?
Which of the following is an example of a mark-up scheme?
What does grammatical coding (tagging) primarily indicate in text?
What does grammatical coding (tagging) primarily indicate in text?
What is a key characteristic of collocations?
What is a key characteristic of collocations?
Which concept shares a large overlap with collocations?
Which concept shares a large overlap with collocations?
According to the definition provided, which attribute is essential for a sequence of words to be considered a collocation?
According to the definition provided, which attribute is essential for a sequence of words to be considered a collocation?
Imagine you are designing a system for sentiment analysis of movie reviews. Which of the following NLP steps would be LEAST crucial in the preprocessing stage, considering the primary goal is to capture the overall emotional tone of the reviews?
Imagine you are designing a system for sentiment analysis of movie reviews. Which of the following NLP steps would be LEAST crucial in the preprocessing stage, considering the primary goal is to capture the overall emotional tone of the reviews?
In the context of the t-test described, what does the variable 'n' represent?
In the context of the t-test described, what does the variable 'n' represent?
What is the purpose of calculating the pooled variance ($s^2$) in the t-test?
What is the purpose of calculating the pooled variance ($s^2$) in the t-test?
If the calculated t-value is less than the critical t-value, what conclusion can be drawn?
If the calculated t-value is less than the critical t-value, what conclusion can be drawn?
Why might the t-test be criticized in the context of statistical NLP?
Why might the t-test be criticized in the context of statistical NLP?
According to the content, what is the null hypothesis for the Chi-Square test?
According to the content, what is the null hypothesis for the Chi-Square test?
In the Chi-Square formula, what do Oij and Eij represent?
In the Chi-Square formula, what do Oij and Eij represent?
What is one of the early applications of the Chi-Square test in Statistical NLP, as mentioned?
What is one of the early applications of the Chi-Square test in Statistical NLP, as mentioned?
What is a limitation of using the Chi-Square test, according to the content?
What is a limitation of using the Chi-Square test, according to the content?
Relating to the Chi-Square test, what is the implication of a very large $X^2$ value?
Relating to the Chi-Square test, what is the implication of a very large $X^2$ value?
Given $O_{11} = 50$, $O_{12} = 30$, $O_{21} = 20$, and $O_{22} = 40$, calculate $X^2$ using the provided formula, and determine if there is a statistically significant association at α = 0.05 (critical value = 3.841). Report your answer, and whether the null hypothesis should be accepted or rejected.
Given $O_{11} = 50$, $O_{12} = 30$, $O_{21} = 20$, and $O_{22} = 40$, calculate $X^2$ using the provided formula, and determine if there is a statistically significant association at α = 0.05 (critical value = 3.841). Report your answer, and whether the null hypothesis should be accepted or rejected.
Flashcards
Natural Language Processing (NLP)
Natural Language Processing (NLP)
A field focused on enabling computers to process, understand, and generate human language.
Information Retrieval, Extraction, and Filtering
Information Retrieval, Extraction, and Filtering
Finding relevant information, extracting specific data, and filtering content based on user needs.
Linguistics
Linguistics
The scientific study of language, including its structure, meaning, and context.
Language Engineering
Language Engineering
Signup and view all the flashcards
Computational Linguistics (CL)
Computational Linguistics (CL)
Signup and view all the flashcards
Parts of Speech and Morphology
Parts of Speech and Morphology
Signup and view all the flashcards
Semantics
Semantics
Signup and view all the flashcards
Likelihood Ratios
Likelihood Ratios
Signup and view all the flashcards
Independence Hypothesis (H1)
Independence Hypothesis (H1)
Signup and view all the flashcards
Dependence Hypothesis (H2)
Dependence Hypothesis (H2)
Signup and view all the flashcards
Corpus Comparison
Corpus Comparison
Signup and view all the flashcards
Mutual Information
Mutual Information
Signup and view all the flashcards
Probability Mass Function
Probability Mass Function
Signup and view all the flashcards
Expectation
Expectation
Signup and view all the flashcards
Variance
Variance
Signup and view all the flashcards
Joint Probability Distribution
Joint Probability Distribution
Signup and view all the flashcards
Marginal Probability Mass Function
Marginal Probability Mass Function
Signup and view all the flashcards
Relative Frequency
Relative Frequency
Signup and view all the flashcards
Parametric Approach
Parametric Approach
Signup and view all the flashcards
Bayesian Updating
Bayesian Updating
Signup and view all the flashcards
Bayesian Decision Theory
Bayesian Decision Theory
Signup and view all the flashcards
Lemmatization
Lemmatization
Signup and view all the flashcards
End-of-Sentence Detection
End-of-Sentence Detection
Signup and view all the flashcards
Mark-up Schemes
Mark-up Schemes
Signup and view all the flashcards
Grammatical Coding (Tagging)
Grammatical Coding (Tagging)
Signup and view all the flashcards
Collocations
Collocations
Signup and view all the flashcards
Compositionality
Compositionality
Signup and view all the flashcards
Collocations Overlap
Collocations Overlap
Signup and view all the flashcards
Collocation Definition
Collocation Definition
Signup and view all the flashcards
t-test
t-test
Signup and view all the flashcards
Degrees of Freedom (df)
Degrees of Freedom (df)
Signup and view all the flashcards
Chi-Square test
Chi-Square test
Signup and view all the flashcards
Null Hypothesis
Null Hypothesis
Signup and view all the flashcards
Observed frequencies
Observed frequencies
Signup and view all the flashcards
Expected frequencies
Expected frequencies
Signup and view all the flashcards
Text corpora in two languages aligned at the sentence level.
Text corpora in two languages aligned at the sentence level.
Signup and view all the flashcards
Probability Level (α)
Probability Level (α)
Signup and view all the flashcards
Corpus
Corpus
Signup and view all the flashcards
Study Notes
- Instructor: Diana Inkpen, email: [email protected]
- Focus on preliminaries in Natural Language Processing, CSI 5386
Importance of Studying NLP
- NLP is crucial for numerous beneficial applications and stands as a significant area of current investigation.
- Applications include: information retrieval, extraction, filtering, intelligent Web searching, spelling and grammar checking and automatic text summarization.
- Pseudo-understanding, natural language generation, and multilingual systems that supports machine translation are also applications
Linguistics
- Considers what kind of things people say and how people learn, produce, and understand language.
- Explores what utterances say, ask, or request about the world by connecting utterances to the world.
NLP and Related Terms
- Natural Language Processing (NLP) involves manipulating, processing, and "understanding" natural language in text or speech.
- NLP may not be the same as full-blown AI or what people think of as "language comprehension."
- Language engineering is the development of NLP techniques and emphasizes large-scale system-building and software engineering.
- Computational Linguistics (CL) refers to the research aspect of NLP, which is inclusive of linguistics, relevant parts of AI, and cognitive science..
Why Study NLP Statistically
- NLP relied mainly on a rule-based method until the late 1980s.
- Rules appear inflexible when characterizing language usage.
- Individuals often stretch and bend rules to accommodate their communication needs.
- Statistical approaches offer the required flexibility for more accurate language modeling.
NLP Subdivisions
- Parts of Speech and Morphology focus on words and their sentence functions, and study the various forms they can take.
- Phrase Structure and Syntax are concerned with word order and phrase structural constraints and regularity.
- Semantics studies the meaning of words, also known as lexical semantics, as well as how those meanings combine to form sentence meanings.
- Pragmatics is the study of how language norms & knowledge about the world interacts and affects the literal meaning.
Course Topics
- Studying Words consists of Morphology, Collocations, N-gram Models, Markov Models, and Part-of-Speech Tagging
- Studying Grammars consists of Grammars and Parsing.
- Semantics consists of Compositional semantics, Shallow Semantics, Word Sense Disambiguation and Lexical Acquisition.
- Applications consists of Information Retrieval, Text Categorization, Text Clustering, Statistical Alignment and Machine Translation.
NLP Tools and Resources
- Probability/Statistical Theory: Involves statistical distributions and Bayesian Decision Theory.
- Linguistics Knowledge: Encompasses morphology, syntax, semantics, and pragmatics.
- Corpora: Collection of marked or unmarked text that applies statistical methods with linguistic knowledge to discover theories or knowledge organization.
Course Requirements
- There will be two written and programming assignments. They are 20% each and done in groups of 2-3 students.
- An in-class presentation of a current research paper will be 10% of the grade and done in groups.
- There are two types of participations, one is a quiz (5%) and the other is class participation (5%).
- There will be a Final Project (40%), and will be done in groups
Textbooks
- Jurafsky, Daniel, and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 3nd edition, Prentice-Hall, 2020.
- Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
Approaches to Language
- In models of NLP, you must consider how much prior knowledge should be built in.
- The rationalist answer focuses on what knowledge in the human mind is not derived from the senses but presumably from genetic inheritance.
- The empiricist answer is that the brain's ability to use association, generalization, and pattern recognition could also be used for learning natural language structures
- Chomskyan/generative linguists are interested in describing the language module of the mind or "I-language".
- The I-language only has indirect evidence from text or "E-language", supplemented by native speakers intuition.
- Empiricists describe "E-language" or how language actually is used.
- Chomskyans differentiate between linguistic incompetence and performance and believe competence can be described in isolation: empiricists disagree.
- From 1970-1989, the focus was on mind science and toy systems that were built with the goal of intelligent behaviour.
- Currently, there's more focus on automatic learning or knowledge induction through machine learning, like deep learning
- Chomskyans focus on categorical judgements of rare sentence types while statistical NLP focuses sentences that are common.
NLP Difficulty
- Natural Language contains high ambiguity and NLP is difficult.
- The sentence "The company is training workers" contains multiple syntactic analyses or parse trees.
- "List the sales of the products produced in 1973 with the products produced in 1972" can have 455 parses.
- NLP systems need to make good disambiguation decisions on word sense, word category, syntactic structure, and semantic scope.
Inefficient Methods
- Inconsistent with symbolic NLP is maximizing coverage and minimizing ambiguity.
- Hand-coded syntax, constraint and preference rules are time-consuming, brittle and don't scale in language.
- Example : Metaphors
Statistical NLP
- An NLP approach seeks to solve problems from learning structural and lexical preferances from corpora automatically.
- Statistical NLP offers a good solution to handle ambiguity using generalized, robust statistical models that perform gracefully.
Corpora Examples
- Corpora examples include Brown Corpus (1 million words), British National Corpus (100 million words), and American National Corpus (10 million words converted to 100).
- The Penn TreeBank is parsed WSJ text and the Canadian Hansard is a parallels corpus (bilingual).
- Other Corpora examples are English Gigaword Corpus and Wikipedia dumps.
Dictionaries
- Dictionaries include Longman Dictionary of Contemporary English, WordNet (hierarchy of synsets), and Wiktionary
Analyzing Word Counts
- Word counts in word vectors allow you to find the most common words in the text, the amount of words in the text (tokens vs types), and the average frequency of each word.
- Limitation of word counts is predicting a word's behaviour because most words appear infrequently.
- Zipf's Law states f c 1/r. For most words, data about their use is exceedingly sparse.
Collocations
- A collocation is any turn of phrase or accepted usage where somehow the whole is perceived as having an existence beyond the sum of its parts. (e.g., disk drive, make up, bacon and eggs).
- Collocations are important and can be extracted from a text.
- The most common bigrams can be extracted (e.g., "at the", "of a") and can be be filtered due to their insignificant data
Concordances
- Finding concordances corresponds to finding the different contexts in which a given word occurs and uses a Key Word In Context (KWIC) concordancing program.
- Concordances are useful both for building dictionaries for learners of foreign languages and for guiding statistical parsers.
Linguistic Essentials
- Focus on ch 3. of the book Manning&Schutze
- Based on Diana Inkpen, 2021-2004
Parts of Speech and Morphology
- These correspond to syntactic or grammatical categories that include noun, verb, adjective, adverb, pronoun, etc.
- Word categories are connected systematically through morphological processes, such as producing a plural form from the singular.
- Morphology processes are inflection, derivation and compounding.
Words' Syntactic Functions
- Nouns typically refer to entities in the world like people, animals and things.
- Determiners describe the particular reference of a noun and adjectives the properties of nouns.
- Verbs describe actions, activities and states.
- Adverbs modify verbs the same way adjectives modify nouns.
- Prepositions are typically small words that communicate time or space.
- Prepositions are used as ‘particles’ to make phrasal verbs.
- Conjunctions and complementizers link words, phrases or clauses.
Features of Nouns
- Number: single, plural (example books)
- Gender: masculine, feminine (example waiter or waitress, waiter is a natural gender in the english language)
- Case: nominative, accusative, genitive (possessive), dative (indirect object)
Determiners
- Definite articles: the, that
- Indefinite articles: a
- Demonstrative adjectives: that, those
Adjectives
- Adjectives have several features that include number, gender and case.
- Degree: positive, comparative, superlative. (Example good, better, best)
- Adjectives can be quantifiers like all, many, some.
Verbs
- Number: singular, plural
- Person: first, second, third
- Tense: past, present, future
- Aspect: progressive, perfect
- Base form / infinitive (to eat)
- Modality / mood (subjunctive, conditional)
- voice (active, passive)
Adverbs
- Degree: positive, comparative, superlative (fast, faster, fastest)
- Qualifies and often ends in very
Prepositions
- in, over, on (typically express spatial or time relationships)
- Particles: in phrasal verbs or other compounds (make up, show off)
Conjunctions
- Coordinating (apples and oranges)
- Subordinating: (I would like to go the movie, but I have to study)
- Introduces direct object subordinate sentence (I think that he will come to class.)
A simple context-free grammar
- Consists of the rules S --> NP VP, NP --> AT NNS, VP --> VP PP, P --> IN NP and IN --> in | of, NN --> cake Syntax and Phrase Structure:
- Consists of the rules AT --> the and NNS --> children | students | mountains | VBD --> slept | ate | saw The Grammar makes up The Lexicon
Local and Non-Local Dependencies
- A local dependency occurs when two words are expressed within the same syntactic rule.
- A non-local dependency occurs when two words don't occur within the same syntactic rule, examples include:
- subject-verb agreement.
- distant dependencies as wh-extraction.
- Statistical NLP approaches commonly model local dependencies due to the challenge of non-local phenoma
Semantic Roles
- Noun phrases or verb arguments make up the semantic roles like instrument goal etc..
- Agents, the person acting the action
- Patient: being acted on or the object being acted on
- In English, roles are object or subject but this is complicated by active, passive, direct and indirect objects
Subcategorization
- Verbs have different ways of relating to entities, they are categorized as transitive/intransitive. For example: transitive verb (hits ball)
- Prototypical descriptions of time, place, or action or adjunct manner are called adjuncts relating to the action,adjuncts are time and place, the tightly related objects and words with verbs are compliments
- Verbs are categorized or subcategorized according to complements allowed to capture syntactic and semantic regularities
Garden Path Sentences
Ambiguity and Garden-Path Sentences:
- Attachment ambiguities occur with generated phrases from nodes in a parse tree. The child ate the cake with a spoon, a phrase could modify the noun(cake or child)
- A ambiguous sentence is Fruit flies like a banana.
- Garden-Path Sentences are used to direct you down the path that doesn't work.
Semantics
- Constructions and utterances when using studying definitions.
- Divided in to lexical and conbinational.
- Lexical consist of the definitions: hypernymy, hyponymy, antonymy.
- Compositionality mean whole differs from the parts.
- Idioms: the phrase means something completely different from its parts, they are not predictable
Pragmatics
- Studies things that go beyond single sentences.
- The area of studies that explains what the speaker means to express, this is beyond.
- Includes discourse, scop quantifiers, speech acts, discourse , references. Crucial extraction information.
Notions of Probability Theory
- This theory is used to predict how possible it is.
- A experiment a trial is the process by which these observations can be done.
- The result of basic our comes is called sample space.
- An Event, space that are a subset.
- Numbers can range from 0-1 to where impossibility shows zero or one.
- Probability distributes a mass of one.
Conditional Probability and Independence
- This measures the probability of events given certain information.
- Prior probabilities are used to measure before hand.
- Posterior is a probability using the data.
- Important parts include related to NLP: independence, conditional probabilities, and relationship in intersections with conditions
Bayes Theorem
- Important when swapping/changing order between events, important with swapping when one quantity is difficult to determine.
- P(B|A) = P(A|B)P(B)/P(A): formula
- Normalizations: P(A)
Random Variable
- A random variable is a function X: sample space --> Rn.
- A discrete random variable function is X: sample space --> S.
- S is the countable part of the R.
- If X, space gives 0, 1 then it would be referred to as Bernoulli trial.
- Values function with X: gives the probability mass function
Expectation and Variance
- Measurement is done by averaging (is expectation) of an random variable
- Variance of how it tends, to measure values during random variables. To give consitenct value or try to vary a lot
Joint and Conditional Distributions
- More than one random variable used to explain the probability distribution.
- Discrete random is p(x,y)P(X=x, Y=y) and two discrete variables in the function joint is given by Joint distribution (p(x,y)=p(xP(x=x)). Separately when totting it becomes: marginal function ( totalling values separately).
Estimating Probability Functions
- "What is the probability that the sentence "The cow chewed its cud" will be uttered? " needs estimating
- An important measure for the rate is relative frequency
- Models called using parametric are certain aspects of language are modeled by using the well known distribution
- To make no assumption we must user non-parametric approach.
- The well common distribution is discrete with the binomial, multinomial distribution distribution.
- With continuation, a common form is the standard normal distribution
Estimate Probability Functions
- Bayesian Update follows when data comes sequentially and independently
- Where we update our beliefs when datum new by calculating maximum A posteriori (MAP) distribution
- Becomes the new prio.
Bayesian Decision Theory
- Bayesian Statistic evaluates by using models like likelihood ratio and models of two different events.
- Text Corpora tend to be large, representative of the interesting group of people, quickly access a mass amount of information.
Tools
- Text Corpora/software tool existant.
Formmating Text
- Junk Formatting/Content: headers, data in the file.
- Data retrieved with (unrecognized) words through filter, OCR needed.
- Upper and Lowercase: we only consider lowercase, when looking for sentences (The, the can be be replaced with the) "brown" is separated with brown dog
- Tokens break down to tokens, and words or some type of punctuations
Text (II): Lookings
- Periods used end sentences.
- Homographs can have two different lexemes and different pronunciations.
- Single apostrophes usually are tokenized.
- Whitespaces are good signs.
- Word Segmentation in other languages: difficult in no whitespace cases.
Morphology
- Strips off affixes to get the word to its stem.
- Get to base form during transformation.
- It isn't helpful in English.
What Makes Up a Sentence
- Can end with different symbols (. ! ?) true 90% of the time.
- Often time in sentences but punctuation and quotes exist. Solutions involving heuristic methods (by hand).
End Of Sentence (EOS)
- Used after .?!
- Move EOS after the quotation marks if any exist
- A peirod is not used If Known by abbreviation and is sentence final ( Mr. vs. Prof.)
- It is disqualified when it involves upper cases and the abbrevation not followed (EX: Jr. or etc.) that are not (abbreviation and the abbreviation sentence end)
- Boundary use to disqualify with ! and ? with a short lower case after
- Keeping all else for EOS
Marked-up Data
- Schemas used to identify test such as COCOA and SGML (encode with HTML, TEI and XML) Coding:
- Done on auto on the various parts of speech
- Tag set is used from different sets like Brown and Penn Treebank
- Target feature for set design versus predictive features
Collocations
- Characterized by compositionality.
- Relationship overlaps between the concepts in terms technical phrase.
- Substances come in different attitude types (strong to powerful).
Collocations (Definitions)
- Collocation occurs with two or more words whose annotation cannot be directly derived directly from the meaning of the component (Choueura, 1988).
- Collocations are not all together.
Collocation Criteria
- Non-compostitional, non-substituability, non-modifiability.
- You cannot translate to other languages.
- Word generalization only matters on a string
Subclasses of Linguistic Collocations
- Light Verbs: verbs with little semantic content.
- Nouns must be proper.
- Terminological expressions like ver particle constructions or phrasal verbs
Collocation Detecting Techniques
- Collocations detection techniques has been surveyed
- Collocation by frequency
- Collaction by Mean and the distance in collocating.
- By Mutual Information
- By Hypothesis Testing
Collocation: Frequency
- Selecting by the Bigram
- Puts through part of of Speech
- Very well with the frequency
Variance and Mean:
- This consists of how flexible things are
- Look for mean and variace
- This tells us how many words and corpus.
- Then sample the sample deviations (collocation).
Collocation: Hypothesis Testing
- High can be accidental when low and variance occurs
- Hypothesis: tested against statistic problems. We want to know when we need to access the value.
###Hypothesis Testing: The 𝑡-test
- t-test analysis and variance the hypothesis is drawn with a certain parameter
- The test is with a normal assumption and variance(sample) is done scaling observations tell us whether the scale it is is
- Hypothesis Statistics*
- Formula with Bigrams (N).
- When event is true Ho : p = probability.
Collocation: Chi-Square Test
- Criticism is made because it is assumes it's appropriate, thus is chi test.
- Observed compare when using this particular test.
- If Independence of observations are in test.
Chi Formula
Given x^2 equation with variables Oij Eji Eeij N/n.
- Application during Early one
- Translated to alignment
- Application Similarity
Ratios: Likelihood
- Sparse appropriate because addition and inter operated with the the Chi Square is different is tested for the likelihood
Mutual Information(1)
- measure theory
- measure the Collocations of the Pointwise Theory *Important
- what Pointwise mean for other word
- pointwise only
Mutual Information (II)
- If P = is that the average uncertainty the random variable. Let p = 𝘗 (𝘟)𝘟 = 𝑥) 𝑥 𝓧 for where H ( p ) H() X=x.
Entropy, Conditional, and Joint Entropy
- If two discrete variables in form : 1,2 X Σ𝑥 Ex px log 2 p(,) yxy , and then it measures about of both. The given another is conditional. Y ~ px,y x.
Channel Model and Relative Entropy
- The channel for message commination is in channel (it is that all you need). Then is there a channel’s capacity to distribute output over some all input.
Corpus, Count and Distribution
- For the the P and Dpmfs, and where for relative functions then the
- Dqpl (the difference measured for two probability but the distributions is equal.
The Language (Entropy)
- Becomes average in Language then the cross language X(1 = p+p, Dq . and what we look out the can find is what next to word is during it.
Randoms and Grams English
There random where,
- chain Markov
- these model there model
- k then previous order
Perplexity
- What the speech community uses for relation is perplexity
- Value for how have many perplexity is to that when the we see another value during have to choose
Corpus Work
- Collection of Corpora that has information needed.
Things For Tokenization:
- Text, OCR, format, and lowercase.
- Tokenized to then to make new token sentences.
Morphologies:
- Making use of stemming and lemmatization.
Text:
- Boundary marks and codes needed to have good data.
Schemes of Data
- Design
- Grammicle code
- Different tag design versus perspective
Analyzing Collocations
- Collocations is the study of compositionality.
- Terms and Phrases.
- Reflecting between attitude types.
Theoretical Overlapping Definitions
Linguistic Subclasses
- Light verbs.
- Particle construction phrases.
- Nouns that matter.
- Terminologically important expressions.
Key to Technique Detection
- Hypothesis testing.
- Selections of collacates of the range of frequence and measurements.
Frequency
- Select a bigrum
- Put that through the model
Variance:
- Mean varience.
- High versus random.
Hypothesis Testing
- High frequency and low variance can be accidental
- Hypothesis testing a is classical problem
Equations
- Formulas for numbers collactate and deviation.
Hypothesis Stats
- Formula's the parameters of bigrums true when the even is tested.
- Use bigrum to test if the equations for all parts is high then they do correspond together otherwise.
##Testing In differences (Hanks)
- Words distinguish between 73% of the sentences.
- Test that tests parameters using normal equations.
Joint Equation:
- Hypothesis test average out
T-Text Differences
- Pooled equation.
- The hypothesis a can be rejected with a certain hypothesis table.
- Then the scores are compared to each other.
Chi-Square Test Methodology
- It is not appropriate.
- Assumptions when making decisions on what to include.
- This also rejects the hypothesis and and gives an expected range of frequencies.
Joint Equation For Chi Test:
- Formulas for the the bigrums true when the event is tested.
- Use bigrum to test if an equation used during calculation will make it high, then that means they do correspond together otherwise.
Chi-Square test (III): Applications
- One of the early uses of the Chi square test in Statistical NLP was the identification of translation pairs in aligned corpora ((Church & Gale,, 1991).)
- A more recent application is to use Chi square as a metric for corpus similarity (Kilgariff and) Rose, 1998))
- Nevertheless, the Chi-Square test should not be used in small corpora
Chi-Square test (III): Applications
- One of the early uses of the Chi square test in Statistical NLP was the identification of translation pairs in aligned corpora ((Church & Gale,, 1991).)
- A more recent application is to use Chi square as a metric for corpus similarity (Kilgariff and) Rose, 1998))
- Nevertheless, the Chi-Square test should not be used in small corpora
Likelihood Ratio Equations:
- log L(H) = b()p b(12122; ,1c c−; −, pNC cN)
- Pw|2 =PwP =Pw|2 =p
- Log b with binom distributed.
Vector Semantics & Embeddings
- Explores how to look at words and put them in a dictionary through word senses.
Key Points
- Lemma: what the actual word is.
- Words are represented as vector values.
- The process of making vectors is to train as a classifier.
- What gives these words meaning is where they are placed on a graph (their values.).
Relations Between Senses
- Synonymy.
- The aspects they have that are the same (their meaning in context).
- The notions that are about their politeness.
Word ↔Meaning
- The contrast that must exist.
How to Evaluate
- Ask humans to distinguish between 2 random words.
Joint Evaluation:
- Measure with a scatter point of data with numbers and labels along the 2d graphs
Terms:
- Pearson: set of values.
- Spearman: comparing against the the known range.
- Range of values and a close in relationship can get the score of -1,1 where you have no point with zero.
Other Relationships
- Similarities: different than and also included, this is done for and on data.
- What is the overall relationship.
- Super Ordinate: data that has relationships.
Overall:
#Word/Relations
- Complex means too many associations.
- That also leads to:
#Relations
- Relationships
- Simiarity and difference
- Connections
- Super relations
Zellig’s Theorys on Usage and Meaning
- The location you and the words around you gives meaning.
Example: “Ong Choi”
- What does certain word mean when you see where they placed?
- If one word is placed on a point, then there is an identical environment when referring to it.
##Vectors
- These are similar in spaces (data science).
- Data is placed into an algorithm and are trained as well.
- Dense over Spare*.
- Density of the the vector values is taken by the values in the point of vectors being trained.
- Similarity is what defines the context of values.
Vectorized Data
1:
- TFDF
- Model is applied along with weights of each individual area (information retrieval and general model ) and.
- The the number of values for words and numbers is used for the counts
2:
- Model: Word Vev
- Model uses density and makes creation is used with a classier that reads the input in the range and what is in
- With that range some more will happen and this will be read to the contextual model
- After all is is to take into account of the the data:*
- Then compare that with data using cosine.
- However this cannot solve with bad representations.
Models For Density
- There are many models for to compare data density such as::
- Static models.
- All the different models for to take certain parameters.
- There is no current. ##Model and Information (After all is is to tell and tell what to choose)
- Take certain parameter
- The there and when those things occur
- What things matter in certain words that can be found
#Skip Gram Model
- Find values what near a location the data is placed given that the area is to a binary code 12 or 21, when reading what the binary gives to the data you see
- If data is the the set up then that that will find to some degree and not see them
- Them being able to see the data well to some degree you don't that's all you need in order
##Grams
- Skip for more and where the data does have relationships.
- This helps with limited small data
#How Are New Items Train
- First find and give where near by, these positives find a
- And that set it with the words.
- Do this with a high the set.
- Maximizes the distance and point, so use these variables.
#Loss Functions
- Can be used as a learning strategy.
- Can make the model more accurate by knowing how to to tune weights.
- To know where you need to be you just needed to do this manually.
Embedding and Words In Action
- The vector space created by a group that is known for
- What is the location they come from.
- They then can understand how their connection is the point of the space between vector
BERT(Bidirectional Encoder Representations from Transformers)
- BERT's is applying the bidirectional training of Transformer to help with language modelling.
- Previous language (models)(LM) uses a text and looks at a +sequence and its relationship.
- Used just a encoder portion then output is a model used for a language
Mask Terms:
- With the text where the mask token can appear. +This then needs to predict
Steps for training a model for how what is next for the sentences
- That follows the models created from what they use. ,
- Each token transforms into a space vector and is classified (has two learning matrices to work with.
- Then probability with new soft max to connect.
- Where how it to find all that is the high level way to look for model this works, but BERT is just a start model
- All them have just the basics.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz covers key concepts in Natural Language Processing (NLP), focusing on collocation discovery, likelihood ratios, and the role of corpora. It tests understanding of statistical methods, null hypotheses, and limitations of pointwise mutual information in NLP.