Pseudowords & Semantics

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In computational linguistics, what primary role do pseudowords play in research?

  • They enable controlled experiments by removing prior meaning associations. (correct)
  • They serve as replacements for outdated vocabulary.
  • They are used to test the processing speed of native speakers.
  • They help in understanding the historical evolution of languages.

What is the main goal when using the 'Wuggy' algorithm to create pseudowords?

  • To ensure the created pseudowords follow the phonotactic constraints of a given language. (correct)
  • To create words that are as different as possible from real words.
  • To generate words that have clear emotional valence.
  • To produce words that are universally pronounceable across all languages.

Why is it important for computational models to account for the emotional valence of words?

  • To optimize search engine rankings based on sentiment.
  • To better predict stock market fluctuations.
  • To simulate human-like understanding and processing of language nuances. (correct)
  • To improve the accuracy of machine translation between languages.

What is the primary function of 'edit distance' in computational linguistics?

<p>To quantify the number of edits required to transform one word into another. (C)</p> Signup and view all the answers

How does the 'Systematicity Hypothesis' explain the relationship between word forms and their meanings?

<p>Word forms predict word meaning, such that similar-sounding words have similar meanings. (D)</p> Signup and view all the answers

In the context of word embeddings, what is the significance of representing words as vectors?

<p>It enables mathematical operations to determine semantic relationships between words. (C)</p> Signup and view all the answers

What key advantage does the FastText model offer over Word2Vec when handling language data?

<p>FastText uses subword information, enhancing its ability to handle rare words and morphological variations. (D)</p> Signup and view all the answers

When evaluating word embeddings, what does 'intrinsic evaluation' primarily assess?

<p>The alignment of embeddings with human intuition about word relationships. (C)</p> Signup and view all the answers

According to the experiment by Gatti et al. (2024), what can be inferred about how humans process the valence of novel vs. real words?

<p>Humans rely more on letter n-grams to predict the valence of novel words compared to real words. (A)</p> Signup and view all the answers

What is the main purpose of using Pointwise Mutual Information (PMI) when improving word representations?

<p>To give higher weight to words that co-occur more than expected by chance. (C)</p> Signup and view all the answers

When discussing N-gram language models, what is a major limitation related to 'data sparsity'?

<p>Long sequences of words may rarely appear in the training data. (C)</p> Signup and view all the answers

What is the purpose of using an <UNK> token in handling unseen words in language modeling?

<p>To replace rare words with a generic token, allowing the model to estimate probabilities for unseen words. (B)</p> Signup and view all the answers

What key advantage do neural language models offer compared to N-gram models?

<p>Neural language models can capture longer-term dependencies in text. (C)</p> Signup and view all the answers

What is the main function of the 'self-attention' mechanism in transformer models?

<p>To allow each word to attend to all other words in the sequence, capturing relationships regardless of distance. (C)</p> Signup and view all the answers

What problem do contextualized word embeddings solve that fixed word embeddings do not?

<p>They allow word meaning to vary based on context. (A)</p> Signup and view all the answers

Why is positional encoding necessary in transformer models?

<p>To provide information about the position of words in a sentence, as transformers process all words at once. (D)</p> Signup and view all the answers

What is a key motivation behind using subword tokenization in modern NLP models?

<p>To improve the model's ability to generalize and handle rare or unseen words. (D)</p> Signup and view all the answers

What is a key difference in word processing between humans and transformer models?

<p>Humans leverage real-world knowledge, while transformers primarily use text-based information. (A)</p> Signup and view all the answers

What aspect of language does the Jaccard distance primarily measure?

<p>Contextual overlap (A)</p> Signup and view all the answers

Which of the following describes 'lexicon' as defined in the text?

<p>A resource that matches words or expressions (A)</p> Signup and view all the answers

What does the distributional hypothesis propose about word meaning?

<p>Word meanings are influenced by the contexts in which they occur. (C)</p> Signup and view all the answers

Why is 'valence' considered an important feature in computational linguistics?

<p>It helps models understand sentiment. (B)</p> Signup and view all the answers

What is a drawback of using trigram encoding in language models?

<p>It leads to greater sparsity. (D)</p> Signup and view all the answers

What is the purpose of text preprocessing in NLP before training a model?

<p>To convert text into a suitable format for the model. (D)</p> Signup and view all the answers

How do language models use probabilities?

<p>They assign likelihood of predicting words. (C)</p> Signup and view all the answers

Flashcards

Pseudo Words

Words that, through repeated use, gain understanding and acceptance, effectively becoming real words.

Valence

A measure of how positive or negative the meaning conveyed by a word is.

Edit Distance

The smallest number of edits (insert, delete, replace) needed to transform one string into another.

Lexicon

A resource that pairs words or expressions with their corresponding meanings or definitions.

Signup and view all the flashcards

Distributional Hypothesis

A hypothesis stating that the meaning of a word is understood by the company it keeps; words in similar contexts have similar meanings.

Signup and view all the flashcards

Jaccard's Distance

A measure of set overlap, ranging from 0 (no overlap) to 1 (complete overlap).

Signup and view all the flashcards

Word Embeddings

Continuous, numerical representations of words in a multi-dimensional space, capturing semantic relationships.

Signup and view all the flashcards

Word2Vec

A model that helps computers understand word meanings instead of just seeing them as letters.

Signup and view all the flashcards

FastText

An improved version of Word2Vec that understands words at the subword level, better handling rare words and misspellings.

Signup and view all the flashcards

Why use pseudowords?

Experiments where controlled conditions are used without prior meaning interference, to study language learning.

Signup and view all the flashcards

"Wuggy" Algorithm

Takes real words and flips letters while keeping language rules to create realistic-sounding non-words.

Signup and view all the flashcards

Bigram Encoding

Looks at pairs of letters (2-grams) to encode words, capturing more context than individual letters.

Signup and view all the flashcards

Trigram Encoding

Looks at sequences of three letters to encode words, capturing more context than bigrams.

Signup and view all the flashcards

Form-Meaning Mapping

The idea that word forms (spellings) may reflect their meanings to some extent.

Signup and view all the flashcards

Systematicity Hypothesis

States word form predicts meaning; learning is easier, but there's risk of confusion.

Signup and view all the flashcards

Arbitrariness Hypothesis

States word form is random and unrelated to meaning; less confusion, but harder to learn new words.

Signup and view all the flashcards

What do language models do?

Assigns probabilities to sequences, generates new sentences, helps in speech recognition, spelling correction, machine translation.

Signup and view all the flashcards

N-Gram Language Models

Predict a word based on the last n words. (Bigram, Trigram)

Signup and view all the flashcards

LSTM & GRU

Recurrent Neural Networks that improve RNNs by storing memory longer to handle long-term dependencies.

Signup and view all the flashcards

Positional Encoding

Adjusts word representations based on contextual clues allowing to disambiguate polysemous words.

Signup and view all the flashcards

Subword Tokenization

Breaks words into subunits to generalize better to improve representation of rare words in across morphological variations.

Signup and view all the flashcards

Contextualized word embeddings

The same word gets a slightly different embedding depending on co-occurring words.

Signup and view all the flashcards

Self-Attention

Each word attends to surrounding words by computes weighted influences from each word in a sentence.

Signup and view all the flashcards

Issue with Word2Vec & FastText

These models assign one vector per word, ignoring contextual nuances.

Signup and view all the flashcards

Contextualized word embeddings

Generate different vectors for the same word depending on its sentence. Each word form no longer maps to a single embedding but varies by context.

Signup and view all the flashcards

Study Notes

Exams

  • Group assignment answers count, incentivizing correct answer recognition for job relevance
  • Only 1 in 10 points are outcome-based.
  • Midterm: closed questions, 1 hour, open book, designed for reliance when truly stuck
  • Final: mix of closed and open questions, open book, 2.5 hours

Lecture 1

  • Pseudowords become words through enough usage and understanding
  • Pseudowords possess semantics and some level of meaning
  • A pseudoword is plausible if 70% of people agree on its meaning or emotional sentiment
  • Plausibility of pseudowords varies by country
  • Native English speakers struggle to recognize words with 3+ consonants
  • Valence indicates how positive or negative a word is
  • Valence is computed using slides and crowd-sourced estimates for its arousal and dominance
  • Edit distance determines word neighbors based on the number of edits
  • Strings are similar when edit distance computes the fewest edits (insert, delete, replace)
  • Form-meaning mapping is easier if meaning follows form, but communication becomes confused, conversely meaning that doesn't predict the name results in arbitrary mapping that is harder to learn but less prone to confusion
  • Shared meaning between two terms leads to different names and a higher co-occurrence likeliness

Lexicon

  • A lexicon serves as a resource connecting words and expressions

Lecture 2

  • The distributional hypothesis states that word meaning is determined by its context or "the company it keeps"
  • Word similarity increases when their context overlaps
  • Jaccard's distance measures set overlap between 0 (no overlap) and 1 (complete overlap)
  • Jaccard's distance applies the formula: jacc(x,y)= ( x AND y) / (x OR y)

Embeddings

  • Continuous numerical representations of words are embedded in an n-dimensional space
  • Embedding turns words into numbers, assigning meaningful positions in space, clustering similar words
  • Word2Vec is a tool that helps computers understands words via meanings, not just letters

FastText Model

  • An enhanced version of Word2Vec from Facebook AI Research (FAIR) that uses subwords to better handle rare words, misspellings, and word forms
  • Treats words as pieces
  • For example, "playing" is split into ["pla", "lay", "ayi", "yin", "ing"]
  • It helps recognize similar words like "play" and "playing"

ChatGPT Summary: Lecture 1

  • Title: That word sounds nice – Gauging semantic connotations for entirely novel words
  • Authors: Giovanni Cassani & Afra Alishahi
  • Topic: How humans interpret novel words (pseudowords) and whether they assign meaning based on existing linguistic patterns.

Introduction: Learning From Novel Words

  • All words were once meaningless pseudowords
  • People generalize meanings to new words based on form (spelling/sound)

Key question

  • Do people recognize positive or negative pseudowords?

Example experiment

  • Participants receive unfamiliar pseudowords and asked "Which word feels more positive?"
  • The goal is to see how emotional valence is applied to meaningless words

Why use pseudowords?

  • Controlled experiments are useful for research
  • Employed in lexical decision, priming, and sentence processing tasks
  • Some examples of pseudowords are Keex, Plufgok, Bixmel

Pseudoword creation

  • A pseudoword must "sound real"
  • "Wuggy" Algorithm flips real word letters
  • Ensures phonotactic constraints are followed

Phonotactic constraints in English

  • Valid: “Stray” (CCVCC)
  • Invalid: “Spfovik” (CCCVCCVCC) – Too many consonants together.
  • Trst is a valid word in Croatian, but not in English

Valence: Emotional Meaning in Words

  • How is it determined whether a word “feels” positive or negative?

Defining Valence

  • A core aspect of meaning
  • Evolutionarily important for survival
  • Affects processing, learning, and memory of words

How to Compute Valence?

  • Human ratings are determined via surveys using the Likert scale.
  • Crowdsourcing Studies collect valence ratings for thousands of words.

How Computers Encode Words

  • How can words be converted into numbers for analysis?

Basic Method: Letter Counting

  • Each word becomes a 26-dimensional vector (one value per letter).
  • Issues of this is that it's too simple, ignoring context

Using N-Grams for More Context

  • Bigram encoding looks at letter pairs
  • Trigram Encoding identifies three letter sequences
  • Trade off is that more context equals more sparsity

Similarity Between Words: Nearest Neighbors

  • How is it decided if two words “look alike”?
  • Edit Distance (Levenshtein Distance) finds the choice most similar to "minced"

Edit Distance (Levenshtein Distance)

  • Counts changes (insert, delete, replace) needed to turn one word into another
  • Lower edit distance equals more similarity
  • Normalization Issues: Short words naturally have lower edit distance

Add-ons to Edit Distance

  • Weighted Costs can make some edits (e.g., replacing vowels) more common.
  • Transposition recognizes swapping two letters as a minor error
  • External Features factor in keyboard layout and typing frequency

Form-Meaning Mapping in Language

  • How do word forms (spellings) reflect their meanings?

Two Theories

  • Systematicity Hypothesis (Dante's view) which predicts word form equals meaning where learning is easier but has a risk of confusion
  • Arbitrariness Hypothesis (Shakespeare's view) which predicts that word form is random, with less confusion, but harder for learning new words
  • Language balances both principles, where common words are systematic and rare words are arbitrary

Applying These Ideas to Pseudowords

  • Determining whether people can guess the emotional valence of a new word just from its letters.
  • Gatti et al. (2024) collected valence ratings, trained a model, and test for pseudoword valence

Findings

  • Real Words: Letter n-grams predict valence poorly (r² = 0.01 ).
  • Novel Words: Letter n-grams predict valence better ( r² = 0.18).

Conclusion

  • Despite the form-meaning mapping being arbitrary, people generalize valence from known to novel words!

Final Thoughts & Open Questions

  • Humans continuously learn new words
  • Meaning is instinctively assigned to unfamiliar words
  • Whether computational models learn in a similar way is a question

Future Research Questions

  • How do people generalize emotional meaning from form?
  • What patterns in language influence this process?
  • How can AI models be improved to handle new words better?

Key Takeaways

  • Humans subconsciously assign emotional meaning to unfamiliar words.
  • Pseudowords remove pre-existing meaning helping aiding understanding of language learning
  • A computational model can be trained to predict word valence

Lecture 2

  • Title: This word might co-occur with nice words – A distributional approach to novel words
  • Authors: Giovanni Cassani & Afra Alishahi
  • Topic: Understanding how word meaning emerges from co-occurrence patterns in language, using distributional semantics and word embeddings.

Introduction: Words in Context

  • In Class 1, words were isolated strings
  • Words co-occur

Key Ideas from Linguistics

  • Meaning as Use (Wittgenstein, 1953) determines that a word’s meaning comes from how it is used, as opposed to its definition
  • Distributional Hypothesis (Harris, 1957) which states "Words that occur in similar contexts have similar meanings."
  • Firth’s Principle (Firth, 1957) which states "You shall know a word by the company it keeps."

Example

  • Cat and dog often appear in similar sentences (e.g., "My __ is sleeping").
  • The prior statement suggests they have related meanings, even if not defining them explicitly

Measuring Word Similarity: Jaccard Distance

  • How much do two words share the same context?

Limitation

  • Jaccard distance ignores frequency

Word Embeddings: Representing Words as Vectors

  • Turns words into numbers (vectors) to analyze relationships, discarding isolated symbol approach

What are embeddings?

  • Continuous, numerical representations of words as points embedded in an n-dimensional space
  • Words as points in a multi-dimensional space
  • Similar words have closer vectors
  • The space itself depends on how words are used in a language

Example of Context-Based Word Representations

  • First define a target word ("car")
  • A context window of size is defined

Improving Word Representations: Pointwise Mutual Information (PMI)

  • Better measurement is needed regarding the significance of a co-occurrence

Why PMI is Useful

  • Words that co-occur more than expected are given a higher weight
  • Reduces interference from random co-occurrences

Example

  • "Doctor" and "Hospital" shows a High PMI with strong semantic link
  • "Doctor" and "Random" shows a Low PMI unlikely to co-occur

Reducing Dimensionality

  • Raw co-occurrence matrices are too large (often 100,000+ dimensions) reducing the data while keeping important info
  • Singular Value Decomposition (SVD) captures main data patterns
  • t-SNE & PCA projects high-dimensional data into 2D or 3D for visualization

Goal

  • Making embeddings smaller, denser, and more informative

Predicting Word Meaning: Word2Vec

  • Train the model to predict word occurrences

Word2Vec Model

  • Two approaches of Continuous Bag of Words (CBOW) that Predicts a word from surrounding words and Skip-Gram with negative sampling (SGNS) which Predicts surrounding words from a given word

Training Process

  • Start with random word vectors.
  • Co-occurrences are correctly predicted by adjusting them
  • Embeddings are captured based on capturing word meaning

Example Results

  • King - Man + Woman ≈ Queen
  • Paris - France + Italy ≈ Rome

Handling Pseudowords: FastText Instead of Word2Vec

  • How do embeddings are created for words that don’t exist in the training data

FastText Model (Facebook AI Research)

  • Key Improvement is uses character-level n-grams instead of whole words
  • For example, “windowist” → Composed of n-grams like: win, ind, ndo, dow, ist
  • Meaning of "windowist" is inferred thanks to similar words when "windowist" was never seen before

Why is FastText Useful?

  • Handles rare words better
  • Works well for languages with complex morphology
  • Captures subword patterns

Evaluating Embeddings: How Do We Know They Work?

  • Embeddings are tested in line with matching human intuition

Intrinsic Evaluation

  • Includes word similarity, Analogy, and Clustering of similar words
  • Word similarity tasks (e.g., WordSim-353, SimLex-999) are assessed.
  • Analogy tasks (e.g., King - Man + Woman = Queen) are tested
  • Finally, clustering similar words in a 2D space (e.g., via PCA or t-SNE).

Extrinsic Evaluation

  • Utilizes embeddings in real tasks (e.g., sentiment analysis, machine translation).
  • If embeddings improve the performance, they are marked useful

Conclusion: What About Pseudoword Valence?

  • Do people judge pseudowords based on co-occurrence patterns or letter structure?

Findings

  • For real words, FastText embeddings capture valence well (r² = 0.62).
  • Letter n-grams work for pseudowords (r² = 0.12).

Takeaway

  • Humans strategize known words differently from pseudowords
  • Meaning stems from form whenever co-occurrences are missing

Final Thoughts

  • Words get their meanings from context and co-occurrence patterns, aided by embeddings
  • Models can handle unknown words, but humans still rely on surface-level features

Lecture 3

  • Title: A Replication – What Matters When Building on Another Study
  • Authors: Giovanni Cassani & Afra Alishahi
  • Topic: replicate, evaluate, and extend, focusing on valence prediction with different models

Introduction: Why Replication Matters

  • Study needs initial confirmation to hold up
  • Key steps in replication are reproducing existing results, modifying variables, and evaluating findings
  • Example: To replicate Gatti et al. (2024) tested how pseudowords encode emotional valence, aided by the Warriner et al. (2014) dataset.

Replication Pipeline

  • Experiment is carried out with a structured pipeline to predict valence for pseudowords
  • Steps are extracting unigram vectors, training a linear regression model, applying the model to pseudowords, and checking if valence aligns through model fit
  • Finally computing r^2 to measure accuracy for the model

Extending the Study: Testing Different Models & Stimuli

  • Not all pseudowords behave the same way
  • Testing Different Models by Comparing word2vec, FastText, and LLMs
  • Pseudoword Selection has to be carried out to make sure pseudowords like Combatman are used
  • Humans may categorize a pseudoword is human if it is realistic

Pseudowords in Lexical Decision Tasks

  • People can categorize pseudowords is actual words
  • Shown a mix of words and pseudowords the participants have to decide which are real words
  • Stimuli used are real words MINE, or pseudowords QWEFQK

Notes

  • Confirms pseudowords that are too similar to a real word get mistaken for one
  • Complicates form-meaning mapping studies

Pseudoword Valence Ratings

  • Study examines whether people can judge valence
  • Participants rate positivity/negativity of a pseudoword
  • Gorpeous for positive sound and Tutoured negative

Methodology

  • Some pseudowords sound similar to existing words
  • Others leave little sense

Morphology

  • Words made from meaningful parts help the understanding, for example happiness from the word "happy"
  • If pseudowords are built using recognizable parts, they may be easier to interpret

Challenges

  • Lack of meaning leads to regression to the mean
  • Ratings cluser around the average because of lacking association
  • Focus on pseudowords with extreme is a a solution

Correlation Methods for Evaluation

  • Statistical methods determine aspects of correlation
  • Different models emphasize different tasks which in turn make some tests like Pearson's r not ideal for pseudowords

Testing New Models

  • Do new NLP models capture pseudoword valence?
  • FastText was previously tried LLMs are explored
  • LLMs are trained on massive datasets via transformer architectures and predict large context window meaning

Challenges

  • LLM embeddings require regularization
  • Training on languages could detect universal patterns

Text Preprocessing

  • Different NLP models require different text preprocessing methods
  • Text processing is carried out using 3 steps

Key Concepts

  • Tokenization is splittint the text
  • 3 main methods applied like word-based splitting, FastText's uses n-grams, and special tokens like LLMs
  • Final 2 steps are converting word to base form named Lemmatization, and removing affixed known as stemming
  • Model performance will vary from proccessing.

Summary & Takeaways

Refines study

  • It refines current understanding of pseudoword proccessing aided by by testing new evalution
  • Replication matters; verification is needed when extending a study
  • Different categories of preudowords
  • Form-meaning mappings are multi-faceted, models have to be preprocessed
  • LLMs create new dimensions that needs normalization

Lecture 4

  • Title: Modeling Language Through Prediction – Word After Word After Word After…
  • Authors: Giovanni Cassani & Afra Alishahi
  • Topic: Understanding how language models predict words, covering n-gram models, Markov chains, neural embeddings, and transformers

Introduction: Language Models & Prediction

  • Language models rely on prediction and word sequences
  • Words are not independent, but have appear in patterns
  • Language models assign probabilities and generate new sentences in turn aiding speech and spelling error correction

N-Gram Language Models

  • They predict a word based on the past n words
  • Bigram models and Trigram follow same method
  • However Data sparsity is a issue and are limited by a no long-term context

Probabilities in Language Models

  • Prediction happens because probabilities are used
  • Used in spelling correction and Machine Translation

Chain Rule of Probability

  • Multiplication of probabilities represents a whole sentence probability Solution: Markov approximations and word embeddings

Markov Chains & Maximum Likelihood Estimation (MLE)

  • The tactic of approximating probability using history
  • Has 2 main challenges

Challenges

  • Uknown words
  • Data sparsity

Handling Unseen Words

  • This is aided using known words
  • A "Solution is needed

Smoothing Techniques

  • Probabilities have to be confirmed to avoid probability errors
  • Is aided by Laplace Smoothing which small counts to everypossible n grams are added or Backoff & Interpolation where a unigram estimate makes up for a missing bigram

N-Grams to Word Embeddings

  • Instead of storing a raw probability a language model maps vectors in a space

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Different Types of Pseudopods
18 questions
Amoeboid Protozoans Biology
8 questions

Amoeboid Protozoans Biology

GroundbreakingOcean8169 avatar
GroundbreakingOcean8169
Use Quizgecko on...
Browser
Browser