PCA and Spectral Clustering

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which French phrase best describes the action of habitually engaging in a specific activity?

  • Je sais.
  • Couramment.
  • Peut-être.
  • Avoir l'habitude de. (correct)

If someone says "J'en ai assez", which of the following is the most likely sentiment they are expressing?

  • They are expressing their tiredness.
  • They are encouraging you to sit down.
  • They have plenty of time to spare.
  • They have had enough of something. (correct)

Which French phrase would you use to tell someone to sit down?

  • Je pars.
  • Assieds-toi. (correct)
  • Je me lève.
  • Je m'habille.

In a conversation about daily routines, what is the closest meaning to 'Je me couche'?

<p>I go to bed. (D)</p> Signup and view all the answers

If someone says they are going home at one o'clock, what French phrase would they use?

<p>À une heure. (A)</p> Signup and view all the answers

Which of these options refers to midnight?

<p>À minuit. (D)</p> Signup and view all the answers

How would you express 'a quarter to two' in French?

<p>Deux heures moins le quart. (C)</p> Signup and view all the answers

If you wanted to ask a question, which French phrase would you use?

<p>Poser une question. (B)</p> Signup and view all the answers

Which of these options translates to 'falling'?

<p>Tomber. (A)</p> Signup and view all the answers

What is the closest meaning to the French phrase 'en plein air'?

<p>Outdoors. (B)</p> Signup and view all the answers

Which of the following options best translates to 'the surroundings'?

<p>Les environs. (B)</p> Signup and view all the answers

What is the best translation of 'le bac'?

<p>The exam. (A)</p> Signup and view all the answers

Which option is the closest to 'each'?

<p>Chaque. (C)</p> Signup and view all the answers

In what context would you use the French word 'pourtant'?

<p>Contradicting a previous statement. (D)</p> Signup and view all the answers

Which of the following best translates to 'to believe'?

<p>Croire. (A)</p> Signup and view all the answers

What is the meaning of the French verb 'réussir'?

<p>To succeed. (B)</p> Signup and view all the answers

Which option best translates to 'the studies'?

<p>Les études. (B)</p> Signup and view all the answers

How would you say 'nowhere' in French?

<p>Ne... aucun(e). (A)</p> Signup and view all the answers

What does 'il était' mean?

<p>He was. (C)</p> Signup and view all the answers

Which French verb means 'to stop'?

<p>Arrêter. (B)</p> Signup and view all the answers

Flashcards

Croire

The act of believing or having faith.

Réussir

To have a successful outcome.

Le bac

Final exam or school leaving exam

Le monde

The world.

Signup and view all the flashcards

Le travail

The job, the work.

Signup and view all the flashcards

Pourtant

However or yet.

Signup and view all the flashcards

Quitter

The act of leaving a place.

Signup and view all the flashcards

Arrêter

To stop.

Signup and view all the flashcards

Plusieurs

Often or several.

Signup and view all the flashcards

Entre

Between.

Signup and view all the flashcards

Ensemble

Together.

Signup and view all the flashcards

Tous les jours

Each day

Signup and view all the flashcards

Les études

Studies.

Signup and view all the flashcards

La banlieue

The suburbs.

Signup and view all the flashcards

Nombreux, -euse

The number.

Signup and view all the flashcards

Ne... aucun(e)

None.

Signup and view all the flashcards

Study Notes

Principal Component Analysis (PCA)

  • Reduce dimension while retaining variance.
  • Data is represented as $X = [x_1, \dots, x_n] \in \mathbb{R}^{d \times n}$ and is centered (mean is zero).
  • New basis vectors are represented as $u_1, \dots, u_k \in \mathbb{R}^d$, forming an orthonormal basis.
  • The variance retained is $\sum_{i=1}^k Var(u_i^T x)$, where $x$ is a sample from the data.
  • $u_i$'s are the top $k$ eigenvectors of the covariance matrix $XX^T$.

Spectral Clustering

  • Similarity graph is represented as $G = (V, E)$, where $V$ is the set of data points and $E$ is the set of edges, weighted by similarity.
  • Similarity matrix is defined as $W_{ij} = similarity(x_i, x_j)$.
  • Degree matrix is defined as $D_{ii} = \sum_j W_{ij}$.
  • Laplacian matrix is defined as $L = D - W$.
  • Normalized Laplacian is defined as $L_{norm} = D^{-1/2}LD^{-1/2}$.

Spectral Clustering Algorithm

  • Construct the similarity graph.
  • Compute the Laplacian matrix.
  • Find the top $k$ eigenvectors of the Laplacian.
  • Run $k$-means on the eigenvectors.

Topic Modeling

  • Documents are made up of words.
  • Words are tokens from a vocabulary
  • Bag of words approach ignores the order of words in a document.
  • Corpus is a collection of documents.
  • Goal is to discover the hidden thematic structure of a corpus.

Latent Dirichlet Allocation (LDA)

  • Utilizes a generative model, which is a probabilistic model describing how the data is generated.
  • Uses Latent variables, which are hidden variables influencing the observed data.

LDA Model Parameters

  • $\alpha$ is the Dirichlet prior on document-topic distributions.
  • $\beta$ is the Dirichlet prior on topic-word distributions.

LDA Model Latent Variables

  • $\theta_d$ is the topic distribution for document $d$.
  • $z_{dn}$ is the topic assignment for the $n$th word in document $d$.

LDA Model Observed Variables

  • $w_{dn}$ Stands for the $n$th word in document $d$.

LDA Generative Process

  1. For each topic $k \in {1, \dots, K}$:
    • Draw $\beta_k \sim Dirichlet(\eta)$.
  2. For each document $d \in {1, \dots, D}$:
    • Draw $\theta_d \sim Dirichlet(\alpha)$.
    • For each word $n \in {1, \dots, N_d}$:
      • Draw $z_{dn} \sim Multinomial(\theta_d)$.
      • Draw $w_{dn} \sim Multinomial(\beta_{z_{dn}})$.

LDA Inference

  • The goal is to estimate the posterior distribution $p(\theta, z | w, \alpha, \beta)$.
  • Methods can vary. For example: Variational inference, or Collapsed Gibbs sampling.

Collapsed Gibbs Sampling

  • Collapsed means integrating out $\theta$ and $\beta$.
  • Gibbs sampling involves sampling each variable conditioned on all other variables.
  • Update rule: $p(z_{di} = j | z_{-di}, w, \alpha, \eta) \propto \frac{n_{-di,j}^{(w_{di})} + \eta}{\sum_{w=1}^V n_{-di,j}^{(w)} + V\eta} \frac{n_{-di,d}^{(j)} + \alpha}{\sum_{j=1}^K n_{-di,d}^{(j)} + K\alpha}$
    • $n_{-di,j}^{(w_{di})}$ is the number of times word $w_{di}$ is assigned to topic $j$, excluding the current assignment.
    • $n_{-di,d}^{(j)}$ is the number of words in document $d$ assigned to topic $j$, excluding the current assignment.

Collapsed Gibbs Sampling Algorithm

  1. Initialize $z_{di}$ randomly for all $d$ and $i$.
  2. For each iteration:
    • For each document $d$ and word $i$:
      • Compute $p(z_{di} = j | z_{-di}, w, \alpha, \eta)$ for all topics $j$.
      • Sample a new topic $z_{di}$ from this distribution.
  3. After convergence, estimate $\theta$ and $\beta$ from the samples.

LDA Uses

  • Document clustering: Documents with similar topic distributions are clustered together.
  • Topic discovery: Discover the hidden topics in a corpus.
  • Document summarization: Summarize a document by identifying its main topics.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Spectral Lines Quiz
5 questions
Spectral Clustering Quiz
10 questions
Spectral Lines and Doppler Effect Quiz
8 questions
Introduction to Spectral Clustering
13 questions
Use Quizgecko on...
Browser
Browser