Podcast
Questions and Answers
Which French phrase best describes the action of habitually engaging in a specific activity?
Which French phrase best describes the action of habitually engaging in a specific activity?
- Je sais.
- Couramment.
- Peut-être.
- Avoir l'habitude de. (correct)
If someone says "J'en ai assez", which of the following is the most likely sentiment they are expressing?
If someone says "J'en ai assez", which of the following is the most likely sentiment they are expressing?
- They are expressing their tiredness.
- They are encouraging you to sit down.
- They have plenty of time to spare.
- They have had enough of something. (correct)
Which French phrase would you use to tell someone to sit down?
Which French phrase would you use to tell someone to sit down?
- Je pars.
- Assieds-toi. (correct)
- Je me lève.
- Je m'habille.
In a conversation about daily routines, what is the closest meaning to 'Je me couche'?
In a conversation about daily routines, what is the closest meaning to 'Je me couche'?
If someone says they are going home at one o'clock, what French phrase would they use?
If someone says they are going home at one o'clock, what French phrase would they use?
Which of these options refers to midnight?
Which of these options refers to midnight?
How would you express 'a quarter to two' in French?
How would you express 'a quarter to two' in French?
If you wanted to ask a question, which French phrase would you use?
If you wanted to ask a question, which French phrase would you use?
Which of these options translates to 'falling'?
Which of these options translates to 'falling'?
What is the closest meaning to the French phrase 'en plein air'?
What is the closest meaning to the French phrase 'en plein air'?
Which of the following options best translates to 'the surroundings'?
Which of the following options best translates to 'the surroundings'?
What is the best translation of 'le bac'?
What is the best translation of 'le bac'?
Which option is the closest to 'each'?
Which option is the closest to 'each'?
In what context would you use the French word 'pourtant'?
In what context would you use the French word 'pourtant'?
Which of the following best translates to 'to believe'?
Which of the following best translates to 'to believe'?
What is the meaning of the French verb 'réussir'?
What is the meaning of the French verb 'réussir'?
Which option best translates to 'the studies'?
Which option best translates to 'the studies'?
How would you say 'nowhere' in French?
How would you say 'nowhere' in French?
What does 'il était' mean?
What does 'il était' mean?
Which French verb means 'to stop'?
Which French verb means 'to stop'?
Flashcards
Croire
Croire
The act of believing or having faith.
Réussir
Réussir
To have a successful outcome.
Le bac
Le bac
Final exam or school leaving exam
Le monde
Le monde
Signup and view all the flashcards
Le travail
Le travail
Signup and view all the flashcards
Pourtant
Pourtant
Signup and view all the flashcards
Quitter
Quitter
Signup and view all the flashcards
Arrêter
Arrêter
Signup and view all the flashcards
Plusieurs
Plusieurs
Signup and view all the flashcards
Entre
Entre
Signup and view all the flashcards
Ensemble
Ensemble
Signup and view all the flashcards
Tous les jours
Tous les jours
Signup and view all the flashcards
Les études
Les études
Signup and view all the flashcards
La banlieue
La banlieue
Signup and view all the flashcards
Nombreux, -euse
Nombreux, -euse
Signup and view all the flashcards
Ne... aucun(e)
Ne... aucun(e)
Signup and view all the flashcards
Study Notes
Principal Component Analysis (PCA)
- Reduce dimension while retaining variance.
- Data is represented as $X = [x_1, \dots, x_n] \in \mathbb{R}^{d \times n}$ and is centered (mean is zero).
- New basis vectors are represented as $u_1, \dots, u_k \in \mathbb{R}^d$, forming an orthonormal basis.
- The variance retained is $\sum_{i=1}^k Var(u_i^T x)$, where $x$ is a sample from the data.
- $u_i$'s are the top $k$ eigenvectors of the covariance matrix $XX^T$.
Spectral Clustering
- Similarity graph is represented as $G = (V, E)$, where $V$ is the set of data points and $E$ is the set of edges, weighted by similarity.
- Similarity matrix is defined as $W_{ij} = similarity(x_i, x_j)$.
- Degree matrix is defined as $D_{ii} = \sum_j W_{ij}$.
- Laplacian matrix is defined as $L = D - W$.
- Normalized Laplacian is defined as $L_{norm} = D^{-1/2}LD^{-1/2}$.
Spectral Clustering Algorithm
- Construct the similarity graph.
- Compute the Laplacian matrix.
- Find the top $k$ eigenvectors of the Laplacian.
- Run $k$-means on the eigenvectors.
Topic Modeling
- Documents are made up of words.
- Words are tokens from a vocabulary
- Bag of words approach ignores the order of words in a document.
- Corpus is a collection of documents.
- Goal is to discover the hidden thematic structure of a corpus.
Latent Dirichlet Allocation (LDA)
- Utilizes a generative model, which is a probabilistic model describing how the data is generated.
- Uses Latent variables, which are hidden variables influencing the observed data.
LDA Model Parameters
- $\alpha$ is the Dirichlet prior on document-topic distributions.
- $\beta$ is the Dirichlet prior on topic-word distributions.
LDA Model Latent Variables
- $\theta_d$ is the topic distribution for document $d$.
- $z_{dn}$ is the topic assignment for the $n$th word in document $d$.
LDA Model Observed Variables
- $w_{dn}$ Stands for the $n$th word in document $d$.
LDA Generative Process
- For each topic $k \in {1, \dots, K}$:
- Draw $\beta_k \sim Dirichlet(\eta)$.
- For each document $d \in {1, \dots, D}$:
- Draw $\theta_d \sim Dirichlet(\alpha)$.
- For each word $n \in {1, \dots, N_d}$:
- Draw $z_{dn} \sim Multinomial(\theta_d)$.
- Draw $w_{dn} \sim Multinomial(\beta_{z_{dn}})$.
LDA Inference
- The goal is to estimate the posterior distribution $p(\theta, z | w, \alpha, \beta)$.
- Methods can vary. For example: Variational inference, or Collapsed Gibbs sampling.
Collapsed Gibbs Sampling
- Collapsed means integrating out $\theta$ and $\beta$.
- Gibbs sampling involves sampling each variable conditioned on all other variables.
- Update rule: $p(z_{di} = j | z_{-di}, w, \alpha, \eta) \propto \frac{n_{-di,j}^{(w_{di})} + \eta}{\sum_{w=1}^V n_{-di,j}^{(w)} + V\eta} \frac{n_{-di,d}^{(j)} + \alpha}{\sum_{j=1}^K n_{-di,d}^{(j)} + K\alpha}$
- $n_{-di,j}^{(w_{di})}$ is the number of times word $w_{di}$ is assigned to topic $j$, excluding the current assignment.
- $n_{-di,d}^{(j)}$ is the number of words in document $d$ assigned to topic $j$, excluding the current assignment.
Collapsed Gibbs Sampling Algorithm
- Initialize $z_{di}$ randomly for all $d$ and $i$.
- For each iteration:
- For each document $d$ and word $i$:
- Compute $p(z_{di} = j | z_{-di}, w, \alpha, \eta)$ for all topics $j$.
- Sample a new topic $z_{di}$ from this distribution.
- For each document $d$ and word $i$:
- After convergence, estimate $\theta$ and $\beta$ from the samples.
LDA Uses
- Document clustering: Documents with similar topic distributions are clustered together.
- Topic discovery: Discover the hidden topics in a corpus.
- Document summarization: Summarize a document by identifying its main topics.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.