CSAI 470: Deep Learning

Study Notes

Deep learning models include Multilayer Perceptron (MLP), Convolutional Neural Networks (CNNs), Sequence to Sequence Models, and Recurrent Neural Networks (RNNs)
RNNs have limitations, such as long-term dependencies and gated cells, which can be addressed using attention mechanisms

Attention is a mechanism that allows models to focus on specific parts of the input data
There are two types of attention: cross-attention and self-attention
Attention can be used with RNNs to address limitations
Attention is All You Need: a paper that introduced the transformer architecture, which relies solely on attention mechanisms

Transformers are a type of neural network architecture that uses self-attention mechanisms
Self-attention allows the model to weigh the importance of different input elements
Multi-head attention is a technique that allows the model to attend to different aspects of the input data
Masking is used to prevent the model from attending to certain input elements
Positional embeddings are used to encode the position of input elements
Residuals and normalization are techniques used to improve the stability and performance of the model

BERT is a pre-trained language model that uses transformers and attention mechanisms
Input representations include token embeddings, segment embeddings, and position embeddings
Pre-training tasks include masked language modeling and next sentence prediction
BERT can be fine-tuned for specific tasks, such as text classification and question-answering
Vision Transformers (ViTs) are a type of transformer architecture that can be used for computer vision tasks

Deep learning models include Multilayer Perceptron (MLP), Convolutional Neural Networks (CNNs), Sequence to Sequence Models, and Recurrent Neural Networks (RNNs)
RNNs have limitations, such as long-term dependencies and gated cells, which can be addressed using attention mechanisms

Attention is a mechanism that allows models to focus on specific parts of the input data
There are two types of attention: cross-attention and self-attention
Attention can be used with RNNs to address limitations
Attention is All You Need: a paper that introduced the transformer architecture, which relies solely on attention mechanisms

Transformers are a type of neural network architecture that uses self-attention mechanisms
Self-attention allows the model to weigh the importance of different input elements
Multi-head attention is a technique that allows the model to attend to different aspects of the input data
Masking is used to prevent the model from attending to certain input elements
Positional embeddings are used to encode the position of input elements
Residuals and normalization are techniques used to improve the stability and performance of the model

BERT is a pre-trained language model that uses transformers and attention mechanisms
Input representations include token embeddings, segment embeddings, and position embeddings
Pre-training tasks include masked language modeling and next sentence prediction
BERT can be fine-tuned for specific tasks, such as text classification and question-answering
Vision Transformers (ViTs) are a type of transformer architecture that can be used for computer vision tasks