Generative Artificial Intelligence PDF
Document Details
Uploaded by Deleted User
Prof. Mousa Ayyash
Tags
Summary
This document provides an outline of generative artificial intelligence, covering foundational concepts of generative modeling, core probabilistic principles, and different generative model families, and explores applications in various domains. The document delves into the historical context, the theoretical framework, and relevant practical considerations of generative AI, including codebase setup for training models.
Full Transcript
Generative Artificial Intelligence Chapter One B y P r o f. M o u s a Ay y a s h Outline 1. Introduction to Generative Modeling 2. Understanding Generative Models 3. Generative vs. Discriminative Modeling 4. The Rise of Generative Modeling 5. Representation Learning in Generative M...
Generative Artificial Intelligence Chapter One B y P r o f. M o u s a Ay y a s h Outline 1. Introduction to Generative Modeling 2. Understanding Generative Models 3. Generative vs. Discriminative Modeling 4. The Rise of Generative Modeling 5. Representation Learning in Generative Models 6. Core Probability Theory Concepts 7. Generative Model Taxonomy 8. Deep Learning in Generative Models 9. The Generative Deep Learning Codebase 2 1. Introduction to Generative Modeling Generative Modeling is a branch of machine learning that focuses on training models to produce new data resembling a given dataset. It does not require labeled data and aims to capture the underlying data distribution to generate new observations. 3 1. Introduction to Generative Modeling But what is Generative Modeling in practice? Suppose we have a dataset containing photos of horses. We can train a generative model on this dataset to capture the rules that govern the complex relationships between pixels in images of horses. Then we can sample from this A generative model trained to generate model to create novel, realistic realistic photos of horses. images of horses that did not (adapted from Book) exist in the original dataset. 4 2. Understanding Generative Models Learn Key Differences Explore Properties Understand how generative Examine the desirable models differ from characteristics of effective discriminative models in their generative models through approach and capabilities. simple examples. Grasp Core Concepts Survey Model Families Learn fundamental probabilistic Explore the different families concepts that form the of generative models and foundation of generative their unique approaches. modeling approaches. 5 2. Understanding Generative Models Learn Key Differences Generative models and discriminative models differ in their approach to learning from data and their capabilities. Generative models aim to understand the underlying distribution of the data and are used to generate new data that is similar to the training data. On the other hand, discriminative models focus on learning the boundary between different classes and are used for classification tasks. Generative models can be more challenging to train compared to discriminative models, but they can capture more complex relationships within the data. 6 2. Understanding Generative Models Explore Properties Consider aspects such as the ability to accurately capture the underlying data distribution, the capacity to generate realistic and diverse samples, and the efficiency in learning from limited data. Delve into specific techniques and methods used in generative modeling and analyze their impact on the quality of generated outputs. 7 2. Understanding Generative Models Grasp Core Concepts Acquire a thorough understanding of the core concepts by delving into fundamental probabilistic principles. These principles serve as the building blocks for various generative modeling approaches. 8 2. Understanding Generative Models Survey Model Families Dive into an in-depth exploration of the various families of generative models, including their unique approaches and methodologies. This includes but is not limited to, autoregressive models, generative adversarial networks (GANs), variational autoencoders (VAEs), flow-based models, and other emerging paradigms in the field of generative modeling. 9 3. Generative vs. Discriminative Modeling This model would learn that certain colors, shapes, and textures are more likely to indicate that a painting is by the Dutch master, and for paintings with these features, the model would upweight its prediction accordingly. A discriminative model trained to predict if a given image is painted by Van Gogh. (adapted from Book) 10 4. The Rise of Generative Modeling 1 2 3 4 Historical Focus on Recent Emerging Future Potential Discriminative Advancements Applications Generative Models Maturation of Proliferation of modeling seen as Discriminative machine learning companies offering key to unlocking tasks were easier technologies has generative services more sophisticated to tackle and more enabled significant for specific forms of artificial readily applicable progress in business problems, intelligence beyond to practical generative such as content discriminative problems across modeling tasks. generation and approaches. industries. creative tasks. 11 4. The Rise of Generative Modeling Face generation using generative modeling has improved signifyingly over the last decade (adapted from Book) 12 5. Representation Learning in Generative Models The following is the desirable properties of generative models, which enables the creation of new data with high fidelity to the original dataset: Accuracy Generation Representation 13 5. Representation Learning in Generative Models High-Dimensional Data 1 Start with complex, high-dimensional data like images or text. Latent Space Mapping Learn a lower-dimensional latent space 2 that captures key features of the data. Generative Mapping 3 Develop a function to map points in latent space back to the original data domain. Novel Generation 4 Sample new points in latent space and map them to generate novel, realistic data examples. 14 6. Core Probability Theory Concepts Core Probabilistic Concepts is about understanding sample space, probability density functions, parametric modeling, likelihood, and maximum likelihood estimation which is essential for grasping the foundations of generative modeling. 15 6. Core Probability Theory Concepts Probability Likelihood and Sample Parametric Maximum Density Space Modeling Likelihood Function Estimation A function Using a family of Techniques for The complete set representing the density functions finding the of all possible relative likelihood described by a optimal values an of a continuous finite number of random variable parameters that observation can parameters to falling within best explain take. estimate the true different intervals. observed data. data distribution. 16 7. Generative Model Taxonomy Generative models are categorized into: Explicit Density Models Directly model the density function Implicit Density Models Generate data through a stochastic process It has six families of generative models 17 7. Generative Model Taxonomy Variational Autoencoders Generative Explicit Approximate Energy-based Models Density Density Models Diffusion Models Autoregressive Tractable Models Density Normalizing Flow Models Implicit Approximate Density Density 18 7. Generative Model Taxonomy An Approximate Density A Tractable Model imposes Model uses: constraints on its architecture, Variational Autoencoders to which makes it easier to introduce a variable and optimize calculate the density, it uses: its joint function. Energy-based Models to utilize Autoregressive Models to require Markov chain sampling instead of an ordering in the input features variational methods. to generate sequential output. Diffusion Model to approximates Normalizing Flow Models to the density by training it to employ a set of invertible gradually remove the corrupted functions to produce complex image. distributions. 19 8. Deep Learning in Generative Models Complex Performance Flexibility Relationships Adaptable to Enables state-of- Deep neural the-art results in various generative networks can learn generating realistic modeling complex data and diverse data approaches and relationships from architectures scratch 20 9. The Generative Deep Learning Codebase Clone Repository GPU Options Use Git to clone the book's Run on CPU or set up a cloud- repository to your local machine. based GPU environment for faster training. 1 2 3 Set Up Docker Use Docker for easy setup across different architectures and operating systems. 21 9. The Generative Deep Learning Codebase You will first need to To clone the repository clone the Git repository for the book Git is an open-source version Navigate to the folder where control system and will allow you would like to store the files you to copy the code locally so Clone and type the following into your that you can run the notebooks Repository terminal: on your own machine, or in a git clone cloud-based environment. It https://github.com/davidADSP/Generative_Deep_ Learning_2nd_Edition.git could be installed from here: Git - Downloads (git-scm.com) 22 9. The Generative Deep Learning Codebase What is Docker? Get set up with Docker Docker is an open platform for If you've never used Docker developing, shipping, and running before, don't worry! I have applications. included a guide to Docker in Docker allows you to separate your Set Up the Docker README file in this applications from your infrastructure Docker repository, here: so you can deliver software quickly. https://github.com/davidADSP/Generative_Deep_ With Docker, you can manage your Learning_2nd_Edition/blob/main/docs/docker.m infrastructure in the same ways you d manage your applications. https://docs.docker.com/get-started/get-docker/ 23 9. The Generative Deep Learning Codebase What is GPU Get set up with Docker accelerator? If you've never used Docker GPU acceleration, or Graphics before, don't worry! I have Processing Unit acceleration, is a included a guide to Docker in computing technique that uses both GPU the Docker README file in this a Central Processing Unit (CPU) and Options repository, here: a Graphics Processing Unit (GPU) to https://github.com/davidADSP/Generative_Deep_ speed up the performance of data- Learning_2nd_Edition/blob/main/docs/docker.m intensive applications d https://docs.docker.com/get-started/get-docker/ 24 Generative Artificial Intelligence Chapter Two B y P r o f. M o u s a Ay y a s h Outline 1. Introduction to Deep Learning 2. The Anatomy of a Neural Network 3. Convolutional Layers: Exploiting Spatial Structure 4. Regularization Techniques 5. Generative Modeling with Deep Learning 6. Applications of Deep Learning 7. Ethical Considerations 8. Democratizing Deep Learning 2 1. Introduction to Deep Learning Deep learning is a class of Artificial Intelligence machine learning algorithms that uses Machine multiple stacked layers of Learning processing units to learn high-level representations Deep from unstructured data. Learning 3 1. Introduction to Deep Learning Structured data is a tabular Unstructured data refers to data, arranged into columns any data that is not naturally of features that describe arranged into columns of each observation, such as, a features, such as images, person’s age, gender, income, and height, all audio, and text features that helps to predict something. Many types of machine learning algorithms require structured data as input. Photo adopted from DryvIQ 4 1. Introduction to Deep Learning Deep learning is a powerful machine learning technique that excels at modeling complex, Unstructured Data. Unlike traditional algorithms that rely on carefully engineered features using structured data, deep learning networks can automatically learn high-level representations from raw unstructured data, making them incredibly versatile and capable of tackling a wide range of problems. 5 1. Introduction to Deep Learning Deep learning CAN BE applied to structured data, BUT its real power, especially regarding generative modeling, comes from its ability to work with unstructured data. 6 2. The Anatomy of a Neural Network Input Layer Hidden Layer Output Layer 7 2. The Anatomy of a Neural Network Input Layer The neural network starts with an input layer that receives the raw data, such as pixel values for an image or word 1 embeddings for text. This layer simply passes the input Input Layer through to the next layer without any transformation. 8 2. The Anatomy of a Neural Network Hidden Layers The real magic happens in the hidden layers, where the network learns to extract increasingly complex features 2 from the input. Each hidden layer applies a nonlinear transformation to its inputs, allowing the network to model highly intricate relationships in the data. Hidden Layer 9 2. The Anatomy of a Neural Network Output Layer The final output layer produces the desired prediction or generation, such as a classification label or a generated 3 image. The output is shaped by the specific task and the activation function used in this layer. Output Layer 10 3. Convolutional Layers: Exploiting Spatial Structure Spatial Parameter Hierarchical Awareness Sharing Learning 11 3. Convolutional Layers: Exploiting Spatial Structure Convolutional layers are designed to Spatial take advantage of the spatial Awareness structure inherent in data like images. By applying a sliding filter across the input, the network can efficiently detect local patterns and features, such as edges, textures, and shapes. 12 3. Convolutional Layers: Exploiting Spatial Structure The weights of the convolutional filter Parameter are shared across all spatial locations, Sharing reducing the number of parameters in the network and allowing the model to learn features that are useful regardless of their position in the input. 13 3. Convolutional Layers: Exploiting Spatial Structure Hierarchical Stacking multiple convolutional layers Learning allows the network to build a hierarchical representation, with lower layers detecting simple features and higher layers combining these into more complex, abstract concepts. 14 4. Regularization Techniques 1 2 3 Batch Normalization Dropout Early Stopping Batch normalization helps Dropout randomly Early stopping monitors the stabilize the training process deactivates a proportion of performance of the model on the units in a layer during a validation set and stops by normalizing the inputs to training, forcing the network each layer for each mini- to learn more robust and training when the validation batch, reducing the internal generalizable features that loss stops improving, covariate shift and allowing are not overly dependent on preventing overfitting to the for faster convergence. any single part of the model. training data. 15 4. Regularization Techniques Internal Covariate Shift in Batch Normalization Internal covariate shift refers to the phenomenon where the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This can slow down the training process because the network has to continuously Photo adopted from Patrick Glauner in his published paper “On the Reduction of Biases in Big Data Sets for the Detection of adapt to the new distributions. Irregular Power Usage” 16 7. Generative Model Taxonomy During training, dropout layers randomly select a set of units from the previous layer and set their output to zero. This simple technique effectively reduces overfitting by preventing the network from relying too much on specific units or groups of units that may only memorize training set observations. 17 5. Generative Modeling with Deep Learning Variational Autoencoders Variational autoencoders learn a probabilistic latent representation of the input data, allowing them to generate new samples that are similar to the training data. 18 5. Generative Modeling with Deep Learning Generative Adversarial Networks Generative adversarial networks pit a generator model against a discriminator model in a adversarial game, leading to the generator learning to produce highly realistic synthetic data. 19 5. Generative Modeling with Deep Learning Diffusion Models Diffusion models learn to gradually transform noise into realistic samples by modeling the reverse process of adding noise to data, enabling them to generate diverse and high-quality outputs. 20 6. Applications of Deep Learning Computer Vision Natural Speech Language Recognition Deep learning excels Processing at tasks like image Deep learning Deep learning-based classification, object models can speech recognition detection, and understand, systems can semantic generate, and transcribe audio into segmentation, with translate human applications in fields language, enabling text with high like healthcare, applications in accuracy, powering autonomous chatbots, machine voice assistants, translation, text vehicles, and summarization, and dictation software, security. sentiment analysis. and audio indexing. 21 7. Ethical Considerations Bias and Transparency and Fairness Accountability Privacy and Security 22 7. Ethical Considerations Bias and Fairness Deep learning models can perpetuate and amplify societal biases present in training data, leading to unfair and discriminatory outcomes. Careful data curation and model evaluation are crucial to ensure fairness. 23 7. Ethical Considerations Privacy and Security The powerful data processing capabilities of deep learning raise concerns about privacy, with potential misuse of personal information. Robust security measures and ethical guidelines are needed to protect user privacy. 24 7. Ethical Considerations Transparency and Accountability As deep learning systems become more complex and influential, there is a growing need for transparency in their decision- making processes and clear lines of accountability for their actions. 25 8. Democratizing Deep Learning Open-Source Educational Frameworks Resources Cloud Computing 26 8. Democratizing Deep Learning The availability of powerful open- source deep learning frameworks like TensorFlow and PyTorch has lowered the barrier to entry, enabling more Open-Source researchers and developers to experiment with and apply deep Frameworks learning techniques. 27 8. Democratizing Deep Learning Cloud-based deep learning platforms and services provide accessible and scalable computing resources, allowing individuals and small Cloud teams to train and deploy complex models without the need for Computing expensive hardware. 28 8. Democratizing Deep Learning A wealth of educational materials have emerged, empowering people from diverse backgrounds to learn Educational about deep learning and apply it to their own projects and domains. Resources 29 Generative Artificial Intelligence Chapter Three B y P r o f. M o u s a Ay y a s h Outline 1. Brian, the Stitch, and the Wardrobe (Story) 2. Introduction to Autoencoders 3. The Autoencoder Architecture 4. Limitations of Basic Autoencoders 5. Variational Autoencoders 6. Variational Autoencoders Architecture 7. Training on the CelebFaces Attributes Dataset 8. Generating New Faces 2 1. Brian, the Stitch, and the Wardrobe Imagine a magical wardrobe where items are organized, and with just the location, your stylist Brian can recreate any garment using his sewing machine. This incredible discovery also allows the creation of entirely new clothing items when an empty location is provided within the wardrobe. This story can be related to Autoencoders especially with their latent space and encoder. Photo created by Generative AI 3 2. Introduction to Autoencoders You play the part of the encoder, moving each item of clothing to a location in the wardrobe. This process is called encoding. Brian plays the part of the decoder, taking a location in the wardrobe and attempting to re-create the item. This process is called decoding. Each location in the wardrobe is represented by two numbers (a 2D vector). For example, the trousers here, are encoded to the point [6.3, –0.9]. This vector is also known as an embedding because the encoder attempts to embed as much information into it as possible, so that the decoder can produce an accurate Photo adopted from the book reconstruction. 4 2. Introduction to Autoencoders An Autoencoder is simply a neural network that is trained to perform the task of encoding and decoding an item, such that the output from this process is as close to the original item as possible. Crucially, it can be used as a generative model, because we can decode any point in the 2D space that we want (in particular, those that are not embeddings of original items) to produce a novel item of clothing. 5 2. Introduction to Autoencoders Now, we will be using the Fashion-MNIST dataset to test Autoencoders and to show you how do they actually work. Fashion-MNIST dataset is a collection of 70,000 grayscale images of clothing items. In this project we will be using both TenserFlow and Keras to build the structure of our Autoencoder using Fashion-MNIST dataset. Photo adopted from Margot Wagner’s Research GitHub repository for the project: about Multi-Layer Neural Network Classification of Generative_Deep_Learning_2nd_Edition/notebooks/03_vae/01_autoencoder/autoencoder.ipynb at main · davidADSP/Generative_Deep_Learning_2nd_Edition · GitHub Fashion MNIST Dataset 6 2. Introduction to Autoencoders Fashion-MNIST Dataset Image Size Each image is 28 x 28 pixels The images are categorized into 10 fashion-related Number of classes classes, such as t-shirt, trouser, and ankle boot Training and test The dataset is split into 60,000 images for training and sets 10,000 images for testing The Fashion-MNIST dataset shares the same image Structure size, data format, and training and testing split structure as the original MNIST dataset The Fashion-MNIST dataset is intended to be a modern Photo adopted from visualization of the Fashion MNIST Dataset in the Deep Lake UI Purpose replacement for the original MNIST dataset, which contains handwritten digits Each pixel has a value between 0 and 255, with higher Pixel values numbers indicating darker pixels 7 2. Introduction to Autoencoders The dataset comes prepackaged with TensorFlow, so it can be downloaded as follows: from tensorflow.keras import datasets (x_train, y_train), (x_test, y_test) = datasets.fashion_mnist.load_data() These are 28 × 28 grayscale images (pixel values between 0 and 255) out of the box, which we need to preprocess to ensure that the pixel values are scaled between 0 and 1. 8 2. Introduction to Autoencoders We will also pad each image to 32 × 32 for easier manipulation of the tensor shape as it passes through the network, this is how we preprocess our data: def preprocess(imgs): imgs = imgs.astype("float32") / 255.0 imgs = np.pad(imgs, ((0, 0), (2, 2), (2, 2)), constant_values=0.0) imgs = np.expand_dims(imgs, -1) return imgs x_train = preprocess(x_train) x_test = preprocess(x_test) 9 3. The Autoencoder Architecture Encoder Input Compresses high-dimensional input data like an image into a lower- dimensional embedding vector Encoder Latent Space Latent The compressed representation of the Space input data Decoder Decoder Decompresses the embedding vector back to the original data Output domain 10 3. The Autoencoder Architecture An input image is encoded to a latent embedding vector (also called latent space) z, which is then decoded back to the original pixel space. The autoencoder is trained to reconstruct an image, after it has passed through the encoder and back out through the decoder. 11 3. The Autoencoder Architecture Autoencoders as Denoising Models Autoencoders can be used to clean noisy images, since the encoder learns that it is not useful to capture the position of the random noise inside the latent space in order to reconstruct the original. For tasks such as this, a 2D latent space is probably too small to encode sufficient relevant information from the input. However, as we shall see, increasing the dimensionality of the latent space quickly leads to problems if we want to use the autoencoder as a generative model. 12 3. The Autoencoder Architecture Encoder In an autoencoder, the encoder’s job is to take the input image and map it to an embedding vector in the embedding space (latent space). It uses convolutional layers to compress the input image into a latent vector. ✓ We first create an Input layer for the image and pass this through three Conv2D layers in sequence, each capturing increasingly high-level features. ✓ We use a stride of 2 to halve the size of the output of each layer, while increasing the number of channels. ✓ The last convolutional layer is flattened and connected to a Dense layer of size 2, which represents our two-dimensional latent space. 13 3. The Autoencoder Architecture Encoder Architecture Layer (type) Output shape Param # InputLayer (None, 32, 32, 1) 0 Conv2D (None, 16, 16, 32) 320 Conv2D (None, 8, 8, 64) 18,496 Conv2D (None, 4, 4, 128) 73,856 Flatten (None, 2048) 0 Dense (None, 2) 4,098 Total params 96,770 Trainable params 96,770 Non-trainable params 0 14 3. The Autoencoder Architecture encoder_input = layers.Input( shape=(IMAGE_SIZE, IMAGE_SIZE, CHANNELS), name="encoder_input") Define the Input layer of the x = layers.Conv2D(32, (3, 3), strides=2, activation="relu", padding="same")( encoder (the image). encoder_input) Stack Conv2D layers sequentially x = layers.Conv2D(64, (3, 3), strides=2, activation="relu", on top of each other. padding="same")(x) Flatten the last convolutional x = layers.Conv2D(128, (3, 3), strides=2, activation="relu", padding="same")(x) layer to a vector. shape_before_flattening = K.int_shape(x)[1:] # the decoder will need Connect this vector to the 2D this! embeddings with a Dense layer. x = layers.Flatten()(x) The Keras Model that defines the encoder—a model that takes an encoder_output = layers.Dense(EMBEDDING_DIM, name="encoder_output")(x) input image and encodes it into a 2D embedding. encoder = models.Model(encoder_input, encoder_output) 15 3. The Autoencoder Architecture Decoder The decoder is a mirror image of the encoder—instead of convolutional layers. Uses transposed convolutional layers to reconstruct the image from the latent vector. In Keras, the Conv2DTranspose layer allows us to perform convolutional transpose operations on tensors. By stacking these layers, we can gradually expand the size of each layer, using strides of 2, until we get back to the original image dimension of 32 × 32. 16 3. The Autoencoder Architecture Decoder Architecture Layer (type) Output shape Param # InputLayer (None, 2) 0 Dense (None, 2048) 6,144 Reshape (None, 4, 4, 128) 0 Conv2DTranspose (None, 8, 8, 128) 147,584 Conv2DTranspose (None, 16, 16, 64) 73,792 Conv2DTranspose (None, 32, 32, 32) 18,464 Conv2D (None, 32, 32, 1) 289 Total params 246,273 Trainable params 246,273 Non-trainable params 0 17 3. The Autoencoder Architecture Convolutional Transpose Layers Standard convolutional layers (Conv2D) allow us to halve the size of an input tensor in both dimensions (height and width), by setting strides = 2. The convolutional transpose layer (Conv2DTranspose) uses the same principle as a standard convolutional layer (passing a filter across the image) but is different in that setting strides = 2 doubles the size of the input tensor in both dimensions. 18 3. The Autoencoder Architecture Convolutional Transpose Layers In a convolutional transpose layer, the strides parameter determines the internal zero padding between pixels in the image. Here, a 3 × 3 × 1 filter (gray) is being passed across a 3 × 3 × 1 image (blue) with strides = 2, to produce a 6 × 6 × 1 output tensor (green). 19 3. The Autoencoder Architecture decoder_input = layers.Input(shape=(EMBEDDING_DIM,), name="decoder_input") Define the Input layer of the x = layers.Dense(np.prod(shape_before_flattening))(decoder_input) decoder (the embedding). x = layers.Reshape(shape_before_flattening)(x) x = layers.Conv2DTranspose(128, (3, 3), strides=2, activation="relu", Connect the input to a Dense padding="same")(x) layer. x = layers.Conv2DTranspose(64, (3, 3), strides=2, activation="relu", padding="same")(x) Reshape this vector into a tensor x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", that can be fed as input into the padding="same")(x) first Conv2DTranspose layer. decoder_output = layers.Conv2D( CHANNELS, Stack Conv2DTranspose layers (3, 3), on top of each other. strides=1, The Keras Model that defines the activation="sigmoid", decoder—a model that takes an padding="same", embedding in the latent space name="decoder_output",)(x) and decodes it into the original decoder = models.Model(decoder_input, decoder_output) image domain. 20 3. The Autoencoder Architecture Joining the Encoder to the Decoder To train the encoder and decoder simultaneously, we need to define a model that will represent the flow of an image through the encoder and back out through the decoder. the output from the autoencoder is simply the output from the encoder after it has been passed through the decoder. After defining our model, we are need to compile it with a loss function and optimizer. ✓ The loss function is usually chosen to be either the root mean squared error (RMSE) or binary cross-entropy between the individual pixels of the original image and the reconstruction. 21 3. The Autoencoder Architecture autoencoder = The Keras Model that defines the models.Model(encoder_input, full autoencoder, a model that decoder(encoder_output)) takes an image and passes it through the encoder and back out through the decoder to generate a reconstruction of the autoencoder.compile(optimizer="ada original image m", loss="binary_crossentropy") 22 3. The Autoencoder Architecture Choosing the Loss Function Binary Cross- RMSE Entropy Penalizes overestimations and Penalizes errors asymmetrically, underestimations symmetrically, with more severe penalties for leading to outputs centered predictions closer to extremes. around the average pixel values. This tends to produce blurrier It produces sharper images but images by pushing predictions may result in pixelized edges. toward the middle (around 0.5). 23 3. The Autoencoder Architecture Training the autoencoder Reconstructing images using the autoencoder autoencoder.fit( n_to_predict = 5000 x_train, x_train, example_images = x_test[:n_to_predict] epochs=EPOCHS, batch_size=BATCH_SIZE, example_labels = y_test[:n_to_predict] shuffle=True, validation_data=(x_test, x_test), callbacks=[model_checkpoint_callback, tensorboard_callback],) 24 3. The Autoencoder Architecture Visualizing the Latent Space We can visualize how images are embedded into the latent space by passing the test set through the encoder and plotting the resulting embeddings. In the scatter plot each black point represents an image that has been embedded into the latent space. We can color each point based on the label of the corresponding image to produce the plot. embeddings = encoder.predict(example_images) figsize = 8 plt.figure(figsize=(figsize, figsize)) plt.scatter(embeddings[:, 0], embeddings[:, 1], c="black", alpha=0.5, s=3) plt.show() Photo adopted from the book 25 3. The Autoencoder Architecture Generating New Images We can generate novel images by sampling some points in the latent space and using the decoder to convert these back into pixel space mins, maxs = np.min(embeddings, axis=0), np.max(embeddings, axis=0) sample = np.random.uniform(mins, maxs, size=(6,3)) reconstructions = decoder.predict(sample) Each blue dot maps to one of the images Photo adopted from the book shown on the right of the diagram, with the embedding vector shown underneath. 26 4. Limitations of Basic Autoencoders 1 2 3 Discontinuous Undefined Poor Latent Space Distribution Generalization Large gaps No clear way to Decoded points between clusters sample new between clusters of encoded points from the may not produce points make latent space well-formed sampling difficult outputs Photo adopted from the book 27 4. Limitations of Basic Autoencoders In order to solve these problems and limitations, we need to convert our Autoencoder into a Variational Autoencoder. 28 5. Variational Autoencoders Revisiting the Infinite Wardrobe Instead of placing each clothing item at a specific spot in the wardrobe, you now assign a general area where it's more likely to be found. This helps smooth out gaps in the wardrobe. To keep things organized, you and Brian agree that each item's area should stay near the center and within a one-meter deviation. If you deviate from this rule, you pay Brian more. After some time, the new layout works well—there’s more variety in the items, and no gaps. Photo created by Generative AI 29 5. Variational Autoencoders Variational Autoencoders (VAEs) are a fundamental deep learning architecture for generative modeling. It is a type of autoencoder where the encoding distribution is regulated during training to ensure that its latent space has good properties for generating new data. The term "variational" refers to its close relationship with variational inference in statistics. 30 5. Variational Autoencoders Probabilistic Encoder Reparameterization Maps inputs to a distribution Trick in latent space rather than a Enables backpropagation single point through the sampling process KL Divergence Loss Encourages latent distributions to approximate a standard normal distribution 31 5. Variational Autoencoders In an autoencoder, each In a variational autoencoder, each image is instead image is mapped directly mapped to a multivariate to one point in the latent normal distribution around a space point in the latent space 32 Photo adopted from the book 5. Variational Autoencoders The Multivariate Normal Distribution A normal distribution (or Gaussian distribution) N(𝜇, 𝜎) is a probability distribution characterized by a distinctive bell curve shape, defined by two variables: the mean (𝜇) and the variance (𝜎 2 ). The standard deviation (𝜎) is the square root of the variance. (𝑥−𝜇)2 2 1 − 𝑓 𝑥 𝜇, 𝜎 = 2 𝑒 2𝜎2 2𝜋𝜎 33 5. Variational Autoencoders The Multivariate Normal Distribution We can sample a point 𝑧 from a normal distribution with mean 𝜇 and standard deviation 𝜎 using the following equation: 𝑧 = 𝜇 + 𝜎𝜖 Where 𝜖 is sampled from a standard normal distribution. Photo adopted from Wikipedia It shows several normal distributions in one dimension, for different values of the mean and variance. 34 5. Variational Autoencoders The Variational Autoencoders Encoder The encoder will take each input image and encode it to two vectors that together define a multivariate normal distribution in the latent space: z_mean The mean point of the distribution z_log_var The logarithm of the variance of each dimension We can sample a point z from the distribution defined by these values using the following equation: z = z_mean + z_sigma * epsilon Where: z_sigma = exp(z_log_var * 0.5) epsilon ~ N(0,I) The2 derivation of the relationship between z_sigma (𝜎) and z_log_var (log (𝜎 )) is as follows: 𝜎 = exp (log(𝜎)) = exp (2 log (𝜎) /2 = exp (log (𝜎 2 )/2) 35 6. Variational Autoencoders Architecture To summarize, the encoder will take each input image and encode it to two vectors that together define a multivariate normal distribution in the latent space: z_mean: The mean point of the distribution z_log_var: The logarithm of the variance of each dimension We can sample a point z from the distribution defined by these values using the following equation: z = z_mean + z_sigma * epsilon where: z_sigma = exp(z_log_var * 0.5) epsilon ~ N(0,I) 36 6. Variational Autoencoders Architecture Previously, there was no requirement for the latent space to be continuous even if the point (–2, 2) decodes to a well-formed image of a sandal, there’s no requirement for (–2.1, 2.1) to look similar. Now, since we are sampling a random point from an area around z_mean, the decoder must ensure that all points in the same neighborhood produce very similar images when decoded, so that the reconstruction loss remains small. 37 6. Variational Autoencoders Architecture Sampling Encoder Decoder Loss Layer Outputs Reconstructs Function Samples from mean and input from Combines latent log variance sampled reconstruction distribution of latent latent vector loss and KL using divergence reparameteriz distribution ation trick 38 6. Variational Autoencoders Architecture Sampling Layer class Sampling(layers.Layer): It allow us to sample from the distribution defined by z_mean and z_log_var. def call(self, inputs): We create a new layer by subclassing the Keras base Layer z_mean, z_log_var = inputs class, it can handle the sampling of z from a normal distribution with parameters defined by z_mean and batch = tf.shape(z_mean) z_log_var. This is useful when you want to apply a dim = tf.shape(z_mean) transformation to a tensor that is not already included as one of the out-of-the-box Keras layer types. epsilon = We use the reparameterization trick to build a sample from K.random_normal(shape=(batch, dim)) the normal distribution parameterized by z_mean and return z_mean + tf.exp(0.5 z_log_var. We can sample epsilon from a standard normal and then manually adjust the sample to have the correct * z_log_var) * epsilo mean and variance. 39 6. Variational Autoencoders Architecture Encoder... Instead of connecting the Flatten layer directly to the x = layers.Flatten()(x) 2D latent space, we connect it to layers z_mean and z_mean = layers.Dense(EMBEDDING_DIM, z_log_var. name="z_mean")(x) The Sampling layer samples a point z in the latent z_log_var = layers.Dense(EMBEDDING_DIM, space from the normal distribution defined by the name="z_log_var")(x) parameters z_mean and z_log_var. z = Sampling()([z_mean, z_log_var]) The Keras Model that defines the encoder—a model encoder = models.Model(encoder_input, [z_mean, z_log_var, z], name="encoder") that takes an input image and outputs z_mean, z_log_var, and a sampled point z from the normal distribution defined by these parameters. 40 6. Variational Autoencoders Architecture Encoder Model Architecture Layer (type) Output shape Param # Connected to InputLayer (input) (None, 32, 32, 1) 0 [] Conv2D (conv2d_1) (None, 16, 16, 32) 320 [input] Total params 100,868 Conv2D (conv2d_2) (None, 8, 8, 64) 18,496 [conv2d_1] Trainable params 100,868 Conv2D (conv2d_3) (None, 4, 4, 128) 73,856 [conv2d_2] Non-trainable params 0 Flatten (flatten) (None, 2048) 0 [conv2d_3] Dense (z_mean) (None, 2) 4,098 [flatten] Densee (z_log_var) (None, 2) 4,098 [flatten] Sampling (z) (None, 2) 0 [z_mean, z_log_var] 41 6. Variational Autoencoders Architecture Loss Function kl_loss = tf.reduce_mean( tf.reduce_sum( Previously, our loss function only consisted of the reconstruction loss between images and their attempted -0.5 copies after being passed through the encoder and decoder. The reconstruction loss also appears in a variational * (1 + z_log_var - tf.square(z_mean) - autoencoder, but we now require one extra component: the tf.exp(z_log_var)), Kullback–Leibler (KL) divergence term. axis=1, It will help to get us ae a well-defined distribution that we can use for choosing points in the latent space—the ) standard normal distribution. Also, since this term tries to ) force all encoded distributions toward the standard normal distribution, there is less chance that large gaps will form total_loss = reconstruction_loss + between point clusters. Instead, the encoder will try to use kl_loss the space around the origin symmetrically and efficiently. 42 6. Variational Autoencoders Architecture The KL divergence term in the loss function of a VAE The Kullback–Leibler (KL) divergence, also known as relative entropy, is a measure of how much one probability distribution differs from another. In a VAE, we want to measure how much our normal distribution with parameters z_mean and z_log_var differs from a standard normal distribution. In this special case, it can be shown that the KL divergence has the following closed form: kl_loss = -0.5 * sum(1 + z_log_var - z_mean ^ 2 - exp(z_log_var)) In Mathematical notation: 1 𝐷𝐾𝐿 𝑁 𝜇, 𝜎 || 𝑁 0, 1 = − (1 + log 𝜎 − 𝜇2 − 𝜎 2 ) 2 43 6. Variational Autoencoders Architecture The KL divergence term in the loss function of a VAE In the specific case of VAEs, the KL divergence term is used to ensure that the latent space representation of the data is close to a standard normal distribution (mean of 0 and standard deviation of 1). The latent space in a VAE is where data is encoded before being decoded back into the original domain. 44 6. Variational Autoencoders Architecture The KL divergence term ensures that the encoded representations (z_mean and z_log_var) stay close to a normal distribution. This helps the model generate more continuous and realistic images, as the encoder is now Photo adopted from the book stochastic rather than The new latent space: the black dots show the z_mean value of each encoded image, while blue deterministic. dots show some sampled points in the latent space (with their decoded images on the right) 45 6. Variational Autoencoders Architecture When visualizing the latent space by clothing type, no specific type dominates, indicating that the VAE has learned to differentiate between types without using labels during training, aiming to minimize reconstruction loss. By coloring points in the latent space by clothing type, we can see that there is no preferential treatment of any one type. Photo adopted from the book The righthand plot shows the space The latent space of the VAE colored by clothing type transformed into p-values—we can see that each color is approximately equally represented. 46 7. Training on the CelebFaces Attributes Dataset The CelebA dataset images cover large pose variations and background clutter, and include faces with different expressions, lighting conditions, and occlusions. The dataset is widely used for training and evaluating face attribute recognition and generative models. This dataset is also available through Kaggle, so you can download the dataset by running the Kaggle dataset downloader script in the book repository: bash scripts/download_kaggle_data.sh jessicali9530 celeba- Photo adopted from the book dataset 47 7. Training on the CelebFaces Attributes Dataset Dataset CelebFaces Attributes (CelebA) Images Over 200,000 celebrity face images Size Resized to 32x32 pixels Channels RGB (3 channels) Latent Dimensions 200 48 7. Training on the CelebFaces Attributes Dataset We need to change the number of channels in the final convolutional transpose layer of the decoder to 3 because our data has 3 RGB input channels instead of one grayscale. Sample Latent Vector 1 Draw random samples from 200-dimensional standard normal distribution instead of 2 Decode 2 Pass sampled vectors through trained VAE decoder Generate 3 Obtain novel face images as decoder output after around five epochs of training. Photo adopted from the book 49 8. Generating New Faces grid_width, grid_height = (10, 3) Sample 30 points from a standard z_sample = np.random.normal(size=(grid_width * multivariate normal distribution with 200 grid_height, Z_DIM)) dimensions. reconstructions = decoder.predict(z_sample) Decode the sampled points. fig = plt.figure(figsize=(18, 5)) fig.subplots_adjust(hspace=0.4, wspace=0.4) Plot the images! # Output the grid of faces The VAE is able to take the set of points that we for i in range(grid_width * grid_height): sampled from a standard normal distribution and ax = fig.add_subplot(grid_height, convert each into a convincing image of a person’s grid_width, i + 1) face. This is our first glimpse of the true power of ax.axis("off") generative models! ax.imshow(reconstructions[i, :, :]) 50 8. Generating New Faces Photo adopted from the book New Generated faces 51 8. Generating New Faces Latent Space Arithmetic Adjust Age Add Smile Add Glasses Change Hair Add/subtract Add "smile" Add "glasses" Add hair color "young" vector vector to make vector to add vectors to to change face happier eyewear modify style apparent age 52 8. Generating New Faces Conceptually, we are performing the following vector arithmetic in the latent space, where alpha is a factor that determines how much of the feature vector is added or subtracted: z_new = z + alpha * feature_vector You will notice in the photo that we can add or subtract multiples of a certain vector (e.g., Smiling, Black_Hair, Eyeglasses, Young, Male, Blond_Hair) to obtain different versions of the image, with only the Photo adopted from the book relevant feature changed. 53 8. Generating New Faces Morphing Between Faces 1 2 3 4 Encode Start Interpolate Face Generate points Encode End Decode Map first face to Face along line from A latent vector A to B Convert Map second face interpolated points to latent vector B to face images 54 8. Generating New Faces Mathematically, we are traversing a straight line, which can be described by the following equation: z_new = z_A * (1- alpha) + z_B * alpha Here, alpha is a number between 0 and 1 that determines how far along the line we are, away from point A. Photo adopted from the book 55 Generative Artificial Intelligence Chapter Four B y P r o f. M o u s a Ay y a s h Outline 1. Brickki Bricks and the Forgers (story) 2. Introduction to Generative Adversarial Network (GAN) 3. Deep Convolutional GAN (DCGAN) 4. Training the DCGAN 5. Challenges in GAN Training 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) 7. Conditional GAN (CGAN) 8. Paired Image-to-Image Translation (Pix2Pix) 2 1. Brickki Bricks and the Forgers To illustrate the fundamental concepts of GANs … Consider the story of Brickki, a company producing high-quality building blocks. A competitor starts making counterfeit bricks, mixing them into Brickki's supply. As the quality control expert, you must learn to spot the fakes. Over time, you improve at detection, while the forgers enhance their techniques. This cat-and- mouse game drives significant improvements in both forgery and detection quality. Photo created by Generative AI 3 1. Brickki Bricks and the Forgers Photo created by Generative AI This analogy mirrors the GAN training process, where the generator (forger) and discriminator (quality control) continuously improve through adversarial training. The generator converts random noise into fake data, while the discriminator predicts whether samples are real or fake. 4 2. Introduction to Generative Adversarial Network (GAN) Generative adversarial networks (GANs) are a powerful class of generative models that have revolutionized the field of artificial intelligence. Introduced in 2014 by Ian Goodfellow et al., GANs consist of two neural networks - a Generator and a Discriminator - that are trained simultaneously through an adversarial process. The generator learns to create realistic fake data, while the discriminator learns to distinguish real data from fake. Photo created by Generative AI 5 2. Introduction to Generative Adversarial Network (GAN) Inputs and outputs of the two networks in a GAN Photo adopted from the book 6 2. Introduction to Generative Adversarial Network (GAN) 𝑋𝑟𝑒𝑎𝑙 Discriminator Y z Generator 𝑋𝑓𝑎𝑘𝑒 Generative Adversarial Network Diagram 7 2. Introduction to Generative Adversarial Network (GAN) 1 2 3 Generator Creates Discriminator Adversarial Evaluates Training The generator network The networks are takes random noise as The discriminator trained alternately, with input and produces fake network examines both the generator trying to data samples. real and generated fool the discriminator and the discriminator samples, predicting if improving its detection they are real or fake. abilities. 8 3. Deep Convolutional GAN (DCGAN) The Deep Convolutional GAN (DCGAN) is one of the first major developments in GAN architecture. It uses convolutional layers in both the generator and discriminator networks to generate realistic images. We'll build a DCGAN to generate images of toy bricks using Keras. 9 3. Deep Convolutional GAN (DCGAN) The DCGAN architecture includes several key components: Convolutional Layers LeakyReLU Activation Both the generator and LeakyReLU activation functions discriminator use convolutional are used in the discriminator to layers to process and generate prevent vanishing gradients. image data. Batch Normalization Tanh Output Batch normalization is used to The generator uses a tanh stabilize training and improve activation in the output layer to gradient flow. produce images in the range [- 1, 1]. 10 3. Deep Convolutional GAN (DCGAN) The Bricks Dataset consists of computer-generated images of LEGO bricks. It contains 40,000 photos of 50 different toy bricks captured from various angles, the dataset offers a diverse training set for the GAN. This allows the GAN to create new, realistic images resembling LEGO bricks. This dataset is also available through Kaggle, so you can download the dataset by running the Kaggle dataset downloader script in the Photos adopted from Kaggle book repository: bash scripts/download_kaggle_data.sh joosthazelzet lego-brick- images 11 3. Deep Convolutional GAN (DCGAN) We start preparing our data. train_data = utils.image_dataset_from_directory This allows us to read batches ( "/app/data/lego-brick- of images into memory only images/dataset/", when required (e.g., during labels=None, training), so that we can work color_mode="grayscale", with large datasets and not image_size=(6, 6), batch_size=128, worry about having to fit the shuffle=True, entire dataset into memory. It seed=42, also resizes the images to 64 interpolation="bilinear", × 64, interpolating between ) pixel values. 12 3. Deep Convolutional GAN (DCGAN) The Discriminator The goal of the discriminator is to predict if an image is real or fake. This is a supervised image classification problem, so we can use a similar architecture to those we worked with in Chapter 2: stacked convolutional layers, with a single output node. 13 3. Deep Convolutional GAN (DCGAN) discriminator_input = layers.Input(shape=(64, 64, 1)) 1. Define the Input layer of the x = layers.Conv2D(64, kernel_size=4, strides=2, discriminator (the image). padding="same", use_bias=False)( 2. Stack Conv2D layers on top of discriminator_input each other, with ) BatchNormalization, LeakyReLU activation, and Dropout layers x = layers.LeakyReLU(0.2)(x) sandwiched in between.... )(x) 3. Flatten the last convolutional... x = layers.BatchNormalization(momentum=0.9)(x) layer—by this point, the shape of... the tensor is 1 × 1 × 1, so there is no need for a final Dense layer. )(x) discriminator_output = layers.Flatten()(x) 4. The Keras model that defines the discriminator—a model that discriminator = models.Model(discriminator_input, takes an input image and outputs discriminator_output) a single number between 0 and 1. 14 3. Deep Convolutional GAN (DCGAN) The Generator The input to the generator will be a vector drawn from a multivariate standard normal distribution. The output is an image of the same size as an image in the original training data. This description may remind you of Photos adopted from Source the decoder in a Variational Autoencoder (VAE). 15 3. Deep Convolutional GAN (DCGAN) generator_input = layers.Input(shape=(Z_DIM,)) 1. Define the Input layer of the generator—a x = layers.Reshape((1, 1, Z_DIM))(generator_input) vector of length 100. x = layers.Conv2DTranspose( 2. We use a Reshape layer to give a 1 × 1 × 100 tensor, so that we can start applying 512, kernel_size=4, strides=1, padding="valid", convolutional transpose operations. use_bias=False 3. We pass this through four )(x) Conv2DTranspose layers, with x = layers.BatchNormalization(momentum=0.9)(x) BatchNormalization and LeakyReLU layers sandwiched in between. x = layers.LeakyReLU(0.2)(x) 4. The final Conv2DTranspose layer uses a... tanh activation function to transform the output to the range [–1, 1], to match the... original image domain.... 5. The Keras model that defines the )(x) generator—a model that accepts a vector of length 100 and outputs a tensor of generator = models.Model(generator_input, shape [64, 64, 1]. generator_output) 16 3. Deep Convolutional GAN (DCGAN) UpSampling VS. Conv2DTranspose X = layers.UpSampling2D(size = 2)(x) X = layers.Conv2D(256, kernel_size=4, We can use an UpSampling2D layer followed by strides=1, padding="same")(x) a Conv2D layer with a stride of 1. Instead of using Conv2DTranspose layers. In this method, UpSampling2D doubles the size of the input by repeating each row and column, and the Conv2D layer applies convolution. Unlike Conv2DTranspose, which adds zeros between pixels, UpSampling2D repeats pixel values. Conv2DTranspose can sometimes create unwanted patterns (checkerboard artifacts), but it’s still widely used in many powerful GANs. Artifacts when using convolutional transpose layers Photos adopted from Source 17 4. Training the DCGAN Photo adopted from the book 18 4. Training the DCGAN class DCGAN(models.Model): def __init__(self, discriminator, generator, The discriminator and generator latent_dim): are constantly fighting for... dominance, which can make the def compile(self, d_optimizer, g_optimizer): DCGAN training process... unstable. def metrics(self):... Ideally, the training process will def train_step(self, real_images): find an equilibrium that allows... the generator to learn meaningful information from the dcgan = DCGAN( discriminator and the quality of discriminator=discriminator, generator=generator, latent_dim=Z_DIM) the images will start to improve. 19 4. Training the DCGAN Discriminator Training 1 The discriminator is trained on both real and generated images, learning to distinguish between them. Training a DCGAN Generator Training involves alternating 2 The generator is trained to produce images between updating the that fool the discriminator, using the discriminator's feedback. discriminator and Loss Functions generator networks. The 3 Binary cross-entropy loss is typically used for process can be both networks, measuring how well they perform their respective tasks. challenging due to the need to balance the Hyperparameter Tuning training of both networks. 4 Careful tuning of learning rates, batch sizes, and network architectures is crucial for stable training. 20 4. Training the DCGAN After enough epochs, the discriminator tends to end up dominating, but this may not be a problem as the generator may have already learned to produce sufficiently high- quality images by this point. Photo adopted from the book 21 4. Training the DCGAN One of the requirements of a successful def compare_images(img1, img2): generative model is that it doesn't only return np.mean(np.abs(img1-img2)) reproduce images from the training set. To test this, we can find the image from the training set that is closest to a particular generated example. A good measure for distance is the compare_images function. Through the figure we can see that while there is some degree of similarity between the generated images and the training set, they are not identical. This shows that the generator has understood these high-level features and can generate examples that are distinct from those it has already seen. Photo adopted from the book 22 5. Challenges in GAN Training Mode Collapse Vanishing Training Difficulty in Gradients Instability Evaluation The generator If the The adversarial Unlike other may produce a discriminator nature of GANs models, GANs limited variety becomes too can lead to lack a clear of samples, powerful, it may oscillations and objective metric provide little failure to for evaluating failing to useful feedback converge during performance, capture the full to the training. making it diversity of the generator, challenging to training data. leading to assess progress. vanishing gradients. 23 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) Wasserstein GAN Wasserstein GAN (WGAN) is a variant of generative adversarial network (GAN) proposed in 2017 that aims to “improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches”. Compared with the original GAN discriminator, the Wasserstein GAN discriminator provides a better learning signal to the generator. This For GAN (the red line), it fills with areas with allows the training to be more stable when diminishing or exploding gradients. For WGAN generator is learning distributions in very high (the blue line), the gradient is smoother dimensional spaces. everywhere and learns better even the generator is not producing good images. Photo adopted from Source 24 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) We are using binary cross-entropy loss to train the discriminator and generator of the GAN 𝑛 1 − (𝑦𝑖 log 𝑝𝑖 + (1 − 𝑦𝑖 ) log 1 − 𝑝𝑖 Binary cross-entropy loss 𝑛 𝑖=1 To train the GAN discriminator 𝐷, we calculate the loss when comparing predictions for real images 𝑝𝑖 = 𝐷 𝑝𝑖 to the response 𝑦𝑖 = 1 and predictions for generated images 𝑝𝑖 = 𝐷 𝐺(𝑧𝑖 ) to the response 𝑦𝑖 = 0 GAN discriminator loss minimization 25 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) To train the GAN generator 𝐺, we calculate the loss when comparing predictions for generated images 𝑝𝑖 = 𝐷 𝐺(𝑧𝑖 ) to the response 𝑦𝑖 = 1 GAN generator loss minimization Wasserstein loss requires that we use 𝑦𝑖 = 1 and 𝑦𝑖 = −1 as labels, rather than 1 and 0. We also remove the sigmoid activation from the final layer of the discriminator, so that predictions 𝑝𝑖 are no longer constrained to fall in the range [0, 1] but instead can now be any number in the range (−∞, ∞). For this reason, the discriminator in a WGAN is usually referred to as a critic that outputs a score rather than a probability The Wasserstein loss function 26 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) To train the WGAN critic 𝐷, we calculate the loss when comparing predictions for real images images 𝑝𝑖 = 𝐷 x𝑖 to the response 𝑦𝑖 = 1 and predictions for generated images 𝑝𝑖 = 𝐷 𝐺(𝑧𝑖 ) to the response 𝑦𝑖 = −1. In other words, the WGAN critic tries to maximize the difference between its predictions for real images and generated images Full WGAN loss minimization To train the WGAN generator, we calculate the loss when comparing predictions for generated images 𝑝𝑖 = 𝐷 𝐺(𝑧𝑖 ) to the response 𝑦𝑖 = 1. In other words, the WGAN generator tries to produce images that are scored as highly as possible by the critic (i.e., the critic is fooled into thinking they are real) WGAN generator loss function 27 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) The Lipschitz Constraint The Lipschitz constraint ensures that the critic function in a WGAN (Wasserstein GAN) behaves in a controlled way. Instead of restricting the critic’s output to a range like [0, 1] using a sigmoid function, the output can now be any value between (-∞, ∞). The critic is a function 𝐷 that converts an image into a prediction. We say that this function is 1-Lipschitz if it satisfies the following inequality for any two input images, 𝑥1 and 𝑥2 : 𝐷 𝑥1 −𝐷(𝑥2 ) ≤1 𝑥1 − 𝑥2 28 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) The Lipschitz Constraint Here, 𝑥1 − 𝑥2 is the average pixelwise absolute difference between two images and 𝐷 𝑥1 − 𝐷(𝑥2 ) is the absolute difference between the critic predictions. Essentially, we require a limit on the rate at which the predictions of the critic can change between two images (i.e., the absolute We can see this applied to a Lipschitz continuous 1D function in this figure at no point does the line enter the value of the gradient must be at most 1 cone, wherever you place the cone on the line. In other everywhere). words, there is a limit on the rate at which the line can rise or fall at any point. Photo adopted from the book 29 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) Wasserstein GAN with Gradient Penalty (WGAN-GP) Wasserstein GAN with Gradient Penalty (WGAN-GP) a very popular method for training generative models to produce high quality synthetic data. WGAN-GP uses the Wasserstein loss formulation plus a gradient norm penalty to achieve Lipschitz continuity. The gradient penalty (GP) in WGAN-GP enforces a Lipschitz constraint on the discriminator, which further improves the model's performance and stability during training. 30 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) WGAN vs. WGAN-GP The problems with WGAN mostly stem from the way it uses weight clipping to make sure the critic is Lipschitz continuous. WGAN-GP, on the other hand, replaces weight clipping with a constraint on the gradient norm of the critic to ensure Lipschitz continuity. This change makes the network training more stable compared to WGAN and needs very little hyper- parameter tuning. Photo adopted from Source 31 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) The WGAN-GP critic training process Photo adopted from the book 32 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) Remove Sigmoid 1 The critic (discriminator) no longer uses a sigmoid activation in the output layer. WGAN-GP Building a Wasserstein Loss 2 Implement the Wasserstein loss function for both the generator and critic. Gradient Penalty 3 Add a gradient penalty term to the critic's loss function to enforce the Lipschitz constraint. Training Ratio 4 Train the critic multiple times for each generator update to ensure it approaches optimality. 33 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) Gradient Penalty The gradient penalty loss measures the squared difference between the norm of the gradient of the predictions with respect to the input images and 1. The model will naturally be inclined to find weights that ensure the gradient penalty term is minimized, thereby encouraging the model to conform to the Lipschitz constraint. The WGAN-GP evaluates the gradient at only a handful of points, because it is hard to calculate this gradient during the training process To ensure a balanced mix, we use a set of interpolated images that lie at randomly chosen points along lines connecting the batch of real images to the batch of fake images pairwise Photo adopted from the book 34 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) This image shows some results from training the WGAN-GP after 25 epochs of training to generate new faces Photo adopted from the book 35 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) We can also see how the loss functions of the model evolve over time the loss functions of both the critic and generator are highly stable and convergent. WGAN-GP loss curves: The critic loss (epoch_c_loss) is broken down into the Wasserstein loss (epoch_c_wass) and the gradient penalty loss (epoch_c_gp) Photo adopted from the book 36 6. Wasserstein GAN with Gradient Penalty (WGAN-GP) It is also true that GANs are generally more difficult to train than VAEs and take longer to reach a satisfactory quality. However, many state-of-the-art generative models today are GAN-based, as the rewards for training large-scale GANs on GPUs over a longer period of time are significant. 37 7. Conditional GAN (CGAN) Conditional GAN (CGAN) is about building a GAN where we can be able to control the output. This idea, first introduced in “Conditional Generative Adversarial Nets” by Mirza and Osindero in 2014, is a relatively simple extension to the GAN architecture. It takes advantage of labels during the training process. Generator — Given a label and random array as input, this network generates data with the same structure as the training data observations corresponding to the same label. 38 7. Conditional GAN (CGAN) 𝑋𝑟𝑒𝑎𝑙 𝑋𝑟𝑒𝑎𝑙 z Discriminator Y Generator 𝑋𝑓𝑎𝑘𝑒 𝑌𝑓𝑎𝑘𝑒 Conditional Generative Adversarial Network Diagram 39 7. Conditional GAN (CGAN) Both generator and discriminator are fed a class label and conditioned on it. All other components are exactly what you see in a typical GANs framework, this being more of an architectural modification. During the training of the CGAN: The Generator is parameterized to learn and produce realistic samples for each label in the training dataset. The Discriminator learns to distinguish fake and real samples, given the label information. However, their roles don’t change. The Generator and Discriminator continue to generate and classify images just like before, but with conditional auxiliary information. 40 7. Conditional GAN (CGAN) CGAN Improved Versatile Conditional Controlled Applications Input Generation Discrimination CGANs enable The generator The discriminator tasks like Both the generator learns to judge not generating images learns to produce and discriminator samples that only if a sample is of specific classes receive additional real or fake, but or with specific match the given attributes, input specifying condition, allowing also if it matches enhancing the the desired output for more targeted the given utility of GANs in class or attributes. output. condition. various domains. 41 7. Conditional GAN (CGAN) Inputs and outputs of the generator and critic in a CGAN Photo adopted from the book 42 7. Conditional GAN (CGAN) NOTE In our example, our one-hot encoded label will have length 2, because there are two classes (Blonde and Not Blond). However, you can have as many labels as you like for example, you could train a CGAN on the Fashion-MNIST dataset to output one of the 10 different fashion items, by incorporating a one-hot encoded label vector of length 10 into the input of the generator and 10 additional one-hot encoded label channels into the input of the critic. 43 7. Conditional GAN (CGAN) Implementing a CGAN 1 2 3 4 Conditional Generator Discriminator Training Input Modification Modification Process Train the CGAN similarly Add an additional input Concatenate the Provide the conditional to a standard GAN but to both the generator conditional input with information alongside ensure that matching conditional information and discriminator for the the random noise input the image input to the is provided to both conditional information in the generator. discriminator. networks during (e.g., class labels). training. 44 7. Condit