w05deepneuralnetworksdnn_076569fe8eab29c05e77b0e6181a603c_.pdf

Deep Neural Networks (DNN) Deep Learning Dr. Mohammed SALEM Dr. Mohammed SALEM 1 DNN it can be referred to by any of these names: ANN (Artificial Neural Networks) DNN (Deep Neural Networks) MLP (Multilayer Perceptrons ) Fully connected networks (FCN) Feedforward networks Dr. Mohammed SALEM 2 Linear vs. Nonlinear Problems Linear datasets—The data can be split with a single straight line. Nonlinear datasets—The data cannot be split with a single straight line. We need more than one line to form a shape that splits the data Dr. Mohammed SALEM 3 MLP z1 = (x1 w1 + x2 w2 ) + b1 A perceptron is a linear function that produces a straight line. In order to fit this data, we try to create a triangle-like shape that splits the dark dots. It looks like three lines would do the job. z2 = (x1 w1 + x2 w2 ) + b2 Try this website-→ Playground.Tensorflow.org z3 = (x1 w1 + x2 w2 ) + b3 Dr. Mohammed SALEM 4 How many layers, and how many nodes in each layer? A network can have one or more hidden layers. Each layer has one or more neurons. Your main job, as a machine learning engineer, is to design these layers. There is no single prescribed recipe that fits all models. Usually, when we have two or more hidden layers, we call this a deep neural network (DNN). Start from that point, maybe three to five layers, and observe the network performance. If it is performing poorly (underfitting), add more layers. If you see signs of overfitting, then decrease the number of layers. To build a sense play around with https://playground.tensorflow.org. Dr. Mohammed SALEM 5 Overfitting vs Underfitting The deeper your network is, the more it will fit the training data. But too much depth is not a good thing, because the network can fit the training data so much that it fails to generalize when you show it new data (overfitting). Dr. Mohammed SALEM 6 TensorFlow TensorFlow has got some really cool features. It provides a simplified workflow and uses Keras as the main API for building deep learning models. Pip install tensorflow Dr. Mohammed SALEM 7 KERAS Keras is another popularly used deep learning library. Implementation of the Keras API, the high-level API of TensorFlow. It has modules, classes and functions. Dr. Mohammed SALEM 8 KERAS Modules activations module: Built-in activation functions. applications module: Keras Applications are premade architectures with pre-trained weights. backend module: Keras backend API. callbacks module: Callbacks: utilities called at certain points during model training. constraints module: Constraints: functions that impose constraints on weight values. Dr. Mohammed SALEM 9 KERAS Modules datasets module: Small NumPy datasets for debugging/testing. dtensor module: Keras' DTensor library. estimator module: Keras estimator API. experimental module: Public API for tf.keras.experimental namespace. initializers module: Keras initializer serialization / deserialization. layers module: Keras layers API. Dr. Mohammed SALEM 10 KERAS Modules losses module: Built-in loss functions. metrics module: All Keras metrics. mixed_precision module: Keras mixed precision API. models module: Keras models API. optimizers module: Built-in optimizer classes. preprocessing module: Utilities to preprocess data before training. regularizers module: Built-in regularizers. utils module: Public Keras utilities. Dr. Mohammed SALEM 11 KERAS Classes & Functions Classes class Model: Model groups layers into an object with training and inference features. class Sequential: Sequential groups a linear stack of layers into a tf.keras.Model. Functions Input(...): Input() is used to instantiate a Keras tensor. Dr. Mohammed SALEM 12 First contact with Keras The core data structures of Keras are layers and models. The simplest type of model is the Sequential model, a linear stack of layers. For more complex architectures, you should use the Keras functional API, which allows to build arbitrary graphs of layers, or write models entirely from scratch via subclasssing. Dr. Mohammed SALEM 13 First contact with Keras Here is the Sequential model: from tensorflow.keras.models import Sequential model = Sequential() Stacking layers is as easy as.add(): from tensorflow.keras.layers import Dense model.add(Dense(units=64, activation='relu')) model.add(Dense(units=10, activation='softmax')) Dr. Mohammed SALEM 14 First contact with Keras Once your model looks good, configure its learning process with.compile(): model.compile(loss='categorical_crossentropy', optimizer='sgd’, metrics=['accuracy']) If you need to, you can further configure your optimizer. model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.SGD(learning_rate=0.01, momentum=0.9, nesterov=True)) You can now iterate on your training data in batches: # x_train and y_train are Numpy arrays model.fit(x_train, y_train, epochs=5, batch_size=32) Evaluate your test loss and metrics in one line: Loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128) Dr. Mohammed SALEM 15 XOR Example with Tensorflow XOR_Example_dev Dr. Mohammed SALEM 16 Model investigation https://netron.app/ Dr. Mohammed SALEM 17 Image classification using MLP Let’s recall the MLP architecture: Neurons are stacked in layers on top of each other, with weight connections. The MLP architecture consists of an input layer, one or more hidden layers, and an output layer. Use MLP to solve an image classification problem using the MNIST dataset. The goal of this classifier will be to classify images of digits from 0 to 9 (10 classes). Dr. Mohammed SALEM 18 Input layer MNIST image is seen by the computer as a 28 × 28 matrix, with pixel values ranging from 0 to 255 (0 for black, 255 for white, and the range in between for grayscale). Dr. Mohammed SALEM 19 Input layer Since MLPs only take as input 1D vectors with dimensions (1, n), they cannot take a raw 2D image matrix with dimensions (x, y). To fit the image in the input layer, we first need to transform our image into one large vector with the dimensions (1, n) that contains all the pixel values in the image. This process is called image flattening. In this example, the total number (n) of pixels in this image is 28 × 28 = 784. Then, in order to feed this image to our network, we need to flatten the (28 × 28) matrix into one long vector with dimensions (1, 784). The input vector looks like this: x = [row1, row2, row3,..., row28] The input layer in this example will have a total of 784 nodes: x1, x2, …, x784. Dr. Mohammed SALEM 20 Input layer from keras.models import Sequential from keras.layers import Flatten #Imports a layer called Flatten to convert the image matrix into a vector # Defines the model model = Sequential() model.add( Flatten(input_shape = (28,28) )) # Adds the Flatten layer, also known as the input layer The Flatten layer in Keras takes the 2D image matrix input and converts it into a 1D vector. Note that the Flatten layer must be supplied a parameter value of the shape of the input image. Now the image is ready to be fed to the neural network. Dr. Mohammed SALEM 21 Hidden layers As a DL engineer, you will often have a lot of different choices when you are building your network. Choose the number of layers (lets say 2 layers). Choose the size of each layer (lets say 512, 32). Choose the activation function of each layer (lets say ReLU and ReLU). There is no single best answer that fits all problems, in most cases, the ReLU function performs best in the hidden layers. Dr. Mohammed SALEM 22 Hidden layers Add two fully connected (also known as dense) layers, using Keras: Import the Dense layer from keras from keras.layers import Dense model.add(Dense(512, activation = 'relu')) model.add(Dense(32, activation = 'relu')) Dr. Mohammed SALEM 23 Output layer In classification problems, the number of nodes in the output layer should be equal to the number of classes that you are trying to detect. In this problem, we are classifying 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). Then we need to add one last Dense layer that contains 10 nodes: model.add(Dense(10, activation = ‘softmax’)) For most classification problems where classes are mutually exclusive, softmax is generally a good choice in the output layer. The softmax function gives us the probability that the input image depicts one of the n classes. Dr. Mohammed SALEM 24 Putting it all together When we put all these layers together, we get a neural network like the figure below: Dr. Mohammed SALEM 25 Putting it all together from keras.models import Sequential from keras.layers import Flatten, Dense model = Sequential() #Defines the neural network architecture model.add( Flatten(input_shape = (28,28) )) model.add(Dense(512, activation = 'relu')) model.add(Dense(32, activation = 'relu')) model.add(Dense(10, activation = 'softmax')) model.summary() Dr. Mohammed SALEM 26 Putting it all together When you run this code, you will see the model summary printed as shown in the figure. You can see that the output of the flatten layer is a vector with 784 nodes, since we have 784 pixels in each 28 × 28 images. As designed, the hidden layers produce 512 and 32 nodes; and finally, the output layer (dense_3) produces a layer with 10 nodes. Dr. Mohammed SALEM 27 Learnable (Trainable) Parameters The Param # field represents the number of parameters (weights) produced at each layer. These are the weights that will be adjusted and learned (trained) during the training process. They are calculated as follows: Params after the flatten layer = 0, because this layer only flattens the image to a vector for feeding into the input layer. The weights haven’t been added yet. Params after layer 1 = (784 nodes in input layer) × (512 in hidden layer 1) + (512 (number of nodes) connections to biases) = 401,920. Params after layer 2 = (512 nodes in hidden layer 1) × (32 in hidden layer 2) + (32 connections to biases) = 16,416. Params after layer 3= (32 nodes in hidden layer 2) × (10 in output layer) + (10 connections to biases) = 330. Total params in the network = 401,920 + 16,416 + 330 = 418,666. Dr. Mohammed SALEM 28 Learnable (Trainable) Parameters In this network, we have a total of 418,666 parameters (weights and biases) that the network needs to learn and whose values it needs to tune to optimize the error function. This is a huge number for such a small network. You can see how this number would grow out of control if we added more nodes and layers or used bigger images. This is one of the two major drawbacks of MLPs. Dr. Mohammed SALEM 29

w05deepneuralnetworksdnn_076569fe8eab29c05e77b0e6181a603c_.pdf

Document Details

Tags

Related

Full Transcript