Artificial Intelligence for Big Data, Neural Networks, and Deep Learning PDF

Artificial Intelligence for Big Data Neural Network & Deep Learning for Big Data Dr. Feras Al-Obeidat Introduction Previous chapter introduced a basic foundation toward building intelligent systems. Machine learning two primary groups are supervised and unsupervised algorithms Explored how the Spark programming model is a handy tool for us to implement these algorithms covered the fundamentals of regression analysis, clustering, DT and RF and supporting code in Spark ML. Explored K-means algorithm, and how to use for dimensionality reduction, representing the same information with fewer dimensions without any loss of information. Fundamentals of Artificial Neural Networks Explore the neural networks and how they have evolved with the increase in computing power with distributed computing frameworks. The neural networks take inspiration from the human brain Help us to solve complex problems that are not feasible with traditional mathematical models. Main topics: Fundamentals of artificial neural networks Perceptron and linear models Nonlinearities model Feed-forward neural networks Gradient descent, backpropagation, and overfitting Recurrent neural networks Neural Network Vs Human Brain The basic algorithms and mathematical modeling concepts covered in the last chapter are great when it comes to solving some of the structured and simpler problems. They are simpler compared to what the human brain is easily capable of doing. For instance, when a baby starts to identify objects through various senses (sight, sound, touch, and so on), it learns about those objects based on some fundamental building blocks within the human brain. A neurological study of the brains of various animals and human beings reveals that the basic building blocks of the brain are neurons. Neural Network Vs Human Brain In more complex species, such as humans, the brain contains more neurons than in less complex species. The human brain contains 100 billion interconnected neurons. The researchers found a direct correlation between the quantity and level of interconnection between the neurons and the intelligence in various species. This has led to the development of artificial neural networks (ANN), which can solve more complex problems, such as image recognition. ANN ANNs offer an alternate approach to computing and understanding the human brain. While our understanding of the exact functioning of the human brain is limited, the application of ANNs for solving complex problems has so far shown encouraging results for primarily developing a machine that learns on its own based on contextual inputs, unlike the traditional computing and algorithmic approach. Keep in mind that neural networks and algorithmic computing do not compete with each other; instead, they complement each other. A Simple ANN Similar to the biological neurons, the ANNs have input and output units. A simple ANN representation Structure of a Simple ANN An ANN consists of : one input layer, which provides the input data to the network, one output layer, which represents the finished computation of the ANN and one or more (depending on complexity) hidden layers for actual computation and logic implementation. The theory of NN has been introduced previously. However, at the time of its origin, the computational resources and datasets were limited to leverage the full potential of the ANNs. With the advent of big data technologies and massively parallel distributed computing frameworks, we can explore the power of ANNs for innovative use cases and solve some of the most challenging problems, such as image recognition and natural language processing. Perceptron and linear models Let's consider the example of a regression problem where we have two input variables and one output or dependent variable and Illustrate the use of ANN for creating a model that can predict the value of the output variable for a set of input variables. Sample training set ANN notations In this example, x1 and x2 as input variables and y as the output variable. The training data consists of five data points and the dependent variable, y. The goal is to predict the value of y when x1 = 6 and x2 = 10. Any given continuous function can be implemented exactly by a three-layer neural Component Notations of Neural Network x1 and x2 are inputs (possible to call the activation function on the input layer) There are three layers in this network: the input layer, output layer, and hidden layer. There are two neurons for this example the input layer corresponding to the input variables. Remember, two neurons are used for illustration. However, in reality we have hundreds of thousands of dimensions and hence input variables. There are three neurons in the hidden layer (layer 2): (a21, a22, a23). The neuron in the final layer produces output A31. We have six neurons here We have 6 neurons here! Component of NN a(j)i: represents activation (the value computed and output by a node) of unit i in layer j. The activation function of a node defines the output of the node for a set of inputs. The simplest and most common activation function is a binary function representing two states of a neuron output, whether the neuron is activated (firing) or not: For example, a21 is activating the first unit in the second layer. Activation is crucial for introducing non-linearity into the network, enabling it to solve more complex problems. Activation and Non Activation 1. Activation: The activation function is responsible for transforming the weighted input (which is typically a linear combination of the inputs and weights) into a non-linear output. Without activation, the neural network would behave like a linear model, regardless of its depth, limiting its capacity to solve complex tasks. Example: Sound, images, and videos. 2. Non-Activation (Linear Activation): If no activation function is used, the network becomes a linear model. No matter how many layers the network has, the composition of linear functions remains linear, and it cannot model complex patterns in the data. Component notations of the neural network W(l)ij represents the weight on a connector, l is the layer from which a signal is moving, i represents the neuron number from which we are moving, and j represents the neuron number in the next layer to which the signal is moving. Weights are used for reducing the difference between the actual and desired output of the ANN For example, W(1)12 represents the weight for the connection between two neurons w13 from layer 1 to layer 2 for the first neuron in the layer 1 and toward the second neuron in layer 2. Mathematical Representation of Perceptron Model The output of the neural network depends on: The input values, Activation functions on each of the neurons, Weights on the connections. The goal is to find appropriate weights on each of the connections to accurately predict the output value. ANN components Correlation A correlation between inputs, weights, transfer, and activation functions: ANN:Perceptron Mathematical Model : Within an ANN, we do the sum of products of input (X) to their weights (W) and apply the activation function f(x) to get the output of a layer that is passed as input to another layer. If there is no activation function, the correlation between the input and output values will be a linear function. O Model has a high bias when the model doesn’t do well during training, and during validation. O The bias inside the data used to train models. It determines the model’s behavior. We cannot expect any fair treatment from algorithms that were built from biased data. Mathematical model Since we have multiple values of x1 and x2 in our example, the computation is best done with a matrix multiplication so that all the transfer and activation functions can be parallelly computed. The mathematical model are greatly tuned to utilize the power of distributed parallel computation frameworks in order to perform the matrix multiplications. W(1) Z(3) = a(2) W(2) 13 Mathematical Model with Matrix Let's now consider our example and represent it with matrix notations. The input dataset can be represented as x. In our example, we have 5 inputs The weights can be represented as W1(2x3). The resultant matrix, (Z2), is a 5 by 3 matrix which is the activity of the second (hidden) layer. Each row corresponds to a set of input values and each column represents the transfer function or activity on each of the nodes in the hidden layer. Example of mathematical Model Example of mathematical Model O Activation Function Matrix notation allows us to perform complex computation in a single step: Z(2) = XW(1) With this formula, we are summing up the products of input and the corresponding weights for each set of input. The output of a layer is obtained by applying an activation function over all the individual values for a node. The main purpose of an activation function is to convert the input signal of a node to an output signal. As a parallel to the biological neuron, the output after application of an activation function indicates whether a neuron is fired or not. Importance of Activation Functions How to separate the objects?! Importance of Activation Functions linear functions are easy to work with but their usage is very limited. They cannot be used to learn complex data such as image, audio, video etc. Importance of Activation Functions linear functions cannot be used to learn complex data such as image, audio, video etc. The Q how to separate them! Importance of Activation Functions The main purpose of the activation function is to introduce non-linearity in the network. If we don’t use activation function, the purpose of the neural networks will not be served. Most real-life problems are complex and non-linear so we need activation function for the network to solve them. Activation Functions Without an activation function, the output will be a linear function of the input values. A linear function is a straight-line equation or a polynomial equation of the first degree. A linear equation represents the simplest form of a mathematical model and is not representative of real-world scenarios. It cannot map the correlations within complex datasets. Without an activation function, a neural network will have very limited capability to learn and model unstructured datasets such as images and videos. Linear versus Nonlinear Functions Activation Functions Types Using a nonlinear activation function, we can generate a nonlinear mapping between the input and output variables and model complex real-world scenarios. Three primary activation functions are used at each neuron in the neural network: Sigmoid function Tanh function Rectified linear unit- ReLu Sigmoid Function The sigmoid function is one of the most popular nonlinear functions The output values are bound between 0 and 1. So the output of each neuron will be normalized. It provides a clear prediction 0.5 is the threshold value for prediction. 0 or 1 1. Sigmoid function The function curve takes an S shape and hence the name sigmoid. For the values of x between - 2 and +2, the Y output values are very steep. This makes it an ideal choice as an activation function for binary classification problems 2. Tanh (Hyperbolic tangent) It is bound to the range -1 to 1. function is zero centered unlike sigmoid. 3. ReLu (Rectified Linear Unit) It cheap to compute and accelerates the convergence of the gradient descent compared to Leaky Relu the other activation functions. For the negative inputs the result is 0. The neuron does not get activated. It is a much easier and efficient activation function than the other two. Nonlinearities Model Consider using activity of the hidden layer in the previous example. Let's apply the sigmoid activation function to the activity for each of the nodes in the hidden layer. This gives our second formula in the perceptron model: Z(2) = XW(1) a(2) = S(z(2)) Nonlinearities Model Once we apply the activation function, S, the resultant matrix will be the same size as z(2). That is, 5 x 3. Next step is to multiply the activities of the hidden layer by the weights on the output layer Note that we have three weights, one for each link from the nodes in the hidden layer to the output layer. Let's call these weights W(2). Then, the activity for the output layer can be expressed with our third function as: Z(3) = a(2) W(2) Nonlinearities model with activation function As we know, a(2) is a 5 x 3 matrix and W(2) is a 3 x 1 matrix. Each row representing an activity value corresponds to each individual entry in the training dataset. Finally, we apply the sigmoid activation function to Z(3) in order to get the output value estimate based on the training dataset: y∧ = s(Z(3)) The application of activation functions at the hidden and output layers ensures nonlinearity in the model. O =3 O O =2 O O = 0.9933071 Feed-Forward Neural Networks Dr. Feras Al-Obeidat Feed-Forward Neural Networks The ANN we have referred to so far is called a feed- forward neural network since the connections between the units and layers do not form a cycle and move only in one direction (from the input layer to the output layer). Usually, the output value predicted by the model is not accurate. only forward propagated once. We need our neural network to optimize the weights for better modelling. This is achieved with a technique called backpropagation, which we will discuss in the next section. Feed-Forward Neural Networks A feedforward neural network is an artificial neural network where the nodes never form a cycle. This kind of neural network has an input layer, hidden layers, and an output layer. It is the first and simplest type of artificial neural network. Cost Function The difference between the actual and predicted value for an individual training sample contributes to the overall error for the prediction function. The goodness of fit for a neural network is defined with a cost function. It measures how well a neural network performed with respect to the training dataset when it modeled the training data. Cost function value in NN is dependent on the weights on each neuron and the biases on each of the nodes. The cost function is a single value and it is representative of the overall neural network. The cost function takes the following form in a neural network: C (W, Xr, Yr) W represents weights for the neural network Xr represents the input values of a single training sample Yr represents the output corresponding Xr Cost Function Calculation Our fifth equation for the neural network represents the cost: C (W, Xr, Yr) = J = ∑ 1/2 (y - y∧)2 Since the input training data is something that we cannot control, the goal of a neural network is to derive the weights and biases so as to minimize the value of the cost function. As we minimize the cost, our model is more accurate in predicting values for the unknown data input. Initialize the weight to a random value and test a high number of arbitrary values and plot the corresponding cost ANN notations Refer to our example, we have nine individual weights in our neural network. Essentially, there is a combination of these nine weights that gets us the minimum cost for our neural network. Weight-to-cost graph We keep tunning and changing the weights until we minimize the cost of errors, or we reach the minimum cost of error. Computation Complexity It may be computationally easy and feasible to calculate the minimum cost for a number of input weights selected at random. However, as the number of weights increases (nine in our case) along with the number of input dimensions (just two in our example), it becomes computationally impossible to get to the minimum cost in a reasonable amount of time. In real-world scenarios, we are going to have hundreds or thousands of dimensions and highly complex neural networks with a large number of hidden layers and hence a large number of independent weight values. Gradient Descent Algorithm For high dimensional data, we can use a simple and widely used gradient descent algorithm in order to significantly reduce the computational requirement in training the neural network. In order to understand gradient descent, let's combine our five equations into a single equation, as follows: J = Σ 1/2 (y -s( s(X W(1)) W(2)) )2 In this case, we are interested in finding the rate of change in J with respect to W, which can be represented as a partial derivative, as follows: Positive Slope versus Negative If the derivative equation evaluates to a positive value, we are going up the hill and not in the direction of minimum cost, and if the derivative equation evaluates to a negative value, we are descending in the right Gradient Descent Pseudocode Let w be some initial value that can be chosen randomly. Compute the ∂J/∂W gradient. If ∂J/∂W < t, where t is some predefined threshold value, EXIT. We found the weight vector that gets the minimum error for the predicted output. Update W; W = W - L (∂J/∂W) [L is called the learning rate. L needs to be chosen carefully, if it is too large, the gradient will overshoot and we will miss the minimum. If it is too small, it will take too many iterations to converge]. Usually L in the range between 0.0 and 1.0. The learning rate controls how quickly the model is adapted to the problem Recurrent Neural Network (RNN) A Recurrent Neural Network (RNN) is a special type of an artificial neural network adapted to work for time series data or data that involves sequences. RNNs have the concept of ‘memory’ that helps them store the states or information of previous inputs to generate the next output of the sequence. A special case real-life problem is optimizing the ANN for training sequences of data, for example, text, speech, or any other form of audio input. In simple terms, when the output of one forward propagation is fed as input for the next iteration of training, the network topology is called a recurrent neural network (RNN). Why Recurrent Neural Networks? RNN were created because there were a few issues in the feed-forward neural network: Cannot handle sequential data Considers only the current input Cannot memorize previous inputs The solution to these issues is the RNN. An RNN can handle sequential data, accepting the current input data, and previously received inputs. RNNs can memorize previous inputs due to their internal memory. RNN Architecture RNN saves the output of a particular layer and used it to feeding this back to the input in order to predict the output of the layer. Types of RNNs - 1 One To One Here there is a single (xt,yt) pair. Traditional neural networks employ a one to one architecture. Single input -> Single output e.g, Traditional neural network Types of RNNs - 2 One To Many The audio data is also sequential data where it can be considered as a signal which has modulation with time similarly to the time series data where data points are collected in a sequence with time values. In one to many networks, a single input at xt can produce multiple outputs, e.g., (yt0,yt1,yt2). Music generation is an example area, where one to many networks are employed. Image caption: RNNs are used to caption an image by analyzing the activities present. Types of RNNs - 3 Many To One In this case many inputs from different time steps produce a single output. For example, (xt,xt+1,xt+2) can produce a single output yt. Such networks are employed in sentiment analysis or emotion detection, where the class label depends upon a sequence of words. Types of RNNs - 4 Many To Many An example is shown above, where two inputs produce three outputs. Many to many networks are applied in machine translation, e.g, English to French or vice versa translation systems. https://www.youtube.com/ watch?v=AsNTP8Kwu80 Example example example Example Frequently asked questions Q: Are ANNs exactly the same as the biological neurons in terms of information storage and processing? A: Although it cannot be stated with 100% certainty that the ANNs are an exact replica in terms of memory and processing logic, there is evidence in medical science that the basic building block of a brain is a neuron, and neurons are interconnected. When the external stimulus is obtained or when is is generated by the involuntary processes, the neurons react by communicating with each other by the transmission of neuro signals. Although the functioning of the brain is very complex and far from fully understood, the theory of ANNs has been evolving and we are seeing a great deal of success in modeling some of the very complex problems that were not possible with traditional programming models. In order to make modern machines that possess the cognitive abilities of the human brain, there needs to be more research and a much better understanding of the biological neural networks. Q: What are the basic building blocks of an ANN? A: The ANN consists of various layers. The layer that receives input from the environment (independent variables) is consumed by the input layer. There is a final layer that emits output of the model based on the generalization of the training data. This layer is called the output layer. In between the input and output layers there can be one or many layers that process the signals. These layers are called hidden layers. The nodes within each of the layers are connected by synopse or connectors. Each of the connectors has an optimum weight so as to reduce the value of the cost function that represents the accuracy of the neural network. Q and A Q: What is the need for nonlinearity within an ANN? A: The neural networks are mathematical models where the input are multiplied by the weights and the sum of all the node connection products constitutes the value on a node. However, if we do not include nonlinearity with an activation function, multi-layer neural networks will not exist. In that case, the model can be represented with a single hidden layer. We will be able to model very simple problems with linear modeling. In order to model more complex, real-world problems, we need multiple layers and hence nonlinearity within the activation functions. Q: Which activation functions are most commonly used in building the ANNs? A: Commonly used activation functions within the ANNs are: Sigmoid function: The output value is between 0 and 1. This function takes a geometrical shape of S and hence the name sigmoid. Tanh function: The hyperbolic tangent function (tanh) is a slight variation of the sigmoid function that is 0-centered. Rectified Linear Unit (ReLu): This is the simplest, computationally optimized, and hence most popularly used activation function for the ANNs. The output value is 0 for all negative input and the same as the value of input for positive input. Q and A Q: What is a feed-forward ANN and how are the initial values of weights selected? A: A single pass through the network from the input layer to the output layer via the hidden layers is called a forward pass. During this, the nodes are activated as sum products of the node values and the connection weights. The initial values of the weights are selected randomly and as a result, the first pass output may deviate from the expected output based on the training data. This delta is called the network cost and is represented with a cost function. The intuition and goal for the ANN is to ultimately reduce the cost to a minimum. This is achieved with multiple forward and backward passes through the network. One round trip is called an epoch. Q: What is the meaning of model overfitting? A: Model overfitting occurs when the model is learning the input and cannot generalize on the new input data. Once this happens, the model is virtually not usable for real-world problems. The overfitting can be identified by the variation in model accuracy between the runs on training and validation datasets. Q: What are RNNs and where are they used? A: RNNs are the recurrent neural networks that utilize the output of one forward pass through the network as an input for the next iteration. RNNs are used when the input are not independent of each other. As an example, a language translation model needs to predict the next possible word based on the previous sequence of words. ANNs have great significance in the field of natural language processing and audio/video processing systems. References https://www.guru99.com/backpropogation-neural-netwo rk.html https://medium.com/swlh/importance-of-activation-funct ions-in-neural-networks-bc39964311dd https://machinelearningmastery.com/an-introduction-to- recurrent-neural-networks-and-the-math-that-powers-th em/

Artificial Intelligence for Big Data, Neural Networks, and Deep Learning PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue