1.2(part 1).pdf
Document Details
Uploaded by EnthralledEarthArt1012
K.J. Somaiya College of Arts, Commerce and Science, Kopargaon
Full Transcript
Module 1- Introduction to Soft Computing and Neural Network 1 Content 1.2 Biological neurons and its working, ANN – Terminologies, Basic Models (Biological neuron, Aritificial neuron, comprison, ANN, History, NN models(architecture, learn...
Module 1- Introduction to Soft Computing and Neural Network 1 Content 1.2 Biological neurons and its working, ANN – Terminologies, Basic Models (Biological neuron, Aritificial neuron, comprison, ANN, History, NN models(architecture, learning algorithm, activation function), Basic terminolgies 2 Neural Network Biological nervous system is the most important part of many living things, in particular, human beings. There is a part called brain at the center of human nervous system. In fact, any biological nervous system consists of a large number of interconnected processing units called neurons. Each neuron is approximately 10µm long and they can operate in parallel. Typically, a human brain consists of approximately 1011 neurons communicating with each other with the help of electrical impulses. 3 Neuron:Basic unit of nervous system Different parts of a biological neuron 4 Neuron and its working Dendrite (Input): receives signals from other neurons Soma (cell body – processing unit): sums up all signals. It also consists of threshold unit. 5 Neuron and its working Synapse (weighted connections): The point of interconnection of one neuron with other neurons. The amount of signal transmitted depends upon the strength (synaptic weight) of the connection. Connection can be Inhibitory (decreasing strength) or Excitatory (increasing strength) Axon (output): when the sum reaches to the threshold value, neuron fires and generates an output signal. 6 Biological Neural Network Has three main parts ○ Soma or cell body-where cell nucleus is located ○ Dendrites-where the nerve is connected to the cell body ○ Axon-which carries the impulses of the neuron Electric impulse is passed between synapse and dendrites. Synapse- Axon split into strands and strands terminates into small bulb like organs called as synapse. It is a chemical process which results in increase /decrease in the electric potential inside the body of the receiving cell. If the electric potential reaches a thresh hold value, receiving cell fires & pulse / action potential of fixed strength and duration is send through the axon to synaptic junction of the cell. After that, cell has to wait for a period called refractory period. 7 Artificial Neural Network ANN is an information processing system that has certain performance characteristics in common with biological nets. Several key features of the processing elements of ANN are suggested by the properties of biological neurons: ○ The processing element receives many signals. ○ Signals may be modified by a weight at the receiving synapse. ○ The processing element sums the weighted inputs. ○ Under appropriate circumstances (sufficient input), the neuron transmits a single output. ○ The output from a particular neuron may go to many other neurons. Artificial Neurons A physical neuron From experience: examples / training data Strength of connection between the neurons is stored as a weight- value for the specific connection. Learning the solution to a problem = changing the connection weights An artificial neuron 9 Artificial Neurons Four basic components of a human biological The components of a basic artificial neuron neuron 10 Terminology Relation Between Biological And Artificial Neuron Biological Neuron Artificial Neuron Cell Neuron Dendrites Weights or interconnections Soma Net input Axon Output 11 Model Of A Neuron Wa X1 Wb Y X2 f() Wc X3 Input units Connection Summing computation weights function (dendrite) (synapse) (axon) (soma) 12 ANN A neural net consists of a large number of simple processing elements called neurons, units, cells or nodes. Each neuron is connected to other neurons by means of directed communication links, each with associated weight. The weight represent information being used by the net to solve a problem. Each neuron has an internal state, called its activation or activity level, which is a function of the inputs it has received. Typically, a neuron sends its activation as a signal to several other neurons. 13 Brain Vs Computer – Comparison bwn BN and AN Term Brain Computer Speed Execution time is few milliseconds Execution time is few nano seconds Processing Perform massive parallel operations Perform several parallel operations simultaneously simultaneously. It is faster than the biological neuron Size and Number of Neuron is 1011 and number of It depends on the chosen application and complexity interconnections is 1015. network designer. So complexity of brain is higher than computer Storage i) Information is stored in interconnections i) Stored in continuous memory capacity or in synapse strength. location. ii) New information is stored without ii) Overloading may destroy older destroying old one. locations. 14 iii) Sometimes fails to recollect information iii) Can be easily retrieved Contd… Term Brain Computer Tolerance i) Fault tolerant i) No fault tolerance ii) Store and retrieve ii) Information corrupted if the information even network connections interconnections fails disconnected. iii) Accept redundancies iii) No redundancies Control mechanism Depends on active chemicals CPU and neuron connections are Control mechanism is very strong or weak simple 15 Evolution of neural networks Year Neural network Designer Description 1943 McCulloch and Pitts neuron McCulloch and Pitts Arrangement of neurons is combination of logic gate. Unique feature is thresh hold 1949 Hebb network Hebb If two neurons are active, then their connection strengths should be increased. 1958,1959,1962,1988,1960 Perceptron Frank Rosenblatt, Block, Weights are adjusted to Adaline Minsky and Papert Widrow reduce the difference and Hoff between the net input to the output unit and the desired output 16 Contd… Year Neural network Designer Description 1972 Kohonen self- Kohonen Inputs are clustered to organizing feature map obtain a fired output neuron. 1982, Hopfield network John Hopfield and Tank Based on fixed weights. 1984, Can act as associative 1985, memory nets 1986, 1987 1986 Back propagation Rumelhart, Hinton and i) Multilayered network Williams ii) Error propagated backward from output to the hidden units 17 Contd.. 1988 Counter propagation Grossberg Similar to kohonen network network 1987-1990 Adaptive resonance Carpenter and Designed for binary Theory(ART) Grossberg and analog inputs. 1988 Radial basis function Broomhead and Lowe Resemble back network propagation network , but activation function used is Gaussian function 1988 Neo cognitron Fukushima For character recogniton. 18 Basic models of ANN Models are based on three entities ○ The model’s synaptic interconnections. ○ The training or learning rules adopted for updating and adjusting the connection weights. ○ Their activation functions The arrangement of neurons to form layers and the connection pattern formed within and between layers is called the network architecture. 19 Five types of ANN 1. Single layer feed forward network 2. Multilayer feed-forward network 3. Single node with its own feedback 4. Single-layer recurrent network 5. Multilayer recurrent network 20 Single layer Feed- Forward Network Layer is formed by taking processing elements and combining it with other processing elements. Input and output are linked with each other Inputs are connected to the processing nodes with various weights, resulting in series of outputs one per node. 21 Multilayer feed-forward network Formed by the interconnection of several layers. Input layer receives input and buffers input signal. Output layer generates output. Layer between input and output is called hidden layer. Hidden layer is internal to the network. Zero to several hidden layers in a network. More the hidden layer, more is the complexity of network, but efficient output is produced. 22 Feed back network If no neuron in the output layer is an input to a node in the same layer / proceeding layer – feed forward network. If outputs are directed back as input to the processing elements in the Recurrent networks are networks with feedback networks with closed loop. same layer/proceeding layer – Fig 2.8 (A) –simple recurrent neural network feedback network. having a single neuron with feedback to itself. If the output are directed back to the Fig 2.9 – single layer network with feedback input of the same layer then it is from output can be directed to processing lateral feedback. element itself or to other processing element/both. 23 Single layer recurrent layer Processing element output can be directed to processing element itself or to other processing element in the same layer. 24 Multilayer recurrent network Processing element output can be directed back to the nodes in the preceding layer, forming a multilayer recurrent network. 25 Learning Two broad kinds of learning in ANNs is : i) parameter learning – updates connecting weights in a neural net. ii) Structure learning – focus on change in the network. Apart from these, learning in ANN is classified into three categories as i) supervised learning ii) unsupervised learning Iii) reinforcement learning 26 Supervised learning In ANN, each input vector requires a corresponding target vector, which represents the desired output. The input vector along with target vector is called training pair. The input vector results in output vector. The actual output vector is compared with desired output vector. If there is a difference means an error signal is generated by the network. It is used for adjustment of weights until actual output matches desired output. 27 Unsupervised learning Learning is performed without the help of a teacher. Example: tadpole – learn to swim by itself. In ANN, during training process, network receives input patterns and organize it to form clusters. From the Fig. it is observed that no feedback is applied from environment to inform what output should be or whether they are correct. The network itself discover patterns, regularities, features/ categories from the input data and relations for the input data over the output. Exact clusters are formed by discovering similarities & dissimilarities so called as self – organizing. 28 Reinforcement learning Similar to supervised learning. For example, the network might be told that its actual output is only "50% correct" or so. Thus, here only critic information is available, nor the exact information. Learning based on critic information is called reinforcement learning & the feedback sent is The external reinforcement signals called reinforcement signal. are processed in the critic signal The network receives some feedback from generator, and the obtained critic the environment. signals are sent to the ANN for Feedback is only evaluative. adjustment of weights properly to get critic feedback in future. 29 Simple Model of Artificial Neuron Summation Thresholding Weights unit unit x1 w1 Σ ƒ x2 w2 Inputs Output... Summation of Thresholding wn weighted inputs output xn 30 Activation functions To make work more efficient and for exact output, some force or activation is given. Activation function is applied over the net input to calculate the output of an ANN. Information processing of processing element has two major parts: input and output. An integration function (f) is associated with input of processing element. 31 Simple Model of Artificial Neuron Let 𝐼 be the total input received by the soma of the artificial neuron 𝐼 = 𝑤1 𝑥1 + 𝑤2 𝑥2 +... +𝑤𝑛 𝑥𝑛 𝑛 𝐼= 𝑤𝑖 𝑥𝑖 𝑖=1 To generate the output 𝑦, the sum 𝐼 is passed on to a non-linear filter 𝑓 called the Activation function or Transfer function or Squash Function 𝑦=𝑓 𝐼 32 Linear Activation Functions 33 Activation functions-Identity function 1. Identity function(Linear Activation fucntion): ○ it is a linear function which is defined as f(x) =x for all x ○ The output is same as the input. 34 Linear Activation Function(Identity function) Range: -infinity to +infinity Uses: Used only in output layer Limitation: problems with linear activation function ○ Derivative of function is constant and has no relation with the input, so not suitable for backpropogation ○ All layers in the network will collapse into one layer. No matter the number of layers, output layer will be linear function of first layer ○ Not able to learn complex patterns from the data. 35 Activation Functions: Heaviside function/Step function Very commonly used activation function: Thresholding function The sum is compared with a threshold value 𝜃. If 𝐼 > 𝜃, then the output is 1 else it is 0 Output 𝑛 𝑦=𝑓 𝑤𝑖 𝑥𝑖 − 𝜃 1 𝑖=1 𝜙(𝐼) where, 𝜙 is the step function known as Heaviside function and is such that Input 0 𝜃 I 1, 𝐼 > 0 f 𝐼 = 0, 𝐼 ≤ 0 Threshold 36 Binary Step function (Unipolar Binary) Range: 0 or 1 Uses: Used for binary classification (yes or no) problem Limitation: problems with linear activation function ○ Cannot be used for multi-class classification problem. ○ Gradient of this function is zero, which causes problem in back propagation process. ○ Not able to learn complex patterns from the data. 37 Activation Functions: Signum function (Biplolar Step function Also known as Quantizer function +1, 𝐼 > 0 f 𝐼 = −1, 𝐼 ≤ 0 Output 𝜙(𝐼) +1 Input 0 I -1 𝜃 Threshold 38 Bipolar binary Function Range: -1 or 1 Uses: Used for binary classification (yes or no) problem Limitation: 2 problems with linear activation function ○ Cannot be used for multi-class classification problem. ○ Gradient of this function is zero, which causes problem in back propagation process. ○ Not able to learn complex patterns from the data. 39 Non-Linear Activaiton Functions 40 Activation Functions: Sigmoidal function Varies gradually between the asymptotic values 0 and 1 (logistic function) 1 f 𝐼 = 1+𝑒 −𝛼𝐼 where, 𝛼 is the slope parameter The function is differentiable Prone to vanishing gradient problem This Photo by Unknown Author is licensed under CC BY-SA When gradient reaches 0, the network do not learn 41 Activation Functions: Sigmoidal function Range: 0 to 1 Function takes real value as input and output values in range of 0 to 1. Larger the input , output value is close to 1 whereas smaller the input, the output is close to 0 Uses: Used to predict the probability of output (probability lies bwn 0 to 1) Function is differentiable. Limitation: ○ Suffers from vanishing gradient problem (When inputs are in the saturated regions of these functions (very high or very low values), the derivatives are close to zero. During backpropagation, gradients are calculated as the product of these derivatives. If many derivatives are close to zero, the gradient diminishes exponentially as it is propagated backward through each layer). Network doesn’t learn 42 Activation Functions: Hyperbolic tangent function Also known as tanh function f 𝐼 = tanh 𝐼 Scaled version of sigmoid function Leads to vanishing gradient problem in very deep neural networks This Photo by Unknown Author is licensed under CC BY-SA 43 Activation Functions: Bipolar Sigmoidal function/tanh Range: -1 to 1 Function takes real value as input and output values in range of -1 to 1. Output of tanh function is zero centered. Uses: Usually used in hidden layers. Function is differentiable. Limitation: ○ Suffers from vanishing gradient problem like sigmoid but tanh is zero centered and gradient are not restricted to move in certain direction. Therefore tanh non-linearity and more prefered to sigmoid non-linearity 44 Other popular activation functions: ReLU (Rectified linear unit) and Softmax Softmax Function Softmax is a type of sigmoid function Used in handling Ideally used in output layer of the classification This Photo by Unknown Author is licensed under CC BY-SA 𝑒 𝑧𝑛 𝐼𝑛 = 𝑚 𝑧𝑘 Most widely used 𝑘=1 𝑒 Does not activate all neurons at the same time If input is negative the neuron will not get activated Overcomes the vanishing gradient problem Suited for hidden layers 45 Activation Functions: ReLU Range: 0 to infinity Relu function does not activate all neurons at the same time. Uses: Computational inexpensive than tanh and sigmoid ○ Learns much faster than sigmoid and tanh ○ Mostly implemented in hidden layers of NN Function is differentiable. Limitation: ○ Dying ReLU problem. All negative values are convereted to zero and this conversion rate is so fast that neither it can map nor fit into data properly which creates a problem. 46 Activation function: SoftMax Uses: Usually used when trying to handle multiple class. It is often used as activation function of NN to normalize the output of a network to a probability distribution over predicted output class. 47 Ramp function: 48 Properties of Good activation function Zero Centered: Output of activation is zero centered, than gradient do not shift in one direction Computational expense: Computationaly inexpensive as the activation function is applied after every layer in calculated million number of times in deep network. Differentiable: Neural network are trained using gradient descent process, hence the layers in the network should be differentiable or nearly differentiable. Hence this requirement for activation function. Range: Range of the values generated by activation function is imp factor for its application 49 Properties of Good activation function Robust to vanishing gradient problem: NN are trained using gradient descent process. Gradient descent will involve backpropagation steps where the weights are updated to reduce the loss of each epoch. The activation function should withstand vanishing gradient. Non-linearity- Non-linear activation function are preferred over linear activation function to solve complex problem. 50 Common models of neurons Binary perceptrons Continuous perceptrons 51 Weights Each neuron is connected to every other neuron by means of directed links Links are associated with weights Weights contain information about the input signal and is represented as a matrix Weight matrix also called connection matrix 52 Weight matrix W= T w1 w11w12 w13...w1m wT 2 w T w 21w 22 w 23...w 2 m .................. 3 . . = . ................... . . T wn w n 1 w n 2 w n 3...w nm 53 Weights contd… wij –is the weight from processing element ”i” (source node) to processing element “j” (destination node) n x i wij 1 X1 bj y inj i 0 w1j xw 0 0j x1w1 j x 2 w 2 j .... x n w nj Xi Yj n wij w0 j x i wij i 1 n Xn wnj y b j x i wij inj i 1 54 Bias Bias has an impact in calculating net input. Bias is included by adding x0 to the input vector x. 1 The net output is calculated by b X1 W1 Y X2 W2 The bias is of two types Positive bias -Increase the net input Negative bias-Decrease the net input 55 Threshold ○ It is a set value based upon which the final output is calculated. ○ Calculated net input and threshold is compared to get the network output. ○ The activation function of threshold is defined as where θ is the fixed threshold value 56 Learning rate Denoted by α. Used to control the amount of weight adjustment at each step of training Learning rate ranging from 0 to 1 determines the rate of learning in each time step 57 Other terminologies Momentum factor: ○ used for convergence when momentum factor is added to weight updation process. Vigilance parameter: ○ Denoted by ρ ○ Used to control the degree of similarity required for patterns to be assigned to the same cluster 58