Neural Networks and Deep Learning Lecture 02 - PDF

Summary

This lecture introduces fundamental concepts of neural networks and deep learning. It covers topics like neural networks, activation functions, the McCulloch-Pitts model, and various types of neural networks, such as MLPs, RNNs, CNNs, and LSTMs.

Full Transcript

MLAI504 501 NEURAL NETWORKS AND MLAI504 – 2024/2025 DEEP LEARNING Dr. Zein Al Abidin IBRAHIM [email protected] NEURAL NETWORKS: A QUICK OVERVIEW DEEP LEARNING BRAIN 3 WHAT IS NEURAL NETWORKS ? ▪...

MLAI504 501 NEURAL NETWORKS AND MLAI504 – 2024/2025 DEEP LEARNING Dr. Zein Al Abidin IBRAHIM [email protected] NEURAL NETWORKS: A QUICK OVERVIEW DEEP LEARNING BRAIN 3 WHAT IS NEURAL NETWORKS ? ▪ Idea → inspired from human Brain ▪ A brain is composed of a vary large number of processing units → called neurons. ▪ Number approximately = 1011 = 10 milliard neurons ▪ Neurons works in parallel  ▪ Computational power of brain is very high ▪ In addition → large connectivity of neurons ▪ Connections called synapses ▪ Neurons have connections to around 104 other neurons. ▪ Brain take approximately 100-200ms to perform perceptual recognition 4 WHAT IS ANN? ▪ NN → massive number of parallel distributed processing units ▪ Can store experiential knowledge (experiences) ▪ Similar to brain in two main things: 1. Knowledge → acquired from its environment through a learning phase. 2. Interneuron connection weights used to store the acquired knowledge. 5 LOOKING INSIDE HUMAN BRAIN 6 HOW TO MODEL A NEURON ▪ Another representation for the bias is to consider it as an additional neuron, the summation this time Starts from zero. ▪ 𝑣𝑘 = σ𝑚 𝑗=0 𝑤𝑘𝑗 𝑥𝑗 ▪ 𝑦𝑘 = 𝜑(𝑣𝑘 ) ▪ 𝑥0 = +1, 𝑤𝑘0 = 𝑏𝑘 Knowledge 7 SOME ACTIVATION FUNCTIONS 8 MCCULLOCH-PITTS MODEL ▪ The first computational model of a neuron was proposed by Warren MuCulloch (neuroscientist) and Walter Pitts (logician) in 1943. ▪ Inputs → Boolean ▪ Output → Boolean ▪ Activation → thresholding ▪ What we can do with → OR, AND, > ▪ No learning from data ▪ Just a theoretical model 9 MCCULLOCH-PITTS MODEL 10 ONE-INPUT NEURON: EXAMPLE ▪ 1D pattern p ▪ The associated weight = w ▪ The bias value is b ▪ The output is n ▪ The value n passes through the activation function f to generate the output a ▪ Ex: w=3, p=2, b=-1.5  ▪ a=f(3(2)-1.5)=f(4.5) 11 MULTIPLE-INPUT NEURON: EXAMPLE 12 THE NEURON IN THE SPACE ▪ Output of the neuron can be written: ▪ 𝑦 = 𝑤 𝑇 𝑥 with w=[w0, w1,….wd] & x=[1, x1, ….,xd]T ▪ Weights we need to be learned from training data such that the patterns of the training data are correctly classified ▪ Ex: when d = 1  one-dimension patterns ▪ y = w x + w0 is the equation of a line ▪ The line separate the space in two zones positive-side & negative-side ▪ Find weights in such a way →for any new input x, assign it to one of two classes depending in which side is. 13 HOW TO REPRESENT A NEURAL NETWORKS: DIRECTED GRAPHS kel layer b2lba majmo3a mn neurons , neuron be nfs layer msh mrtebten be b3d …… … 14 HOW TO REPRESENT NEURAL NETWORKS: DIRECTED GRAPHS ▪ The directed graph can be complete (known as fully- connected or dense) or incomplete. ▪ Fully-connected → each neuron in layer n is connected to each neuron in layer n+1 (see figure below). ▪ Incomplete (known as partially complete)→ a neuron in layer n is connected to some neurons in layer n+1 (used in CNN, RNN, …). ▪ referred to as an architectural graph 15 SOME NETWORK ARCHITECTURES (1/2) ▪ Here are some network architectures: ▪ Feedforward Neural Networks (Multilayer Perceptron (MLP)) ▪ Used in simple classification and regression tasks where the input data is structured. ▪ RNN → Recurrent Neural Networks ▪ Used in NLP for time-series prediction, speech recognition, and any task where order and context over sequences are important. ▪ CNN→ Convolutional Neural Networks ▪ Used in image and video recognition, object detection, segmentation, and any task involving visual data. ▪ LSTM → Long Short-Term Memory Networks ▪ Similar to RNNs but more efficient for longer sequences and temporal dependencies. Used extensively in NLP tasks, machine translation, and speech processing. ▪ Transformer Networks ▪ Transformer networks use attention mechanisms to process input data, enabling them to capture global dependencies in sequences of data. ▪ Used in NLP tasks such as translation, summarization, and language modeling 16 (e.g., BERT, GPT). Used also in image processing (Vision Transformers). SOME NETWORK ARCHITECTURES (2/2) ▪ Here are some network architectures: ▪ GAN → Generative Adversarial Networks ▪ Consists of two neural networks, a generator and a discriminator, that compete against each other in a zero-sum game. ▪ The generator creates fake data, and the discriminator tries to distinguish between real and fake data. ▪ Used for image generation, style transfer, data augmentation, and synthetic data generation for various applications. ▪ Autoencoders ▪ Consists of an encoder and a decoder. The encoder compresses the input into a latent-space representation, and the decoder reconstructs the input from this representation. ▪ Used for dimensionality reduction, image denoising, anomaly detection, and generating data (e.g., variational autoencoders). ▪ Graph Neural Networks (GNN) ▪ designed to work with graph-structured data. They use nodes, edges, and adjacency matrices to process data. ▪ Used for Social network analysis, molecular biology, recommendation systems, and any application involving network or graph-structured data. ▪ Capsule Networks (CapsNets) ▪ attempt to model hierarchical relationships in data more effectively than traditional CNNs. ▪ Used for image classification, object segmentation, and tasks requiring spatial hierarchies in the data. 17 ▪... FEEDFORWARD NETWORK 18 SINGLE-LAYER AND MULTI-LAYER lezem 23mel vectorization (5astn bel network) Weight matrix 1 Weight matrix 2 Weight matrix 3 19 (lzem ytl3 1 be2e zeros bel output) EXAMPLE – USE CASE car o 2ham she hidden layer train 0 bicycle 1 ▪ Consider a handwritten-digit recognition problem. Ex: eza binary class fe ykon 1 output spam/not ▪ Input → pixels of the image each pixel as separate feature spam ▪ Output → one of the 10 digits (10 neuron) ▪ Training set → large variety of handwritten digits that are representative of a real-world situation. ▪ Design of the network ▪ Input dimension = number of pixels in the image. ▪ Output = 10 neurons each represent one of the 10 digits 784*100* 100*100 *100*10 0-9 output 20 BUILDING PRIOR INFORMATION INTO NN DESIGN ▪ Examples of how to build prior information into the NN design. ▪ We have ad hoc procedures: ▪ Restricting the network architecture which is achieved through the use of local connections known as receptive fields. ▪ Constraining the choice of synaptic weights which is implemented through the use of weight sharing. kermel msh kel b7aje la kel weights fe local connection w nsabet weight Partially connected feedforward network. 21 EXAMPLE: LOCAL CONNECTIONS AND WEIGHTS SHARING WEIGHTS SHARING LIKE IN CNN NO NEED TO HAVE DIFFERENT WIGHTS 22 BUILDING INVARIANCES INTO NN DESIGN ▪ Problem to consider during NN design: ▪ Network trained to detect objects → what if objects appear rotated, translated, scaled, different colors than during training? ▪ → The classifier should be invariant to these transformations. ▪ How to do that? ▪ Invariant by structure ▪ Invariance by training ▪ Invariant by feature space 23 BUILDING INVARIANCES INTO NN DESIGN ▪ Invariant by structure ▪ Weights of neurons are created so that transformed versions of the same input are forced to produce the same output. ▪ Disadvantage → the number of synaptic connections becomes prohibitively large even for images of moderate size. ▪ Invariance by training ▪ Train the network to recognize the image and its rotations (different aspect) views. ▪ Data Augmentation → Generate more samples from your dataset by applying several type of transformations (rotated, scaled, translated, …). ▪ Disadvantages → Computational cost, overfitting, … 24 BUILDING INVARIANCES INTO NN DESIGN ▪ Invariant by feature space ▪ Extract features to characterize the essential information content of patterns that are invariant to transformations of the input. ▪ Advantages ▪ The number of features applied may be reduced to realistic levels. ▪ The requirements imposed to the network are relaxed. ▪ Invariance for all objects with respect to known transformation is assured. ▪ Disadvantage ▪ Not easy and should pass by feature engineering step to study the set of features. 25 LEARNING PROCESS ▪ Learning can be with a teacher or without a teacher. ▪ Learning with a teacher can be categorized into supervised and reinforced learning. ▪ Learning with a Teacher → Main steps ▪ Pattern in dataset are fed into the network ▪ The error is calculated as the difference between the predicted and real label. ▪ Weights are updated in backward manner relative to their amount. ▪ Updates is done until reaching an optimum value 26 LEARNING PROCESS ▪ Learning with a Teacher ▪ Updating the weights is done in the direction of the gradient ▪ The update is done in one of the three ways: ▪ Batch-training → one update after processing a batch of samples. ▪ Stochastic training → one update after processing one individual sample. ▪ Mini-batch training → one update after processing a small subset of data Weights updating Error function 27 LEARNING PROCESS 28

Use Quizgecko on...
Browser
Browser