Lecture 6: Machine Learning for Remote Sensing Image Processing - Part I PDF

LSGI536 Remote Sensing Image Processing Lecture 6 Machine Learning for Remote Sensing Image Processing: Part I Dr. Zhiwei Li Research Assistant Professor Department of Land Surveying and Geo-Informatics The Hong Kong Polytechnic University Email: [email protected] Outlines ▪ Brief history of machine learning ▪ Remote sensing image analysis traditional methods ▪ Traditional machine learning models ▪ Deep neural networks (DNN) I. Brief history of machine learning Brief history of machine learning What is machine learning? Machine learning is a branch of AI that includes algorithms that parse data, learn from that data, and then apply what we’ve learned to make informed decisions. Artificial Intelligence Data Algorithms with the ability to learn without being explicitly programmed Machine Learning Programs with the ability to learn and reason like humans Learn Predict Brief history of machine learning Artificial Intelligence Image source: NVIDIA’s blog Brief history of machine learning Machine Learning history ▪ Birth [1952 – 1956] ▪ First Winter of AI [1974 – 1980] ▪ The Explosion of the 1980s [1980 – 1987] ▪ Second Winter of AI [1987 – 1993] ▪ Explosion and Commercial Adoption [2006 – Present] 1950 - The Turing Test 1997 - IBM Deep Blue with operator (right) 2016 - AlphaGo vs KE JIE Brief history of machine learning Machine Learning at present Now machine learning has made a great advancement in its research, and it is present everywhere around us, such as translator, voice assistant, self-driving cars, recommendation system and many more. Language Translator Voice assistant GenAI applications Self-driving system Recommendation system Brief history of machine learning Machine Learning at present Any-to-Any Multimodal Large Language Model Wu, S., Fei, H., Qu, L., Ji, W., & Chua, T. S. (2023). NExT-GPT: Any-to-Any Multimodal Large Language Model. arXiv preprint arXiv:2309.05519. Brief history of machine learning Machine Learning at present EarthGPT: Multi-modal Large Language Model for Remote Sensing Image Comprehension Zhang, W., Cai, M., Zhang, T., Zhuang, Y., & Mao, X. (2024). EarthGPT: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain. arXiv preprint arXiv:2401.16822. II. Remote sensing image analysis traditional methods Remote sensing image analysis traditional methods Traditional Remote Sensing Image Analysis Methods Pixel-based image analysis (Spectral domain) Object-based image analysis Time series image analysis (Spatial domain) Traditional remote sensing image analysis methods (Temporal domain) Remote sensing image analysis traditional methods Pixel-based image analysis Indices may be used to enhance a particular single feature on image. Different indices can be used to enhance different earth surface features on image, like enhancement of vegetation, water , snow, building, etc. 1) Band ratio 2) Normalized Difference Vegetation Index (NDVI) 3) Normalized Difference Water Index (NDWI) 4) Normalized Difference Snow Index (NDSI) 5) Normalized Difference Building Index (NDBI) … Satellite image NDVI Remote sensing image analysis traditional methods Object-based image analysis (OBIA) Traditional pixel-based image classification assigns a land cover class per pixel. All pixels are the same size, same shape, and don’t have any concept of their neighbors. OBIA groups pixels together into image "objects" based on the spectral similarity of pixels through the process of segmentation. With the segmented objects, their spectral, geometrical, and spatial properties can be utilized to classify them into distinct land cover classes. Remote sensing image analysis traditional methods Object-based image analysis (OBIA) (a) Aerial photograph (b) fine scale segmentation (c) coarse scale segmentation (d) object based classification of woody cover (Source: Levick and Rogers, 2008) Idealized GEOBIA workflow that illustrates the iterative nature of the object building and classification process which incorporates GIScience concepts. (Source:Blaschke, T. et al., 2014. ISPRS J.) Remote sensing image analysis traditional methods Time series image analysis Pixel-based image analysis only involved spectral information, and does not utilize the spatial and temporal information that remote sensing imagery contains. The objective of time series image analysis is to obtain temporally and spatially continuous data with high quality and use it to support time series land change monitoring. The filtering operation is to eliminate noises as well as reconstruct and predict the missing values. Remote sensing image analysis traditional methods Time series image analysis Savitzky-Golay filter is a simplified least- squares-fit convolution for smoothing and computing derivatives of a set of consecutive values (a spectrum). Savitzky-Golay method for smooth filtering can improve the smoothness of the spectrum and reduce noise interference Savitzky–Golay Filter (Savitzky, A. & Golay, M. J., 1964; Chen J., 2004) Remote sensing image analysis traditional methods Time series image analysis When a periodic time dependent data set, such as NDVI, is decomposed into sum of sinusoidal functions, the procedure is called Harmonic Analysis of Time series (HANTS). When it is used to reconstruct the time series of remote sensed data, the basic formula is written as Harmonic Analysis of Time series (HANTS) (Verhoef, W., 1996; Zhou J. et al, 2015) Remote sensing image analysis traditional methods Time series image analysis Time series model: Ordinary Least Squares (OLS) Surface reflectance time series modeling (Zhu et al., 2014, RSE) Remote sensing image analysis traditional methods Time series image analysis Application of time series model for land cover change detection: Surface reflectance time series modeling (Zhu et al., 2014, RSE) III. Traditional machine learning models Traditional machine learning models Regression is a process of finding the correlations between dependent and independent variables : continuous output variable(y) Classification is a process of finding a function which helps in dividing the dataset into classes based on different parameters : discrete output(y) Traditional machine learning models Traditional machine learning models Traditional machine learning models Random forests (RF) or random decision forests is an ensemble learning method for classification, regression, and other tasks that operates by constructing a multitude of decision trees at training time. Given a training set A forest averages the predictions of a set of m trees with individual weight functions Wj, its predictions are is the non-negative weight of the i'th training point relative to the new point x' in the same tree. For any particular x', the weights for points xi must sum to one. Tin Kam Ho, 1995. Random decision forests, in: Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE Comput. Soc. Press, pp. 278–282. https://doi.org/10.1109/ICDAR.1995.598994 Traditional machine learning models Support-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Maximum-margin hyperplane SVM constructs a hyperplane or set of hyperplanes in a high- or infinite dimensional space, which can be used for classification, regression. SVM trained with samples from two classes. Samples on the margin are called the support vectors. Cortes, C., Vapnik, V., 1995. Support-vector networks. Machine Learning 20, 273–297. https://doi.org/10.1007/BF00994018 IV. Deep neural networks (DNN) Deep neural networks (DNN) Artificial neural network Neuron and myelinated axon, with signal flow from inputs at dendrites to outputs at axon terminals (Prof. Loc Vu-Quoc) Deep neural networks (DNN) Artificial neural network each circular node represents an artificial neuron an arrow represents a connection from the output of one artificial neuron to the input of another. An artificial neural network is an interconnected group of nodes, inspired by a simplification of neurons in a brain. Deep neural networks (DNN) Artificial neural network Input vector: x =  x1 , x2 , , xN  x1 1 2 x2... xi... xN i  y f Output of a hidden layer: v= N x i =1 b N 1 i i +b Activation:  N  y = f ( v ) = f   xii + b   i =1  Deep neural networks (DNN) Artificial neural network An example of forward-pass during a neuron process using the following activation function: Deep neural networks (DNN) Comparison of Machine Learning and Deep Learning Deep neural networks (DNN) ❖ Features for Classification/Regression General features (e.g. person) Image features (e.g. high-resolution images) ⚫ Gender ⚫ Appearance ⚫ Spectral ⚫ Temporal ⚫ Age ⚫ Personality ⚫ Color ⚫ Semantic ⚫ Height ⚫ Hair style ⚫ Shape ⚫ … ⚫ Weight ⚫ Skin color… ⚫ Texture Deep neural networks (DNN) ❖ Texture feature extraction by traditional methods and deep learning r=1 r=2 1-dilated r=4 2-dilated 4-dilated versus Local binary patterns (LBP) descriptor (Ojala, T., 1994 & 2002) Dilated convolution kernel (Fisher Y. et al., 2016) Deep neural networks (DNN) Computer Vision tasks Computer Vision tasks applied to GIS and Remote Sensing Deep neural networks (DNN) Datasets for deep learning ❖ Deep learning is a kind of data-driven approach and rely on datasets for end-to-end learning. Large-scale datasets (Left: ImageNet, Right: BigEarthNet) Deep neural networks (DNN) 22 layers What does “deep” means? 19 layers Deep = Many hidden layers 6.7% 8 layers 7.3% 16.4% AlexNet (2012) VGG (2014) GoogleNet (2014) Deep neural networks (DNN) What does “deep” means? 152 layers ResNet achieves a top-5 error rate of 3.57% which beats human-level performance on ImageNet dataset. 8 layers 16.4% AlexNet (2012) 19 layers 7.3% VGG (2014) 22 layers 3.57% ResNet— Makes the net very deep 6.7% GoogleNet (2014) Residual Net (2015) Deep neural networks (DNN) Image processing - Convolution kernel Convolution kernel can be used for blurring, sharpening, embossing, edge detection, and more. Original image Filtered image Filter kernel Deep neural networks (DNN) Example: A convolutional network for image classification For high-level vision tasks: learning high-dimensional features for pattern recognition Conv Low-level Features Conv Pooling Mid-level Features Conv Pooling Classified: Probability Vector High-level Features Softmax Regression Full Connect Feature vector Deep neural networks (DNN) Example: Handwritten Digit Recognition MNIST: a dataset of handwritten digits 55,000 training samples 5,000 validation samples 10,000 test samples Each sample has a labeled gray value 28*28 pixel handwritten digit image Sample images from MNIST test dataset Deep neural networks (DNN) Convolutional Neural Network (CNN) Data flow in LeNet. The input is a handwritten digit, the output a probability over 10 possible outcomes. Feature map Feature map Feature map Feature map Output Input Softmax Full connection Convolution Full connection Normalization and pooling Convolution Normalization and pooling Deep neural networks (DNN) From CNN to Fully Convolutional Network (FCN) Fully convolutional networks (FCN) can efficiently learn to make dense predictions for per-pixel tasks like semantic segmentation. Transforming fully connected layers into convolution layers enables a classification net to output a spatial map (Shelhamer et al., 2017, TPAMI) Deep neural networks (DNN) Layers in DNN model - Convolution Input Image (2D) 1st convolution operation Kernel (2D) 9th convolution operation Deep neural networks (DNN) Layers in DNN model - Convolution Input (1D) Kernel (1D) 1D Convolution Output (1D) Deep neural networks (DNN) Layers in DNN model - Convolution Output (2D) Kernel (2D) Input (2D) 2D Convolution Deep neural networks (DNN) Layers in DNN model - Convolution 2D Convolution with different parameter configuration Convolution Strided Convolution Deconvolution Dilated Convolution (Padding, no strides) (Padding, strides) (Padding, strides, transposed) (No padding, no stride, dilation) Down-sample Up-sample Deep neural networks (DNN) Layers in DNN model - Convolution Input (3D) Kernel (3D) Output (3D) 3D Convolution Deep neural networks (DNN) Layers in DNN model - Convolution 2D Convolution: the single-channel version Input image Convolution kernel Output feature maps Deep neural networks (DNN) Layers in DNN model - Convolution 2D Convolution: the multi-channel version Xl-1,1 Xl,1 Wl,1,1 Xl,2 Activate Function:X=R(Y) ⊙ Yl,1 Xl-1,2 Yl,1 Wl,1,2 ⊙ ∑ Wl,1,1 ⊙Xl-1,1 Wl,1,2 ⊙Xl-1,2 Wl,1,3 ⊙Xl-1,3 Yl,2 Wl,2,1 ⊙Xl-1,1 Wl,2,2 ⊙Xl-1,2 Wl,2,3 ⊙Xl-1,3 Xl-1,3 Xl-1,1 Xl-1,2 Xl-1,3 Wl,1,3 ⊙ Number of convolutions in each group → channels of input feature maps Group number of convolutions → channels of output feature maps Computing forward-pass during a convolutional network Deep neural networks (DNN) Layers in DNN model - Batch Normalization Basically, Batch Normalization is the normalization of the output in each hidden layer. ✓ Normalization or Whitening of the inputs to each layer: Zero means, unit variances and or not decorrelated. ✓ Learning faster: Learning rate can be increased compare to non-batch-normalized version. ✓ Increase accuracy: Flexibility on mean and variance value for every dimension in every hidden layer provides better learning, hence accuracy of the network. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (Ioffe et al., 2015, ICML) Deep neural networks (DNN) Layers in DNN model - Activation function Activation function defines how the weighted sum of the input is transformed into an output from a node or nodes in a layer of the network. Sigmoid ： Tanh ： ReLU ： f (x ) = 1 1 + e− x −2 x 1 − e f (x ) = 1 + e−2 x x f (x ) =  0 if x > 0 otherwise Deep neural networks (DNN) Layers in DNN model - Pooling Pooling layers are used to reduce the dimensions of the feature maps, which summary the features present in a region of the feature map generated by a convolution layer. ◆ Two common pooling methods: Max pooling Average pooling Deep neural networks (DNN) Layers in DNN model – Skip connection and residual architecture Skip connection is used to build the residual architecture, a special common network structure that is used to improve network model performance and accelerate deep model training. Plain Networks ResNets Residual learning: a building block (He et al., 2016, CVPR) Training on CIFAR-10. Dashed lines denote training error, and bold lines denote testing error Deep neural networks (DNN) Receptive field ◼ The receptive field is defined as the size of the region in the input that produces the feature. Basically, it is a measure of association of an output feature (of any layer) to the input region (patch). ◼ It is important to note that the idea of receptive fields applies to local operations (i.e., convolution, pooling). 1-dilated The receptive field of each convolution layer with a 3 × 3 kernel (Lin Haoning et al., 2017, RS) 2-dilated 4-dilated Dilated convolution supports an exponential expansion of the receptive field (Fisher Yu et al., 2016, ICLR) Deep neural networks (DNN) Network construction and model training Deep neural networks (DNN) Network construction and model training The concepts of batch size, epoch, and number of iterations in model training Number of training samples N … … e.g. 𝐵𝑎𝑡𝑐ℎ 𝑆𝑖𝑧𝑒 = 8 Batch 1 Batch 2 Batch 3 Batch m-1 One epoch of training includes m iterations The number of batch 𝒎 ≈ 𝑵 𝐵𝑎𝑡𝑐ℎ 𝑆𝑖𝑧𝑒 Batch m Deep neural networks (DNN) Network construction and model training The concepts of batch size, epoch, and number of iterations in model training ◼ The model will usually be validated on the Validation set after each round of training, and then tested for accuracy on the Test set every several epochs of training. ◼ The total number of iterations for model training is n x m, meaning that the model parameters will be updated n x m times. m iterations Illustration of a complete model training process Deep neural networks (DNN) Loss function  Regression Loss Mean Square Error/Quadratic Loss/L2 Loss Mean Absolute Error/L1 Loss  Classification Loss Cross Entropy Loss/Negative Log Likelihood Ground truth label for ith training example Prediction for ith training example Deep neural networks (DNN) Loss function and parameter optimization ◼ Optimizers 1. Gradient Descent 2. Stochastic Gradient Descent 3. Mini-Batch Gradient Descent 4. Adagrad 5. RMSProp 6. AdaDelta 7. Adam … Comparison between various optimizers (Image Source: Sanket Doshi, 2019) Deep neural networks (DNN) Loss function and parameter optimization ▪ Batch gradient descent (GD) ▪ Stochastic gradient descent (SGD) ▪ Mini-batch gradient descent (Mini-Batch GD) Batch gradient descent (GD) … … Batch 1 Batch 2 Stochastic gradient descent (SGD) (Using one sample for gradient computation each time) Batch 3 Mini-Batch GD Batch m-1 Batch m Deep neural networks (DNN) Loss function and parameter optimization ▪ Batch gradient descent (GD) ▪ Stochastic gradient descent (SGD) ▪ Mini-batch gradient descent (Mini-Batch GD) 𝑛 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 = ෍ 𝐺(𝑤, 𝑏; 𝑖𝑛𝑝𝑢𝑡𝑖 ) 𝑖=1 𝑛 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 = 𝐺(𝑤, 𝑏; 𝑖𝑛𝑝𝑢𝑡𝑖 ) 1 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 = ෍ 𝐺(𝑤, 𝑏; 𝑖𝑛𝑝𝑢𝑡𝑖 ) 𝑛 𝑖=1 Deep neural networks (DNN) Loss function and parameter optimization Error = label − ( w  input + b) 2 2 = E ({w, b}; input ) L Gradient：  Chain rule Error Error Gradient = { , } w b E ( w, b; input ) E ( w, b; input ) = , } w b = G ( w, b; input ) L L X i =  i X i i L L X i =  X i −1 X i X i −1 L L X i −1 =  i −1 X i −1 i −1 Deep neural networks (DNN) Loss function and parameter optimization Computing a gradient descent with back-propagation (BP) L(Output ; Label ) n +1 L(Output ; Label ) n , B = B −   ; n n W B = Net ( Input ;{W n +1 , B n +1 }) W n +1 = W n −   Output n +1 n n 𝜂 is the learning rate (LR), a tuning parameter that determines the step size at each iteration while moving toward a minimum of a loss function L( W, B)（W n ,Bn） ={ L L , n }： n W B The gradient of L at (W,B) indicates a direction in which L ascends most quickly. Computing a gradient descent with back-propagation (BP) Deep neural networks (DNN) Loss function and parameter optimization W n +1 L(Output n ; Label ) n +1 L(Output n ; Label ) n = W −  , B = B −  ; W n B n n  (Wn , Bn ) = −  L(Wn , Bn ) learning rate (LR)  (Wn , Bn ) = m   (Wn −1 , Bn −1 ) −   L(Wn , Bn ) −      (Wn , Bn ) momentum weight deacy (WD) Deep neural networks (DNN) Hyperparameters in model training Learning rate is most important hyperparameter, which is used to controls how fast your neural net “learns”. Learning rate configurations in the training of deep models (Image Source: Jeremy Jordan) Deep neural networks (DNN) Hyperparameters in model training Momentum is a method that helps accelerate SGD in the relevant direction and dampens oscillations. Local and global cost minimum (Image Source: Tejas Khare, 2021) (Image Source: Hazarapet Tunanyan, 2021) Deep neural networks (DNN) Hyperparameters in model training Weight decay is a regularization technique applied to the weights of a neural network, which can effectively deactivating some of the neurons by shrinking their weights close to zero to prevent overfitting and underfitting. Weight decay configurations in the training of deep models (Image Source: Deep Learning, Goodfellow et al) Summary Deep neural networks (DNN) Brief summary of DNN ◆ The design of deep network structure is essentially the design of linear and nonlinear combinations, and the objective of model training is to make the model output close to the labeled true value, i.e. 𝒀∗ ≈ 𝒀 𝒀 = 𝒇(𝒈(𝒇(𝒈 ··· 𝒇(𝒈(𝑿))))) Where 𝑔 𝑋 =𝑊∗𝑋+𝑏 𝑓 · Linear operations are nonlinear activation functions Deep neural networks (DNN) Brief summary of DNN ◆ Compared to traditional machine learning methods that typically utilize multiple manually designed features at one or several scale, deep learning methods benefit from their strong feature representation capabilities that allow for feature extraction at multiple scales and levels. Low-level Mid-level High-level Multiple level features from low-level spatial features to high-level semantic features Scale-1 Scale-2 Scale-3 Multiple scale features extracted with different size of convolutional kernels Deep neural networks (DNN) A Neural Network Playground https://playground.tensorflow.org/ Homework Introduction to Google Earth Engine Beginner's Cookbook: https://developers.google.com/earth-engine/tutorials/community/beginners-cookbook Video tutorial: https://www.youtube.com/playlist?list=PLgxX4AQ_KUQ9IjZl_4SVPbWVQ7hkZxl5B

Lecture 6: Machine Learning for Remote Sensing Image Processing - Part I PDF

Document Details

Tags

Related

Summary

Full Transcript