Semantic Segmentation Techniques PDF
Document Details
Uploaded by RespectableCotangent
Damascus University
Tags
Summary
This document provides an overview of semantic segmentation techniques, focusing on architectures like fully convolutional networks (FCNs), DeconvNet, SegNet, and U-Net. The document also discusses operations used for dense prediction, such as transposed convolutions and unpooling. The different loss functions, such as the Dice coefficient, are also examined.
Full Transcript
Outline Semantic segmentation & Applications Fully convolutional networks Architectures for dense prediction DeconvNet, SegNet, U-Net Operations for dense prediction Transposed convolutions, unpooling Instance segmentation Example of semantic segmentation...
Outline Semantic segmentation & Applications Fully convolutional networks Architectures for dense prediction DeconvNet, SegNet, U-Net Operations for dense prediction Transposed convolutions, unpooling Instance segmentation Example of semantic segmentation 3 The Task person grass trees motorbik road e Semantic Segmentation vs. Instance Segmentation Semantic segmentation Medical Semantic segmentation Example of segmentation output compared to ground truth ("Manual segmentation"). Lesions in green, liver in red. Medical image diagnostics Machines can augment analysis performed by radiologists, greatly reducing the time required to run diagnositic tests. A chest x-ray with the heart (red), lungs (green), and clavicles (blue) are segmented 9 CNNs for dense image labeling image classification object detection semantic segmentation instance segmentation Instance segmentation Instance Segmentation Instance Segmentation is an extension of object detection, where a binary mask (i.e. object vs. background) is associated with every bounding box. This allows for more fine-grained information about the extent of the object within the box. Instance Segmentation = Object Detection + Semantic Segmentation A real-time segmented road scene for autonomous driving. The Task in details Take either a RGB color image (height x width x 1) or a grayscale image (height x width x 1) and output a segmentation map where each pixel contains a class label represented as an integer. Creating an output channel for each of the possible classes Fully convolutional networks Design a network with only convolutional layers, make predictions for all pixels at once Source: Stanford CS231n Fully convolutional networks Design a network with only convolutional layers, make predictions for all pixels at once Can the network operate at full image resolution? Source: Stanford CS231n Fully convolutional networks Design a network with only convolutional layers, make predictions for all pixels at once Can the network operate at full image resolution? Practical solution: first downsample, then upsample Source: Stanford CS231n Image Classification with CNN: Recap Image Classification with CNN: Recap Semantic Segmentation Idea: Sliding Window Classify center pixel Extract patch with CNN Full image Cow Cow Grass Problem: Very inefficient! Not reusing shared features between overlapping patches Fully convolutional networks (FCN) Predictions by 1x1 conv layers, bilinear upsampling to original image resolution Predictions by 1x1 conv layers, learned 2x upsampling using transposed convolutions, fusion by summing J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 Fully convolutional networks (FCN) Predictions by 1x1 conv layers, bilinear upsampling to original image resolution Predictions by 1x1 conv layers, learned 2x upsampling using transposed convolutions, fusion by summing J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 Fully convolutional networks (FCN) Predictions by 1x1 conv layers, bilinear upsampling to original image resolution Predictions by 1x1 conv layers, learned 2x upsampling using transposed convolutions, fusion by summing J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 Refining fully convolutional nets by fusing information from layers with different strides improves segmentation details. Fully convolutional networks (FCN) Comparison on a subset of PASCAL 2011 validation data: J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 Loss Function The full network, as shown below, is trained according to a pixel- wise cross entropy loss. Defining loss function The most commonly used loss function for the task of image segmentation is a pixel-wise cross entropy loss. This loss examines each pixel individually, comparing the class predictions (depth-wise pixel vector) to our one-hot encoded target vector. 33 Loss function (pixel-wise cross entropy loss) 34 Dice Coefficent Another popular loss function for image segmentation tasks is based on the Dice coefficient, which is essentially a measure of overlap between two samples. This measure ranges from 0 to 1 where a Dice coefficient of 1 denotes perfect and complete overlap. The Dice coefficient was originally developed for binary data, and can be calculated as: 2 Α Β Dice Α Β 37 Dice Coefficent |A∩B| represents the common elements between sets A and B, and |A| represents the number of elements in set A (and likewise for set B). For the case of evaluating a Dice coefficient on predicted segmentation masks, we can approximate |A∩B| as the element-wise multiplication between the prediction and target mask, and then sum the resulting matrix. 38 Softmax Outline Semantic segmentation & Applications Fully convolutional networks Architectures for dense prediction DeconvNet, SegNet, U-Net Operations for dense prediction Transposed convolutions, unpooling DeconvNet H. Noh, S. Hong, and B. Han, Learning Deconvolution Network for Semantic Segmentation, ICCV 2015 DeconvNet Max indices DeconvNet Original image 14x14 deconv 28x28 unpooling 28x28 deconv 54x54 unpooling 54x54 deconv 112x112 unpooling 112x112 deconv 224x224 unpooling 224x224 deconv H. Noh, S. Hong, and B. Han, Learning Deconvolution Network for Semantic Segmentation, ICCV 2015 DeconvNet results PASCAL VOC 2012 mIoU FCN-8 62.2 DeconvNet 69.6 Ensemble of DeconvNet and FCN 71.7 Similar architecture: SegNet Drop the FC layers, get better results V. Badrinarayanan, A. Kendall and R. Cipolla, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, PAMI 2017 SegNet V. Badrinarayanan, A. Kendall and R. Cipolla, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, PAMI 2017 U-Net Like FCN, fuse upsampled higher-level feature maps with higher-res, lower-level feature maps Unlike FCN, fuse by concatenation, predict at the end O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, MICCAI 2015 U-Net architecture Summary of dense prediction architectures Transposed Convolution A transposed convolutional layer is an upsampling layer that generates the output feature map greater than the input feature map. It is similar to a deconvolutional layer. Transposed Convolution The operation of a transposed convolutional layer is similar to that of a normal convolutional layer, except that it performs the convolution operation in the opposite direction. Instead of sliding the kernel over the input and performing element-wise multiplication and summation, a transposed convolutional layer slides the input over the kernel and performs element-wise multiplication and summation. This results in an output that is larger than the input, and the size of the output can be controlled by the stride and padding parameters of the layer. Transposed Convolutional with stride 2 Transposed Convolutional Stride = 1 Output shape The output shape can be calculated as: Oh = (Ih-1) x S + Kh - 2p Ow = (Iw-1) x S + Kw - 2p Upsampling by unpooling Alternative to transposed convolution: max unpooling Max Max pooling unpooling Remember pooling indices (which element was max) Output is sparse, so unpooling is typically followed by a transposed convolution layer SegNet V. Badrinarayanan, A. Kendall and R. Cipolla, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, PAMI 2017