Machine Vision Lecture 1 PDF
Document Details
CDUT
Dr Happy N.
Tags
Summary
This lecture introduces fundamental concepts of machine vision, different applications, and algorithms. It covers concepts like classification, localization, detection, and segmentation, along with practical implementations using popular programming languages like Python, Keras, and TensorFlow. The lecture also discusses visual data used in machine vision, key problems in this domain, and the structure of a vision system, including low-level feature extraction and recognition.
Full Transcript
Machine Vision CHC6781 Lecture 1: Introduction to Machine Vision Module Leader: Dr Happy N. Monday Contact via: Email: [email protected] Office: Room 8304 Class Rules 1. Be on time 2. Bring your stuff (Pen,...
Machine Vision CHC6781 Lecture 1: Introduction to Machine Vision Module Leader: Dr Happy N. Monday Contact via: Email: [email protected] Office: Room 8304 Class Rules 1. Be on time 2. Bring your stuff (Pen, papers, folder, brain) 3. No pressing of phone, or gaming in the class. 4. Habit of Effective learners. 5. Prepare for your coursework and exam on time. 6. Always revise your PPTs before and after class. 7. Have Fun Learning Outcomes 1. Understand the fundamental concepts of machine vision (MV) 2. To be able to learn the different applications of MV and the workflow involved in building and deploying machine vision models. 3. Gain knowledge of various machine vision algorithms and their applications, such as classification, localization, detection, and segmentation 4. Develop practical skills in implementing MV models using popular programming languages and frameworks like Python, keras, and tensorflow. References 1. Richard Szeliski. (2017). Computer Vision: Algorithms and Applications. Springer 2. Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Media. 3. Aggarwal, C.C. (2018). Neural Networks and Deep Learning. Springer 4. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. 5. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. 6. Gulli, A, Pal, S. (2017). Deep Learning with Keras. Packt Publishing 7. Montavon, G. Orr, G. Muller, K.K. (2012). Neural Networks: Tricks of the Trade. Springer. Purpose of machine vision To enable machine to interact with the environment in the same natural way humans do. Vision is an essential part of this interaction. Machine vision is about processing of this visual information to support reasoning and decision making. Visual information processing looks natural and easy in humans but there is a lot of processing going on inside the eye and visual cortex which is very difficult to be model by algorithms. Visual data for Machine Vision Machine Vision Applicationa Domain Applications of vision Robotics: self-localization, map building, autonomous navigation Surveillance: “anomalous" event detection Biometrics: (face/identity recognition) Human-machine interface: natural interaction, video games! Applications of vision 1. Smart rooms: assistance to elderly people 2. Entertainment industry: augmented reality (AR), special effects. 3. Military applications 4. Target tracking 5. Medical imaging Key problems of Machine vision Images are very high dimensional data: eg. 100 x 100 image represents 104 dimensional vector of values. Two dimensional image do not capture 3D structural information about the scene. Quality of image is highly dependent on the imaging sensor There is always a lot of background (useless) information. Structure of a vision system machine vision system is typically composed by a number of cascade blocks decision/ classification/ estimation Block diagram Structure of a vision system First block: low level image processing designed to improve image quality or remove artifact, also known as preprocessing, e.g. gaussian blurring and contrast stretching Structure of a vision system Second block: features extraction is process of extracting relevant information from the image, eg. edge detection and Histogram of oriented gradients (HOG) Structure of a vision system Third block: It a model that takes in the extracted features at models their relationship for the task at hand. SVM, decision tree, neural network etc. Final goal: decision/action classification/recognition estimation/reconstruction Structure of a vision system making decisions example granting access to a person after face recognition classifying the input images/videos example: gesture recognition in human-machine interfaces estimating a quantity of interest example: estimating the depth of each point in the scene estimating the pose of a moving object Structure of a vision system making decisions example granting access to a person after face recognition classifying the input images/videos example: gesture recognition in human-machine interfaces estimating a quantity of interest example: estimating the depth of each point in the scene estimating the pose of a moving object Low level feature extraction Images are complex descriptions of external world often more handy to extract some sort of salient measurements or features from an image. Features are later used to make inferences, decisions, or estimations on the action to take or the structure of the outside world Features come in many different forms: points of interest Edges, contours Textures Interest points Images are complex descriptions of external world, lots of variability determined by nuisance factors Need to find areas where the relevant information is contained Interest points: local neighborhood rich of information, stable with respect to changes of view/illumination Edge detection Edge is the region of the image which has very high gradient value in either of horizontal of vertical directions. Gradient represents the change in pixel value in reference to its surrounding. Edge detection Edges are normally related either to variations in the appearance of the viewed object(s) (e.g. trees, persons, cars, etc) or to boundaries between separate objects Face detection Textures Textures: areas of an image in which a certain “appearance pattern" is present. Texture classification is used for object recognition Texture information can be used for synthesis 3D rendering. Image and video compression Images and videos come in lots of different formats: why? Limited transmission capacity and storage space. Compression is used to reduce the amount of memory used to store the image or video. 1. Lossy compression: Discrete Cosine Transform, Discrete Wavelet Transform 2. Lossless compression: Run Length Encoding Examples: TIFF, JPG (images), MPEG (videos) Image segmentation Image segmentation is process of labeling each pixel in image to a predefined category. Segmentation can be based on difference of brightness, color, texture, motion., (often two regions: foreground and background) Original image (hover to highlight segmented parts! Semantic segmentation Objects appearing n the image mam B*cyde OXFORD BROOK UNIVERSITY Background segmentation ES For a lot of applications we need to know what is foreground and what is background. Background segmentation utilizes the property of motion in the video to segment two parts. Recognition Recognition involves identifying the category of the object present in image. Face recognition Object recognition Action recognition (uses sequence generally) Detection Detection involves identifying the location of objects in the image. Face detection Pedestrian detection Detection Object detection Object Localization: does not provide category of the objects. OXFORD BROOK UNIVERSITY Segmentation ES Semantic segmentation: process of classifying each pixel belonging to a particular label. It doesn't differentiate across different instances of the same object. Instance segmentation: It gives a unique label to every instance of a particular object in the image. Panoptic segmentation: it combines semantic segmentation and instance segmentation. Has two components: background and objects OXFORD BROOK UNIVERSITY Segmentation ES Different Computer Vision tasks CAT, DOG, DUCK CAT, DOG, DUCK Single object Multiple objects Different Computer Vision tasks Classification Instance Classification Object Detection + Localization Segmentation CAT CAT CAT, DOG, DUCK CAT, DOG, DUCK Single object Multiple objects Different Computer Vision tasks GRASS, CAT, CAT DOG, DOG, CAT DOG, DOG, CAT TREE, SKY Single Object No objects, just pixels Multiple Object Different Computer Vision tasks Classification Semantic Object Instance + Localization Segmentation Detection Segmentation GRASS, CAT, CAT DOG, DOG, CAT TREE, SKY Single Object No objects, just pixels Multiple Object Panoptic segmentation Semantic Instance segmentation segmentation Panoptic segmentation Basic Image Processing Read and Display Image Output Read and Display Image Output Saving The Output Image Image Manipulation Image Manipulation Computer Vision in Real-World Computer Vision in Real-World There is a number of visual recognition problems that are related to image classification, such as object detection, image captioning Object detection Action classification Image captioning Convolutional neural Network (CNN) have become an important tool for object recognition Convolutional neural Network (CNN) were not invented overnight Image Preprocessing Image Preprocessing Image Preprocessing Image Preprocessing Image Preprocessing Image Preprocessing The End!