Week 2 Module 1: Introduction to Digital Image Processing PDF
Document Details
Uploaded by QuickerEucalyptus
Tags
Summary
This document provides an introduction to computer vision and image processing. It covers fundamental concepts, core techniques, and applications, including healthcare, automotive, and retail applications. The document also details the course overview and prerequisites.
Full Transcript
**1. Introduction to Computer Vision and Image Processing** - **Computer Vision (CV):** - Computer vision aims to enable machines to interpret and understand visual information from the world (e.g., images or videos). - It mimics human vision and allows for applicati...
**1. Introduction to Computer Vision and Image Processing** - **Computer Vision (CV):** - Computer vision aims to enable machines to interpret and understand visual information from the world (e.g., images or videos). - It mimics human vision and allows for applications like object detection, facial recognition, and scene understanding. - **Image Processing (IP):** - Focuses on transforming and analyzing images to improve their quality or extract information. - Techniques include filtering, noise reduction, and color adjustments. - Often acts as a preprocessing step for computer vision tasks. - **Differences Between CV and IP:** - Image processing is more focused on pixel-level manipulation, while computer vision works on higher-level interpretation. **2. Fundamental Concepts** - **Digital Images:** - An image is composed of pixels, which are the smallest units of information in an image. - **Resolution:** Refers to the number of pixels in an image, affecting its quality. - **Color Models:** Defines how colors are represented. Examples include RGB, grayscale, and CMYK. - **Basic Operations:** - **Image Acquisition:** Capturing an image using devices like cameras or sensors. - **Preprocessing:** Steps like resizing, cropping, and noise reduction to prepare the image for analysis. - **Enhancement:** Improving image quality for better visual interpretation (e.g., adjusting contrast or sharpness). **3. Core Techniques** - **Edge Detection:** - Identifying boundaries in images to highlight regions of interest. - Common algorithms: Canny, Sobel, Prewitt. - **Segmentation:** - Dividing an image into meaningful regions or segments (e.g., separating a foreground object from the background). - Approaches include thresholding, clustering (e.g., K-means), and region-based methods. - **Feature Extraction:** - Extracting key characteristics or descriptors from an image (e.g., shapes, edges, or textures). - These features are used in machine learning and deep learning for classification or recognition tasks. **4. Applications** - **Healthcare:** - Medical imaging for diagnosing diseases (e.g., X-rays, CT scans). - Analyzing biological images (e.g., cell segmentation). - **Automotive Industry:** - Self-driving cars rely on computer vision for lane detection, object recognition, and obstacle avoidance. - **Surveillance and Security:** - Face recognition and behavior monitoring in security systems. - **Retail and E-commerce:** - Image-based product search, augmented reality for virtual try-ons. - **Other Emerging Trends:** - AI-powered vision systems in agriculture, manufacturing, and robotics. **5. Course Overview** - The video might provide a structured roadmap of topics covered throughout the course. Likely topics include: - Mathematical foundations (linear algebra, probability). - Image representation, filtering, and transformations. - Advanced computer vision methods like convolutional neural networks (CNNs). - Practical tools and frameworks for implementation (e.g., OpenCV, TensorFlow, PyTorch). - **Prerequisites for the Course:** - Basic knowledge of mathematics, particularly linear algebra and calculus. - Familiarity with programming languages like Python or MATLAB. - Awareness of machine learning fundamentals can be helpful. - **Resources:** - Suggested textbooks, research papers, and online resources to deepen understanding. **Why This Field is Important** - Computer vision and image processing play a pivotal role in making machines intelligent. - These technologies are becoming increasingly integral to diverse fields like AI, automation, and real-time decision-making systems. **1. Image Processing in Real Life** Image processing involves transforming and enhancing raw images for better quality or feature extraction. Here are some examples: **Photo Editing and Enhancement** - **Example:** Apps like Adobe Photoshop, Instagram filters, and Snapseed. - These tools apply image processing techniques like color correction, sharpening, and blurring. - For example, Instagram filters adjust contrast, brightness, and saturation in real-time. **Medical Imaging** - **Example:** X-rays, MRI scans, and CT scans. - Image processing helps enhance medical images to improve diagnosis. - **Use case:** Removing noise in MRI images to identify tumors or abnormalities. **Document Scanning** - **Example:** OCR (Optical Character Recognition) software in apps like CamScanner. - Image processing enhances the scanned document (e.g., increasing contrast) before text is extracted using OCR. **Barcode and QR Code Scanning** - **Example:** Barcode readers in supermarkets or QR code payments. - The scanned image is preprocessed to detect and decode the pattern for product identification or payment processing. **Satellite and Remote Sensing** - **Example:** Google Earth or weather prediction models. - Satellite images are processed to analyze terrain, detect deforestation, or predict crop health. **2. Computer Vision in Real Life** Computer vision goes a step further by interpreting images and making decisions based on the content. **Facial Recognition** - **Example:** Face Unlock in smartphones, airport security, or attendance systems. - CV detects and matches faces against a database to identify individuals. - **Use case:** Apple's Face ID uses a 3D map of your face for secure authentication. **Self-Driving Cars** - **Example:** Tesla's Autopilot or Waymo vehicles. - Cameras and CV algorithms are used for: - **Lane detection** to keep the car within road boundaries. - **Pedestrian detection** to avoid collisions. - **Traffic sign recognition** to obey road rules. **Healthcare -- Disease Diagnosis** - **Example:** AI-assisted tools like DeepMind's retinal disease detection system. - Computer vision analyzes medical images (like retinal scans) to detect diabetic retinopathy or cancer. **Retail -- Visual Search** - **Example:** Amazon and Pinterest's \"visual search\" features. - You can take a photo of a product, and the app finds similar products online. - **How it works:** CV extracts features like color, shape, and texture to match items. **Augmented Reality (AR)** - **Example:** Snapchat filters or Pokémon GO. - CV identifies facial landmarks to apply filters or detects the ground to place virtual Pokémon in the real world. **Security and Surveillance** - **Example:** CCTV systems with motion detection or person tracking. - CV algorithms identify suspicious activities or unauthorized access automatically. - **Use case:** Airports use CV to track individuals in crowded terminals. **Agriculture** - **Example:** Monitoring crop health using drones. - Drones capture field images, and CV algorithms analyze the images for plant diseases, water stress, or pest infestation. **Manufacturing -- Quality Control** - **Example:** Automated assembly lines. - CV systems inspect products for defects, ensuring consistent quality. - **Use case:** Detecting scratches or uneven surfaces in car manufacturing. **E-commerce -- Try-On Features** - **Example:** Virtual try-on for clothes or glasses (e.g., Warby Parker, Sephora). - CV maps your face or body to overlay products like glasses, makeup, or clothes in real-time. **3. Combined Use of Image Processing and Computer Vision** In most cases, image processing is a preprocessing step for computer vision tasks. Here\'s how they work together: **Example 1: Automatic Number Plate Recognition (ANPR)** - **Step 1 (Image Processing):** The camera captures an image of a vehicle, and preprocessing is applied to enhance contrast and remove noise. - **Step 2 (Computer Vision):** CV algorithms detect the number plate, segment the characters, and recognize the text for applications like toll collection or traffic violation tracking. **Example 2: Medical Imaging** - **Step 1 (Image Processing):** Enhance a CT scan to improve clarity and remove noise. - **Step 2 (Computer Vision):** Analyze the enhanced image to detect patterns, such as identifying cancerous cells or measuring organ sizes. **Emerging Trends with Examples** **1. Gesture Recognition** - **Example:** Gaming consoles like Xbox Kinect or Google's Soli project. - Recognizing hand gestures to control devices or interact with virtual objects. **2. Real-Time Translation** - **Example:** Google Translate's camera feature. - CV identifies text in an image, and OCR extracts the characters. - The extracted text is then translated into the target language. **3. Fraud Detection** - **Example:** Banks use CV for fraud prevention. - **Use case:** Verifying if ID documents (like a driver's license) match the face of the user in real time. **4. Urban Planning** - **Example:** Detecting urban sprawl or illegal construction. - Satellite images are processed and analyzed using CV to monitor changes over time. **Why This Field is Transformational** - **Efficiency:** Automated systems can process thousands of images in seconds, far beyond human capabilities. - **Accuracy:** Machine learning-powered CV systems are often more accurate than manual methods, such as detecting early-stage diseases or inspecting microscopic defects. - **Scalability:** CV and IP applications can scale to industries like education, entertainment, and even social media moderation (e.g., detecting harmful content). TYPES OF IMAGES There are three types of images. They are as following: 1\. Binary Images It is the simplest type of image. It takes only two values i.e, Black and White or 0 and 1. The binary image consists of a 1-bit image, and it takes only 1 binary digit to represent a pixel. Binary images are mostly used for general shape or outline. **For Example:** Optical Character Recognition (OCR). Binary images are generated using threshold operation. When a pixel is above the threshold value, then it is turned white (\'1\') and which are below the threshold value then they are turned black(\'0\') 2\. Gray-scale images Grayscale images are monochrome images, Means they have only one color. Grayscale images do not contain any information about color. Each pixel determines available different grey levels. A normal grayscale image contains 8 bits/pixel data, which has 256 different grey levels. In medical images and astronomy, 12 or 16 bits/pixel images are used. ![](media/image2.png) 3\. Colour images Colour images are three band monochrome images in which, each band contains a different color and the actual information is stored in the digital image. The color images contain gray level information in each spectral band. The images are represented as red, green and blue (RGB images). And each color image has 24 bits/pixel means 8 bits for each of th+e three color band(RGB). 8-bit color format 8-bit color is used for storing image information in a computer\'s memory or in a file of an image. In this format, each pixel represents one 8 bit byte. It has 0-255 range of colors, in which 0 is used for black, 255 for white and 127 for gray color. The 8-bit color format is also known as a grayscale image. Initially, it was used by the UNIX operating system. ![](media/image4.png) 16-bit color format The 16-bit color format is also known as high color format. It has 65,536 different color shades. It is used in the system developed by Microsoft. The 16-bit color format is further divided into three formats which are Red, Green, and Blue also known as RGB format. In RGB format, there are 5 bits for R, 6 bits for G, and 5 bits for B. One additional bit is added in green because in all the 3 colors green color is soothing to eyes. 24-bit color format The 24-bit color format is also known as the true color format. The 24-bit color format is also distributed in Red, Green, and Blue. As 24 can be equally divided on 8, so it is distributed equally between 3 different colors like 8 bits for R, 8 bits for G and 8 bits for B. ![](media/image6.png) Color Codes Conversion Different color codes As we know that color here is of 24-bit format, which means 8 bits of red, 8 bits of green, 8 bits of blue. By changing the quantity of the 3 portions, you can made different colors. Binary color format **Color:** Black For pure black color, all the three portions of R, G, B is 0 **Color:** White For pure white color, all the three portions of R, G, B is 255 **Color:** Red For the red color, green and blue should be set 0, and for the red portion, we have to give its maximum value, i.e. 255. **Color:** Green For green color, red and blue should be set 0, and for the green portion, we have to give its maximum value, i.e. 255. **Color:** Blue For blue color, red and green should be set to 0, and for the blue portion, we have to give its maximum value, i.e. 255. **Color:** gray For gray color, All the value should be 128. CMYK color model CMYK model is used for printers in which two carters are used, one for CMY color and other for black color. CMY can be changed to RGB. In CMYK color model C stands for cyan, M stands for magenta, Y stands for yellow, and K stands for black. **Color:** Cyan For cyan color, red should be set 0, and for the green and blue portion, we have to give its maximum value, i.e. 255. **Color:** Magenta For magenta color, green should be set 0, and for the red and blue portion, we have to give its maximum value, i.e. 255. **Color:** Yellow For magenta color, blue should be set 0, and for the red and green portion, we have to give its maximum value, i.e. 255. RGB to Hex code For example, if we want to convert the white color code (255, 255, 255) to hex code. **Following are the steps for RGB to Hex code conversion:** 1. Taking the 1s portion i.e. value of red(R) 255. 2. Dividing it by 16. We will get 15 as a factor as well as remainder which is FF. 3. Repeat step 1 and 2 for another 2 portions. 4. Combine all the three hex code into one we will get \#FFFFFF Hex to RGB code For example, if we want to convert white color code \#FFFFFF Following are the steps for converting hex code to RGB decimal format: 1. Divide the number into 3 equal parts: FF FF FF 2. Taking the 1st part and separating it: F F 3. Now convert each part into binary separately: (1111) (1111) 4. Combine both the binary values into one: 11111111 5. Now convert the binary number into a decimal number: 255 6. Now, repeat step 1 to 5 for another two portions. 7. Combine all the three-hex code into one we will get (255, 255, 255) **Following are some colors and their hex code** **Color** **Hex Code** ----------- -------------- Black \#000000 White \#FFFFFF Gray \#808080 Red \#FF0000 Green \#00FF00 Blue \#0000FF Cyan \#00FFFF Magenta \#FF00FF Yellow \#FFFF00 **Steps involved in an image processing pipeline** [Image processing](https://www.educative.io/answers/what-is-image-processing) is a set of techniques that involve manipulating and analyzing digital images to enhance their quality, extract information, or perform specific tasks. These tasks can be of various types, like recognizing objects, finding objects that are not visible, and sharpening and restoring the image. Image processing is widely used in fields such as computer vision, medical imaging, remote sensing, and digital photography. Let\'s take a look at the steps in the image processing pipeline in the diagram below. **Image acquisition** The first step in the image processing pipeline is image acquisition. This step involves capturing the raw image data, including pixel values and metadata, from cameras, scanners, or other sources and converting them into digital format. The resulting digital image is a matrix of pixel values, where each pixel is assigned a specific binary code or numeric value. The digital image is typically stored in a specific file format, such as JPEG, PNG, TIFF, or RAW. These formats determine how the pixel values, metadata, and other information are encoded and stored within the file to preserve image quality or metadata. ![](media/image8.png) **Image preprocessing** After acquiring the image, it is preprocessed. **Image preprocessing** refers to a set of techniques and operations performed on images before they undergo further analysis. The goal of preprocessing is to enhance the quality of the image, remove unwanted artifacts, and prepare the image for subsequent tasks such as feature extraction, object recognition, or image analysis. Some basic preprocessing techniques include resizing, scaling, rotating, cropping, and flipping an image. Next, we will look at a few techniques used in image preprocessing. - **Image enhancement** It aims to improve the visual quality, clarity, and interpretability of an image by adjusting the brightness, contrast, and color balance of an image to make it more visually appealing or highlight specific features. Common enhancement techniques include histogram equalization, contrast stretching, gamma correction, and adaptive filtering. - **Image restoration** **Image restoration** techniques are used to recover or restore degraded or damaged images to improve their quality, thereby removing the artifacts caused by noise, blurring, compression, or other factors. Examples of image restoration techniques include denoising, deblurring, super-resolution, and inpainting. ![Image restoration example](media/image10.jpeg) - **Image denoising** **Image denoising **is a technique used to reduce or remove noise from an image. Noise in an image can be introduced during image acquisition, transmission, or storage processes, and it can degrade image quality and affect subsequent analysis or processing tasks. Denoising can be done by spatial filters like mean, median, and the Gaussian filter or frequency domain filters. - **Image segmentation** **Image segmentation** involves dividing an image into meaningful and distinct regions or objects. Segmentation techniques can be based on various criteria, such as color, intensity, texture, or edge information. It is useful for object detection, tracking, and extracting region-specific information for further analysis. Common segmentation methods include thresholding, edge detection, region growing, clustering, and watershed segmentation. **Feature extraction** After processing the image, the useful features are extracted from the image. **Feature extraction **in image processing refers to the process of identifying and extracting meaningful and relevant information or features from an image. These features capture distinctive characteristics of the image that can be used for various tasks such as image recognition, object detection, image classification, and image retrieval. Some commonly used techniques for feature extraction in image processing are edge detection and texture analysis. **Edge detection **algorithms identify and highlight the boundaries or edges of objects in an image while **texture analysis** methods capture the spatial arrangement and statistical properties of texture patterns in an image. **Recognition or detection** The recognition or detection step in image processing involves identifying and classifying objects or patterns of interest within an image. This step utilizes the extracted features from the previous feature extraction step to make decisions about the presence, location, or characteristics of specific objects or classes. Now that we have overviewed the steps in an image processing pipeline, let\'s model the steps in an example. **Example** Consider the scenario of an autonomous vehicle navigating through a busy city road. In this scenario, the image processing pipeline involves acquiring images from the vehicle\'s cameras, applying preprocessing techniques such as enhancing the images, segmenting the scene, detecting and recognizing objects like traffic signs, detecting lanes, and tracking obstacles. These steps collectively provide valuable information to the autonomous vehicle system, enabling it to safely navigate through the city street. **Conclusion** In conclusion, the image processing pipeline plays a vital role in various real-world applications, such as computer vision, medical imaging, autonomous vehicles, surveillance, and more. It enables us to extract meaningful information from images, make decisions based on visual data, and automate tasks that would otherwise be time-consuming to perform. [Steps involved in an image processing pipeline](https://www.educative.io/answers/steps-involved-in-an-image-processing-pipeline)