Computer Vision Basics PDF

· Computer Vision Basics. Computer vision focuses on teaching machines to understand and interpret visual information from theworld such as images and videos , It aims to enable comput...

· Computer Vision Basics. Computer vision focuses on teaching machines to understand and interpret visual information from theworld such as images and videos , It aims to enable computers to recognize objects people or , patterns. In data science computer vision involves using data to train models that can perform tasks / like identifying faces ecc. / · OBJECT RECOGNITION. Recognition problems in computer vision are fundamental tasks /they focus on understanding and interpreting images by identifying objects within them ?. Recognition often involves breaking down the process into Key components : SEGMENTATION. The first step the goal to distinguish between pixels that , is belong to the object of interest /the foreground) and these that make up the background. This process creates a precise around the object , ena boundary to isolate it from other elements in the bling the system image. EXAMPLE : In an image of a dog on the grass segmentation , would separate the pixels forming the dog from those representing the grass. LOCALIZATION DETECTION. It identifies where the object is located within the scene. This includes pinpointing the objects position using bounding boxes or coordinates and estimating its pose. refers to properties like orientation size, , scale and 30 position. The components work together to provide a detailed understanding of object's presence and its spacial context. Exampe : Localization might indicate that a car is in the top right corner of the image , while pose estimation could reveal that the car is facing forward , is scaled to appear close ecc. So, recognition focuses on understanding and interpreting the content of an image. Then there are : ↳ Object identification I specific subtasks within object recognitione Both rely on recognizing patterns , but Identification is more detailed. Classification abstracts these details to catego rize based on shared characteristics. · OBJECTIDENTIFICATION. It refere to the precise specific instance of recognition of a an object. EXAMPE : Identifyingyour apple" involves distinguishing that particular apple from all other apples for objects) , based on unique features /hape color e) , , It requires detailed knowledge about the observed exact object is being. often through extensive training on object's characte ristico. STEPS : 1) Gathering a dataset that includes multiple images of the specific object to be identified. Each image must be accurately labeled to ensure the model can associate the correct features with the object. 2) Advanced algorithms analyze the images to extract unique features of the specific object /color shope , , distinguishing marks ecc. / 3) A deep learning model is trained using the labeled dataset , in order to learn the unique characteristics of the object to distinguish it from others , even in varging conditions /changes in angle lightning , ecc. 4) The trained model is validated and tested in real-world scenarios to confirm it can consistently recognize the specific object. I ADVANTAGES. DISADVANTAGES. · Higly accurate recognition of a Requires extensive and detailed specific instances of objects datasets of the specific object to Sideal for inventory tracking per , achieve high accuracy. sonal objects recognition ecc. · Models trained for specific object · Excels in distinguishing nearly identification might struggle when identical items. faced changes/lightning angle , ec. · Highly specialized , it may require re-training for new object to be identified. · OBJECT CLASSIFICATION. It involves recognizing objects as belonging to a general category or class identifying any apple any cup any doge such as ,. , , This process doesn't focus on specific instances but rather on , shared features that define a category. general roundness of an apple , handles of cups ecc. Often associated with "basic-level categories" , the most intuitive and general levels at which humans naturally classify objects. quicker to identify a "dog" rather than a "Golden retriver. It relies on training models to learn these shared features and categorize objects accurately. STEPS : 1) In order to provide better image data for computer vision models to work with, it removes unwanted deformities and enhances some important parts of the image. 2) Objects are localized which involves object , segmentation and object position determination. 3) Deep learning algorithms identify patterns in the image that can be specific to a certain label. 1) Machine learnings divide observed things into predefined classes using the classification strategy. ADVANTAGES. DISADVANTAGES. · Since your product will need on some images , the target object to recognize items in the may not be entirely visible , like physical world , it would make a dog hiding in a bush. Even if the sense to train it on real-life dog's head is plainly visible , the images. image algorithm might not identify it. · Image classification products · ability of model to correctly The a can be used in areas like identify the object in the image may object identification in satel be impacted by interfering with life images traffic control , ambient texture and color like having , systems ecc. troubles distinguishing a red apple from a similar-colored table · · IMAGE Linear FILTering. I abasic get powerful technique used to enhance images or important information from them. It involves applying a mathematical function to the pixels of an image to achieve a desired effect. Smooth out noise , sharpen edges highlight , certain features ecc. Blurry photo filter to make it clearer. This process by examining small pixels and typically works a group of adjusting their values based on specific rules An image can be seen as a mathematical function : f(x y) , where "and "y" represents the spatial coordinate , corresponding to specific locations within the image. Function fk g) assigns, a value to each coordinate pair ; this value is referred to as pixel value · In image with dimensions MXN a from o to M-1 and y an , ranges ranges from o to N-1 , defining a grid of pixels. useful because it allows algorithm to treat the image as a structured dataset where each point corresponds to a measurable quantity that can be processed for tasks. Grayscale Images. In grayscale image each pixel a is represented by a single component that captures the intensity of gray at a specific location. indicates how light or dark the pixel appears flower darker , lighter-whiter ! The range of volues is from o to 255. RGB IMAGES. An Blue) image represents each pixel using three RGB /Red Green , , Separate components real gly g) and by gl the intensities of , , , , , the red green and blue color channels at , specific location ( 4) a.. The combination of these three channels determines the final color of each pixel ; an &GB model can produce 16 177 216.. possible colors. ⑳ CONVOLUTION. They arefundamental mathematical operations in image processing. They operate by applying a Kernel /a small matrix of numerical values to localized regions of an image performing element-wise multi , plication and summation across overlapping areas. convolutional operations are shift-invariant so spatial , structures /shape patterns) are preserved during the processing ,. Among the benefits for convolutions there are : Detection of local patterns , such edges shapes , ecc. That they work on smalllocalized of the input/kernels or , regions filters) , enabling tasks like smoothing sharpening ecc ,. They preserve the spatial structure of the input data. The image filtering is a crucial technique in image processing where , a filter is defined as a mathematical function applied to the pixels of an image to transform or extract information. This transformation is performed by systematically altering pixel values based on their surroundings. Filters play diverse roles : Enhancement : Improving the contrast of an image , highlight speci fir features ecc. Smoothing Reducing noise by averaging. other techniques /resulting in a cleaner images. Template matching Detecting predefined patterns : or shapes within an image. - - The filtering fundamental technique used to reduce noise is a and enhance image quality. The noise refere to un wanted variations in pixel values that obscure the true signal of the image /caused by sensor errors quantization e , The filtering methods are designed to smooth these random fluctua tions while the details preserving image. The noise is categorized into two types : H low-level noise : includes issues like light fluctuations ecc. These types of noises often manifest as random variations in intensity image graing or distorted. e) Complex noise : includes larger-scale disruptions , like shadows or extraneous objects that interfere with the image interpretation. The additive noise model in image processing assumes that an observed image I is the sum of the true signal 5 and a noise component N. 1 = 5+ N. The by applying gaussian filtering works a gaussian kernel to an image , which smooths out variations by averaging the pixel values in a local neighborhood. higher weights to pixels closer to the center of the neighborhood and progressively lower weights to farther pixels ensuring that nearby pixels , have a more closer significant influence on the final output. this makes Gaussian filtering effective at reducingrandom noise while preserving important image featureslike edgest. The standard deviation controls the spread of the Kernel; a smaller ~ results in a narrower Kernel /only very close pixels have significant influence) , a larger s creates a broader Kernel. 2D CONVOLUTION. The linear filtering involves replacing the value of each pixel in an image weighted with a sum of its own value and the values of its neighboring pixels. The weights used for this operation are defined by a filter for Kernel which is a small matrix that specifies how much influence each neighboring pixels has. The mathematical operation of a zo convolution is defined as : f[m n], = (10g) [m , n] = I[m-k , n - lg[k , l]. I[m nJ, : input image , a grid of pixels g[K l] , : the filter/Kernell o , values. matrix that is applied to the image. I[m-K n-l] , : the image feature to which Kernel is applied. Fimn] filtered image :. The convolution process works by sliding a filter /Kernel) over sum of pixel values an input image and calculating the weighted within the filter's receptive field. Each resulting value forms a pixel in the filtered image. EXAMPLE. & 52 Given the Input Image /I = K (1) , : I = 753 9 ↓1 The Filter Kernel (g[K l]) , is a 353 Kernel used for edge detection or smoothing. - 101 g 101 = - - 101 So the resulting filtered pixel If [m n]) is an element-wise , multiplication of the Kernel performed with the image region it overlays , then sum up the results f(1 1] , = f - 1 - % + (0 5) + 11 2).. + f - 1 - 7) + (0 5) + (1 3).. + f - 1 : 9) + (0 4) + (1 1).. - = 18. The output size decreases after convolution operation because a the Kernel used for convolution requires a full overlap with the input image's pixels to compute valide results. Separability Of Box and Gaussian FILTers. The separable property of filters refere to the ability to decompose a convolution 20 operation into two sequential 10 the convolutions , one along the rows and the other along columns. / f fy(dI = /gl) , wherer and fare 1o Kernels I is the input image. This reduces the costs and simplifies implementation , as two sequential 10 convolutions are easier to compute and optimize. EXAMPLE. Filters like Box and Gaussian can exploit their separable property to be efficiently applied as two sequential 10 filters instead of a full 20 convolution. First , we convolve the image with or 10 filter along the rows then , we convolve the result with a 10 filter along the columns. Box Filter. Is a simple averaging filter represented by a 3x3 Kernel , where each value is 19. This 3x3 box filter can be decomposed into two 10 filters. The horizontal filter is [13 , 73 , 73) , which applies averaging along the vocus performs averaging the columns of the image ,. and the vertical filter[] , which By applying these 2 10-filters sequentially , we achieve the same result as the full 3x3 convolution. When the Kernel is centered near the edges of the image part , of it full outside the image area , leaving insufficient pixels for a valid computation. autermost vocus and columns ofpixels are excluded from the output, resulting in an image with reduced dimensions. For an input of size NXN and a kernel of size Kak the formula for the , output size is : N-K 1) (N + : - K + 1) , so if we have an in put image 5x5 and a Kernel 3x3 , we will have: (5 - 3+ 1). (5 - 3 + 1) = 3x3. The convolution reduces the spatial dimensions of the image because boundary regions are excluded as the Kernel can't extend beyond the image. The larger the Kernel , the greater the reduction of dimensions. Important to choose appropriate Kernel size. PADDING. To preserve the output size after applying a convolution padding , can be used to add extra vocus and columns around the image. It involves surrounding the original image with a border of zero values to ensure that the convolutional Kernel can fully overlap every pixel in the original image. Usually , for a KXk Kernel , a typical padding size is LK2) so for , a 3x3 is 1 pixel , for a sx5 is a pixels ecc. CORRELATION US CONVOLUTION. The key difference lies in the flipping of the Kernel. For operation convolution the Kernel must be an ,qualify to as , flipped both horizontally and vertically before being applied to the input data. EXAMPLE. A Kernel [ ?, flipping it would result in If this flipping issmitted , the operation becomes cross-correlation instead of convolution. · Convolution emphasizes specific transformations ledge detection , blurring ecc. ) by combining the flipped Kernel with the input data to generate a transformed output. · Cross Correlation measures similarity between two the signals by directly applying the Kernel to the input without flipping : Multi-scale /MAGE Representation. Objects can appear at different size in an image. A multi-scale representation helps search for objects more effectively. Across scales larger and more generalized retained features are as the scale decreases on the other hand fine details and small , features are lost as the image is blurred and docunsampled. Gaussian Pyramid I a technique used to represent images at multiple scales by progressively blurring the image and reducing its size. There are two main operations : 1) Gaussian Blurring. Smooths high-frequency details (sharp edges). 2) Docnsampling. Reduces the resolution by removing pixels in both horizontal and vertical directions from the image. By repeating these steps , the image

Computer Vision Basics PDF

Document Details

Tags

Related

Summary

Full Transcript