Computer Vision Lecture Notes PDF
Document Details
Uploaded by AccurateZombie5867
Tags
Summary
These lecture notes provide an introduction to computer vision and image processing. They cover topics such as image types (binary, grayscale, color), transformations, and applications like medical imaging and robotics. The document also describes the difference between image processing and computer vision.
Full Transcript
Computer Vision Lecture (1) Course Outline This course is an introductory course to image processing and computer vision. The main objectives of this course are to: Identify difference between image processing & computer vision. Identify different data str...
Computer Vision Lecture (1) Course Outline This course is an introductory course to image processing and computer vision. The main objectives of this course are to: Identify difference between image processing & computer vision. Identify different data structures for image analysis. Identify and understand image features. Introduction Humans are extremely good in understanding the world from visual input alone. For example, Looking at a framed group portrait, human can easily count (and name) all of the people in the picture and even guess at their emotions from their facial appearance. This comes so easy to us that we underestimate how difficult perception it is, and how hard it is for machines. So, what does it mean “To see” ? To know what is where by looking. To discover from images what is present in the world, where things are, what actions are taking place, to predict and anticipate events in the world. However, … Optical Illusion Optical Illusion Optical Illusion Optical Illusion What is Computer Vision ? Computer vision is a subfield of AI focused on getting machines to see as humans do. It seeks to automate tasks that the human visual system can do. In simple words, computer vision is to make computers understand images and video. Goal The goal of computer vision is to bridge the gap between Pixels and Meaning. What we see What a computer sees Difference between image processing & computer vision Image Processing: process image Input: Image Output: Image Computer Vision: try to emulate human vision Input: Image, image sequence, video Output: decision , classification,… Image processing is one part of computer vision. Computer vision system uses the image processing algorithms. Robotics Machine Learning Computer Vision Human Computer Interaction Image Processing Graphics Feature Matching Medical Imaging Recognition Computational Photography Neuroscience Optics Computer vision system Why study computer vision? Vision is useful: Images and video are everywhere! Personal photo albums Movies, news, sports Medical and scientific images Surveillance and security Image Retrieval Identify a picture from a large database or on the web without words. Optical character recognition (OCR) Technology to convert scanned docs to text. License Plates Face detection Many digital cameras now detect faces. Smile detection 3D model building Computing the 3D shape of the world. Internet Photos (“Colosseum”) Reconstructed 3D cameras and points 3D model التحليل الجنائي Forensics Biometrics Fingerprint scanners on many new laptops and Face recognition systems now appear more widely other devices Vision based interaction & Games Nintendo Wii has camera-based IR Kinect tracking built in. Smart cars Self-driving cars Google Waymo Medical imaging 3D imaging Image guided surgery MRI, CT Current state of the art You just saw many examples of current systems: Many of these are less than 5 years old This is a very active research area, and rapidly changing: Many new apps in the next 5 years. Deep learning powering many modern applications. However, … Computer vision is difficult, why? Viewpoint variation Illumination Scale Computer vision is difficult, why? Motion Intra-class variation Background clutter Computer Vision Lecture (2) Image Processing What is an image? An image is a single picture which represents something. It may be a picture of a person, of people or animals, or of an outdoor scene. Digital Camera What is Image Processing? Image processing involves changing the nature of an image in order to either: 1. improve its pictorial information for human interpretation. 2. render it more suitable for autonomous machine perception. Humans like their images to be sharp, clear and detailed. Machines prefer their images to be simple and uncluttered. Examples of (1) Enhancing the edges of an image to make it appear sharper. Examples of (1) Removing noise from an image. Noise is random errors in the image. Examples of (1) Removing motion blur from an image. Examples of (2) Obtaining the edges of an image. This may be necessary for the measurement of objects in an image. Examples of (2) Removing detail from an image for measurement or counting purposes. We could measure the size and shape of the animal without being distracted by unnecessary detail. Why studying Image Processing? The first stage in most computer vision applications is the use of image processing to preprocess the image and convert it into a form suitable for further analysis. While some may consider image processing to be outside the purview of computer vision, most computer vision applications, such as computational photography and even recognition, require care in designing the image processing stages in order to achieve acceptable results. Types of Images There are three basic types of images: 1. Binary images. 2. Grayscale images. 3. Color images. Binary images Each pixel is just black or white. Since there are only two possible values for each pixel, we only need one bit per pixel (0 for black and 1 for white). Grayscale images Each pixel can be represented by exactly one byte (8 bits). Each pixel is a shade of grey, normally from 0 (black) to 255 (white). Color images (RGB) Each pixel has a particular color; that color being described by the amount of red, green and blue in it. If each of these components has a range from 0 to 255, this gives a total of 2563 = 16,777,216 different possible colors in the image. Since the total number of bits required for each pixel is 3 × 8 = 24, such images are also called 24-bit color images. Such an image may be considered as consisting of a stack of three matrices; representing the red, green and blue values for each pixel. This means that for every pixel there correspond three values. Color images (RGB) Digital Image An image can be considered as a two dimensional function, where the function values give the brightness of the image at any given point. A digital image can be considered as a large array of sampled points, each of which has a particular quantized brightness; these points are the pixels which constitute the digital image. The pixels surrounding a given pixel constitute its neighborhood. Digital Image Two important terms of digital (discrete) images: Sample means converting the 2D space on a regular grid. Quantize means rounding each sample to nearest integer. Digital Image A grid (matrix) of intensity values. 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 20 0 255 255 255 255 255 255 255 255 255 255 75 75 75 255 255 255 255 255 255 255 255 75 95 95 75 255 255 255 255 255 255 = 255 255 96 127 145 175 255 255 255 255 255 255 255 255 127 145 175 175 175 255 255 255 255 255 255 255 127 145 200 200 175 175 95 255 255 255 255 255 127 145 200 200 175 175 95 47 255 255 255 255 127 145 145 175 127 127 95 47 255 255 255 255 74 127 127 127 95 95 95 47 255 255 255 255 255 74 74 74 74 74 74 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 (common to use one byte per value: 0 = black, 255 = white) Images as functions Images as functions A grayscale image can be considered as a function from 𝑅2 to 𝑅. 𝑓(𝑥, 𝑦) gives the intensity at position (𝑥, 𝑦). Pixel value (or intensity): [0,255]. Images as functions A color image is just three functions pasted together. We can write this as a “vector-valued” function: r ( x, y ) f ( x, y ) = g ( x, y ) b( x, y ) Computer Vision Lecture 3 Image Processing Image transformation Image processing operations are divided to three classes based on the information required to perform the transformation: 1. Point operations: A pixel's grey value is changed without any knowledge of its surrounds. 2. Neighborhood processing: To change the grey level of a given pixel we need to know the value of the grey levels in a small neighborhood of pixels around the given pixel. 3. Transforms: the entire image is processed as a single large block. Point Processing The simplest kinds of image processing transforms are point operations, where each output pixel’s value depends on only the corresponding input pixel value. Although point operations are the simplest, they contain some of the most powerful and widely used of all image processing operations. They are especially useful in image pre-processing, where an image is required to be modified before the main job is attempted. Arithmetic operations of grayscale images These operations act by applying a simple function 𝑦 = 𝑓 𝑥 to each gray value in the image. Thus, 𝑓 𝑥 is a function in the range from 0 to 255. Simple functions include adding or subtract a constant value to each pixel: 𝑦 =𝑥±𝐶 or multiplying each pixel by a constant: 𝑦 = 𝐶𝑥 Arithmetic operations of grayscale images In each case, we may have to vary the output slightly in order to ensure that the results are integers in the range of 0 to 255. We can do this by limiting the values by setting: 255 𝑖𝑓 𝑦 > 255, 𝑦←ቊ 0 𝑖𝑓 𝑦 < 0. Arithmetic operations of grayscale images Thus, when adding 128, all gray values of 127 or greater will be mapped to 255. And when subtracting 128, all gray values of 128 or less will be mapped to 0. In general, adding a constant will lighten an image, and subtracting a constant will darken it. Arithmetic operations of grayscale images Arithmetic operations of grayscale images Arithmetic operations of grayscale images g (x,y) = f (x,y) + 20 g (x,y) = f (-x,y) Arithmetic operations of grayscale images Lightening or darkening of an image can be performed by multiplication. Note that b3, although darker than the original, is still quite clear, whereas a lot of information has been lost by the subtraction process, as can be seen in image b2. This is because in image b2 all pixels with gray values 128 or less have become zero. Arithmetic operations of grayscale images Arithmetic operations of grayscale images Complements The complement of a grayscale image is its photographic negative. Complements Interesting special effects can be obtained by complementing only part of the image. For example: by taking the complement of pixels of gray value 128 or less, and leaving other pixels untouched. Or taking the complement of pixels which are 128 or greater, and leave other pixels untouched. The effect of these functions is called solarization. Complements Histograms Given a grayscale image, its histogram consists of the histogram of its gray levels; that is, a graph indicating the number of times each gray level occurs in the image. We can infer a great deal about the appearance of an image from its histogram, as the following examples indicate: In a dark image, the gray levels (and hence the histogram) would be clustered at the lower (left) end. In a uniformly bright image, the gray levels would be clustered at the upper (right) end. In a well contrasted image, the gray levels would be well spread out over much of the range. Histograms Histograms From the result shown in the previous figure, and since the gray values are all clustered together in the center of the histogram, we would expect the image to be poorly contrasted, as indeed it is. Given a poorly contrasted image, we would like to enhance its contrast, by spreading out its histogram. There are two ways of doing this: 1. Histogram stretching (Contrast stretching). 2. Histogram equalization. Histogram stretching Suppose a 4-bit grayscale image with the histogram shown in the next figure, associated with a table of the numbers 𝑛𝑖 of gray values (𝑛 = 360): We can stretch the gray levels in the center of the range out by applying the linear function shown at the right in the same figure. This function has the effect of stretching the gray levels 5 − 9 to gray levels 2 − 14 according to the equation: 14 − 2 𝑗= 𝑖−5 +2 9−5 Histogram stretching Histogram stretching Where 𝑖 is the original grey level and 𝑗 is its result after the transformation. Gray levels outside this range are either left alone (as in this case) or transformed according to the linear functions at the ends of the graph above. This yields: Histogram stretching And the corresponding histogram indicates an image with greater contrast than the original: Histogram stretching Histogram equalization The trouble with the method of histogram stretching is that they require user input. Sometimes a better approach is provided by histogram equalization, which is an entirely automatic procedure. Suppose a 4-bit grayscale image has the histogram shown in the next figure, associated with a table of the numbers of 𝑛𝑖 gray values (𝑛 = 360): Histogram equalization Histogram equalization We would expect this image to be uniformly bright, with a few dark dots on it. To equalize this histogram, we form running totals of the 𝑛𝑖 , 15 1 and multiply each by =. 360 24 15 is 24 − 1, while 360 is the total number of pixels in the image. Histogram equalization We now have the following transformation of grey values, obtained by reading the first and last columns in the above table: and the histogram of the 𝑗 values is shown in next figure. This is far more spread out than the original histogram, and so the resulting image should exhibit greater contrast. Histogram equalization And again: After histogram equalization: Another example: After histogram equalization: Computer Vision Lecture 4 Types of Images There are three basic types of images: 1. Binary images. 2. Grayscale images. 3. Color images. Switching between formats RGB to Grayscale: 𝐼𝑅 𝑝 + 𝐼𝐺 𝑝 + 𝐼𝐵 𝑝 𝐼𝑔𝑟𝑒𝑦 𝑝 = 3 where 𝐼𝑅 , 𝐼𝐺 , 𝐼𝐵 are the Red, Green, and Blue channels. Switching between formats Grayscale to binary: - Simplest way is to apply a threshold 𝑑: 1 𝑖𝑓 𝐼𝑔𝑟𝑒𝑦 𝑝 ≥ 𝑑 𝐼𝑏𝑖𝑛 𝑝 = ൞ 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Disadvantages of using threshold Throws away information. Threshold must generally be chosen by hand. Not robust: different thresholds usually required for different images. How to choose a Threshold Intensity histogram can be useful. Often it is necessary to adaptively choose a threshold for a particular application. Geometric transformations Such as: Rotation. Resizing or scaling an image. Geometric transformations Involve 2 steps: Spatial transformation: relocating known pixel values Interpolation: to determine image values at integer pixel locations of transformed image. Numerous interpolation schemes exist 2 simple methods are: Nearest neighbor: assign value of nearest known point. Bilinear interpolation. Bilinear Interpolation Interpolating the intensity at the location of 𝐼(𝑥’, 𝑦’), shown in red, from that of the 𝐼(𝑥, 𝑦), 𝐼(𝑥 + 1, 𝑦), 𝐼(𝑥, 𝑦 + 1), and 𝐼(𝑥 + 1, 𝑦 + 1), shown in yellow. Nearest neighbor vs. bilinear Interpolation Resizing an Image Step 1: Determine new location of each pixel after scaling by 𝑠𝑥 in x-direction and 𝑠𝑦 in y-direction. Step 2: Interpolate value of new integer pixel locations, for example take a weighted average of known values in the vicinity of the new pixel region. Rotating an Image Step 1: Determine new location of each pixel (Rotation about origin and angle 𝜃). Step 2: Interpolate value of new integer pixel locations, for example using bilinear interpolation. Neighborhood Processing Neighborhood Processing We have seen in the previous lecture that an image can be modified by applying a particular function to each pixel value. Neighborhood processing may be considered as an extension of this, where a function is applied to a neighborhood of each pixel. The idea is to move a “mask” over the given image. The mask is a rectangle (usually with sides of odd length) or other shape. Neighborhood Processing As we do this, we create a new image whose pixels have grey values calculated from the grey values under the mask. The combination of mask and function is called a Filter. If the function by which the new grey value is calculated is a linear function of all the grey values in the mask, then the filter is called a linear filter. Image Filtering Forming a new image whose pixels are a combination of the original pixels. Modify the pixels in an image based on some function of a local neighborhood of each pixel. 10 5 3 Some function 4 5 1 7 1 1 7 Local image data Modified image data Why using Filters? To get useful information from images such as extracting edges or contours (to understand shape). To enhance the image by removing noise or sharpening image. A key operator in Convolutional Neural Networks. Linear filtering Replace each pixel by a linear combination (a weighted sum) of its neighbors. The prescription for the linear combination is called the “kernel” (or “mask”, “filter”). Linear filtering is implemented in the spatial domain by convolving a filter kernel across an image. 10 5 3 0 0 0 4 6 1 0 0.5 0 8 1 1 8 0 1 0.5 Local image data kernel Modified image data 2D Convolution Convolve image I with filter kernel K: Flip rows and columns of kernel. Multiply each pixel in range of kernel by the corresponding element of flipped kernel, sum all these products and write to center pixel. Do this for every pixel in the image. 2D Convolution 2D Convolution -1 -1 0 -1 0 1 1 0 1 11 2 2 2 0 1 4 3 0 2 2 2 0 1 1 0 2D Convolution -1 -1 0 -1 0 1 1 3 0 11 12 2 2 0 1 4 3 0 2 2 2 0 1 1 0 2D Convolution -1 -1 0 -1 0 1 1 3 4 0 1 1 2 12 2 0 1 4 3 0 2 2 2 0 1 1 0 2D Convolution -1 -1 0 -1 0 1 1 3 4 4 1 02 12 12 0 1 4 3 0 2 2 2 0 1 1 0 2D Convolution -1 -1 0 -1 0 1 1 3 4 4 2 1 2 02 12 1 0 1 4 3 0 2 2 2 0 1 1 0 2D Convolution -1 -1 0 -1 0 1 1 3 4 4 2 0 1 2 2 02 1 1 0 1 4 3 0 2 2 2 0 1 1 0 2D Convolution -1 -1 0 1 3 4 4 2 0 -1 0 11 2 2 2 1 0 1 10 1 4 3 0 2 2 2 0 1 1 0 2D Convolution -1 -1 0 1 3 4 4 2 0 -1 01 12 2 2 1 3 0 10 11 4 3 0 2 2 2 0 1 1 0 2D Convolution -1 -1 0 1 3 4 4 2 0 -11 0 2 1 2 2 1 3 6 00 11 14 3 0 2 2 2 0 1 1 0 2D Convolution -1 -1 0 1 3 4 4 2 0 1 -12 0 2 1 2 1 3 6 7 0 01 14 13 0 2 2 2 0 1 1 0 When calculating the convolution of an image, we assume the image is padded with zeros around its borders. Computer Vision Lecture 5 2D Convolution Let 𝐹 be the image, 𝐻 be the kernel (of size 2k+1 x 2k+1), and 𝐺 be the output image. Linear filters: examples 0 0 0 * 0 1 0 0 0 0 = Original Identical image Linear filters: examples 0 0 0 * 1 0 0 0 0 0 = Original Shifted left by 1 pixel Convolution Filtering Two examples of convolution filtering: Blurring. Sharpening. Blurring Blur means not clear or smooth. Blurring is a Low-pass filtering. Goal: noise reduction. Blurring Matrix of 1’s divided by area of kernel (box filter). 1 1 1 * 1 1 1 1 1 1 = Original Blur original 3x3 Noise reduced 5x5 9x9 small area blended into background 15x15 35x35 border effects Blurring 15x15 smoothing thresholding Sharpening Goal: highlight fine detail in an image. Convolve image with sharpening kernel: Isharp will typically lie outside [0,1], so must truncate or rescale. What does blurring take away? – = original smoothed (5x5) detail (This “detail extraction” operation Let’s add it back: is also called a high-pass filter) +α = original detail sharpened 2D Convolution is Expensive 2D convolution is very useful. However, it is computationally expensive because convolving an image with an nxm kernel takes nm products per pixel. Separable convolution kernels A 2D convolution is separable if the kernel K can be written as a convolution of 2 vectors. A separable convolution can be written as two 1D convolutions: Separable kernel What does it mean to convolve 2 vectors in 2D? k1 -1 0 1 * 1 -1 0 1 1 = -1 0 1 K = k 1* k 2T 1 -1 0 1 k 2T K Separable kernel Separable kernel Separable kernel Separable kernel Separable kernel Separable kernel Separable kernel Separable kernel Separable kernel Separable kernel Separable kernel Separable convolution kernels Separable convolutions are much more efficient to compute. Convolving an image with an nxm kernel takes nm products per pixel, performing the same convolution separably requires only n+m. Computer Vision Lecture 6 Edge Detection Edge Detection Convert a 2D image into a set of curves. Extracts salient features of the scene. Goal: Identify visual changes (discontinuities) in an image. Why do we care about edges? Edges (lines) convey meaning. Useful when trying to: Extract information, Recognize objects, reconstruct scenes, segment objects, determine boundaries, identify features, … Causes of visual edges Edges are caused by a variety of factors: surface normal discontinuity depth discontinuity surface color discontinuity illumination discontinuity Closeup of edges Closeup of edges Closeup of edges Closeup of edges Images as functions… Edges look like steep cliffs. Images as functions… Consider a single row or column of the image. Plotting intensity as a function of position gives a signal. So, Where is the edge? Characterizing edges An edge is a place of rapid change in the image intensity function. intensity function image (along horizontal scanline) first derivative edges correspond to extrema of derivative Image derivatives How can we differentiate a digital image F[x,y]? Could we implement this differentiation as a linear filter? : : Differentiating an Image An image is differentiated by convolution with a derivative kernel. One such derivative kernel is the 3x3 Prewitt operator. Image gradient An image gradient is a directional change in the intensity or color in an image. At each image point, the gradient vector points in the direction of largest possible intensity increase, and the length of the gradient vector corresponds to the rate of change in that direction. Image gradient Gradient images are created from the original image (generally by convolving with a filter) for this purpose. Each pixel of a gradient image measures the change in intensity of that same point in the original image, in a given direction. To get the full range of direction, gradient images in the x and y directions are computed. Image gradient The gradient of an image is: The gradient points in the direction of most rapid increase in intensity. Image gradient The edge strength is given by the gradient magnitude: The gradient direction is given by: Differentiation example Let’s look at how this works. Differentiation example Let’s look at how this works. Differentiation example Differentiation example Differentiation example Differentiation example Differentiation example Differentiation example Differentiation example Differentiation example And so on … Differentiation example Shading the output: Differentiation example Have determined Use transpose of the same kernel to determine: Differentiation example Use same procedure as for x direction: Differentiation example Shading the output: Differentiation example Original image A gradient image Differentiation example Differentiation example Differentiation example Differentiation example Differentiation example Differentiation example Image gradient A gradient image in the x A gradient image in the y An intensity direction measuring direction measuring image of a cat horizontal change in vertical change in intensity intensity Image gradient Gray pixels have a small gradient; black or white pixels have a large gradient. Intensity profile x Intensity derivative Intensity Intensity derivative With little noise… x Effects of noise Noisy input image Where is the edge? Solution: smooth first f h f*h To find edges, look for peaks in Example Solution ? … Computer Vision Lecture 7 Example original image Finding edges smoothed gradient magnitude Finding edges where is the edge? Thresholding Solution ? … Criteria for Edge Detection The general criteria for edge detection include: Detection of edge with low error rate, which means that the detection should accurately catch as many edges shown in the image as possible. The detected edge point should accurately localize on the center of the edge. A given edge in the image should only be marked once, and where possible, image noise should not create false edges. Canny Edge Detector Canny edge detector Canny edge detection is a technique to extract useful structural information from different vision objects. It is probably the most widely used edge detector in computer vision. Example Original RGB image Example Grayscale image Canny edge detector Steps: 1. Convolve image with Gaussian filter to smooth the image and remove noise. Example Image with Gaussian Filter Canny edge detector Steps: 1. Convolve image with Gaussian filter to smooth the image and remove noise. 2. Find magnitude and orientation of gradient. Compute Gradients X Derivative of Gaussian Y Derivative of Gaussian Compute Gradient Magnitude sqrt( XDerivOfGaussian.^2 + YDerivOfGaussian.^2 ) = gradient magnitude Compute Gradient Orientation Threshold magnitude at minimum level. Unthresholded: Get orientation via theta = atan2(yDeriv, xDeriv). The edge direction angle is rounded to one of four angles representing vertical, horizontal and the two diagonals (0°, 45°, 90° and 135°). An edge direction falling in each color region will be set to a specific angle values, for example θ in [0°, 22.5°] maps to 0°. Canny edge detector Steps: 1. Convolve image with Gaussian filter to smooth the image and remove noise. 2. Find magnitude and orientation of gradient. 3. Perform Non-maximum suppression: (Thin multi-pixel wide “ridges” to single pixel width). Non-maximum suppression Non-maximum suppression is an edge thinning technique. After applying gradient calculation, the edge extracted from the gradient value is still quite blurred. There should only be one accurate response to the edge. Thus, non-maximum suppression can help to suppress all the gradient values (by setting them to 0) except the local maxima, which indicate locations with the sharpest change of intensity value. Non-maximum suppression The algorithm for each pixel in the gradient image is: 1. Compare the edge strength of the current pixel with the edge strength of the pixel in the positive and negative gradient directions. 2. If the edge strength of the current pixel is the largest compared to the other pixels in the mask with the same direction (e.g., a pixel that is pointing in the y-direction will be compared to the pixel above and below it in the vertical axis), the value will be preserved. Otherwise, the value will be suppressed. Non-maximum suppression for each orientation At pixel q: We have a maximum if the value is larger than those at both p and at r. Interpolate along gradient direction to get these values (interpolate pixels p and r). Before Non-max Suppression After Non-max Suppression Canny edge detector Steps: 1. Convolve image with Gaussian filter to smooth the image and remove noise. 2. Find magnitude and orientation of gradient. 3. Perform Non-maximum suppression: (Thin multi-pixel wide “ridges” to single pixel width). 4. ‘Hysteresis’ Thresholding. ‘Hysteresis’ thresholding 1. Define two thresholds: high and low Gradient magnitude > high threshold = strong edge. Gradient magnitude < low threshold = noise. Gradient magnitude value In between = weak edge. 2. Accept all weak edges that are “connected” to strong edges. Final Canny Edges Computer Vision Lecture 8 Feature extraction: Corners Corner Detection Corner detection is an approach used within computer vision systems to extract certain kinds of features in an image. It is a key step in many image processing and computer vision applications, such as: Object detection/recognition. Motion tracking. Stitching of panoramic photographs. Image registration (especially in medical imaging). Object detection/recognition It is a technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are rotated. Humans can even recognize objects that are partially obstructed from view. This task is still a challenge for computer vision systems. Motion tracking Motion tracking is the process of detecting a change in the position of an object relative to its surroundings or a change in the surroundings relative to an object. It is the process of recording the movement of objects or people. Panorama stitching Image stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. We have two images – how do we combine them? Panorama stitching Step 1: extract features. Panorama stitching Step 1: extract features. Step 2: match features. Panorama stitching Step 1: extract features. Step 2: match features. Step 3: align images. Image matching Image matching – Harder case Image matching – Harder still? Image matching – Answer below (look for tiny colored squares…) Image registration Image registration is the process of overlaying images (two or more) of the same scene taken at different times, from different viewpoints, and/or by different sensors. The registration geometrically align two images (the reference and sensed images). Image registration Image registration Feature Matching Feature Matching Invariant local features Feature Descriptors Invariant local features Find features that are invariant to transformations: geometric invariance: translation, rotation, scale, … photometric invariance: brightness, exposure, … A feature descriptor is an algorithm which takes an image and outputs feature descriptors/feature vectors. Local features: main components 1. Feature detection: find it. 2. Feature descriptor: represent it. 3. Feature matching: match it. 4. Feature tracking: track it, when motion. What is “Good Feature” ? … Computer Vision Lecture 9 Local features: main components 1. Feature detection: find it. 2. Feature descriptor: represent it. 3. Feature matching: match it. 4. Feature tracking: track it, when motion. Local features: main components 1. Detection: Identify the interest points 2. Description: Extract vector feature descriptor surrounding each interest point. 3. Matching: Determine correspondence x1 = [ x1(1) ,, xd(1) ] between descriptors in two views x2 = [ x1( 2) ,, xd( 2) ] What is “Good Feature” ? … What makes a good feature? Uniqueness… Look for image regions that are unusual Lead to unambiguous matches in other images But, How to define “unusual”? Local measures of uniqueness Suppose we only consider a small window of pixels. What defines whether a feature is a good or bad candidate? Local measures of uniqueness How does the window change when you shift it? Shifting the window in any direction causes a big change “flat” region: “edge”: “corner”: no change in all directions no change along the significant change in edge direction all directions Finding Corners Corner is the place in the image where two different strong edge directions are represented. Corners contain strong gradients AND gradients oriented in more than one direction. Edge detectors perform poorly at corners. Harris corner detection Consider shifting the window 𝑊 by 𝑢, 𝑣. Compare each pixel before and after by summing up the squared differences (SSD). This defines an SSD “error” 𝐸(𝑢, 𝑣): (u,v) W Image 𝐼 Harris corner detection (u,v) W Image 𝐼 Harris corner detection (u,v) W Image 𝐼 Harris corner detection Horizontal edge: Harris corner detection Vertical edge: Harris corner detection The eigenvalues of M will be proportional to the principle curvatures of the image surface. Classification of image points using eigenvalues of M 2 “Edge” 2 >> 1 “Corner” 1 and 2 are large, 1 ~ 2 ; E increases in all directions 1 and 2 are small; E is almost constant “Flat” “Edge” in all directions region 1 >> 2 1 Harris corner detection Harris noted that exact computation of the eigenvalues is computationally expensive, and instead suggested the following: 2 𝐶 𝑥, 𝑦 = det 𝑀 − 𝑘 trace 𝑀 det 𝑀 = 𝜆1 𝜆2 = 𝐴𝐶 − 𝐵2 trace 𝑀 = 𝜆1 + 𝜆2 = 𝐴 + 𝐶 Where 𝑘 is a tunable sensitivity parameter. Harris Corners – Why so complicated? Can’t we just check for regions with lots of gradients in the x and y directions? Current Window No! A diagonal line would satisfy that criteria. Computer Vision Lecture 10 Introduction to Computer Vision Can we use photographs to create a 360 panorama? In order to figure this out, we need to learn what a camera is … Let’s design a camera Idea 1: Put a piece of film in front of an object. Do we get a reasonable image? No !! This is a bad camera. Let’s design a camera Idea 2: Add a barrier to block off most of the rays. This reduces blurring. The opening (Pinhole in barrier) is known as the aperture. Pinhole camera Home-made Pinhole camera Why so blurring ?!! Adding a lens A lens focuses light onto the film. There is a specific distance at which objects are “in focus”. Other points project to a “circle of confusion” in the image. Circle of confusion Precise focus is only possible at an exact distance from the lens. At that distance, a point object will produce a point image. Otherwise, a point object will produce a blur spot shaped like the aperture, typically and approximately a circle. How Points in 3D Space map to an Image ? How Points in 3D Space map to an Image ? It is traditional to draw the image plane in front of the focal point. How Points in 3D Space map to an Image ? So, the pin-hole camera model is typically drawn as shown. How Points in 3D Space map to an Image ? The 3D point located at depth 𝑍 and a distance 𝑋 from the optical axis in 3D space for a lens with focal length 𝑓 maps to: 𝑓 𝑥= 𝑋 𝑍 Lenses So, What is the relation between: The focal length (f), The distance of the object from the optical center (D), The distance at which the object will be in focus (D’)? Lenses Any point satisfying this equation is in focus: 1 1 1 ′ + = 𝐷 𝐷 𝑓 Parallel rays Parallel rays All parallel rays converge to one point on a plane located at the focal length f. The eye The human eye is a camera: Iris: colored annulus with radial muscles. Pupil: the hole (aperture) whose size is controlled by the iris. Photoreceptor cells (rods and cones) in the retina: construct the film. Digital camera A digital camera replaces film with a sensor array. Each cell in the array is a Charge Coupled Device: light-sensitive integrated circuit that stores and displays the data for an image in such a way that each pixel in the image is converted into an electrical charge with intensity related to a color in the color spectrum.