Introduction to Computer Vision - Fall 2024 - Toronto Metropolitan University PDF
Document Details
Toronto Metropolitan University
2024
Toronto Metropolitan University
Dr. Omar Falou
Tags
Related
Summary
These are lecture notes from a computer vision class at Toronto Metropolitan University, covering topics like image transformations, image warping, and other concepts crucial to computer vision.
Full Transcript
CPS834/CPS8307 Introduction to Computer Vision Dr. Omar Falou Toronto Metropolitan University Fall 2024 Image transformations and image warping Reading Szeliski: Chapter 3.6 Image alignment Image alignment Why don’t these image line up exact...
CPS834/CPS8307 Introduction to Computer Vision Dr. Omar Falou Toronto Metropolitan University Fall 2024 Image transformations and image warping Reading Szeliski: Chapter 3.6 Image alignment Image alignment Why don’t these image line up exactly? What is the geometric relationship between these two images? ? Answer: Similarity transformation (translation, rotation, uniform scale) What is the geometric relationship between these two images? ? What is the geometric relationship between these two images? Very important for creating mosaics! First, we need to know what this transformation is. Second, we need to figure out how to compute it using feature matches. Image Warping image filtering: change range of image g(x) = h(f(x)) f g h x x image warping: change domain of image f g(x) = f(h(x)) g h x x Richard Szeliski Image Stitching 9 Image Warping image filtering: change range of image g(x) = h(f(x)) f g h image warping: change domain of image f g(x) = f(h(x)) g h Richard Szeliski Image Stitching 10 Parametric (global) warping Examples of parametric warps: aspect translation rotation Richard Szeliski Image Stitching 11 Parametric (global) warping T p = (x,y) p’ = (x’,y’) Transformation T is a coordinate-changing machine: p’ = T(p) What does it mean that T is global? – Is the same for any point p – can be described by just a few numbers (parameters) Let’s consider linear xforms (can be represented by a 2x2 matrix): Common linear transformations Uniform scaling by s: (0,0) (0,0) What is the inverse? Common linear transformations Rotation by angle θ (about the origin) θ (0,0) (0,0) What is the inverse? For rotations: 2x2 Matrices What types of transformations can be represented with a 2x2 matrix? 2D mirror across Y axis? 2D mirror across line y = x? 2x2 Matrices What types of transformations can be represented with a 2x2 matrix? 2D mirror across Y axis? 2D mirror across line y = x? 2x2 Matrices What types of transformations can be represented with a 2x2 matrix? 2D Translation? 2x2 Matrices What types of transformations can be represented with a 2x2 matrix? 2D Translation? NO! Translation is not a linear operation on 2D coordinates All 2D Linear Transformations Linear transformations are combinations of … – Scale, – Rotation, x' a b x – Shear, and y' = c d y – Mirror Properties of linear transformations: – Origin maps to origin – Lines map to lines – Parallel lines remain parallel – Ratios are preserved – Closed under composition x' = a b e f i j x y ' c d g h k l y Homogeneous coordinates (x, y, w) w Trick: add one more coordinate: homogeneous plane (x/w, y/w, 1) w=1 x homogeneous image coordinates y Converting from homogeneous coordinates Translation Solution: homogeneous coordinates to the rescue Affine transformations any transformation represented by a 3x3 matrix with last row [ 0 0 1 ] we call an affine transformation Basic affine transformations x ' 1 0 t x x x ' s x 0 0 x y ' = 0 1 t y y ' = 0 sy 0 y y 1 0 0 1 1 1 0 0 1 1 Translate Scale x' cos − sin 0 x x ' 1 shx 0 x y ' = sin 0 y y ' = sh cos y 1 0 y 1 0 0 1 1 1 0 0 1 1 2D in-plane rotation Shear Affine transformations Affine transformations are combinations of … – Linear transformations, and x' a b c x – Translations y ' = d e f y w 0 0 1 w Properties of affine transformations: – Origin does not necessarily map to origin – Lines map to lines – Parallel lines remain parallel – Ratios are preserved – Closed under composition (when you apply a specific operation to elements within the set, the result also belongs to the set.) Where do we go from here? what happens when we mess with this row? affine transformation Projective Transformations aka Homographies aka Planar Perspective Maps Called a homography (or planar perspective map) Homographies Points at infinity Image warping with homographies image plane in front image plane below black area where no pixel maps to Properties of projective transformations Properties of projective transformations: – Origin does not necessarily map to origin – Lines map to lines – Parallel lines do not necessarily remain parallel – Ratios are not preserved – Closed under composition 2D image transformations These transformations are a nested set of groups Closed under composition and inverse is a member Implementing image warping Given a coordinate xform (x’,y’) = T(x,y) and a source image f(x,y), how do we compute a transformed image g(x’,y’) = f(T(x,y))? T(x,y) y y’ x x’ f(x,y) g(x’,y’) Forward Warping Send each pixel (x,y) to its corresponding location (x’,y’) = T(x,y) in g(x’,y’) What if pixel lands “between” two pixels? T(x,y) y y’ x x’ f(x,y) g(x’,y’) Forward Warping Send each pixel (x,y) to its corresponding location (x’,y’) = T(x,y) in g(x’,y’) What if pixel lands “between” two pixels? Answer: add “contribution” to several pixels, normalize later (splatting) Can still result in holes T(x,y) y y’ x x’ f(x,y) g(x’,y’) Inverse Warping Get each pixel g(x’,y’) from its corresponding location (x,y) = T-1(x,y) in f(x,y) Requires taking the inverse of the transform What if pixel comes from “between” two pixels? T-1(x,y) y y’ x x’ f(x,y) g(x’,y’) Inverse Warping Get each pixel g(x’,y’) from its corresponding location (x,y) = T-1(x’,y’) in f(x,y) What if pixel comes from “between” two pixels? Answer: resample color value from interpolated (prefiltered) source image T-1(x,y) y y’ x x’ f(x,y) g(x’,y’) Interpolation Possible interpolation filters: – nearest neighbor – bilinear – bicubic – sinc Needed to prevent “jaggies” and “texture crawl” Image Resampling & Interpolation Reading Szeliski 2.3.1, 3.4-3.5 Image scaling This image is too big to fit on the screen. How can we generate a half-sized version? Source: S. Seitz Image sub-sampling 1/8 1/4 Throw away every other row and column to create a 1/2 size image - called image sub-sampling Source: S. Seitz Image sub-sampling 1/2 1/4 (2x zoom) 1/8 (4x zoom) Why does this look so crufty? Source: S. Seitz Image sub-sampling – another example Source: F. Durand Even worse for synthetic images Source: L. Zhang Aliasing Occurs when your sampling rate is not high enough to capture the amount of detail in your image Can give you the wrong signal/image—an alias To do sampling right, need to understand the structure of your signal/image Enter Monsieur Fourier… – “But what is the Fourier Transform? A visual introduction.” https://www.youtube.com/watch?v=spUNpyF58BY To avoid aliasing: – sampling rate ≥ 2 * max frequency in the image said another way: ≥ two samples per cycle – This minimum sampling rate is called the Nyquist rate Source: L. Zhang Wagon-wheel effect (See http://www.michaelbach.de/ot/mot-wagonWheel/index.html) Source: L. Zhang Wagon-wheel effect https://en.wikipedia.org/wiki/Wagon-wheel_effect Temporal aliasing – helicopter blades https://www.youtube.com/watch?v=yr3ngmRuGUc Aliasing in practice Nyquist limit – 2D example Good sampling Bad sampling Aliasing When downsampling by a factor of two – Original image has frequencies that are too high How can we fix this? Gaussian pre-filtering G 1/8 G 1/4 Gaussian 1/2 Solution: filter the image, then subsample Source: S. Seitz Subsampling with Gaussian pre-filtering Gaussian 1/2 G 1/4 G 1/8 Solution: filter the image, then subsample Source: S. Seitz Compare with... 1/2 1/4 (2x zoom) 1/8 (4x zoom) Source: S. Seitz Gaussian pre- filtering Solution: filter F0 F1 F2 the image, then blur subsample blur subsample … subsample F0 * H F1 * H Gaussian pyramid F0 F1 F2 blur subsample blur subsample … F0 * H F1 * H Gaussian pyramids [Burt and Adelson, 1983] In computer graphics, a mip map [Williams, 1983] A precursor to wavelet transform Gaussian Pyramids have all sorts of applications in computer vision Source: S. Seitz Back to the checkerboard What should happen when you make the checkerboard smaller and smaller? Image turns grey! (Average of black and white squares, because each pixel contains both.) Naïve subsampling Proper prefiltering (“antialiasing”) Upsampling This image is too small for this screen: How can we make it 10 times as big? Simplest approach: repeat each row and column 10 times (“Nearest neighbor interpolation”) Image interpolation d = 1 in this example 1 2 3 4 5 Recall that a digital images is formed as follows: It is a discrete point-sampling of a continuous function If we could somehow reconstruct the original function, any new image could be generated, at any resolution and scale Adapted from: S. Seitz Image interpolation d = 1 in this example 1 2 3 4 5 Recall that a digital images is formed as follows: It is a discrete point-sampling of a continuous function If we could somehow reconstruct the original function, any new image could be generated, at any resolution and scale Adapted from: S. Seitz Image interpolation d = 1 in this 1 example 1 2 2.5 3 4 5 What if we don’t know ? Guess an approximation: Can be done in a principled way: filtering Convert to a continuous function: Reconstruct by convolution with a reconstruction filter, h Adapted from: S. Seitz Image interpolation “Ideal” reconstruction Nearest-neighbor interpolation Linear interpolation Gaussian reconstruction Source: B. Curless Reconstruction filters What does the 2D version of this hat function look like? performs (tent function) performs linear interpolation bilinear interpolation Often implemented without cross-correlation E.g., http://en.wikipedia.org/wiki/Bilinear_interpolation Better filters give better resampled images Bicubic is common choice Image interpolation Original image: x 10 Nearest-neighbor interpolation Bilinear interpolation Bicubic interpolation Raster-to-vector graphics Depixelating Pixel Art Modern methods From Romano, et al: RAISR: Rapid and Accurate Image Super Resolution, https://arxiv.org/abs/1606.01299 Super-resolution with multiple images Can do better upsampling if you have multiple images of the scene taken with small (subpixel) shifts Some cellphone cameras (like the Google Pixel line) capture a burst of photos Can we use that burst for upsampling? Google Pixel 3 Super Res Zoom Effect of hand tremor as seen in a cropped burst of photos, Example photo with and without super res zoom (smart after global alignment burst align and merge) https://ai.googleblog.com/2018/10/see-better-and-further-with-super-res.html Summary Key points: – Subsampling an image can cause aliasing. Better is to blur (“pre-filter”) to remote high frequencies then downsample – If you repeatedly blur and downsample by 2x, you get a Gaussian pyramid – Upsampling an image requires interpolation. This can be posed as convolution with a “reconstruction kernel” Edge detection From Sandlot Science Edge detection Convert a 2D image into a set of curves – Extracts salient features of the scene – More compact than pixels Origin of edges surface normal discontinuity depth discontinuity surface color discontinuity illumination discontinuity Edges are caused by a variety of factors Images as functions… Edges look like steep cliffs Characterizing edges An edge is a place of rapid change in the image intensity function intensity function (along horizontal scanline) image first derivative edges correspond to extrema of derivative Source: L. Lazebnik Image derivatives How can we differentiate a digital image F[x,y]? – Option 1: reconstruct a continuous image, f, then compute the derivative – Option 2: take discrete derivative (finite difference) How would you implement this as a linear filter? : 1 -1 : -1 1 Source: S. Seitz Image gradient The gradient of an image: The gradient points in the direction of most rapid increase in intensity The edge strength is given by the gradient magnitude: The gradient direction is given by: Source: Steve Seitz Image gradient Source: L. Lazebnik Effects of noise Noisy input image Where is the edge? Source: S. Seitz Solution: smooth first f h f*h To find edges, look for peaks in Source: S. Seitz Associative property of convolution Differentiation is convolution, and convolution is associative: This saves us one operation: f Source: S. Seitz The 1D Gaussian and its derivatives 2D edge detection filters derivative of Gaussian (x) Gaussian Derivative of Gaussian filter x-direction y-direction The Sobel operator Common approximation of derivative of Gaussian -1 0 1 1 2 1 -2 0 2 0 0 0 -1 0 1 -1 -2 -1 The standard definition of the Sobel operator omits the 1/8 term – doesn’t make a difference for edge detection – the 1/8 term is needed to get the right gradient magnitude Sobel operator: example Grayscale test image of brick wall Normalized x-gradient from and bike rack Sobel operator Normalized y-gradient from Normalized gradient Sobel operator magnitude from Sobel operator Source: Wikipedia Canny Edge Detector original image Demo: http://bigwww.epfl.ch/demo/ip/demos/edgeDetector/ Image credit: Joseph Redmon Finding edges smoothed gradient magnitude Finding edges smoothed gradient magnitude Finding edges where is the edge? Get Orientation at Each Pixel Get orientation (below, threshold at minimum gradient magnitude) Theta: inverse tangent (Gy, Gx) 360 Gradient orientation angle First derivative in the horizontal direction (Gx) and the vertical direction (Gy). 0 Non-maximum supression Check if pixel is local maximum along gradient direction – requires interpolating pixels p and r (if values are not known) Before Non-max Suppression After Non-max Suppression Thresholding edges Still some noise Only want strong edges 2 thresholds, 3 cases R > T: strong edge R < T but R > t: weak edge R < t: no edge Connecting edges Strong edges are edges! Weak edges are edges iff they connect to strong Look in some neighborhood (usually 8 closest) Canny edge detector 1. Filter image with derivative of Gaussian 2. Find magnitude and orientation of gradient 3. Non-maximum suppression 4. Linking and thresholding (hysteresis): – Define two thresholds: low and high – Use the high threshold to start edge curves and the low threshold to continue them Source: D. Lowe, L. Fei-Fei, J. Redmon Canny edge detector Our first computer vision pipeline! Still a widely used edge detector in computer vision J. Canny, A Computational Approach To Edge Detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8:679-714, 1986. Depends on several parameters: high threshold low threshold : width of the Gaussian blur Canny edge detector original Canny with Canny with The choice of depends on desired behavior – large detects “large-scale” edges – small detects fine edges Source: S. Seitz