Vision Algorithms for Mobile Robotics Lecture 07 PDF
Document Details
Uploaded by IngenuousBegonia
University of Zurich
Davide Scaramuzza
Tags
Summary
This document is a lecture on vision algorithms for mobile robotics, focusing on multiple view geometry. The author, Davide Scaramuzza, provides an overview, including lab exercises and topics like stereo vision, epipolar geometry, and triangulation.
Full Transcript
Vision Algorithms for Mobile Robotics Lecture 07 Multiple View Geometry 1 Davide Scaramuzza https://rpg.ifi.uzh.ch 1 Lab Exercise 5 – Today Stereo vision: rectification, epipolar matching, disparity, triangulation...
Vision Algorithms for Mobile Robotics Lecture 07 Multiple View Geometry 1 Davide Scaramuzza https://rpg.ifi.uzh.ch 1 Lab Exercise 5 – Today Stereo vision: rectification, epipolar matching, disparity, triangulation 3D point cloud Disparity map (cold= far, hot=close) 2 Multiple View Geometry San Marco square, Venice 14,079 images, 4,515,157 points Agarwal, Snavely, Simon, Seitz, Szeliski, Building Rome in a Day, International Conference on Computer Vision (ICCV), 2009. PDF, code, datasets Most influential paper of 2009 State of the art software: COLMAP: Schoenberger, Frahm, Structure-from-Motion Revisited, Conf. on Computer Vision and Pattern Recognition (CVPR), 2016 3 Multiple View Geometry 𝑃𝑖 =? 3D reconstruction from multiple views: Assumptions: K, T and R are known. Goal: Recover the 3D structure from images 𝐾𝑖 , 𝑅𝑖 ,𝑇𝑖 𝐾1 , 𝑅1 ,𝑇1 Structure From Motion: 𝐾2 , 𝑅2 ,𝑇2 Assumptions: none (K, T, and R are unknown). Goal: Recover simultaneously 3D scene structure and camera poses (up to scale) from multiple images 𝑃𝑖 =? 𝐾𝑖 , 𝑅𝑖 ,𝑇𝑖 =? 𝐾1 , 𝑅1 ,𝑇1 =? 𝐾2 , 𝑅2 ,𝑇2 =? 4 2-View Geometry 𝑃𝑖 =? Depth from stereo (i.e., stereo vision): Assumptions: K, T and R are known. Goal: Recover the 3D structure from two images 𝐾1 , 𝑅1 ,𝑇1 2-view Structure From Motion: 𝐾2 , 𝑅2 ,𝑇2 Assumptions: none (K, T, and R are unknown). Goal: Recover simultaneously 3D scene structure and camera poses (up to scale) from two images 𝑃𝑖 =? 𝐾1 , 𝑅1 ,𝑇1 =? 𝐾2 , 𝑅2 ,𝑇2 =? 5 Today’s outline Stereo Vision Epipolar Geometry 6 Depth from Stereo Goal: recover the 3D structure by computing the intersection of corresponding rays 3D Object Left Image Right Image 𝐶𝐿 𝐶𝑅 7 The Human Binocular System Stereopsys is the principle by which our brain allows us to perceive depth from the left and right images Images project on our retinas upside-down, but our brain makes us perceive them as straight. Radial distortion is also removed, and left and right images are aligned: this process is called rectification Image from the left eye Image from the right eye 8 The Human Binocular System Stereopsys is the principle by which our brain allows us to perceive depth from the left and right images Images project on our retinas upside-down, but our brain makes us perceive them as straight. Radial distortion is also removed, and left and right images are aligned: this process is called rectification Make a simple test: 1. Fix an object 2. Open and close alternatively the left and right eyes. The horizontal displacement is called disparity The smaller the disparity, the farther the object 9 The Human Binocular System Stereopsys is the principle by which our brain allows us to perceive depth from the left and right images Images project on our retinas upside-down, but our brain makes us perceive them as straight. Radial distortion is also removed, and left and right images are aligned: this process is called rectification disparity Make a simple test: 1. Fix an object 2. Open and close alternatively the left and right eyes. The horizontal displacement is called disparity The smaller the disparity, the farther the object 10 The Human Binocular System Stereopsys is the principle by which our brain allows us to perceive depth from the left and right images Images project on our retinas upside-down, but our brain makes us perceive them as straight. Radial distortion is also removed, and left and right images are aligned: this process is called rectification What happens if you wear a pair of mirrors for a week? An early experiment in “perceptual plasticity” was conducted by psychologist George Stratton in 1896. He used his inverted vision goggles over a period of 8 days and over time adapted to the point where he was able to function normally. https://en.wikipedia.org/wiki/George_M._Stratton#Wundt's_lab_and_the_inverted-glasses_experiments 11 Stereo Vision Triangulation Simplified case General case Correspondence problem Stereo rectification Intel RealSense D455 stereo camera: uses stereo and structured infrared light for depth estimation https://www.intelrealsense.com/stereo-depth/ 12 Stereo Vision Goal: find an expression of the 3D point coordinates as a function of the 2D image coordinates Assumptions: cameras are calibrated: both intrinsic and extrinsic parameters are known point correspondences are given 13 Stereo Vision Simplified case General case (identical cameras and aligned) (non identical cameras and not aligned) Pw Pw Cl Cr Cl Cr 14 Stereo Vision - Simplified Case Both cameras are identical (i.e., same intrinsics) and are aligned to the x-axis Z P = ( X P , YP , Z P ) ZP ul ur Left Image Right Image f Cl Cr X b Baseline = distance between the optical centers of the two cameras 15 Stereo Vision - Simplified Case Both cameras are identical (i.e., same intrinsics) and are aligned to the x-axis Z P = ( X P , YP , Z P ) From Similar Triangles: f u = l ZP XP bf ZP = f − ur ul −u r = ZP Z P b −X P ul ur Left Image Right Image Disparity f Cl Cr horizontal distance of the projection of a 3D point on X two image planes b Baseline = distance between the optical 1. What’s the max disparity of a stereo camera? centers of the two cameras 2. What’s the disparity of a point at infinity? 16 Choosing the Baseline What’s the optimal baseline? Large baseline: Small depth error but… Minimum measurable depth increases Difficult search problem for close objects Small baseline: Large depth error but… Minimum measureable depth decreases Easier search problem for close objects Large Baseline Small Baseline 1. Can you compute the depth uncertainty as a function of the disparity? 2. Can you compute the depth uncertainty as a function of the distance? 3. How can we increase the accuracy of a stereo system? 17 Stereo Vision – General Case Two identical cameras do not exist in nature Aligning both cameras on a horizontal axis is very hard → Impossible, why? Pw Cl Cr In order to be able to use a stereo camera, we need the Extrinsic parameters (relative rotation and translation) Instrinsic parameters (focal length, principal point, lens distortion coefficients of each camera) Use a calibration method (Tsai’s method (i.e., 3D object) or Zhang’s method (2D grid), see Lectures 2, 3 How do we compute the relative pose between the left and right cameras? 18 Triangulation Triangulation is the problem of determining the 3D position of a point given a set of corresponding image locations and known camera poses We want to intersect the two visual rays corresponding to 𝑝1 and 𝑝2, but, because of feature uncertainty, calibration uncertainty, and numerical errors, they won’t meet exactly, so we can only compute an approximation P=? p2 p1 C1 C2 19 Triangulation: Least Square Approximation We construct the system of equations of the left and right cameras, and solve it: Left camera (it’s often assumed as the world frame) X w 1 u Y 1 v1 = K1 I 0 w Zw 1 1 P=? Right camera X w u2 Y 2 v2 = K 2 R T w p2 Zw p1 1 1 C1 C2 20 Review: Cross Product (or Vector Product) a b = c Vector cross product takes two vectors and returns a third vector 𝒄= that is perpendicular to both inputs a c = 0 b c = 0 So here, 𝒄 is perpendicular to both 𝒂 and 𝒃, which means the dot product = 0 Also, recall that the cross product of two parallel vectors is the 0 vector The vector cross product can also be expressed as the product of a skew-symmetric matrix and a vector 0 − az a y bx a b = az 0 − a x by = [a ]b − a y ax 0 bz 21 Triangulation: Least Square Approximation Left camera X w 1 u Y 1 v1 = K1 I 0 w 1 p1 = M 1 P p1 1 p1 = p1 M 1 P 0 = p1 M 1 P Zw 1 1 Right camera X w u2 Y 2 v2 = K 2 R T w 2 p2 = M 2 P p2 2 p2 = p2 M 2 P 0 = p2 M 2 P Zw 1 1 22 Triangulation: Least Square Approximation Left camera 0 = p1 M 1 P [ p1 ] M 1 P = 0 Recall: 0 − az ay Right camera [a ]= a z 0 − ax − a y ax 0 0 = p2 M 2 P [ p2 ] M 2 P = 0 23 Triangulation: Least Square Approximation Left camera 0 = p1 M 1 P [ p1 ] M 1 P = 0 We get a homogeneous system of equations 𝑷 can be determined using SVD, as we already Right camera did when we talked about DLT (see Lecture 03) 0 = p2 M 2 P [ p2 ] M 2 P = 0 24 Geometric interpretation of Least Square Approximation P is computed as the midpoint of the shortest segment connecting the two lines P p2 p1 C1 C2 25 Triangulation: Nonlinear Refinement Initialize P using the least-square approximation; then refine 𝑃 by minimizing the sum of left and right squared reprojection errors (for the definition of reprojection error refer to Lecture 3): 𝑃 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑃 𝑝1 − 𝜋 𝑃, 𝐾1 , 𝐼, 0 2+ 𝑝2 − 𝜋 𝑃, 𝐾2 , 𝑅, 𝑇 2 Can be minimized using Levenberg–Marquardt (more robust than Gauss-Newton to local minima) 𝑃 Observed right point Reprojected point 𝜋 𝑃, 𝐾2 , 𝑅, 𝑇 Reprojected point 𝜋 𝑃, 𝐾1 , 𝐼, 0 𝑝2 𝑝1 Left reprojection error Observed left point Right reprojection error 𝑝1 − 𝜋 𝑃, 𝐾1 , 𝐼, 0 𝑅, 𝑇 𝑝2 − 𝜋 𝑃, 𝐾2 , 𝑅, 𝑇 Left camera frame = World frame Right camera frame 26 Stereo Vision Triangulation Simplified case General case Correspondence problem Stereo rectification 27 Correspondence Problem Given a point, 𝑝𝐿 , on left image, how do we find its correspondence, 𝑝𝑅 , on the right image? 𝑝𝐿 Left image Right image 28 Correspondence Search Block Matching: compare each candidate patch from the left image with all possible candidate patches from the right image 𝑝𝐿 𝑝𝑅 Left image Right image 29 Correspondence Search Use one of these: (Z)NCC, (Z)SSD, (Z)SAD, or Census Transform plus Hamming distance 30 Correspondence Problem This 2D exhaustive search is computationally very expensive! How many comparisons? Can we make the correspondence search 1D? Potential matches for 𝒑 must lie on the corresponding epipolar line 𝒍’ The epipolar line is the projection of a back-projected ray 𝜋 −1 (𝑝) onto the other camera image The epipole is the projection of the optical center on the other camera image A stereo camera has two epipoles 𝜋 −1 (𝑝) = λ𝐾 −1 𝑝 𝒍’ = epipolar line 𝑝 Cl Cr 𝒆 = epipoles 31 The Epipolar Constraint The camera centers 𝐶𝑙 , 𝐶𝑟 and the image point 𝑝 determine the so called epipolar plane The intersections of the epipolar plane with the two image planes are called epipolar lines Corresponding points must therefore lie along the epipolar lines: this constraint is called epipolar constraint The epipolar constraint reduces correspondence problem to 1D search along the epipolar line 𝑝 epipolar line epipolar line epipolar plane Cl Cr 32 1D Correspondence Search via Epipolar Constraint Thanks to the epipolar constraint, corresponding points can be searched for along epipolar lines: → computational cost reduced to 1 dimension! Left image Right image 33 Example: Converging Cameras Remember: all the epipolar lines intersect at the epipole (NB. The epipole can also be outside the image) As the position of the 3D point P changes, the epipolar lines rotate around the baseline 𝑃 Left image Right image 34 Example: Identical and Horizontally-Aligned Cameras Left image Right image 35 Example: Forward Motion (parallel to the optical axis) Epipole has the same coordinates in both images Points move along lines radiating from the epipole: “Focus of expansion” e’ e Left image Right image 36 Stereo Vision Triangulation Simplified case General case Correspondence problem Stereo rectification 37 Stereo Rectification Even in commercial stereo cameras the left and right images are never perfectly aligned In practice, it is convenient to have the scanlines coincide with epipolar lines because then the correspondence search can be made very efficient (only search the point along the same scanlines) Left Right scanlines 38 Raw stereo pair (unrectified): scanlines do not coincide with epipolar lines Stereo Rectification Even in commercial stereo cameras the left and right images are never perfectly aligned In practice, it is convenient to have the scanlines coincide with epipolar lines because then the correspondence search can be made very efficient (only search the point along the same scanlines) Stereo rectification warps the left and right images into new “rectified” images such that the epipolar lines coincide with the scanlines Left Right 39 Rectified stereo pair: scanlines coincide with epipolar lines Stereo Rectification P Warps original image planes onto coplanar planes parallel to the baseline It works by computing two homographies, one for each image As a result, the new epipolar lines coincide the scanlines of the left and right image are aligned p1 p2 p'1 C1 p'2 C2 40 Stereo Rectification P The idea behind rectification is to define two new Perspective Projection Matrices (PPMs) obtained by rotating the old ones around their optical centers until the image planes become parallel to each other. This ensures that epipoles are at infinity, hence epipolar lines are parallel. To have horizontal epipolar lines, the baseline must p1 be parallel to the new X axis of both cameras. p2 In addition, to have a proper rectification, p'1 corresponding points must have the same vertical coordinate. This is obtained by requiring that the new cameras have the same intrinsic parameters. C1 Note that, being the focal length the same, the new image planes are coplanar too p'2 Fusiello, Trucco, Verri, “A Compact Algorithm for Rectification of Stereo Pairs, International Conference on Computer Vision (ICCV), 1999. PDF. C2 41 Stereo Rectification (1/5) In Lecture 02, we have seen that the Perspective Equation for a point 𝑃𝑤 in the world frame is defined by this equation, where 𝑅 = 𝑅𝑐𝑤 and 𝑇 = 𝑇𝑐𝑤 transform points from the World frame to the Camera frame. 𝑢 𝑋𝑤 𝜆 𝑣 = 𝐾 𝑅 𝑌𝑤 + 𝑇 1 𝑍𝑤 42 Stereo Rectification (1/5) For Stereo Vision, however, it is more common to use 𝑹 ≡ 𝑹𝒘𝒄 and 𝑻 ≡ 𝑻𝒘𝒄 , where now 𝑅, and 𝑇 transform points from the Camera frame to the World frame. This is more convenient because 𝑻≡𝑪 directly represents the world coordinates of the camera center. The projection equation can be re-written as: 𝑢 𝑋𝑤 𝑢 𝑋𝑤 𝜆 𝑣 = 𝐾𝑅−1 𝑌𝑤 − 𝑇 → 𝜆 𝑣 = 𝐾𝑅 −1 𝑌𝑤 − 𝐶 1 𝑍𝑤 1 𝑍𝑤 43 Stereo Rectification (2/5) We can now write the Perspective Equation for the Left and Right cameras. For generality, we assume that Left and Right cameras have different intrinsic parameter matrices, 𝐾𝐿 , 𝐾𝑅 : Left camera Right camera 𝑢𝐿 𝑋𝑤 𝑢𝑅 𝑋𝑤 𝜆𝐿 𝑣𝐿 = 𝐾𝐿 𝑅𝐿 −1 𝑌𝑤 − 𝐶𝐿 𝜆𝑅 𝑣𝑅 = 𝐾𝑅 𝑅𝑅 −1 𝑌𝑤 − 𝐶𝑅 1 𝑍𝑤 1 𝑍𝑤 Pw 𝒖𝑳 𝒖𝑹 𝒗𝑳 𝒗𝑹 Zw Xw 𝒑𝑳 𝒑𝑹 W Yw 𝑪𝑳 𝑪𝑹 [𝑹𝑳 |𝑪𝑳 ] [𝑹𝑹 |𝑪𝑹 ] 44 Stereo Rectification (3/5) The goal of stereo rectification is to warp the left and right camera images such that their image planes are ) and their intrinsic parameters are identical (i.e., same 𝑲) coplanar (i.e., same 𝑹 𝑢𝐿 𝑋𝑤 𝑢𝑅 𝑋𝑤 𝜆𝐿 𝑣𝐿 = 𝐾𝐿 𝑅𝐿 −1 𝑌𝑤 − 𝐶𝐿 Old Left camera 𝜆𝑅 𝑣𝑅 = 𝐾𝑅 𝑅𝑅 −1 𝑌𝑤 − 𝐶𝑅 Old Right camera 1 𝑍𝑤 1 𝑍𝑤 𝑢ො 𝐿 𝑋𝑤 𝑢ො 𝑅 𝑋𝑤 New Left camera New Right camera → 𝜆መ 𝐿 𝑹 𝑣ො𝐿 = 𝑲 −1 𝑌𝑤 − 𝐶𝐿 → 𝜆መ 𝑅 𝑹 𝑣ො𝑅 = 𝑲 −1 𝑌𝑤 − 𝐶𝑅 1 𝑍𝑤 1 𝑍𝑤 Pw ෝ𝑳 𝒖 ෝ𝑹 𝒖 ෝ𝑹 𝒗 ෝ𝑳 𝒗 Zw Xw ෝ𝑳 𝒑 ෝ𝑹 𝒑 W Yw 𝑪𝑳 𝑪𝑹 𝑳 |𝑪𝑳 ] [𝑹 𝑹 |𝑪𝑹 ] [𝑹 45 Stereo Rectification (4/5) By solving with respect to (𝑋𝑤 , 𝑌𝑤, 𝑍𝑤 ) for each camera, we can compute the Homography that needs to be applied to rectify each camera image: 𝑢ො 𝐿 𝑢𝐿 𝑢ො 𝑅 𝑢𝑅 −1 𝜆መ 𝐿 𝑹 𝑣ො𝐿 = 𝜆𝐿 𝑲 −1 𝑅𝐿 𝐾𝐿 −1 𝑣𝐿 𝜆መ 𝑅 𝑹 𝑣ො𝑅 = 𝜆𝑅 𝑲 −1 𝑅𝑅 𝐾𝑅 𝑣𝑅 1 1 1 1 Homography of Homography of Left Camera Right Camera Pw ෝ𝑳 𝒖 ෝ𝑹 𝒖 ෝ𝑹 𝒗 ෝ𝑳 𝒗 Zw Xw ෝ𝑳 𝒑 ෝ𝑹 𝒑 W Yw 𝑪𝑳 𝑪𝑹 𝑳 |𝑪𝑳 ] [𝑹 𝑹 |𝑪𝑹 ] [𝑹 46 Stereo Rectification (5/5) A common choice is to take the arithmetic average of 𝐾𝐿 and 𝐾𝑅 : How do we choose the new 𝑲? 𝐾𝐿 + 𝐾𝑅 = 𝑲 2 = [𝑟ෝ1 , 𝑟ෝ2 , 𝑟ෝ3 ], with 𝑟ෝ1 , 𝑟ෝ2 , 𝑟ෝ3 being the column vectors of 𝑅? How do we choose the new 𝑹 A common choice is as follows: 𝐶𝑅 − 𝐶𝐿 𝑟ෝ1 = This makes the new image planes parallel to the baseline 𝐶𝑅 − 𝐶𝐿 𝑟ෝ2 = 𝑟3𝐿 × 𝑟ෝ1 where 𝑟3𝐿 is the 3rd column of the rotation matrix of the left camera, i.e., 𝑅𝐿 𝑟ෝ3 = 𝑟ෝ1 × 𝑟ෝ2 Fusiello, Trucco, Verri, “A Compact Algorithm for Rectification of Stereo Pairs, International Conference on Computer Vision (ICCV), 1999. PDF. 47 Stereo Rectification: Example Left Right scanlines 48 Stereo Rectification: Example First, undistort images from their lens distortion Left Right 49 Stereo Rectification: Example First, undistort images from their lens distortion Then, compute homographies and rectify Use bilinear interpolation for warping (see lect. 06) Left Right 50 Stereo Rectification: Example 51 Dense Stereo Correspondence: Disparity Map 1. Rectify stereo pair (if not already rectified) to make scanlines coinciding with epipolar lines 2. For every pixel in the left image, find its corresponding point in the right image along the same scanline 3. Compute the disparity for each pair of correspondences (i.e., 𝑢𝑙 − 𝑢𝑟 ) 4. Visualize it as a grayscale or color-coded image: Disparity map Disparity Map Left image Right image Close objects experience bigger disparity → appear brighter in disparity map 52 From Disparity Map to Point Cloud bf Once the stereo pair is rectified, the depth of each point can be computed recalling that: Z P = ul −u r 53 Stereo Vision Triangulation Simplified case General case Correspondence problem: continued Stereo rectification 54 Correspondence Problem Once left and right images are rectified, correspondence search can be done along the same scanlines To average effects of feature uncertainty and camera calibration uncertainty, use a window around the point of interest (assumption: neighboring pixels have similar intensity) Find correspondence by maximizing or minimizing: (Z)NCC, (Z)SSD, (Z)SAD, Census Transform plus Hamming distance 55 Example: (Z)NCC 56 Textureless regions: the aperture problem Textureless regions are not distinctive; high ambiguity for matches. 57 Textureless regions: the aperture problem Solution: increase window size until the patch becomes distinctive from its neighbors 58 Effects of window size (𝑊) on the disparity map Smaller window more detail but more sensitive to noise Larger window smoother disparity maps W = 3 pixels W = 20 pixels but less detail 59 Accuracy Data Block matching Ground truth 60 Challenges Occlusions and repetitive patterns Non-Lambertian surfaces (e.g., specularities), textureless surfaces 61 Correspondence Problems: Multiple matches Multiple match hypotheses satisfy epipolar constraint, but which one is correct? 62 How can we improve window-based matching? Beyond the epipolar constraint, there are “soft” constraints to help identify corresponding points Uniqueness Only one match in right image for every point in left image Ordering Points on same surface will be in same order in both views Disparity gradient Disparity changes smoothly between points that lie on the same surface 63 Example: Semi-Global Matching (SGM) SGM is a popular open-source algorithm that estimates a dense disparity map from a rectified stereo image pair Left Image Right Image Estimated Disparity Main idea: Perform coarse-to-fine block matching followed by regularization (e.g. smoothing): the estimated disparity map is a piece-wise smooth surface passing through the initial disparity map (see Lecture 12a) Hirschmuller, Stereo processing by semiglobal matching and mutual information, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2007. PDF. Code. 64 Better methods exist For the latest and greatest: Middlebury dataset and leader board: http://vision.middlebury.edu/stereo/ KITTI dataset and leader board: http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo Using Deep Learning Ground truth Jia-Ren Chang Yong-Sheng Chen, “Pyramid Stereo Matching Network”, International Conference on Pattern Recognition, CVPR’18. PDF. 65 Things to Remember Disparity Triangulation: simplified and general case, linear and non linear approach Choosing the baseline Correspondence problem: epipoles, epipolar lines, epipolar plane Stereo rectification 66 Reading Szeliski book 2nd edition: Chapter 12 Autonomous Mobile Robot book (link): Chapter 4.2.5 Peter Corke book: Chapter 14.3 67 Understanding Check Are you able to answer the following questions? Can you relate Structure from Motion to 3D reconstruction? What’s their difference? Can you define disparity in both the simplified and the general case? Can you provide a mathematical expression of depth as a function of the baseline, the disparity and the focal length? Can you apply error propagation to derive an expression for depth uncertainty? How can we improve the uncertainty? Can you analyze the effects of a large/small baseline? What is the closest depth that a stereo camera can measure? Are you able to show mathematically how to compute the intersection of two lines (linearly and non-linearly)? What is the geometric interpretation of the linear and non-linear approaches and what error do they minimize? Are you able to provide a definition of epipole, epipolar line and epipolar plane? Are you able to draw the epipolar lines for two converging cameras, for a forward motion situation, and for a side-moving camera? Are you able to define stereo rectification and to derive mathematically the rectifying homographies? How is the disparity map computed? How can one establish stereo correspondences with subpixel accuracy? Describe one or more simple ways to reject outliers in stereo correspondences. Is stereo vision the only way of estimating depth information? If not, are you able to list alternative options? (make link to other lectures) 68