Lecture 5 - Image Features(Part I).pptx.pdf
Document Details
Uploaded by Deleted User
Full Transcript
Image Features (Part I) (Feature Detectors, Descriptor and Matching) Instructor: Karam Almaghout 25.09.2024 Contents Introduction Feature Detectors Harris Detector Lowe Detector (SIFT) Feature Descriptor Lowe Descriptor...
Image Features (Part I) (Feature Detectors, Descriptor and Matching) Instructor: Karam Almaghout 25.09.2024 Contents Introduction Feature Detectors Harris Detector Lowe Detector (SIFT) Feature Descriptor Lowe Descriptor (SIFT) Feature Matching Application Areas Introduction to Computer Vision, Innopolis University Fall 2024 2 Slides Credit and Source of the material This lecture is based on the following resources Lecture slides of Prof. Muhammad Fahim. http://www.cs.cornell.edu/courses/cs5670/2019sp/lectures/lec04_harris.pdf Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe 16-385 Computer Vision -- Spring 2019 at CMU. Found material over the internet to aligned the subject according to the need of the students. Introduction to Computer Vision, Innopolis University Fall 2024 3 Feature Extraction We have two images – how do we combine them? Introduction to Computer Vision, Innopolis University Fall 2024 4 1. Identify Interest Points Detect interest points(features) in both images Introduction to Computer Vision, Innopolis University Fall 2024 5 2. Feature Description Extract vector feature descriptor surrounding each interest point. Introduction to Computer Vision, Innopolis University Fall 2024 6 3. Feature Matching Determine correspondence between descriptors in two views Introduction to Computer Vision, Innopolis University Fall 2024 7 Feature Extraction – Panorama stitching Introduction to Computer Vision, Innopolis University Fall 2024 8 Visual Slam video link: https://www.youtube.com/watch?v=VOHloE1mnos Introduction to Computer Vision, Innopolis University Fall 2024 9 Feature Detector - Harris Detector - SIFT Introduction to Computer Vision, Innopolis University Fall 2024 10 What makes a good feature/interest points? Suppose we only consider a small window of pixels What defines whether a feature is a good or bad candidate? Credit: S. Seitz, D. Frolova, D. Simakov Introduction to Computer Vision, Innopolis University Fall 2024 11 What makes a good feature? How does the window change when you shift it? “flat” region: “edge”: “corner”: no change in all no change along the significant change in directions edge direction all directions Harris corner detection (based on the corner) Credit: S. Seitz, D. Frolova, D. Simakov Introduction to Computer Vision, Innopolis University Fall 2024 12 Feature Detector: Harris Corner Detector Introduction to Computer Vision, Innopolis University Fall 2024 13 Mathematics of Harris Corner Detection Consider shifting the window W by (u,v) How do the pixels in W change? W (u,v) Limitation: Slow to compute exactly for each pixel and each offset (u,v) Introduction to Computer Vision, Innopolis University Fall 2024 14 Mathematics of Harris Corner Detection Taylor Series expansion of I: If the motion (u,v) is small, then first order approximation is good Introduction to Computer Vision, Innopolis University Fall 2024 15 Mathematics of Harris Corner Detection Introduction to Computer Vision, Innopolis University Fall 2024 16 Mathematics of Harris Corner Detection Introduction to Computer Vision, Innopolis University Fall 2024 17 Mathematics of Harris Corner Detection Introduction to Computer Vision, Innopolis University Fall 2024 18 Mathematics of Harris Corner Detection We have, H= Since H is symmetric What does the matrix H reveal? The eigenvalues of H reveal the amount of intensity change in the two principal orthogonal gradient directions in the window. Introduction to Computer Vision, Innopolis University Fall 2024 19 Review!! Eigen Vectors and Eigen Values The eigen vector, x, of a matrix A is special vector, with property: Eigen values of a matrix A: Find the eigen vector for corresponding eigen value: Introduction to Computer Vision, Innopolis University Fall 2024 20 Review!! Example Eigen Values: Eigen Vectors: Source: Dr. Mubarak Shah Introduction to Computer Vision, Innopolis University Fall 2024 21 Review!! Example – Eigen Values Finding Eigen Values Source: Dr. Mubarak Shah Introduction to Computer Vision, Innopolis University Fall 2024 22 Review!! Example – Eigen Vectors Detail: https://matrixcalc.org/vectors.html#eigenvectors(%7 B%7B-1,2,0%7D,%7B0,3,4%7D,%7B0,0,7%7D%7D) Source: Dr. Mubarak Shah Introduction to Computer Vision, Innopolis University Fall 2024 23 Mathematics of Harris Corner Detection Ellipse Equation Introduction to Computer Vision, Innopolis University Fall 2024 24 Interpretation of Eigenvalues Classification of image points using eigenvalues of H: λ2 “Edge” λ2 >> λ1 “Corner” λ1 and λ2 are large, λ1 ~ λ2; E increases in all directions λ1 and λ2 are small; E is almost constant “Flat” “Edge” in all directions region λ1 >> λ2 λ1 Introduction to Computer Vision, Innopolis University Fall 2024 25 Harris Corner Detection Measure of corner response (R): λ2 “Edge” “Corner” R0 0 < α < 0.25 is a empirical constant (value ~0.05) R depends only on eigenvalues of H R is large for a corner “Flat” “Edge” R is negative with large magnitude for an edge |R| R Threshold) Introduction to Computer Vision, Innopolis University Fall 2024 29 Compute non-maximal suppression Introduction to Computer Vision, Innopolis University Fall 2024 30 Harris Features Introduction to Computer Vision, Innopolis University Fall 2024 31 Weighting the derivatives In practice, using a simple window W doesn’t work too well Instead, we’ll weight each derivative value based on its distance from the center pixel Introduction to Computer Vision, Innopolis University Fall 2024 32 Harris Corner Detection: Summary 1. Compute x and y derivatives of image 2. Compute products of derivatives at every pixel 3. Compute the sums of the products of derivatives at each pixel 4. Define the matrix at H each pixel 5. Compute the response of the detector at each pixel (R) 6. Threshold on value of R, compute non-max suppression 7. For each pixel that meets the criteria in 6, compute a feature descriptor (Speak about descriptor a little later). Introduction to Computer Vision, Innopolis University Fall 2024 33 Harris corner response is rotation invariant Ellipse rotates but its shape (eigenvalues) remains the same Corner response R is invariant to image rotation Introduction to Computer Vision, Innopolis University Fall 2024 34 Intensity changes Partially invariance to affine intensity change Only derivatives are used => invariance to intensity shift (I → I+b) Intensity scaling: I → a I R R threshold x (image coordinate) x (image coordinate) Introduction to Computer Vision, Innopolis University Fall 2024 35 Harris Corner Detection Introduction to Computer Vision, Innopolis University Fall 2024 36 Scaling Corner All points will be classified as edges Harris detector not invariant to changes in scaling Introduction to Computer Vision, Innopolis University Fall 2024 37 Shi-Tomasi Corner Detector Harris Corner Detector was given by: A small modification to it in their paper Good Features to Track which shows better results compared to Harris Corner Detector If it is a greater than a threshold value, it is considered as a corner Introduction to Computer Vision, Innopolis University Fall 2024 38 Scale Invariant Feature Transform (SIFT) – David Lowe (ICCV 1999) Introduction to Computer Vision, Innopolis University Fall 2024 39 Scale Invariant Feature Transform (SIFT) SIFT is local feature detector and descriptors It is reasonably invariant to changes in illumination image noise rotation scaling small changes in viewpoint Journal + conference versions: 60,000+ citations SIFT Features [Lowe, ICCV 1999] Patented: University of British Columbia (Canada). Introduction to Computer Vision, Innopolis University Fall 2024 40 Detection Stages for SIFT Features 1. Scale-space extrema detection Potential locations for finding features 2. Key point localization Accurately locating the feature keypoints Detector 3. Orientation assignment Assigning orientation to the keypoints 4. Keypoint descriptor Describing the keypoints as a high dimensional vector. Descriptor Introduction to Computer Vision, Innopolis University Fall 2024 41 1. Scale-space Extrema Detection Scale Invariant Detection Introduction to Computer Vision, Innopolis University Fall 2024 42 1. Scale-space Extrema Detection Scale Invariant Detection Introduction to Computer Vision, Innopolis University Fall 2024 43 1. Scale-space Extrema Detection Scale Invariant Detection How do we choose corresponding windows independently in each image? Do objects have a characteristic scale that we can identify? Introduction to Computer Vision, Innopolis University Fall 2024 44 1. Scale-space Extrema Detection Solution: Design a function on the region which has the same shape even if the image is resized Take a local maximum of this function f Image 1 f Image 2 scale = 1/2 s1 region size s2 region size Source: A. Torralba Introduction to Computer Vision, Innopolis University Fall 2024 45 1. Scale-space Extrema Detection A “good” function for scale detection has one stable sharp peak f f f Good ! Bad Bad region size region size region size Source: A. Torralba Introduction to Computer Vision, Innopolis University Fall 2024 46 1. Scale-space Extrema Detection We need to identify those locations and scales that are identifiable from different views of the same object. This can be efficiently achieved using a "scale space" function. Reasonable assumption: it must be based on the Gaussian function and define as: Where: * is the convolution operator G(x, y, σ) is a variable-scale Gaussian and I(x, y) is the input image. Introduction to Computer Vision, Innopolis University Fall 2024 47 1. Scale-space Extrema Detection Laplacian of Gaussians is one such technique to locate scale-space extrema Calculation costly... What is the solution? Approximate LoG Introduction to Computer Vision, Innopolis University Fall 2024 48 1. Scale-space Extrema Detection Approximation of Laplacian of Gaussians Difference of Gaussians (DoG): Where, Introduction to Computer Vision, Innopolis University Fall 2024 49 1. Scale-space Extrema Detection Also known as scale space generation Image Octaves in Gaussian Pyramid SIFT suggests that 4 octaves and 5 blur levels are ideal for the algorithm. Introduction to Computer Vision, Innopolis University Fall 2024 50 1. Scale-space Extrema Detection Images are searched for local extrema over scale and space. Difference of Guassian One pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint. Introduction to Computer Vision, Innopolis University Fall 2024 51 2. Keypoint Localization Now we have much less points than pixels. However, still lots of points (~1000s)… Source: Brown & Lowe 2002 Introduction to Computer Vision, Innopolis University Fall 2024 52 2. Keypoint Localization SIFT also used Taylor series expansion of scale space to get more accurate location of extrema Intensity at extrema: If the intensity change (i.e., the contrast) is below a certain threshold, the keypoint is rejected because it indicates that the region around that keypoint is too flat. Introduction to Computer Vision, Innopolis University Fall 2024 53 3. Orientation Assignment After localization, we have stable keypoints An orientation is assigned to each keypoint. A neighborhood is taken around the key point location depending on the scale, and the gradient magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created. Source: Deepanshu Tyagi Introduction to Computer Vision, Innopolis University Fall 2024 54 4. Keypoint Descriptor At this point, each keypoint has a location, scale, orientation Now we can compute a descriptor for the local image region about each key point 16x16 window around the keypoint is taken. It is divided into 16 sub-blocks of 4x4 size Based on 16*16 patches 4*4 sub regions 8 bins in each sub region 4*4*8=128 dimensions in total Source: Deepanshu Tyagi Introduction to Computer Vision, Innopolis University Fall 2024 55 Stages of keypoint selection 832 keypoints 233x189 initial interest points 536 729 keypoint localization keypoints after (after gradient threshold) ratio threshold Introduction to Computer Vision, Innopolis University Fall 2024 56 SIFT is Robust Extraordinarily robust matching technique Can handle changes in viewpoint Up to about 60 degree Can handle significant changes in illumination Sometimes even day vs. night Fast and efficient Can run in real time Introduction toAdapted Computer Vision, from S. Seitz Innopolis University Fall 2024 57 Other Detectors and Descriptors SURF Approximate SIFT Works almost equally well Very fast HOG: Histogram of Gradients (HOG) Sliding window, pedestrian detection FREAK: Fast Retina Keypoint Perceptually motivated Used in Visual SLAM LIFT: Learned Invariant Feature Transform Learned via deep learning Introduction to Computer Vision, Innopolis University Fall 2024 58 Feature Matching Introduction to Computer Vision, Innopolis University Fall 2024 59 Which Features Match? Source: Noah Snavely Introduction to Computer Vision, Innopolis University Fall 2024 60 Harder Case Source: Noah Snavely Introduction to Computer Vision, Innopolis University Fall 2024 61 Harder Case Source: Noah Snavely NASA Mars Rover images with SIFT feature matches Introduction to Computer Vision, Innopolis University Fall 2024 62 Feature Matching Keypoints between two images are matched using Nearest Neighbor algorithm based on L2 distance How to discard bad matches? Threshold on L2 => bad performance Solution: threshold on ratio The second closest-match may be very near to the first. In this case, ratio of closest-distance to second-closest distance is taken. Introduction to Computer Vision, Innopolis University Fall 2024 63 Feature Matching Given a feature in I1, how to find the best match in I2? 1. Define distance function that compares two descriptors 2. Test all the features in I2 3. Find the one with min distance Introduction to Computer Vision, Innopolis University Fall 2024 64 Feature Distance How to define the difference between two features f1, f2? Simple Approach L2 distance = ||f1 - f2 || It can give small distances for ambiguous (incorrect) matches f1 f2 Source: Noah Snavely I1 I2 Introduction to Computer Vision, Innopolis University Fall 2024 65 Feature Distance How to define the difference between two features f1, f2? Better Approach Ratio Distance = || f1 - f2 || / || f1 – f’2 || f2 is best match to f1 in I2 f2’ is 2nd best match to f1 in I2 Gives large values for ambiguous matches f1 f2' f2 Source: Noah Snavely I1 I2 Introduction to Computer Vision, Innopolis University Fall 2024 66 Feature Matching Example Source: Noah Snavely Introduction to Computer Vision, Innopolis University Fall 2024 67 Evaluating the Results How can we measure the performance of a feature matcher? 50 75 200 feature distance Source: Noah Snavely Introduction to Computer Vision, Innopolis University Fall 2024 68 True/False Positives How can we measure the performance of a feature matcher? 50 true match 75 200 false match feature distance The distance threshold affects performance True positives = # of detected matches that are correct Suppose we want to maximize these False positives = # of detected matches that are incorrect Suppose we want to minimize these Source: Noah Snavely Introduction to Computer Vision, Innopolis University Fall 2024 69 Evaluating the results How can we measure the performance of a feature matcher? 1 0.7 true # true positives positive # matching features (positives) rate Recall 0 0.1 false positive rate 1 # false positives # unmatched features (negatives) Source: Noah Snavely 1 - Specificity Introduction to Computer Vision, Innopolis University Fall 2024 70 Evaluating the results How can we measure the performance of a feature matcher? ROC curve (“Receiver Operator Characteristic”) 1 0.7 Single number: Area Under the Curve(AUC) true # true positives E.g. AUC = 0.78 positive # matching features (positives) 1 is the best rate recall 0 0.1 false positive rate 1 # false positives # unmatched features (negatives) Source: Noah Snavely 1 - specificity Introduction to Computer Vision, Innopolis University Fall 2024 71 Lots of Applications Features are used for: Image alignment (e.g., mosaics) 3D reconstruction Motion tracking Object recognition Indexing and database retrieval Robot navigation … other Introduction to Computer Vision, Innopolis University Fall 2024 72 Reading Kenneth: Section 7.2 and 7.4 Introduction to Computer Vision, Innopolis University Fall 2024 73 Thank you …! Introduction to Computer Vision, Innopolis University Fall 2024 74