COMP9517 Computer Vision 2024 Term 2 Week 3 Feature Representation Part 2 PDF
Document Details
Uploaded by FastGrowingJackalope
UNSW Sydney
2024
UNSW
Erik Meijering
Tags
Summary
This document is lecture notes from COMP9517 Computer Vision from UNSW Sydney, 2024 Term 2, week 3. The lecture covers feature representation techniques, such as colour features, texture features, and shape features using SIFT, BoW, HOG.
Full Transcript
COMP9517 Computer Vision 2024 Term 2 Week 3 Professor Erik Meijering Feature Representation Part 2 Different types of features (recap) Colour features (Part 1) – Colour moments – Colour histogram Texture features (Part 1)...
COMP9517 Computer Vision 2024 Term 2 Week 3 Professor Erik Meijering Feature Representation Part 2 Different types of features (recap) Colour features (Part 1) – Colour moments – Colour histogram Texture features (Part 1) – Haralick texture features – Local binary patterns (LBP) – Scale-invariant feature transform (SIFT) … One more example application today Shape features (Part 2) – Basic shape features – Shape context – Histogram of oriented gradients (HOG) Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 2 Another example application of SIFT Classifying images based on texture Bread Cracker – The number of SIFT keypoints may vary highly – Thus the number of SIFT descriptors may vary – Distance calculations require equal numbers – How do we deal with this problem? Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 3 Feature encoding Global encoding of local SIFT features – Combine local SIFT keypoint descriptors of an image into one global vector Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 4 Feature encoding Most popular method: Bag-of-Words (BoW) – Variable number of local image features – Encoded into a fixed-dimensional histogram Training Testing http://cs.brown.edu/courses/cs143/2011/results/proj3/hangsu/ Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 5 Feature encoding Bag-of-Words (BoW): Step 1 – Extract local SIFT keypoint descriptors from training images – Create the vocabulary from the set of SIFT keypoint descriptors – This vocabulary represents the categories of local descriptors Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 6 Feature encoding Bag-of-Words (BoW): Step 1 – Main technique used to create the vocabulary is k-means clustering – One of the simplest and most popular unsupervised learning approaches – Performs automatic clustering (partitioning) of the training data into k categories Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 7 Recap of k-means clustering Initialize: k cluster centres (typically randomly) Iterate: 1. Assign data (feature vectors) to the closest cluster (Euclidean distance) 2. Update cluster centres as the mean of the data samples in each cluster Terminate: When converged or the number of iterations reaches the maximum Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 8 Demonstration of k-means clustering Examples: Points Clusters Iterations 1,000 5 30 1,000 10 36 1,000 20 26 5,000 5 33 5,000 10 42 5,000 20 37 10,000 5 30 Iterations may vary depending on: 10,000 10 38 Number of points 10,000 20 89 Number of clusters 10,000 30 68 20,000 30 87 Cluster initialization https://www.youtube.com/watch?v=BVFG7fd1H30 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 9 Feature encoding Bag-of-Words (BoW): Step 2 – Cluster centres are the “visual words” of the “vocabulary” used to represent an image – A local feature descriptor is assigned to one visual word with the smallest distance Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 10 Feature encoding Bag-of-Words (BoW): Step 2 – Compute the number of local image feature descriptors assigned to each visual word – Concatenate the numbers into a vector which is the BoW representation of the image Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 11 Example application of feature encoding SIFT-based texture classification Classification Vocabulary model bread 1. SIFT feature extraction 2. BoW encoding 3. Classification Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 12 Example application of feature encoding SIFT-based texture classification Build vocabulary Train classifier Classify image http://heraqi.blogspot.com/2017/03/BoW.html Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 13 Feature encoding final notes Local features can be other types of features (not just SIFT) – LBP, SURF, BRIEF, ORB There are also more advanced techniques than BoW – VLAD, Fisher Vector A very good source of additional information is VLFeat.org – http://www.vlfeat.org/ Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 14 Shape features Shape is an essential characteristic of material objects Shape features are typically extracted after image segmentation They can be used to identify and classify objects Example: object recognition Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 15 Shape features Challenges in defining shape features – Invariant to rigid transformations – Tolerant to non-rigid deformations – Unknown correspondence ? Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 16 Basic shape features Simple geometrical shape descriptors Net Area Principal Axes Convex Area = Area of the convex hull that encloses the object Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 17 Basic shape features Convexity versus concavity of an object An object is called convex (or concave) if the b straight line between any two points in the object is (or is not) contained in the object a Convex Concave Convex hull of an object The smallest convex set that contains the object Convex deficiency of an object Set difference between the convex hull and the object Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 18 Basic shape features Simple geometrical shape descriptors Compactness: Circularity: Ratio of the area of an object Ratio of 4𝜋𝜋 times the area of an object to the area of a circle with to the second power of its perimeter the same perimeter (4𝜋𝜋𝐴𝐴/𝑃𝑃2 = 1 for a circle) Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 19 Basic shape features Simple geometrical shape descriptors Elongation: Eccentricity: Ratio between the length Ratio of the length of the and width of the object’s minor axis to the length bounding box of the major axis Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 20 Boundary descriptors Chain code descriptor – Represents object shape by the relative positions of consecutive boundary points – Consists of a list of directions from a starting point – Provides a compact boundary representation 5 6 7 Example: 4 0 6,7,0,1,1,0,7,7 3 2 1 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 21 Boundary descriptors Local curvature descriptor – The curvature of an object is a local shape attribute – Convex (versus concave) parts have positive (versus negative) curvature s κ ⇒ 0 0 s 1 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 22 Boundary descriptors Two interpretations of local curvature C Outside C s − s r (s) Inside + Tangent τ(s) Geometrical interpretation Physical interpretation 1 dτ κ (s) = ± κ (s) = ± (s) r (s) ds Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 23 Boundary descriptors Global curvature descriptors – Total bending energy: 𝐵𝐵 = ∮𝐶𝐶 𝜅𝜅 2 𝑠𝑠 𝑑𝑑𝑑𝑑 > Amount of physical energy stored in a rod bent to the contour > Circular objects have the smallest contour bending energy 𝐵𝐵 = 2𝜋𝜋/𝑟𝑟 – Total absolute curvature: 𝐾𝐾 = ∮𝐶𝐶 𝜅𝜅(𝑠𝑠) 𝑑𝑑𝑑𝑑 > Absolute value of the curvature integrated along the object contour > Convex objects have the smallest total absolute curvature 𝐾𝐾 = 2𝜋𝜋 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 24 Boundary descriptors Radial distance descriptor – Use the centroid 𝐶𝐶 of the shape as the reference point and compute the radial distance 𝑑𝑑 for all 𝑁𝑁 pixels 𝑖𝑖 along its boundary – Scale invariance is achieved by normalizing 𝑑𝑑(𝑖𝑖) by the maximum distance to obtain the radial distance 𝑟𝑟(𝑖𝑖) – The number of times the signal 𝑟𝑟(𝑖𝑖) crosses its mean 𝑑𝑑 can be used as a measure of boundary roughness 𝐶𝐶 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 25 Example application of shape features Combining feature descriptors to classify objects 3000 24 13 Area 4 14 1517 6 5 2723 18 39 7 25 26 21 1 11228 12 20 0 19 16 2 10 0 Circularity 1 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 26 Shape context Shape context is a point-wise local feature descriptor – Pick 𝑛𝑛 points 𝑝𝑝𝑖𝑖 on the contour of a shape – For each point make a histogram ℎ𝑖𝑖 of the relative coordinates of the other 𝑛𝑛 − 1 points – This is the shape context of 𝑝𝑝𝑖𝑖 ℎ𝑖𝑖 𝑘𝑘 = # 𝑞𝑞 ≠ 𝑝𝑝𝑖𝑖 : (𝑞𝑞 − 𝑝𝑝𝑖𝑖 ) ∈ bin(𝑘𝑘) (d) (f) (e) Belongie et al. (2002). Shape matching and object recognition using shape contexts. IEEE TPAMI 24(4):509-522. https://doi.org/10.1109/34.993558 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 27 Example application of shape context Shape matching Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 28 Example application of shape context Shape matching – Step 1: Sample a list of points on shape edges For example from Canny edge detector: > Gaussian filtering > Intensity gradient > Non-maximum suppression > Hysteresis thresholding ⇒ > Edge tracking J. Canny (1986). A computational approach to edge detection. IEEE TPAMI 8(6):679-698. https://doi.org/10.1109/TPAMI.1986.4767851 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 29 Example application of shape context Shape matching – Step 2: Compute the shape context for each point ℎ𝑖𝑖 𝑘𝑘 = # 𝑞𝑞 ≠ 𝑝𝑝𝑖𝑖 : (𝑞𝑞 − 𝑝𝑝𝑖𝑖 ) ∈ bin(𝑘𝑘) – Step 3: Compute the cost matrix between two shapes 𝑃𝑃 and 𝑄𝑄 𝑝𝑝 𝐾𝐾 2 1 ℎ𝑖𝑖 𝑘𝑘 − ℎ𝑗𝑗 (𝑘𝑘) 𝐶𝐶(𝑝𝑝𝑖𝑖 , 𝑞𝑞𝑗𝑗 ) = 2 ℎ𝑖𝑖 𝑘𝑘 + ℎ𝑗𝑗 (𝑘𝑘) 𝑘𝑘=1 𝑞𝑞 ℎ𝑖𝑖 is the shape context of 𝑝𝑝𝑖𝑖 ∈ 𝑃𝑃 ℎ𝑗𝑗 is the shape context of 𝑞𝑞𝑗𝑗 ∈ 𝑄𝑄 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 30 Example application of shape context Shape matching – Step 4: Find the one-to-one matching minimising the total cost between point pairs 𝐻𝐻 𝜋𝜋 = 𝐶𝐶 𝑝𝑝𝑖𝑖 , 𝑞𝑞𝜋𝜋(𝑖𝑖) 𝑖𝑖 – Step 5: Transform one shape to the other based on the one-to-one point matching > Choose the desired transformation (for example affine) > Apply least-squares or RANSAC fitting > This yields the optimal transformation 𝑇𝑇 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 31 Example application of shape context Shape matching – Step 6: Compute the shape distance 1 1 𝐷𝐷 𝑃𝑃, 𝑄𝑄 = min 𝐶𝐶 𝑝𝑝, 𝑇𝑇(𝑞𝑞) + min 𝐶𝐶 𝑝𝑝, 𝑇𝑇(𝑞𝑞) 𝑛𝑛 𝑞𝑞∈𝑄𝑄 𝑚𝑚 𝑝𝑝∈𝑃𝑃 𝑝𝑝∈𝑃𝑃 𝑞𝑞∈𝑄𝑄 – Other costs may also be taken into consideration > Appearance of the image at the points > Bending energy of the transformation Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 32 Example application of shape context Shape matching 1. Sample points 2. Compute shape context 3. Compute cost matrix 4. Find point matching 5. Perform transformation 6. Compute distance Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 33 Histogram of oriented gradients Histogram of oriented gradients popularly referred to as HOG Describes the distributions of gradient orientations in localized areas Does not require initial segmentation N. Dalal and B. Triggs Histograms of oriented gradients for human detection Computer Vision and Pattern Recognition 2005 https://doi.org/10.1109/CVPR.2005.177 ⇒ Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 34 Histogram of oriented gradients Step 1: Calculate the gradient vector at each pixel – Gradient magnitude – Gradient orientation Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 35 Histogram of oriented gradients Step 2: Construct the gradient histogram of all pixels in a cell – Divide orientations into 𝑁𝑁 bins (typically 𝑁𝑁 = 9 bins evenly splitting 180 degrees) – Assign the gradient magnitude of each pixel to the bin corresponding to its orientation Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 36 Histogram of oriented gradients Step 3: Generate detection-window level HOG descriptor – Concatenate cell histograms – Block-normalise cell histograms # orientations/cell # features = (7 x 15) x 9 x 4 = 3,780 # blocks # cells/block Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 37 Histogram of oriented gradients Detection via sliding window on the image Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 38 Histogram of oriented gradients Detection via sliding window on the image – Compute the HOG descriptor for many example windows from a training dataset – Manually label each example window as either “person” or “background” – Train a classifier (such as SVM) from these example windows and labels – For each new (test) image predict the label of each window using this classifier HOG feature map Detector response map Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 39 Example application of HOG Detecting humans in images https://www.pyimagesearch.com/2015/11/09/pedestrian-detection-opencv/ Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 40 Example application of HOG Detecting and tracking humans in videos https://www.youtube.com/watch?v=0hMMRlB9DUc Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 41 Example application of HOG Fine-grained detection using deformable parts model https://doi.org/10.1109/CVPR.2008.4587597 Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 42 Summary Feature representation is essential in solving many computer vision problems Most commonly used image features: – Colour features (Part 1) Colour moments and histogram – Texture features (Part 1) Haralick, LBP, SIFT – Shape features (Part 2) Basic, shape context, HOG Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 43 Summary Other techniques discussed (Part 1) – Descriptor matching – Least squares and RANSAC – Spatial transformations – Feature encoding (BoW) – K-means clustering – Shape matching – Sliding window detection Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 44 Further reading on discussed topics Chapters 4 and 6 of Szeliski Acknowledgements Some content from slides of James Hays, Michael A. Wirth, Cordelia Schmit From BoW to CNN: Two decades of texture representation for texture classification And other resources as indicated by the hyperlinks Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 45 Example exam question Given the image on the right showing the result of a segmentation of various objects and the desired classification of these objects. The two different colours (red and green) indicate the two different classes which the objects are to be assigned to. A straightforward way to perform classification is by computing the value of a quantitative shape measure for each object and then thresholding those values. Suppose we compute the circularity and the eccentricity. Which of these two measures can be used to produce the shown classification? A. Only circularity B. Only eccentricity C. Both circularity and eccentricity D. Neither circularity nor eccentricity Copyright (C) UNSW COMP9517 24T2W3 Feature Representation Part 2 46