FINAL COMPUTER VISION PDF

Lecture 5: Fourier Transform I Feature extraction You cannot use raw signals in ML; you need to extract features Speech - Represented in signals as pitch/loudness - Combine raw data into a “container” that is more meaningful or that better re ects the relevant info - So: what amount of low/medium/high frequencies are present at one point in time Vision - Represented in signals as pixel intensity - Same can be done for vision. However, vision does not consist of sine waves in one dimension - Rather, they are represented as sine waves in two dimensions, which can be visualized as gratings Fourier Transforms Main idea - Understand and analyze sound and image changes in terms of how their frequency components change. By decomposing the signals into simpler wave forms, we can process, modify, or interpret complex data Fourier analysis - The representation of a periodic sound or waveform as a sum of Fourier components (= pure sinusoidal waves) So output is multiple frequencies to original image Fourier synthesis - The reconstruction of a periodic signal on the basis of Fourier coe cients, which give the amplitude and phase of each component sine wave So goes from multiple frequencies to original image Representing signals using “basic” waves - The image shows an example of Fourier series approximation, where a square wave is represented by adding together several sine waves of di erent frequencies and amplitudes - K is a parameter showing complexity of the single, more speci cally the number of harmonics (terms) - A single sine wave cannot perfectly represent the square wave because it is smooth and continuous - The more harmonics are added, the closer you get to the target waveform, even though some small oscillations might remain DC component - Constant signal that doesn’t change over time: represents 0 herts frequency - Important in understanding the overall power or energy of a signal. It also shifts the signal’s mean value up or down - Example: imagine a square wave signal oscillating between -1 and 1. If we add a DC component of 1, the wave would oscillate between 0 and 2 instead Spatial signal - Signal that varies over space rather than time. A 2D Fourier transform is often applied to decompose the spatial single into its frequency components, helping analyze textures, patterns, and structures within the signal In sound, it represents the wave form in tanh domain Page 1 of 16 ff fi ffi fl In images, it represents the arrangement of pixels in images across space (spatial domain) Magnitude & phase Magnitude = strength of individual frequencies Phase = position of individual frequencies - Image ( rst box) Original image in the spatial domain, meaning it represents variations in pixel intensities across the 2D space of the image - Magnitude Spectrum (second box) Magnitude spectrum after applying the Fourier transform Shows the strength of the di erent spatial frequencies present in the image. Brighter spots in this visualization indicate frequencies that have stronger contributions Tells us how much of each spatial frequency contributes to the overall image, but without giving us the orientation or position information - Phase spectrum (third box) Represents the phase (or alignment) of the various frequency components Contains crucial information about how di erent frequency components are aligned spatially in the image Phase tells us how these frequencies are positioned relative to each other - Inverse Fourier transform When reconstructing an image, both the magnitude and phase are important, but the phase plays a more critical role in preserving the recognizable structure Magnitude only: lose crucial information about how those frequencies are positioned or aligned Phase only: contains most of the structural information necessary to reconstruct the recognizable features of the image Page 2 of 16 fi ff ff Lecture 6: Fourier Transform II Fast convolution Methods that perform convolution operations more e ciently, typically by reducing computational complexity. While standard convolution computes the result directly in the spatial domain, fast convolution uses mathematical techniques to achieve the same result with fewer operations Key method: Fourier Transform-based convolution By transforming both the input and the lter into the frequency domain using the Fast Fourier Transform (FFT), convolution becomes simple element-wise multiplication. This approach can be much faster for large inputs, as it reduces the complexity from O(N 2 ) in the spatial domain to O(NlogN ) in the frequency domain Discrete Fourier Transform vs Fast Fourier Transform Discrete Fourier Transform (DFT) - A mathematical process used to analyze the frequency content of a signal - Takes a signal (sequence of numbers) and breaks it down into its frequency components, these showing how much of each “wave” (or oscillation) is present in the signal - Performs a sum of multiplications involving every data point in the signal - Because it performs operations to calculate all frequency components, the time complexity of the algorithm is O(N 2 ), meaning the number of calculations grows quickly as the number of data points increases - This makes DFT relatively slow for large datasets Fast Fourier Transform (FFT) - Optimized algorithm for computing the DFT more e ciently by exploiting the symmetry and periodicity properties of the Fourier transform - Most commonly used FFT algorithm is Cooley-Tukey algorithm, which recursively breaks down the DFT into smaller FTs of size N/2, signi cantly reducing the number of required computations - This has a time complexity of O(NlogN ), making FFT standard for many real-time applications and large-scale computations in digital signal and image processing Why do we need FFT to perform DFT? - Because the FFT is an e cient algorithm that calculates the DFT much faster. It rearranges the steps of DFT to minimize the calculations needed, so it’s quicker and more practical for large datasets Template matching using cross-correlation Where - I: original image where we want to nd the location of a smaller region or pattern - T: template, which is a small image or region that represents the pattern the want to locate in I Page 3 of 16 ffi fi fi fi ffi ffi - Trot: the template rotated by 180 degrees, aligning the template for cross-correlation with the image, which is mathematically similar to convolution by matches patterns rather than blurring or ltering - FFT: applied to both I and Trot - Element-wise multiplication in the spectral domain: instead of performing convolution directly in the spatial domain, we multiply the transformed I and Trot element-wise in the frequency domain - IFFT: inverse FFT, which gives us the cross-correlation map C, which shows how well the template T matches di erent regions in I - Normalization by real part only: we take the real part of the IFFT result because any small imaginary components arise from numerical computation errors and do not contribute meaningful to the cross-correlation Final outcome The result C is a correlation map where each pixel represents how well the rotated template Trot matches the corresponding region in the image I. Peaks in C indicate strong matches, making it easy to locate where T occurs in I Image compression Process applied to a graphics le to minimize its size in bytes without degrading image quality below an acceptable threshold Process 1. FFT2 transformation (vertical rst, then horizontal) FFT2 (or 2D FFT) is a technique that converts the image from the spatial domain (pixel values) to the frequency domain (frequency components) For a 2D image, we rst apply the FFT to each column (vertical) and then to each row (horizontal) Transformation captures entire image’s details in terms of frequency components 2. Keep only the top 1% magnitudes After the transformation, the image is represented by a mix of low and high frequencies To compress the image, we keep only 1% of the highest-magnitude values in the frequency domain. This step removed much of the less signi cant data, reducing the storage size signi cantly 3. Sparse representation Sparse representation involves retaining only the most essential data points, such as the top 1% of values, to e ciently represent an image in the frequency domain. This method leverages image redundancy, storing only key frequency components or pixel intensities, enabling a recognizable reconstruction with minimal data. Sparse data is stored as a list of signi cant components (e.g., row, column, frequency), greatly reducing storage needs. 4. Inverse FFT2 To reconstruct the compressed image, we apply the inverse FFT2, which transforms the image back from the frequency domain to the spatial domain (pixel values) Applying FFT to images For a 2D image, we need to apply FFT to both rows and columns to transform the whole image into the frequency domain: 1. FFT on rows rst Transform each row from the spatial domain to the frequency domain 2. FFT on columns next After the rows are transformed, apply FFT to each column of the result from step 1 This completes the 2D transformation, putting the entire image into the frequency domain Note: you could also do rst columns and then rows, the result is the same Page 4 of 16 fi fi fi ffi fi ff fi fi fi fi fi Blur detection Used to automatically determine if an image is blurry, helping photographers or algorithms lter out low-quality images Why detect blur? - Ideal conditions matter: computer vision algorithms work best on clear, sharp images - Automatic grading: photographers, especially after large photoshoots, need to quickly nd the best images. A blur detector helps by grading or agging blurry photos so they can focus on the high-quality ones How does it work? 1. Magnitude spectrum 2. Convolution in the Fourier domain 3. Gaussians in convolution Gaussian lters are commonly used in image processing, especially for blurring, because they smoothly spread pixel values, creating natural, soft blurs 4. Deconvolution and noise sensitivity Deconvolution: process of reversing a blur or distortion to restore an image Noise sensitivity: deconvolution tries to reverse the blurring e ect, which is very sensitive to noise. When you try to restore a blurred image, any tiny noise gets exaggerated, resulting in a poor-quality or noisy output.Because of this sensitivity, deconvolution is rarely perfect for recovering sharp images from heavily blurred ones. 5. Blind deconvolution Tries to de-blur an image without knowing the exact kernel (the shape or pattern of the blur) Lecture 7: Gabor Filters Background on visual selectivity and receptive elds Selectivity to oriented lines - Discovered by neurophysiologists David Hubel and Torsten Wiesel in the 1950s and 60s, this concept revealed that certain neurons in the visual cortex exhibit a selective response to edges or lines at speci c orientations - These “simple cells” in the primary visual cortex respond to particular orientations, forming a foundational understanding of visual processing by enabling edge, shape, and pattern recognition essential for spatial perception Receptive elds - In visual neuroscience, a receptive eld refers to the speci c part of the visual eld where a neuron is responsive. For example, some neurons only respond to patterns or orientations within a small, localized area - Techniques such as Spike-Triggered Average (STA) and Spike-Triggered covariance (STC) are used to map receptive elds STA: averages stimuli preceding each neuron’s spike, identifying the pattern that likely triggered the response. This provides insights into a neuron’s preferred stimulus STC: an extension of STA, this method captures variability in a neuron’s response, revealing complex receptive elds sensitive to multiple pattern types - STA and STC are categorized as “reverse correlation techniques”, which allow for receptive eld mapping and are especially useful for identifying neurons’ sensitivity to di erent visual features like orientation and spatial frequency Gabor Filters: functionality and similarity to visual cortex Introduction to Gabor Filters Page 5 of 16 fi fi fi fi fi fi fi fl fi ff fi ff fi fi fi - Gabor lters are de ned by sinusoidal waveforms modulated by a Gaussian function and characterized by frequency, orientation, and spatial localization parameters - These lters detect edges, textures, and speci c spatial frequencies by highlighting select frequencies and orientations - Their role in visual processing parallel neurons in the visual cortex (like V1 neurons) that respond to particular spatial frequencies and orientations, acting as biological “ lters” that detect edges and patterns Mechanism of Gabor lters - Sinusoidal Carrier Wave: determines the frequency that the Gabor lter will detect - Gaussian envelope: localizes the sine or cosine wave to a speci c area in the visual eld, allowing Gabor lters to focus on localized regions instead of analyzing the entire image - Localization: Gabor lters are often described as “local Fourier Transforms” because they analyze frequencies within a speci c region. They di er from a standard Fourier Transform, which provides global frequency information across the entire signal or image without spatial speci city - Band-pass ltering: Gabor lters act as band-pass lters, isolating speci c frequencies bands and attenuating others, ltering out irrelevant frequencies and focusing on the desired range Parameters of Gabor lters - Wavelength: determines the size of features the lter will respond to - Orientation: lters can be adjusted to detect speci c angles (e.g., horizontal, vertical, diagonal), enhancing edge and texture detection along these orientations - Phase o set: controls wave positioning, a ecting how the lter responds to di erent textures or edges - Sigma (σ): controls the Gaussian envelope width, balancing spatial and frequency resolution - Spatial aspect ratio: controls elongation, making lters more sensitive to features in certain directions Spatial and frequency domains in Gabor lters Fourier Transform vs. Gabor lters - The Fourier Transform analyzes a signal’s frequency components globally but does not provide localized information - In contrast, Gabor lters provide a localized frequency response by applying a Gaussian envelope, making them suited for detecting edges, textures, and spatial patterns at speci c points within an image Inverse relationship between domains - Sigma (σ) controls spatial and frequency domain balance Small sigma (narrow Gaussian) localizes spatial focus, resulting in broader frequency response Large sigma (wide Gaussian) captures more of the spatial eld, narrowing frequency range This relationship allows Gabor lters to be nely tuned for spatial focus (local details) or broader frequency detection (general patterns) Gabor lter application and practical use in image processing Edge detection and texture analysis - By adjusting frequency and orientation, Gabor lters can detect edges and textures across di erent angles and scales - In the Fourier domain, they isolate speci c frequencies and orientations, enabling precise analysis of edges and textures that align with the parameters Page 6 of 16 ff fi fi fi ff fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi ff fi fi fi fi fi fi fi ff fi fi fi fi fi fi fi ff fi fi Comparison to standard Fourier Transform - While the Fourier Transform detects overall frequency content, Gabor lters provide spatial speci city, making them more suited to image processing tasks such as texture analysis, feature extraction, and segmentation Localized Sinusoidal functions - Gabor lters use a Gaussian envelope to “localize” the sine and cosine waves within a particular spatial region, enhancing detection of spatially varying patterns like textures and edges Modeling simple cells in the visual cortex using Gabor functions 2D receptive eld pro les - Simple cells in the visual cortex have 2D receptive elds sensitive to speci c spatial frequencies and orientations, making Gabor functions ideal for approximating these elds Model tting and residual analysis - Gabor models are used to approximate receptive elds by tuning parameters such as orientation and wavelength to match neural responses - Residuals measure discrepancies between the model and actual data, identifying areas where Gabor lters may not fully capture a neuron’s response Biological plausibility - The Gaussian envelope in Gabor lters mirrors the response pro le of Simple Cells, providing a biologically plausible model of edge detection and spatial frequency sensitivity Advanced computer vision applications of Gabor lters Gabor Filter Bank - Collection of lters with varying frequencies and orientations, enabling diverse feature detection across images - Convolution process: each lter in the bank is convolved with the image to create a response highlighting speci c features at each location. The resulting responses form a “map” of edges, textures, and patterns in various directions and scales Frequency and orientation selectivity ion Fourier domain - In the Fourier Domain, each lter in the bank acts as a band-pass lter for its respective frequency and orientation, ensuring comprehensive coverage and selective analysis of image content Image texture and feature extraction - Textures such as brick walls, grass elds, or gravel paths contain structure and periodicity that Gabor lters can capture - Using a lter bank, textures can be segmented based on the Gabor response at di erent orientations and frequencies. Comparing histograms or feature maps generated from Gabor responses enables texture classi cation Page 7 of 16 fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi fi ff Lecture 8: PCA and natural images Natural images The type of images that biological visual systems are exposed to - They are predictable, meaning a pixel and its neighboring pixels are highly correlated (similar to each other - They have structure because the arrangement of pixel values follow speci c patterns or correlations Random images have independency Global coherence = arrangement of elements within a natural image contributes to a recognizable object that we are used to Mean subtraction refers to taking out the DC component Image c shows correlation structure but in the form of a plot where the center is the peak, and the further we go, the more the correlation decreases Plot shows the distribution of line orientations in an image - Peak of cells at 90 degrees (vertical) and 180 degrees (horizontal) Power spectrum of natural images Power spectrum = spectrum of frequencies when traversing Spatial frequencies in natural images follow a special distribution, meaning it has a speci c amplitude and spectrum The Fourier amplitude of spatial frequencies is proportional to ~ 1/f The power spectrum is proportional to ~ 1/f^2 -> balance between smooth variations and ne details in natural images Noise -> randomness -> no correlation - Lacks predictable pattern In natural images, it is linearly proportional Large objects have high power spectrum, meaning that natural images are dominated by low frequencies, and as power decreases, high frequency add texture and details Image patches Image patches -> can be enough for a task and avoid too much work De nition: patch based method technique in computer vision that process images by dividing them into smaller overlapping sections called patches, and these patches are analyzed and measured individually to perform various tasks Page 8 of 16 fi fi fi fi PCA Stands for Principal Component Analysis Natural images have patterns - The points in every dimension will be clustered together - Random Images will be randomly distributed (scattered) Need to have images in small parts PCA reduces dimensionality - identify direction in space that capture the most variance (spread of points in dataset) Technique to compute the (orthogonal) directions of maximum possible variance in the data Consider data (vectors) as rows in a matrix X Assume the components of the data vectors have zero mean The directions of maximum variance correspond to the eigenvectors of the Covariance matrix of the data PCA in general are a useful technique for dimensionality reduction and data analysis and visualization Example 2: variables control the shape of shes: width and height - Width and height are strongly correlated, so we can identify each sh using a single number. This is the rst PC PCA of natural image patches Natural images can be broken down into patches, which you represent as points in an n- dimensional space Because pixels in a natural image are correlated to one another (not random), we can use PCA to break them down into principle components These components represent a compressed version of the image, and are similar to our visual system So far, we have used square patches -> bounder e ects, inaccurate model of neuron’s receptive eld A receptive eld can be localized using a Gaussian mask - This makes it more similar to the neuron’s receptive eld Page 9 of 16 fi fi fi fi ff fi fi PCA as a (simpli ed) model of V1 simple cells PCA is not biologically plausible - Nonetheless, there exist algorithms that can compute PCs of data in more biologically plausible ways (like Oja’s rule) - Hebbian rule: “neurons that re together, wire together” The Oja learning rule (Oja, 1982): mathematical formalization of this Hebbian learning rule, such that over time the neuron learn to compute a principal component of its input stream - Olshausen and Field (1996): we can also go one step further with ICA (independent component analysis) Not only do you remove correlation (like in PCA), you actually remove all overlapping information. So, if one component contains certain information, no other component will also have this Was used for biological analysis Technique to multivariate signal into its independent, non-Gaussian components Goal: nd a linear transformation of the data such that transformation data is as statistically independent as possible This is very e cient: only one neuron needs to re for a certain set of information - This is sparsity: as few neurons active as possible, while still faithfully representing the input image Di erence between PCA and ICA - PCA looks for components which encode the largest variance - ICA looks for components that have no correlation Information theory Study of how information is quanti ed and analyzed within visual data Entropy Measure of information content or uncertainty in an image Quanti es how much information is contained in the image’s pixel intensity distribution, providing insights into its complexity, texture, or randomness This concept comes from information theory and is often used in tasks like image segmentation, thresholding, or texture analysis - Low entropy: an image with mostly uniform or predictable intensities - High entropy: an image with diverse intensities Where: - pi: the probability of intensity i in the image - N: total number of intensity levels Applications in computer vision 1. Image segmentation Entropy helps identify regions of interest by analyzing the distribution of intensities in di erent parts of the image 2. Texture analysis High-entropy regions often indicate complex textures, while low-entropy regions represent smooth or uniform areas 3. Image compression Images with low entropy are easier to compress because they have less information or redundancy Page 10 of 16 ff fi fi ffi fi fi fi fi ff 4. Saliency detection Entropy can guide focus on areas with high information content, aiding in feature extraction or object detection Lecture 9: Scale space Scale Scale introduces a third dimension to a 2D image, enabling to manipulate the level of detail - Zooming in: small scope, high spatial frequency, and ne details are visible - Zooming out: large scope, low spatial frequency, with a coarse, less detailed view Humans naturally perceive and interpret scale Filter -> mathematical operation applied to an image to emphasize or suppress features Image pyramids An image pyramid is a hierarchical structure of images at varying resolutions, created by progressively blurring and subsampling an original image Blurring = apply lter repeatedly to suppress features - Level 0: original image (large scope, full resolution) - Last level: heavily blurred and downsampled version (small image with fewer details) Purpose: helps in analyzing features at di erent scales, which is crucial for tasks like edge detection, image compression, and feature matching Intensity pro les and derivatives Intensity pro le - Low represents dark regions - High represents bright regions 1st derivative - Transitions from dark to bright creates an edge - Useful for detecting presence of edges - Measure the rate of change of intensity - Large positive peak -> dark to bright - Large negative peak -> bright to dark 2nd derivative - Has 0 crossings -> points where vertical lines cross the 0 line - More precise at detecting edges - Positive peak to negative -> transition from dark to bright - Edge detection, precise location of them Page 11 of 16 fi fi fi ff fi 1D example (Witkin, 1984) Kernels can be applied recursively to 1D signals (e.g., time-series or audio) to create progressively smoother versions of the signal This process separates the signal into distinct scales: Key points: -By recursively smoothing, we analyze the signal at varying resolutions -Scale-space: a new dimension representing di erent resolutions of the signal/image Application to edge detection -Smoothing reduces noise, leaving only signi cant variations (edges) -The second derivative helps detect edges -Locate points where the second derivative is zero (zero- crossings) Observations - Higher scales (more smoothing) result in fewer zero-crossings, as ner details are smoothed out -Signal has been smoothed, large features remain -0 crossing -> what features were preserved while displaying the sigma -Paths that are not smoothed out are persistent features -Gaussian limit -> process results in the Gaussian distribution due to the central limit theorem in convolution, meaning repeated consoling on any kernel approximate a Gaussian Di erence of Gaussian Method used for edge detection and feature enhancement Approximates Laplacian of Gaussian where instead of computing the second derivative directly, the Di erence of Gaussian achieves a similar e ect by using the di erence between two Gaussian lters with di erent standard deviations Gaussian Pyramid Construction 1. Downsample image by applying Gaussian lter and reducing its size 2. Repeat to create smaller and smoother versions of image Purpose: emphasize large-scale features, noise reduction -> smoothing can help remove high frequency noise while preserving meaningful patterns at larger scales Features -Each level represents the image at a lower resolution with reduced high-frequency details -Useful for -Downsampling images while preserving important structural details -Multi-scale analysis Page 12 of 16 fi ff ff ff fi fi ff ff fi ff 2D example Fourier Spectrum of the Gaussian Pyramid As you go down the levels, downsampling and blurring is being applied Observe gradual decrease in high frequency components and lower frequencies dominate as you move to coarse levels. This is because the blurring in Gaussian Pyramid removes ne details and downsampling reduces the overall resolution, resulting in a shift towards low frequencies Laplacian Pyramid Construction 1. Start with Gaussian pyramid 2. Upscale each image to natural size of previous level 3. Subtract upscale image from original in the Gaussian pyramid 1. Get edges, isolates frequencies 2. Emphasizes certain features, so band-pass lter Features - Highlights contours and edges at various scales - Acts as a tool for edge detection and compression Purpose: the Laplacian Pyramid represents an image as a series of contour-based layers, each corresponding to a speci c spatial frequency band Page 13 of 16 fi fi fi Fourier spectrum of the Laplacian Pyramid Details of high-pass components of an image At each level, you capture the di erence between original and smoothed-downsampled image. In the Fourier spectrum, for each level, you would observe higher emphasis on higher frequency components Scale invariance - Natural objects exhibit self-similar patterns across di erent scales (e.g., fractals) - Challenge for CNNs: recognizing objects across varying scales - Solution: process subsampled versions (e.g. Gaussian Pyramid) of an image in parallel (multi- scale pyramid) - Enables CNNs to analyze patterns at multiple scale simultaneously, improving recognition accuracy Lecture 10: SIFT and features Object Classi cation To classify objects like dogs and cats, we rely on image features that are discriminative and invariant to transformations like pose, scale, and rotation. Here are possible strategies: Feature examples for classi cation - Color features (HSV, LAB, RGB) Advantage: invariant to translation, rotation, pose, and potentially luminance (HSV, LAB) Limitation: low predictiveness due to high variability in color across objects - Shape features Advantage: captures contour and structure of objects Limitation: shape depends heavily on rotation, angle, and pose General principles for features Good features should - Be discriminative (distinct across categories but consistent within a category) - Be invariant (robust to transformations like translation, scale, and rotation) - Provide robustness to noise - Be computationally e cient Features in computer vision De nition: components of an image that convey essential information for tasks like classi cation, recognition, or detection Types of features - Color histograms Describe the distribution of colors in the image Page 14 of 16 fi fi ffi fi ff ff fi Advantage: robust to translation, rotation, and some lighting changes Limitation: no shape information - Corners Points where edges meet; highly localized and unique Example: Harris Corner Detection algorithm detects corners by analyzing eigenvalues of the image gradient matrix - Scale-Invariant Features (like SIFT) Designed to handle translations, rotations, and scale changes Detects key points and computes descriptors robust to various transformations Manual vs. random feature extraction - Manually designed lters or descriptions (e.g. convolutional lters with pre-determined weights) - Random lters, such as random lines for OCR, rely on counting intersections of features but require large lter sets for accuracy Non traditional approach for feature extraction that initialize random values in convolution lter instead of learning or designing it Generates diversity (lots of images are diverse) Need to repeat process multiple times, so not computationally e cient Lacks optimization Baseline Feature extraction pipeline - Identify the regions of interest containing useful information (point of interest) Locations in the image that are distinctive and carry important information Help us to reduce computational complexity - Compute feature descriptors, which encode these regions into vectors Feature descriptors - vector representation - Should be distinctive and robust to noise and transformations - Examples: color histograms, SIFT descriptors Algorithms for point of interest - Highest corner detector -> identi es corners by analyzing pixel intensity variations - DoG Algorithms for feature descriptors - SIFT descriptor -> represents key points using histograms or oriented gradients - SURF -> similar but faster This approach is passive variant, which enables this exibility Harris Corner Detection Algorithm Identi es corners by analyzing image patches that are distinct from their surrounding Measures how much the patch changes using the sum of squares di erences Corners are points where patches change signi cantly in all directions Sobel lters are used to get gradients Then create gradient products Apply Gaussian smoothing Compute Harris matrix - combines smooth values into matrix M that summarizes changes and pixel Look at eigenvalues because they determine features If they are both small, then you detect a at region When one is large, you detected an edge When both gradients are large, corner is detected Page 15 of 16 fi fi fi fi fi fi fi fl fi fl fi ffi ff SIFT (Scale-Invariant Feature Transform) A robust technique for feature detection and description, invariant to translation, rotation, and scale Key processes: 1. Detection of key points Analyze scale-space extrema using a Di erence of Gaussian (DoG) approach Identify point that are local extrema in both spatial and scale dimensions Remove weak key points (e.g., those with low contrast or edge-like responses) 2. Descriptor computation Around each key point, extract a 16x16 pixel window Compute gradient orientation for each pixel, weighing by a Gaussian function Build a histogram of gradient directions to form a feature descriptor Properties - Invariant to scale, rotation, and minor viewpoint changes - Highly distinctive, ensuring accurate matching across images Applications - Image alignment, panorama stitching, object detection, and scene recognition Page 16 of 16 ff

FINAL COMPUTER VISION PDF

Document Details

Tags

Related

Summary

Full Transcript