Fundamentals of Data Science Chapter 2 PDF

Chapter 2 Edge Detection Edge recognition is a fundamental concept in image processing, focusing on identifying the boundaries or transitions between different regions in an image. One common approach is recognition using line drawings, where the goal is to extract the structural outlines or edges in an image, simplifying it into a representation of shapes and objects. This process often relies on image derivatives, which measure changes in pixel intensity. The 1st-order derivative detects rapid intensity changes, identifying edges as areas where the gradient (rate of change) is highest. The 2nd-order derivative goes further, detecting variations in the gradient itself, which can highlight finer edge details or transitions. These techniques provide the mathematical foundation for identifying and analyzing edges, helping in tasks like object recognition, segmentation, and scene understanding. Recognition Using Line Drawings Recognition using line drawings is an image processing technique that focuses on extracting and analyzing the structural outlines of objects in an image. The primary purpose of line drawings in image processing is to simplify visual information by reducing it to essential lines and edges. This simplification helps algorithms focus on an object’s shape and structure without being in- fluenced by less relevant details like textures, colors, or noise. By abstracting the image into a basic representation, line drawings make it easier to analyze the underlying geometry and structural features. Additionally, line drawings are invaluable for feature extraction, as they highlight critical elements such as edges, corners, and outlines. These features are essential for recognition algorithms, which often rely on geometric and structural information to classify objects, detect patterns, or understand scenes. This makes line drawings an effective tool in applications such as object recognition, robotics, and computer vision. Typical use cases of line drawings include sketch-based recognition, a technique where users provide sketches or line drawings as input for a system to recognize and match them to real-world objects. This approach leverages the simplified representation of objects through their contours and edges, allowing the system to focus on shape and structure rather than textures or colors. By comparing the geometric and structural features of the sketch to a database of objects, the system can identify similar items or categories. This method is widely used in applications such as design tools, where users create sketches to search for matching templates, and in augmented reality, where hand-drawn inputs are used to interact with virtual objects. Sketch-based recognition highlights the power of line drawings in bridging human creativity with computational understanding. Edge detection involves a series of steps to identify significant transitions in intensity within an image. 1. Smoothing Using a Gaussian Filter: It reduces noise and high-frequency variations that could obscure meaningful edges. By smoothing the image, only the significant transitions in intensity are retained, making subsequent steps more accurate. 2. Derivatives of The smoothed Image: These are calculated to measure the rate of intensity change. These derivatives, such as gradients, help highlight regions where intensity varies significantly, which typically correspond to edges. 3. Maxima of the Derivative: They are identified as the locations of edges. These maxima represent points of the highest intensity change, marking the boundaries between different regions in the image. 23 24 CHAPTER 2. EDGE DETECTION This step-by-step approach ensures that edges are detected robustly while minimizing false positives caused by noise or minor intensity variations. Goals of Edge Detection The primary goals of edge detection are to accurately identify significant intensity changes in an image while minimizing errors caused by noise or artifacts. Good detection: Ensuring that the filter or algorithm can differentiate between actual edges and random noise. This requires high sensitivity to edges and low sensitivity to noise, allowing the detection of true boundaries without being misled by irrelevant variations. Good location: It is the second goal, ensuring that the detected edge corresponds precisely to the location where the intensity change occurs, without any shifting or offset. single response: It is crucial, meaning that each edge should be detected only once. Multiple responses to the same edge can lead to redundancy, confusion, and errors in subsequent processing. Together, these goals ensure accurate, precise, and reliable edge detection. Edge Detection Issues Despite these goals, several issues can arise in edge detection: Poor Localization: It is a common problem where detected edges are shifted from their true location, leading to inaccuracies in identifying boundaries. This can occur due to improper filtering or interference from noise. Too Many Responses: Where multiple edges are identified for a single true edge. This redundancy can result in false detections and clutter, complicating further image analysis. These challenges highlight the importance of designing robust edge detection algorithms that balance sensitivity, precision, and noise handling to achieve accurate results. 2.1 1D Edge Detection Steps 1D edge detection involves analyzing intensity changes along a single dimension of an image, typically a row or column, to identify boundaries or transitions. The process detects sharp changes in pixel intensity, which correspond to edges in the image. This approach is useful for simplifying edge detection to a single dimension, where intensity profiles are analyzed using techniques like derivatives or gradient-based methods to locate peaks and troughs rep- resenting edges. Example: In the image, a specific section of the ”Barbara” image is selected, corresponding to line 250, which represents a single row of pixels. The intensity values of the pixels along this row are plotted as a profile graph. Figure 2.1: Single Dimention Line Function where: 2.1. 1D EDGE DETECTION STEPS 25 The x-axis corresponds to the pixel positions (image length) The y-axis represents the intensity values (colors). The profile shows variations in intensity, with sharp changes indicating potential edges. This visualization helps highlight the changes in intensity along a single dimension, making it easier to identify edges within the selected row. The intensity profile extracted from the image section appears noisy, particularly in regions with rapid and frequent changes in intensity. These variations are caused by high-frequency components in the image, such as textures, fine patterns, or sharp transitions between pixel intensities. High-frequency fluctuations represent intricate details, like the fabric’s texture in the ”Barbara” image, which are visually important but can complicate edge detection. Such fluctuations mask the larger, more meaningful intensity transitions that define edges, making it harder to iden- tify them accurately. Without preprocessing, this noise can lead to false edge detections or missed boundaries. Techniques like smoothing filters (e.g., Gaussian filters) are often applied to reduce the influence of high-frequency components, suppressing noise while retaining significant edges for better analysis and detection. 2.1.1 Preprocessing the Image with Gaussian Smoothing Smoothing an intensity profile is a crucial preprocessing step for enhanced edge detection in 1D, as it reduces noise and emphasizes significant intensity transitions. Applying a Gaussian smoothing filter is a common method for this purpose. The Gaussian filter reduces high-frequency components and noise in the intensity profile, resulting in a smoother curve that is easier to analyze. This smoothing minimizes random fluctuations caused by noise or fine textures, which can otherwise obscure meaningful edges in the profile. By attenuating these minor variations, the process highlights more prominent changes in intensity, which correspond to edges or boundaries between different regions in the image. The smoothed profile thus provides a clearer indication of where significant transitions occur, improving the reliability of edge detection algorithms. The purpose of smoothing is to enhance the ability to identify edges or transitions by filtering out unnecessary noise while preserving important intensity changes. This enables more accurate detection of boundaries, leading to better recognition and analysis of objects or patterns in the image. Overall, smoothing is an essential step that balances noise reduction with edge preservation, making edge detection more robust and effective. Figure 2.2: Smoothed Function Edge detection via derivatives from smoothed images is a fundamental method in image processing for identifying significant transitions in intensity within an image. An edge is mathematically defined as a point or region where there is a sharp change in intensity, marking the boundary between different regions. To detect these changes, derivatives are employed to measure the rate of change in intensity values. 2.1.2 First Derivative (Gradient) The first derivative in image processing measures the rate of change of intensity across the image. It quantifies how quickly pixel values change, providing a mathematical representation of intensity variations. Large values of the first derivative correspond to rapid changes in intensity, making it a reliable indicator of edges, where significant 26 CHAPTER 2. EDGE DETECTION transitions occur between regions. This property makes the first derivative particularly useful for detecting edges, as it highlights areas with sharp intensity transitions. By applying gradient operators like Sobel or Prewitt, the first derivative can be computed efficiently, aiding in edge detection tasks and emphasizing critical boundaries within an image. First Derivative Formulation The formula shown represents the mathematical definition of the first derivative, which measures the rate of change of a function f (x). The derivative is defined as the limit of the difference quotient as the step size h approaches zero: d f (x + h) − f (x) f (x) = lim. dx h→0 h In image processing, this derivative is approximated using a difference operation, where h is set to 1 (the distance between adjacent pixels). The approximation becomes: d f (x) ≈ f (x + 1) − f (x), dx which calculates the difference in intensity between neighboring pixels. This discrete difference is widely used in digital images to compute derivatives efficiently, as the continuous nature of the mathematical derivative cannot be directly applied to discrete pixel data. The difference operation allows for detecting intensity changes across an image, which is the basis of many edge detection techniques. By using simple subtraction, this approach highlights regions with significant changes in brightness, making it a computationally efficient method for approximating the derivative and analyzing intensity transitions in digital images. Linear Kernels Linear kernels are commonly used to approximate the derivatives of an image by calculating intensity differences between pixels, enabling the detection of edges and transitions. A direct filter uses a simple kernel, [−1, 1], to compute the difference between two neighboring pixels. This filter directly subtracts the intensity of one pixel from its adjacent pixel, making it an efficient method for identifying rapid intensity changes and detecting edges. A symmetric filter employs the kernel [−1, 0, 1], which computes the central difference by comparing two symmetrical pixels relative to a center pixel. This approach provides a more balanced estimate of intensity changes by considering both sides of the central pixel, resulting in more accurate edge detection. Both filters serve as fundamental tools for deriving intensity variations in an image, with the symmetric filter offering greater precision in preserving directional information. First Derivative Formulation Let’s consider a 2D grayscale image represented as a matrix of intensity values:   10 15 20 25 30 35 40 45 I= 50  55 60 65 70 75 80 85 We want to compute the horizontal derivative using the Direct Filter [−1, 1] and the Symmetric Filter [−1, 0, 1]. Using the Direct Filter [−1, 1] For horizontal derivatives, we apply the kernel row-wise: 2.1. 1D EDGE DETECTION STEPS 27 K = −1 1 Convolving this kernel with the image computes differences between adjacent pixels in each row. For example, for the first row: (15 − 10), (20 − 15), (25 − 20) = [5, 5, 5] Applying this to the entire image:   5 5 5 5 5 5 Horizontal Derivative (Direct Filter) =  5  5 5 5 5 5 This highlights the changes along the horizontal direction. Using the Symmetric Filter [−1, 0, 1] For horizontal derivatives, the symmetric kernel considers the difference between two neighbors, skipping the center pixel: K = −1 0 1 For the first row, the central difference calculation is: (20 − 10), (25 − 15) = [10, 10] Applying this across the image (ignoring boundaries):   10 10 10 10 Horizontal Derivative (Symmetric Filter) =  10  10 10 10 This provides a more balanced difference measure compared to the Direct Filter. Extending to Vertical Derivatives: For vertical derivatives, the kernels are transposed: −1 Direct Filter: K = 1   −1 Symmetric Filter: K =  0  1 These are applied column-wise to compute intensity changes along the vertical direction. The resulting matrices will highlight intensity transitions between rows. This approach shows how linear kernels are applied to 2D images to detect edges and intensity changes in both horizontal and vertical directions. Type of Edges in 1D Edges in an image are critical features that represent transitions or boundaries between different regions. Various types of edges exist, each characterized by the way intensity values change across them. Step edges show an abrupt change in intensity, indicating sharp boundaries. Ramp edges, on the other hand, have a gradual transition in intensity, often caused by lighting variations or smoother boundaries. Line-on-bar edges appear as narrow, in- tense changes in a single region, often representing thin structures or lines. Finally, roof edges exhibit a peak in 28 CHAPTER 2. EDGE DETECTION intensity, with a rise and fall resembling a roof shape, commonly seen in curved or soft boundaries. Understanding these types of edges is essential for designing detection techniques tailored to their unique characteristics. Step Edges A step edge occurs when there is an abrupt change in intensity from one level to another, creating a distinct boundary between two regions. It is characterized by a sharp, vertical transition in the intensity profile, where pixel values suddenly shift from one level to another without gradual variation. This type of edge involves significantly different brightness levels, such as the boundary between a dark and a bright region. Step edges are prominent in images with clear, well-defined objects and are crucial for identifying boundaries and segmenting regions effectively in image analysis. Ramp Edges A ramp edge is characterized by a gradual transition between two intensity levels, where the change in brightness occurs smoothly over a region rather than abruptly. For example, a shadow transitioning gradually from dark to light illustrates a ramp edge, as the intensity values increase progressively rather than jumping sharply. Ramp edges are common in images with soft boundaries or gradual shading, and they appear as sloped transitions in the intensity profile. This type of edge is typically more challenging to detect compared to step edges, as the lack of abrupt changes requires more sensitive techniques to identify the transition accurately. Line or Bar Edges A line or bar edge consists of a narrow region of high intensity flanked by two lower-intensity regions, creat- ing a distinct, thin feature within the image. For example, a thin bright line against a darker background is a typical representation of a line or bar edge. This type of edge highlights thin, well-defined features, such as wires, cracks, or small structural elements, and is essential for detecting fine details in images. The intensity profile of a line or bar edge shows a sharp peak, making it easily identifiable and useful for tasks requiring precise feature recognition. Roof Edges A roof edge is defined by a sharp peak in intensity with sloping sides, resembling a ridge or pinnacle in the intensity profile. It typically represents thin structures or sharp transitions in an image where the intensity grad- ually increases to a peak and then decreases, forming a triangular or roof-like shape. Roof edges are common in features such as ridges, creases, or curved surfaces and are characterized by their distinct yet smooth transitions, making them crucial for identifying detailed structures or subtle boundaries in images. Figure 2.3: Type of Edges 2.1. 1D EDGE DETECTION STEPS 29 2.1.3 Second Derivative (Laplacian) The second derivative in image processing measures the rate at which the first derivative changes , pro- viding insight into the curvature of the intensity profile. It determines whether the first derivative is increasing or decreasing, effectively capturing changes in the shape of the intensity curve. This makes the second derivative par- ticularly useful for identifying where the curve changes direction, such as at edges or inflection points. A key concept in second-derivative edge detection is the zero-crossing, which occurs when the second derivative transitions from positive to negative or vice versa. These zero-crossings often correspond to points where the first derivative reaches a peak, marking the location of edges. This property is leveraged in methods like the Laplacian of Gaussian (LoG) filter, where zero-crossings help identify fine edge details and enhance edge localization in an image. Figure 2.4: First and Second Derivatives in 1D Edge Detection 2.1.4 Simplified Edge Detection Normally the edge detection process is the following: Figure 2.5: Edge Detection Steps The process of edge detection can be simplified by combining smoothing and derivative calculations into a single step. Instead of first convolving the image with a Gaussian function g for smoothing and then calculating the derivative of the smoothed image, the derivative of the Gaussian filter g can be computed beforehand. This results in a derivative filter that combines both operations. By directly convolving this derivative filter with the image, the smoothing and edge detection processes are performed simultaneously. This approach not only reduces computational complexity but also ensures that the edges detected are based on the smoothed intensity transitions, effectively minimizing noise while identifying significant changes. This simplification is widely used in edge detection algorithms. d d (g ⊗ f ) = g ⊗f dx dx g ⊗ f : Convolution of the Gaussian filter and the signal. 30 CHAPTER 2. EDGE DETECTION Figure 2.6: Simplified Edge Detection Steps d dx g: Derivative of the Gaussian filter. The operation simplifies by swapping the order of convolution and differentiation. The simplification of the edge detection process is possible because both convolution and differentiation are linear operations. This means that their order of application can be swapped without altering the final result. Instead of smoothing the image with a Gaussian filter and then calculating its derivative, the derivative of the Gaussian can be computed beforehand and directly applied to the image as a single convolution operation. This approach significantly reduces computational complexity by eliminating the need for intermediate calculations, such as separately convolving the image with the Gaussian filter and then taking the derivative. By combining these steps into one, the process becomes more efficient while still preserving accuracy in edge detection. 2.1.5 Hysteresis Hysteresis is a technique used in edge detection to ensure that detected edges are continuous and not frag- mented due to minor intensity fluctuations. It employs two thresholds to make the edge detection process more robust. High Threshold: It is used to start detecting an edge, requiring the gradient (or derivative) to exceed this value to initialize an edge segment. Low Threshold: It is applied to continue the edge, allowing connected points with gradient values below the high threshold but above the low threshold to be included as part of the edge. This dual-threshold mechanism ensures that strong edges are reliably detected and weaker but connected edge segments are preserved. The benefits of hysteresis include robust edge detection, which maintains edge continuity despite small fluctuations in intensity, and avoiding fragmentation, ensuring edges are smooth and not broken into discon- nected segments. This makes hysteresis a key component in advanced edge detection techniques, such as the Canny edge detector. 2.2 Color Recognition A Useful fiture for Object Recognition can Be the color of the object. 2.2.1 Advantages of Color Recognition The color is consistent under geometric transformations THIS means that when an object undergoes transla- tion rotation or scaling its color remains unchanged. Color as a local feature color is defined at each pixel making it a highly localized feature. This means it robust to partial occlusion, meaning that even if part of the object is hidden the visible part’s color still aid in recognition. 2.2. COLOR RECOGNITION 31 Figure 2.7: Hysteresis The direct color usage approach uses the exact color of objects for identification or recognition. Instead of relying on a dominant color of the object for identification on recognition, we can use statistics of object colors and computing them with histograms that capture the distribution of colors within the object. This adds robustness to noise and other variations in appearance. 2.2.2 Color Histograms Color histograms are a representation of the distribution of colors in an image. For each pixel, the values for Red, Green, and Blue (RGB) are given. Histograms are computed for these color channels. For each color channel (Red, Green, Blue), a histogram counts how many pixels in the image have a particular intensity of that color. Luminance histograms represent the brightness of the image. This histogram measures how many pixels have specific levels of brightness, independent of color. These histograms are used as features to describe the color distribution of an object in an image, which can be compared against histograms of other images to identify similarities or match objects. 2.2.3 Joint 3D Color Histograms stead of separate 1D histograms for Red, Green, and Blue, a 3D histogram considers the RGB values together as a vector. This allows for a more precise representation of color combinations present in the image. Each entry in this 3D space represents a combination of Red, Green, and Blue, and the count in each bin represents how many pixels have that specific color combination. This representation makes it easier to compute the similarity of two images. Comparing two 3D histograms can show whether two objects have a similar color composition, even if they are rotated or partially occluded. Like the 1D case, this is a robust representation because it works even if the objects in the image are rotated, partially occluded, or viewed under different lighting conditions. 32 CHAPTER 2. EDGE DETECTION images/Computer Vision/colors_histo.jpeg Figure 2.8: Histogram of color distribution 2.2.4 Color Normalization by Intensity When dealing with color images, each pixel’s color is typically represented by its Red, Green, and Blue (RGB) components. However, the intensity of a color can vary due to changes in lighting or shading. Even if the colors are the same, varying intensity can make them appear different. We can handle this with nor- malization. Intensity of a Pixel The total brightness of each pixel is defined as: I =R+G+B Chromatic representation involves normalizing the color of each pixel by dividing each color component (R, G, B) by the intensity I. This transformation removes the effect of varying brightness or illumination, making the color representation consistent across different lighting conditions. Using Normalization If I know, for example, two colors R, G, I can calculate B using the formula: B =1−R−G We can fully describe the color using just two values. The cube represents the range of possible values for RGB, with axes corresponding to each color channel. The constraint R + G + B = 1 implies that the colors lie on a plane 2.2. COLOR RECOGNITION 33 images/Computer Vision/3dcolor_histo.jpeg Figure 2.9: Histogram of 3D color distribution within this cube, forming a 2D space for normalized color. In image recognition, it’s important to have a color representation that is invariant to lighting changes. The chromatic representation allows systems to focus on the actual color of the object rather than being influenced by lighting conditions. 2.2.5 Recognition Using Histograms This is a method for identifying objects based on their color distributions. 1. Histograms Comparison Step: A histogram representing the color distribution of a ”test image” is com- pared to histograms from a database of ”known objects”. The object whose histogram closely resembles that of the test is identified as the best match. 2. Multiple Views per Object: Since an object can appear in different orientations and lighting conditions, the database stores multiple views of each object. Each view has its histogram.This increases accuracy of object recognition, as the system can compare the test image’s histogram with histograms from different angles or views of the same object. 3. Histogram-Based Retrieval In the example below, a ”query” object is given (e.g., a yellow cat figurine), and its color histogram is used to retrieve similar objects from the database. The system retrieves objects whose color histograms closely match that of the query, displaying objects such as yellow cars or yellow objects (e.g., nuclear waste barrel) in the database.This process highlights the use of histograms for identifying objects based on color similarities, even when the objects belong to different categories but share similar color profiles. 34 CHAPTER 2. EDGE DETECTION images/Computer Vision/norm_colors.jpeg Figure 2.10: Normalized Colors 2.3 Histogram Comparison Technique The Histogram Comparison technique is a method used to measure the similarity or dissimilarity between two histograms. A histogram represents the distribution of certain features (such as pixel intensity, color, or texture) within an image or dataset. In image processing and computer vision, histograms are often used to represent the distribution of colors, brightness, or other properties of an image. 2.3.1 Histogram Comparison: Intersection Method Histogram comparison is a fundamental method in image analysis and computer vision for determining how similar two histograms are. One commonly used metric is Histogram Intersection, which measures the common parts between two histograms. This method is particularly useful in tasks such as object recognition, color analysis, and retrieval of images from databases based on their content. Histogram Intersection Formula The histogram intersection method calculates the similarity between two histograms Q and V by taking the sum of the minimum values for each bin in the histograms. The formula is given by: X ∩(Q, V ) = min(qi , vi ) i Where: Q = [q1 , q2 ,... , qn ] represents the histogram of the first image. 2.3. HISTOGRAM COMPARISON TECHNIQUE 35 images/Computer Vision/RGB_cube.jpeg Figure 2.11: RGB Cube V = [v1 , v2 ,... , vn ] represents the histogram of the second image. qi and vi are the values of the i-th bin in histograms Q and V , respectively. The min(qi , vi ) function returns the minimum value between the corresponding bins. The sum is over all n bins. Motivation Histogram intersection has the following motivations and properties: Measures the Common Parts: This method directly measures the overlap between the histograms by focusing on their common parts. The more similar two histograms are, the greater their intersection. Range: The result of the histogram intersection lies in the range of [0, 1]. – A value of 1 means the histograms are perfectly similar. – A value of 0 indicates no similarity between the histograms. Unnormalized Histograms: For unnormalized histograms, the intersection is scaled by the sum of the histogram values. The following formula is used to normalize the intersection for unnormalized histograms: P P 1 i min(q i , vi ) i min(qi , vi ) ∩(Q, V ) = P + P 2 i qi i vi This normalization ensures that the intersection comparison is not biased by differences in the overall number of elements in the histograms. Advantages and Disadvantages Advantages: – Easy to implement and computationally efficient. – Measures commonality between histograms, which is useful in applications where overlapping features are important. 36 CHAPTER 2. EDGE DETECTION images/Computer Vision/histo_recognition.jpeg Figure 2.12: Recognition using histograms Disadvantages: – Less sensitive to subtle differences between histograms. – It only focuses on common parts and ignores differences, which might lead to poor results in discriminative tasks. Applications The histogram intersection method is widely used in tasks such as: Object Recognition: Comparing the color histograms of objects for recognition purposes. Image Retrieval: Finding images from a database based on similar content. 2.3. HISTOGRAM COMPARISON TECHNIQUE 37 Color-Based Matching: Comparing the color distributions in different images, useful in pattern recognition. images/Computer Vision/histo_interection.jpeg Figure 2.13: histograms Intersection The histogram intersection method is a simple yet effective technique for comparing histograms. Its ability to focus on the commonality between two histograms makes it suitable for tasks such as object recognition and image retrieval, where the overlap between histograms represents similarity. However, its inability to highlight differences between histograms limits its use in more discriminative tasks. 38 CHAPTER 2. EDGE DETECTION 2.3.2 Histogram Comparison: Euclidean Distance The Euclidean distance is a common metric used to measure the difference between two histograms. It calculates the straight-line distance between two points in a multidimensional space, where each bin of the histogram represents a dimension. In the context of histograms, the Euclidean distance is computed by measuring the sum of squared differences between corresponding bins of two histograms. Euclidean Distance Formula The Euclidean distance between two histograms Q and V is given by: sX d(Q, V ) = (qi − vi )2 i Where: Q and V are the two histograms being compared. qi and vi represent the values in the i-th bin of histograms Q and V , respectively. For the purposes of this slide, the square root is omitted, and the formula can be written as: X d(Q, V ) = (qi − vi )2 i Motivation The Euclidean distance focuses on the absolute differences between corresponding bins in two histograms. This metric has the following characteristics: Focus on Differences: It highlights how different the two histograms are by directly measuring the squared difference between corresponding bins. Range: The Euclidean distance has a range of [0, ∞): – A distance of 0 means that the two histograms are identical. – As the difference between the histograms increases, the Euclidean distance grows larger. Equal Weighting: Each bin in the histogram contributes equally to the total distance, meaning that this metric does not prioritize certain parts of the histogram over others. This can be a disadvantage in cases where certain regions of the histogram are more important than others. Not Very Discriminant: While the Euclidean distance is simple to compute, it is not very discriminant for histogram comparison. Small differences in some bins may get overshadowed by large differences in others, making it less effective in distinguishing between similar but slightly different histograms. Advantages and Disadvantages Advantages: – Simple and computationally efficient to calculate. – Effective when the histograms are well-aligned and the differences are significant. Disadvantages: – Sensitive to noise and small variations in the histogram, which may result in large differences even for slightly different histograms. – Treats all differences equally, which may not always be appropriate when some features of the histogram are more important than others. – Not invariant to transformations such as scaling or translation of the histogram values. 2.3. HISTOGRAM COMPARISON TECHNIQUE 39 Applications The Euclidean distance is commonly used in applications where a straightforward, direct comparison of differences between two distributions is sufficient. These include: Image Retrieval: Comparing feature histograms of images to find similar images in a database. Object Detection: Detecting objects based on their color or texture histograms in different frames or images. images/Computer Vision/histo_euclidian.jpeg Figure 2.14: histograms Euclidian Distance The Euclidean distance provides a simple and effective method for comparing histograms, particularly when we are interested in the overall difference between two distributions. However, it has some limitations, especially in terms of discriminative power, making it less ideal for situations where small differences matter or when robustness to noise is required. 2.3.3 Histogram Comparison: Chi-square Distance The Chi-square (χ2 ) distance is a commonly used metric to measure the similarity between two histograms. This metric has its roots in statistics, where it is used to compare observed distributions to expected distributions. In histogram comparison, it is used to calculate how different two histograms are by considering the relative differences between their bins, weighted by the sum of their bin values. Chi-square Distance Formula The formula for the Chi-square distance between two histograms Q and V is given by: X (qi − vi )2 χ2 (Q, V ) = i q i + vi Where: Q and V are the two histograms being compared. qi and vi represent the values in the i-th bin of histograms Q and V , respectively. The denominator (qi + vi ) normalizes the difference between qi and vi , making the distance more robust to larger bin values. 40 CHAPTER 2. EDGE DETECTION Motivation The Chi-square distance has several important properties that make it useful for histogram comparison: Statistical Background: The Chi-square distance is based on the Chi-square test from statistics, which is used to determine whether two distributions differ. This background provides a rigorous way to compare histograms, with the possibility of computing a significance score. – It tests whether the distributions of two histograms are significantly different from each other. – It accounts for the fact that some bins may have higher values than others, normalizing by the sum of the bin values. Range: The Chi-square distance is non-negative, and its range is [0, ∞): – A value of χ2 (Q, V ) = 0 indicates that the two histograms are identical. – As the difference between the histograms increases, the Chi-square distance increases without an upper bound. Non-equal weighting of cells: Unlike Euclidean distance, the Chi-square distance does not weight all bins equally. Bins with higher values contribute less to the overall distance than bins with smaller values, making the Chi-square distance more sensitive to differences in smaller bins. This property makes it more discriminative in some applications. – The cells with higher values are treated as less important compared to cells with lower values. Sensitivity to Outliers: The Chi-square distance can be sensitive to outliers, especially if the bin counts in some regions are very low. To address this, it is often assumed that each bin contains at least a minimum number of samples to avoid large contributions from small differences. Advantages and Disadvantages Advantages: – It is more discriminative than simpler metrics such as Euclidean distance because it emphasizes relative differences between bins. – It can be used in applications where the statistical significance of the difference between histograms is important. Disadvantages: – It is sensitive to bins with small values, potentially leading to overemphasis on small differences in sparsely populated bins. – It may have problems with outliers or sparse histograms, especially if some bins contain very low values. Applications The Chi-square distance is commonly used in applications where it is important to compare the relative proportions of values in different categories. Some typical applications include: Image Retrieval: Used to compare color histograms of images in large image databases. Texture Analysis: Used to compare texture histograms in image processing. Statistical Analysis: Comparing histograms in situations where the statistical distribution of values is of interest. The Chi-square distance is a useful and statistically motivated method for comparing histograms, particularly when the relative difference between bin values is important. It provides a more discriminative measure than simpler metrics, but care must be taken to manage sensitivity to outliers and sparsely populated bins. 2.3. HISTOGRAM COMPARISON TECHNIQUE 41 images/Computer Vision/histo_chisquare.jpeg Figure 2.15: histograms Chi-Square 2.3.4 Which Measure is Best? The choice of the best histogram comparison measure depends on the application. Below are some of the most commonly used measures: Intersection Robustness: Intersection is generally more robust because it only considers the overlapping parts of the histograms. Best use: It works well when the goal is to compare similar color distributions but may be less effective when there are large differences between histograms. Chi-square (X²) Discriminative Power: Chi-square is more discriminative than intersection, giving more weight to differ- ences relative to the bin values. Best use: It is ideal for distinguishing between histograms representing complex data or when the histograms differ slightly. Euclidean Distance Not Robust: Euclidean distance is less robust because it gives equal weight to all parts of the histogram, making it sensitive to outliers. Best use: Works well when histograms are smooth and similar without many outliers. Other Measures Many other measures exist, depending on the context: Kolmogorov-Smirnov test: A statistical test used to compare two distributions and determine if they differ significantly. 42 CHAPTER 2. EDGE DETECTION Kullback-Leibler divergence: An information-theoretic measure used to quantify the difference between two probability distributions. Jeffrey Divergence: A symmetrized version of Kullback-Leibler divergence that handles situations where distributions differ in both directions. 2.3.5 Recognition using Histograms - Nearest-Neighbor Strategy The recognition process using histograms is based on a simple nearest-neighbor strategy. The algorithm proceeds as follows: 1. Build a set of histograms H = {M1 , M2 , M3 ,...} for each known object. More exactly, build histograms for each view of each object to account for changes in perspective. 2. Build a histogram T for the test image. 3. Compare T to each Mk ∈ H using a suitable comparison measure. 4. Select the object with the best matching score, or reject the test image if no object is similar enough (i.e., distance above a threshold t). This approach is commonly referred to as the “Nearest-Neighbor” strategy, where the histogram of the test image is matched to the closest histogram in the database. 2.4 Performance Evaluation When comparing methods for the same task, there are two main approaches to evaluate performance: 1. Compare a Single Number: Metrics such as accuracy (recognition rate) or top-k accuracy are commonly used. Accuracy is defined as the percentage of correct predictions over the total number of test cases. Example: The bar chart in Figure ?? compares the recognition accuracy across different experimental setups. The method “PIPER” achieves the highest accuracy across all setups compared to “naeil” and “bbq baseline.” 2. Compare Curves: Performance curves such as Precision-Recall curves and ROC curves provide a more detailed evaluation of a model’s performance. The Precision-Recall curve is useful in cases of class imbalance, showing the trade-off between precision and recall. The ROC curve plots the true positive rate (recall) against the false positive rate, with the Area Under the Curve (AUC) summarizing performance. Example: The Precision-Recall curve illustrates the performance of the DetectorNet method in two stages for a bird classification task. The ROC curve in shows ROC curves for multiple face verification models, where higher curves represent better performance. 2.4.1 Score-Based Evaluation In object recognition tasks, the recognition algorithm evaluates the similarity between the query object and the training image using a similarity score. The recognition decision is made based on a threshold t, which determines if the query object is classified as matching the training image. Similarity Score: The similarity score s is a normalized value between 0 and 1 that quantifies how closely the query object resembles the training image. 2.4. PERFORMANCE EVALUATION 43 images/Computer Vision/accuracy.jpeg images/Computer Vision/precision_recall.jpeg images/Computer Vision/ROC.jpeg Figure 2.16: Accuracy Figure 2.17: Precision Recall Figure 2.18: ROC – s = 1: indicates a perfect match between the query and training images. – s = 0: indicates no match between the query and training images. Threshold t: The threshold t is the cutoff value used to classify the query object as matching the training image. – If s ≥ t, the query object is classified as a match (positive example). – If s < t, the query object is classified as a non-match (negative example). Positive and Negative Examples: – Positive Example: Represented by a filled circle ( ), where the similarity score s is greater than or equal to the threshold t. – Negative Example: Represented by an empty circle (◦), where the similarity score s is less than the threshold t. 44 CHAPTER 2. EDGE DETECTION 2.4.2 Score-Based Evaluetion Example In the diagram, positive examples (green circles) are cases where the similarity score is above the threshold t, and negative examples (empty circles) are cases where the score is below the threshold. images/Computer Vision/sb_evaluetion.jpeg Figure 2.19: Score-based evaluation showing positive and negative examples. Threshold, Classifier, and Point Metrics The recognition algorithm identifies (classifies) the query object as matching the training image if their similarity is above a threshold t. This decision process is central to building a classifier. The decision boundary is set by the threshold t, and based on this, objects are classified as either positive or negative. The performance of this classifier can be evaluated using metrics like the confusion matrix, precision, recall, etc. Confusion Matrix A confusion matrix is used to evaluate the classification performance of a model. It is a table that is often used to describe the performance of a classification model on a set of test data for which the true values are known. The matrix contains the following values: True Positives (TP): Number of positive samples correctly predicted as positive. False Positives (FP): Number of negative samples incorrectly predicted as positive. False Negatives (FN): Number of positive samples incorrectly predicted as negative. True Negatives (TN): Number of negative samples correctly predicted as negative. The goal is to have high values in the diagonals (TP, TN) and low values off the diagonals (FP, FN). TP FP FN TN 2.4. PERFORMANCE EVALUATION 45 images/Computer Vision/sb_evaluetion2.jpeg Figure 2.20: Score-based evaluation showing positive and negative examples. 2.4.3 Overall Accuracy in Classification In the context of machine learning classification tasks, Accuracy is one of the most commonly used evaluation metrics. It represents the proportion of correctly predicted instances (both positive and negative) out of the total number of predictions made by the model. More formally, Overall Accuracy is the ratio of the sum of True Positives (TP) and True Negatives (TN) to the total number of instances, which includes False Positives (FP) and False Negatives (FN) as well. It is calculated as: TP + TN Accuracy = TP + TN + FP + FN Where: TP (True Positives): Correctly predicted positive instances. TN (True Negatives): Correctly predicted negative instances. FP (False Positives): Incorrectly predicted positive instances. FN (False Negatives): Incorrectly predicted negative instances. The accuracy measures the diagonal elements (TP and TN), which are the correct classifications, as a fraction of the total number of instances. Accuracy gives us a simple and intuitive way of understanding the overall performance of a model. It measures how often the classifier is correct across all categories. In other words, if you predict on 100 test cases and 90 of them are correct, then your accuracy is 90%. Limitations of Accuracy While accuracy might seem like a comprehensive metric at first glance, it can be misleading in certain scenarios, especially when dealing with imbalanced datasets(i.e., where one class significantly outnumbers the other). In im- balanced datasets, a model that predicts the majority class all the time can have high accuracy even though it fails to correctly predict the minority class. If a dataset has a 95% negative class (non-spam) and only 5% positive class (spam), a classifier that predicts all instances as negative will still achieve high accuracy, even though it never correctly identifies any positive instances. 46 CHAPTER 2. EDGE DETECTION images/Computer Vision/sb_evaluetion3.jpeg Figure 2.21: Score-based evaluation showing positive and negative examples. For example, if the model simply predicts ”non-spam” for every email, the accuracy would still be 95% even though it’s completely ineffective at catching spam. TP + TN 0 + 950 Accuracy = = = 0.95 = 95% TP + TN + FP + FN 0 + 950 + 50 + 0 Accuracy is a good metric when: Classes are balanced in the dataset. The cost of different types of errors (false positives vs. false negatives) is similar. In cases of class imbalance or when different error types have different costs, it is better to consider additional metrics (such as precision, recall, F1-score, or ROC-AUC) to make a more informed evaluation of the model’s performance. Alternatives to Accuracy Given the limitations of accuracy in certain scenarios, it is often helpful to use other evaluation metrics that provide a more detailed understanding of the model’s performance, such as: Precision: Focuses on the accuracy of positive predictions (how many of the predicted positives were actually positive). Recall (Sensitivity/True Positive Rate): Measures how many actual positives were correctly identified by the model. F1 Score: The harmonic mean of precision and recall, balancing the trade-off between the two. AUC-ROC (Area Under the Receiver Operating Characteristic Curve): Measures the model’s ability to discriminate between classes across various thresholds, providing a more comprehensive evaluation than accuracy alone. 2.4. PERFORMANCE EVALUATION 47 2.4.4 Overall Precision in Classification Precision is a metric that measures the accuracy of the positive predictions made by the model. It is the proportion of true positive predictions out of all predicted positive instances. The formula for precision is: TP Precision = TP + FP Where: TP (True Positives): The number of correctly predicted positive instances. FP (False Positives): The number of incorrectly predicted positive instances (false alarms). Interpretation High Precision: A high precision value means the model is confident in its positive predictions, with few false positives. Low Precision: A low precision value suggests that the model is making a large number of false positive predictions. Precision focuses on the quality of positive predictions, while recall focuses on how many actual positives are correctly identified. The trade-off between precision and recall can be managed using the F1 score, which is the harmonic mean of precision and recall: 2 · Precision · Recall F1 = Precision + Recall Precision answers: ”Out of all instances predicted as positive, how many were correct?” Recall answers: ”Out of all actual positive instances, how many were correctly predicted?” Precision does not account for false negatives, so it may not be suitable when identifying all positive cases is important. In imbalanced datasets, precision may be misleading if the model makes very few positive predictions. There is often a trade-off between precision and recall. Increasing precision typically reduces recall and vice versa. This trade-off is managed using the F1 Score, which is the harmonic mean of precision and recall, offering a balance between the two metrics Limitations of Precision Precision Does Not Consider False Negatives: Precision only evaluates how accurate the positive predictions are, but it does not consider the instances where the model failed to identify positive cases (false negatives). This can be problematic in situations where identifying all positive instances is important, as in medical tests. Imbalanced Datasets: In highly imbalanced datasets (where one class significantly outnumbers the other), precision can be misleading. For example, if there are very few actual positive instances, the model could achieve high precision by making very few positive predictions but still miss many of the actual positives. 2.4.5 Recall in Classification Recall is a metric that measures the ability of a classifier to correctly identify all positive instances in a dataset. It is also known as True Positive Rate (TPR) or Sensitivity. In simpler terms, recall answers the question: ”Out of all the actual positive cases, how many were correctly predicted as positive?” It is defined as: TP Recall = TP + FN Where: TP (True Positives): The number of correctly predicted positive instances. FN (False Negatives): The number of actual positive instances that were incorrectly predicted as negative. 48 CHAPTER 2. EDGE DETECTION Importance of Recall High Recall: A high recall value indicates that the model is good at identifying positive instances. In applications where missing a positive case is critical, such as detecting cancer or fraud, a high recall is important Low Recall: A low recall value means that the model is missing many positive cases. In such cases, the model is likely to produce many false negatives. Trade-off Between Recall and Precision In classification tasks, there is often a trade-off between Recall and Precision. As recall increases, it typically comes at the expense of precision, and vice versa. This happens because: Recall increases when the model predicts more instances as positive, which can lead to an increase in the number of False Positives (FP). Hence, the model might capture more actual positives, but at the cost of wrongly classifying negatives as positives. Precision improves when the model becomes more conservative in predicting positives, which reduces the number of False Positives, but this can cause a decrease in Recall by missing some actual positives (i.e., increasing the False Negatives (FN)). To balance the trade-off between Recall and Precision, we can use the F1 Score, which is the harmonic mean of Precision and Recall: Precision · Recall F1 = 2 · Precision + Recall The F1 score provides a single metric that takes both precision and recall into account. A higher F1 score indicates a better balance between precision and recall. 100% Recall: Achieving 100% recall means that the model correctly identifies all actual positive cases (i.e., no false negatives), but it may result in a large number of false positives, lowering precision. High Recall vs. High Precision: A model with high recall might have many false positives, while a model with high precision might miss some actual positive cases. Striving for a balance between recall and precision is crucial, depending on the specific application. F1 Score: The F1 score is most useful when both precision and recall are important, as it balances the two. A high F1 score indicates good performance in terms of both precision and recall.

Fundamentals of Data Science Chapter 2 PDF

Document Details

Tags

Related

Summary

Full Transcript