Introduction to Computer Vision PDF Fall 2024
Document Details
Uploaded by EasygoingCognition3318
Toronto Metropolitan University
2024
Toronto Metropolitan University
Dr. Omar Falou
Tags
Summary
This document is a set of lecture notes for a course on Computer Vision, covering topics including digital image formation, sources of images, image processing problems, and visual perception. The lectures were given by Dr. Omar Falou at Toronto Metropolitan University during Fall 2024.
Full Transcript
CPS834/CPS8307 Introduction to Computer Vision Dr. Omar Falou Toronto Metropolitan University Fall 2024 What is a Digital Image? Digital Image — a two-dimensional function x and y are spatial coordinates The amplitude of f is called i...
CPS834/CPS8307 Introduction to Computer Vision Dr. Omar Falou Toronto Metropolitan University Fall 2024 What is a Digital Image? Digital Image — a two-dimensional function x and y are spatial coordinates The amplitude of f is called intensity or gray level at the point (x, y) f ( x, y ) Pixel — the elements of a digital image 2 Sources for Images Electromagnetic (EM) energy spectrum Acoustic Ultrasonic Electronic Synthetic images produced by computer 3 Electromagnetic (EM) energy spectrum Major uses Gamma-ray imaging: nuclear medicine and astronomical observations inspection, microscopy, lasers, biological imaging, X-rays: medical diagnostics, industry, and astronomy, etc. Ultraviolet: lithography, industrial and astronomical observations Visible and infrared bands: light microscopy, astronomy, remote sensing, industry, and law enforcement Microwave band: radar Radio band: medicine (such as MRI) and astronomy 4 Examples: Gamma-Ray Imaging 5 Examples: X-Ray Imaging 6 Examples: Ultraviolet Imaging 7 Examples: Light Microscopy Imaging 8 Examples: Visual and Infrared Imaging 9 Examples: Visual and Infrared Imaging 10 Examples: Automated Visual Inspection 11 Examples: Automated Visual Inspection Results of automated reading of the plate content by the system The area in which the imaging system detected the plate 12 Example of Radar Image 13 Examples: MRI (Radio Band) 14 Examples: Ultrasound Imaging 15 Examples: Ultrasound Imaging 16 Image Processing —> Computer Vision Image Processing Problems Image Processing Problems Image Processing Problems Image Processing Problems Image Processing Problems Image Processing Problems Image Processing Problems Image Processing Problems Image Processing Problems Image Processing Problems Element of Visual Perception Although the digital image processing field is built on a foundation of mathematical and probabilistic formulations, human intuition and analysis play a central role in the choice of one technique versus another, and this choice often is made based on subjective, visual judgments. Hence, developing a basic understanding of human visual perception is very important 28 Structure of the Human Eye 29 Structure of the Human Eye Cornea : tough, transparent tissue, covers the anterior surface of the eye. Sclera : Opaque membrane, encloses the remainder of the optic globe Choroid : Lies below the sclera, contains network of blood vessels that serve as the major source of nutrition to the eye. Lens: absorb both infrared and ultraviolet by proteins within the lens structure and, in excessive amounts, can cause damage to the eye Retina: Inner most membrane of the eye which lines inside of the wall’s entire posterior portion. When the eye is properly focused, light from an object outside the eye is imaged on the retina 30 Receptors Receptors (neurons) is distributed over the surface of the retina. Receptors are divided into 2 classes: Cones and Rods Cones (neurons 6-7 million) – located primarily in the central portion of the retina (the fovea, muscles controlling the eye rotate the eyeball until the image falls on the fovea). – Highly sensitive to color. – Each is connected to its own nerve end thus human can resolve fine details. – Cone vision is called photopic or bright-light vision. 31 Receptors Rods (75-150 million), – distributed over the retina surface. – Several rods are connected to a single nerve end reduce the amount of detail discernible. – Serve to give a general, overall picture of the field of view. – Sensitive to low levels of illumination. – Rod vision is called scotopic or dim-light vision. 32 Blind Spot Figure 2.2 shows the density of rods and cones for a cross section of the right eye passing through the region of emergence of the optic nerve from the eye. The absence of receptors in this area results in the so- called blind spot (see Fig. 2.1). Except for this region, the distribution of receptors is radially symmetric about the fovea. 33 Blind Spot 34 Image Formation in the Eye The principal difference between the lens of the eye and an ordinary optical lens is that the former is flexible. The distance between the center of the lens and the retina (called the focal length) varies from approximately 17 mm to about 14 mm, as the refractive power of the lens increases from its minimum to its maximum. 35 Brightness Adaptation and Discrimination Two phenomena clearly demonstrate that perceived brightness is not a simple function of intensity. The first is based on the fact that the visual system tends to undershoot or overshoot around the boundary of regions of different intensities. Figure 2.7(a) shows a striking example of this phenomenon. Although the intensity of the stripes is constant, we actually perceive a brightness pattern that is strongly scalloped, especially near the boundaries [Fig. 2.7(b)]. These seemingly scalloped bands are called Mach bands after Ernst Mach, who first described the phenomenon in 1865. 36 Brightness Adaptation and Discrimination 37 Brightness Adaptation and Discrimination The second phenomenon, called simultaneous contrast, is related to the fact that a region’s perceived brightness does not depend simply on its intensity, as Fig. 2.8 demonstrates. All the center squares have exactly the same intensity. However, they appear to the eye to become darker as the background gets lighter. 38 Optical Illusions 39 Light and EM Spectrum c = E = h , h : Planck's constant. C = 2.998*108 m/s h = 6.62607004 × 10-34 m2 kg / s 40 Light and EM Spectrum ► The colors that humans perceive in an object are determined by the nature of the light reflected from the object. e.g. green objects reflect light with wavelengths primarily in the 500 to 570 nm range while absorbing most of the energy at other wavelength 41 Light and EM Spectrum ► Monochromatic light: void of color Intensity is the only attribute, from black to white Monochromatic images are referred to as gray-scale images ► Chromatic light bands: 0.43 to 0.79 um The quality of a chromatic light source: Radiance (W, watts): total amount of energy Luminance (lm, lumens): the amount of energy an observer perceives from a light source Brightness: a subjective descriptor of light perception that is impossible to measure. It embodies the achromatic notion of intensity and one of the key factors in describing color sensation. 42 Image Acquisition Transform illumination energy into digital images 43 Image Acquisition Using a Single Sensor 44 Image Acquisition Using Sensor Strips 45 Image Acquisition Process 46 A Simple Image Formation Model 47 The single image formation model describes how an image is generated by the interaction of light with the objects in a scene. The equation is: f(x, y) = i(x, y) \cdot r(x, y) ### Components: 1. **\(f(x, y)\)**: The intensity of the image at the pixel coordinates \((x, y)\), representing what the sensor captures. 2. **\(i(x, y)\)**: The **illumination component**, representing the amount of light falling on the scene at \((x, y)\). It depends on the light source. 3. **\(r(x, y)\)**: The **reflectance component**, representing the proportion of light reflected by the surface at \((x, y)\). It depends on the surface material and color. ### Explanation: - The model assumes the observed image intensity \(f(x, y)\) is the product of the illumination \(i(x, y)\) and the reflectance \(r(x, y)\). - **Illumination (\(i(x, y)\))** captures environmental lighting conditions. - **Reflectance (\(r(x, y)\))** is a property of the object in the scene. ### Applications: This model helps in tasks like: - **Illumination normalization**: Separating \(i(x, y)\) from \(r(x, y)\) for consistent recognition. - **Image enhancement**: Adjusting illumination or reflectance for better visualization or analysis. Some Typical Ranges of illumination Illumination Lumen — A unit of light flow or luminous flux Lumen per square meter (lm/m2) — The metric unit of measure for illuminance of a surface – On a clear day, the sun may produce in excess of 90,000 lm/m2 of illumination on the surface of the Earth – On a cloudy day, the sun may produce less than 10,000 lm/m2 of illumination on the surface of the Earth – On a clear evening, the moon yields about 0.1 lm/m2 of illumination – The typical illumination level in a commercial office is about 1000 lm/m2 48 Some Typical Ranges of Reflectance Reflectance – 0.01 for black velvet – 0.65 for stainless steel – 0.80 for flat-white wall paint – 0.90 for silver-plated metal – 0.93 for snow 49 Gray Level The intensity of a monochrome image f at coordinate l(x,y) the gray level of the image at that point. gray scale = [Lmin, Lmax], common practice, shift the interval to [0, L], 0 = black , L = white 50 Image Sampling and Quantization Digitizing the coordinate values Digitizing the amplitude values 51 Image Sampling and Quantization 52 Coordinate Convention 53 Representing Digital Images The representation of an M×N numerical array as f (0, 0) f (0,1)... f (0, N − 1) f (1, 0) f (1,1)... f (1, N − 1) f ( x, y ) = ............ f ( M − 1, 0) f ( M − 1,1)... f ( M − 1, N − 1) 54 Representing Digital Images The representation of an M×N numerical array as a0,0 a0,1... a0, N −1 a a1,1... a1, N −1 A= 1,0 ............ aM −1,0 aM −1,1... aM −1, N −1 55 Representing Digital Images Discrete intensity interval [0, L-1], L=2k The number b of bits required to store a M × N digitized image b=M×N×k When an image can have 2k gray levels, it is common practice to refer to the image as a “k-bit image.” For example, an image with 256 possible gray-level values is called an 8-bit image. 56 Representing Digital Images 57 Coloured Image Spatial and Intensity Resolution Spatial resolution — A measure of the smallest discernible detail in an image — stated with line pairs per unit distance, dots (pixels) per unit distance, dots per inch (dpi) Intensity resolution — The smallest discernible change in intensity level — stated with 8 bits, 12 bits, 16 bits, etc. 59 Spatial and Intensity Resolution 60 Spatial and Intensity Resolution 61 Spatial and Intensity Resolution 62 Image Interpolation Interpolation — Process of using known data to estimate unknown values e.g., zooming, shrinking, rotating, and geometric correction Interpolation (sometimes called resampling) — an imaging method to increase (or decrease) the number of pixels in a digital image. Some digital cameras use interpolation to produce a larger image than the sensor captured or to create digital zoom 63 Image Interpolation: Nearest Neighbor Interpolation f1(x2,y2) = f(x1,y1) f(round(x2), round(y2)) =f(x1,y1) f1(x3,y3) = f(round(x3), round(y3)) =f(x1,y1) 64 Image Interpolation: Bilinear Interpolation (x,y) where the four coefficients are determined from the four equations in four unknowns that can be written using the four nearest neighbors of point 65 Image Interpolation: Bicubic Interpolation The intensity value assigned to point (x,y) is obtained by the following equation 3 3 f3 ( x, y ) = aij x y i j i =0 j =0 The sixteen coefficients are determined by using the sixteen nearest neighbors. 66 Examples: Interpolation 67 Examples: Interpolation 68 Examples: Interpolation 69 Examples: Interpolation 70 Examples: Interpolation 71 Examples: Interpolation 72 Examples: Interpolation 73 Examples: Interpolation 74 Distance Measures Given pixels p, q and z with coordinates (x, y), (s, t), (u, v) respectively, the distance function D has following properties: a. D(p, q) ≥ 0 [D(p, q) = 0, iff p = q] b. D(p, q) = D(q, p) c. D(p, z) ≤ D(p, q) + D(q, z) 75 Distance Measures The following are the different Distance measures: p(x, y),q(s,t), a. Euclidean Distance : De(p, q) = [(x-s)2 + (y-t)2]1/2 b. City Block Distance: D4(p, q) = |x-s| + |y-t| c. Chess Board Distance: D8(p, q) = max(|x-s|, |y-t|) 76 Introduction to Mathematical Operations in DIP Array vs. Matrix Operation a11 a12 b11 b12 A= B= a a 21 22 b b 21 22 Array product operator a11b11 a12b12 A.* B = Array product a b a b 21 21 22 22 Matrix product a11b11 + a12b21 a11b12 + a12b22 Matrix product operator A*B= a b + a b a b + 21 11 22 21 21 12 22 22 a b 77 Introduction to Mathematical Operations Linear vs. Nonlinear Operation H f ( x, y ) = g ( x, y ) H ai fi ( x, y ) + a j f j ( x, y ) Additivity = H ai f i ( x, y ) + H a j f j ( x, y ) = ai H f i ( x, y ) + a j H f j ( x, y ) Homogeneity = ai gi ( x, y ) + a j g j ( x, y ) H is said to be a linear operator; H is said to be a nonlinear operator if it does not meet the above qualification. 78 Arithmetic Operations Arithmetic operations between images are array operations. The four arithmetic operations are denoted as s(x,y) = f(x,y) + g(x,y) d(x,y) = f(x,y) – g(x,y) p(x,y) = f(x,y) × g(x,y) v(x,y) = f(x,y) ÷ g(x,y) 79 Example: Addition of Noisy Images for Noise Reduction Noiseless image: f(x,y) Noise: n(x,y) (at every pair of coordinates (x,y), the noise is uncorrelated and has zero average value) Corrupted image: g(x,y) g(x,y) = f(x,y) + n(x,y) Reducing the noise by adding a set of noisy images, {gi(x,y)} K 1 g ( x, y ) = K g ( x, y ) i =1 i 80 Example: Addition of Noisy Images for Noise Reduction K 1 g ( x, y ) = K g ( x, y ) i =1 i 1 K E g ( x, y ) = E gi ( x, y ) K i =1 1 K = E f ( x, y ) + ni ( x, y ) K i =1 1 K = f ( x, y ) + E K i =1 ni ( x, y ) = f ( x, y ) 81 Example: Addition of Noisy Images for Noise Reduction ► In astronomy, imaging under very low light levels frequently causes sensor noise to render single images virtually useless for analysis. ► In astronomical observations, similar sensors for noise reduction by observing the same scene over long periods of time. Image averaging is then used to reduce the noise. 82 83 An Example of Image Subtraction: Mask Mode Radiography Mask h(x,y): an X-ray image of a region of a patient’s body Live images f(x,y): X-ray images captured at TV rates after injection of the contrast medium Enhanced detail g(x,y) g(x,y) = f(x,y) - h(x,y) The procedure gives a movie showing how the contrast medium propagates through the various arteries in the area being observed. 84 85 An Example of Image Multiplication 86 Set and Logical Operations 87 Set and Logical Operations Let A be the elements of a gray-scale image The elements of A are triplets of the form (x, y, z), where x and y are spatial coordinates and z denotes the intensity at the point (x, y). A = {( x, y, z ) | z = f ( x, y )} The complement of A is denoted Ac Ac = {( x, y, K − z ) | ( x, y, z ) A} K = 2k − 1; k is the number of intensity bits used to represent z 88 Set and Logical Operations The union of two gray-scale images (sets) A and B is defined as the set A B = {max(a, b) | a A, b B} z 89 Set and Logical Operations 90 Set and Logical Operations 91 Spatial Operations Single-pixel operations Alter the values of an image’s pixels based on the intensity. s = T ( z) e.g., 92 Spatial Operations Neighborhood operations The value of this pixel is determined by a specified operation involving the pixels in the input image with coordinates in Sxy 93 Spatial Operations Neighborhood operations 94 Geometric Spatial Transformations Geometric transformation (rubber-sheet transformation) consists of two basic operations: — A spatial transformation of coordinates ( x, y) = T {(v, w)} — intensity interpolation that assigns intensity values to the spatially transformed pixels. One of the most commonly used spatial coordinate transformation is the: Affine transform t11 t12 0 x y 1 = v w 1 t21 t22 0 t31 t32 1 95 96 Intensity Assignment Forward Mapping ( x, y) = T {(v, w)} Scanning the pixels of the input image and, at location, (v,w), computing the spatial location, (x,y) of the corresponding pixel in the output image It’s possible that two or more pixels can be transformed to the same location in the output image. Inverse Mapping −1 (v, w) = T {( x, y )} Scans the output pixel locations and, at each location (x,y), computes the corresponding location in the input image. It then interpolates among the nearest input pixels to determine the intensity of the output pixel value. Inverse mappings are more efficient to implement than forward mappings (used in MATLAB). 97 Example: Image Rotation and Intensity Interpolation 98 Image Registration Input and output images are available but the transformation function is unknown. Goal: estimate the transformation function and use it to register the two images. One of the principal approaches for image registration is to use tie points (also called control points) The corresponding points are known precisely in the input and output (reference) images. 99 Image Registration A simple model based on bilinear approximation: x = c1v + c2 w + c3vw + c4 y = c5v + c6 w + c7 vw + c8 Where (v, w) and ( x, y ) are the coordinates of tie points in the input and reference images. 100 Image Registration 101 Image Transform A particularly important class of 2-D linear transforms, denoted T(u, v) M −1 N −1 T (u, v) = f ( x, y )r ( x, y, u, v) x =0 y =0 where f ( x, y ) is the input image, r ( x, y, u, v) is the forward transformation ker nel , variables u and v are the transform variables, u = 0, 1, 2,..., M-1 and v = 0, 1,..., N-1. 102 Image Transform Given T(u, v), the original image f(x, y) can be recoverd using the inverse tranformation of T(u, v). M −1 N −1 f ( x, y ) = T (u, v) s( x, y, u, v) u =0 v =0 where s( x, y, u, v) is the inverse transformation ker nel , x = 0, 1, 2,..., M-1 and y = 0, 1,..., N-1. 103 Image Transform 104 Example: Image Denoising by Using A Transform 105 2-D Fourier Transform M −1 N −1 T (u, v) = f ( x, y )e − j 2 ( ux / M + vy / N ) x =0 y =0 M −1 N −1 1 f ( x, y ) = MN T (u, v)e u =0 v =0 j 2 ( ux / M + vy / N ) 106