02_CV2425_Human and Computer Vision_LR.pdf
Document Details
Uploaded by SlickAlgebra4715
Universitas Indonesia
2024
Tags
Full Transcript
Human Vision and Computer Vision Dr. Eng. Laksmita Rahadianti, Muhammad Febrian Rachmadi, Ph.D., Dr. Dina Chahyati, Prof. Dr. Aniati M. Arymurthy CSCE604133 Computer Vision Fakultas Ilmu Komputer Universitas Indonesia ...
Human Vision and Computer Vision Dr. Eng. Laksmita Rahadianti, Muhammad Febrian Rachmadi, Ph.D., Dr. Dina Chahyati, Prof. Dr. Aniati M. Arymurthy CSCE604133 Computer Vision Fakultas Ilmu Komputer Universitas Indonesia 2 Acknowledgements These slides are created with reference to: Computer Vision: Algorithms and Applications, 2nd ed., Richard Szeliski https://szeliski.org/Book/ Digital Image Processing, Gonzales and Woods, 3rd ed, 2008. Course slides for CSCE604133 Image Processing – Faculty of Computer Science, Universitas Indonesia Introduction to Computer Vision, Cornell Tech https://www.cs.cornell.edu/courses/cs5670/2024sp/lectures/lectures.html Computer Vision, University of Washington https://courses.cs.washington.edu/courses/cse576/08sp/ Computer Vision 2024 - Human and Computer Vision 3 The 3 R’s of Computer Vision Malik, Jitendra, et al. "The three R’s of computer vision: Recognition, reconstruction and reorganization." Pattern Recognition Letters 72 (2016): 4-14. Human Vision can do these tasks effortlessly Or Registration Computer Vision 2024 - Human and Computer Vision 4 3D Scene vs 2D Image Forward Process Inverse Process Models the physical process from 3D Taking the 2D image, and extracting scenes (movement, light) that is the information, 3D scene, and other projected onto a 2D image. properties. Computer Vision 2024 - Human and Computer Vision Image courtesy of Alina Fiene, Unsplash. 5 Image Acquisition Computer Vision 2024 - Human and Computer Vision 6 Image Acquisition Image acquisition requires 3 components: Light source Sensor Scene Computer Vision 2024 - Human and Computer Vision 7 The Light Source Let’s look at the light Light source Sensor Scene Computer Vision 2024 - Human and Computer Vision 8 Newton’s Experiment Conclusion 1: the white light was composed of a mixture of all colours in the spectrum Conclusion 2: the spectral colours were in fact the basic components (monochromatic lights) of the white light Conclusion 3: all the colours in the spectrum can be reunited to form the original white light again (by focusing the components back through a reversed prism). Computer Vision 2024 - Human and Computer Vision 9 Light as an Electromagnetic Wave Light is a wave: with wavelength (λ / lambda) and frequency 𝑓 The electromagnetic spectrum: 400 nm 700 nm Computer Vision 2024 - Human and Computer Vision 10 Light / Light sources / Illuminants Monochromatic light Chromatic light (visible light) At one wavelength only, from The visible spectrum (λ 400-700 nm) coherent light sources Sunlight, most light bulbs Lasers Why are the illuminants important? Computer Vision 2024 - Human and Computer Vision 11 CIE: Commission internationale de l'éclairage (International Commission on Illumination) is the international authority on light, illumination, colour, and colour spaces. Spectral Power Distribution of Illuminants Different lights are spectrally different and appear differently Computer Vision 2024 - Human and Computer Vision 12 The Scene Let’s look at the scene next Light source Sensor Scene Computer Vision 2024 - Human and Computer Vision 13 Spectral Reflectance of Objects Computer Vision 2024 - Human and Computer Vision 14 The Sensor Let’s look at the sensor next Light source Sensor Scene Computer Vision 2024 - Human and Computer Vision 15 Human Vision Computer Vision 2024 - Human and Computer Vision 16 The Human Eye as a Sensor The image captured is not a physical image, but a perception Light source Sensor Scene Stimulus Sensor Image Light Eye Perception Computer Vision 2024 - Human and Computer Vision (Gonzalez & Woods, 1992) 17 The Human Eye Light enters the eye An image is captured on the retina The image is translated into biological signals Signals are transmitted to the brain Processing in the brain Visual perception Computer Vision 2024 - Human and Computer Vision 18 Image Formation on the Retina The retina has optical sensors to perceive Cones (6-7 million sensors) Cones are sensitive to color, resulting in Photopic Vision Rods (75-150 million sensors) Rods create a more general overall image, resulting in Scotopic Vision (Color Vision) Computer Vision 2024 - Human and Computer Vision 19 Image Formation on the Eye The Spectral Power Distribution and Reflectance Coefficient 𝜌 change the light that arrives at the sensor SPD of Illuminants (𝐸) Spectral Sensitivity/ Color Matching Functions Reflectance Coefficient 𝜌 The human eye has spectral sensitivity Computer Vision 2024 - Human and Computer Vision (Gonzalez & Woods, 1992) 20 Human Visual Perception The Human Eye is Weird Image capture by the human eye is not that simple! Light enters the eye An image is captured on the retina The image is translated into biological signals Signals are transmitted to the brain Processing in the brain Visual perception ? Computer Vision 2024 - Human and Computer Vision 21 Visual Perception in The Brain The image is a perception based on the process on the brain. Computer Vision 2024 - Human and Computer Vision 22 Brightness Adaptation Mach Bands: Perceived intensity is not a simple function of actual intensity Actual Intensity Perceived Intensity Computer Vision 2024 - Human and Computer Vision 23 Brightness Adaptation Simultaneous Contrast: a region’s perceived brightness does not depend simply on its intensity. Computer Vision 2024 - Human and Computer Vision 24 Other Curious Things on the Human Eye The eye fills in non-existing info or wrongly perceives geometrical properties of objects Computer Vision 2024 - Human and Computer Vision 25 What about with Color? Is A and B the same color? Color Constancy Computer Vision 2024 - Human and Computer Vision 26 What about with Motion? Computer Vision 2024 - Human and Computer Vision 27 The Human Eye Becomes the basis on which the digital camera is built on. We try to obtain an image similar to the perception we obtain in our eyes. What can we replicate in digital images? What information can we recover from digital images with computer vision? There will be some things that computer vision simply can not do. Computer Vision 2024 - Human and Computer Vision 28 Digital Image Acquisition Computer Vision 2024 - Human and Computer Vision 29 Digital Cameras as Sensors The camera has a digital sensor Light source Sensor Scene Computer Vision 2024 - Human and Computer Vision 30 Digital Images Light source Sensor Scene Computer Vision 2024 - Human and Computer Vision 31 Digital Images A matrix where each matrix element the gray level intensity 𝒇 𝒙, 𝒚 Intensity function 𝒇(𝒙, 𝒚) : 𝑥 and 𝑦 is the spatial coordinate on the matrix and the function value of 𝑓(𝑥, 𝑦) is the intensity level at that location Intensity function 𝒇(𝒙, 𝒚) is obtained through Spatial discretization (sampling) Intensity discretization (quantization); Computer Vision 2024 - Human and Computer Vision 32 Sampling and Quantization We only have limited “spots” that can be used to store the intensities We only have limited “values” that can be used to represent the intensities Sampling Quantization Computer Vision 2024 - Human and Computer Vision 33 Image Intensity Resolution How smooth / rough is the division of the intensity level values represented in each pixel. Transforming the continuous signal to discrete intensities in a digital image matrix is quantization. White 255 Gray 128 Black 0 Example: 8-bit image Computer Vision 2024 - Human and Computer Vision 34 Image Spatial Resolution How smooth / rough is the division of the grid (rows and columns) of pixels. Transforming the continuous signal to a limited number of values in a grid is digitization / sampling. Picture Computer Vision 2024 - Human and Computer Vision 35 Digital Images y Intensity function 𝑓(𝑥, 𝑦) Discretization of.. Spatial resolution (Sampling) Intensity resolution (Quantization) Picture elements (pixels) x Computer Vision 2024 - Human and Computer Vision 36 Pixel Neighbors y A pixel 𝑝 𝑥, 𝑦 Has 4 horizontal and vertical neighbors 4-neighbors 𝑵𝟒 (𝒑) 𝑥 + 1, 𝑦 , 𝑥 − 1, 𝑦 , (𝑥, 𝑦 + 1), (𝑥, 𝑦 − 1) x Computer Vision 2024 - Human and Computer Vision 37 Pixel Neighbors (2) y A pixel 𝑝 𝑥, 𝑦 Has 4 more diagonal neighbors 8-neighbors 𝑵𝟖 (𝒑) 𝑥 + 1, 𝑦 + 1 , 𝑥 + 1, 𝑦 − 1 (𝑥 − 1, 𝑦 + 1), (𝑥 − 1, 𝑦 − 1) x Computer Vision 2024 - Human and Computer Vision 38 Adjacency and Connectivity 4-adjacency: 𝑝 and 𝑞 that fulfill 𝑉 are 4-adjacent if 𝑞 ∈ 𝑁4 (𝑝) 0 1 1 0 1 0 0 0 1 8-adjacency: 𝑝 and 𝑞 that fulfill 𝑉 are 8-adjacent if 𝑞 ∈ 𝑁8 (𝑝) 0 1 1 0 1 0 0 0 1 m-adjacency: 𝑝 and 𝑞 that fulfill 𝑉 are m-adjacent if 0 1 1 𝑞 ∈ 𝑁4 (𝑝), or 0 1 0 𝑞 ∈ 𝑁𝐷 (𝑝) and 𝑁4 (𝑝) ∩ 𝑁4 (𝑞) has no pixels that fulfil 𝑉 0 0 1 Computer Vision 2024 - Human and Computer Vision 39 Connected Components For 𝑉 = {1} , we obtain the connected R is a region if R is a connected set components the image. Regions are adjacent if their union 4-neighbors 8-neighbors becomes a connected set, otherwise they are disjoint. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 𝑹𝟏 0 1 1 0 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 1 1 0 0 0 1 1 1 𝑹𝟐 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Within 1 connected component, all the **Are 𝑅1 and 𝑅2 adjacent? pixels make up a connected set. Computer Vision 2024 - Human and Computer Vision 40 Mathematical Operators on Images + - ∗ / Examples: Substracting two temporal images can be used to detect a change in an area: No change : difference is 0 Some change: difference is not 0 Computer Vision 2024 - Human and Computer Vision 41 Logic Operators on Images 𝑂𝑅 𝐴𝑁𝐷 𝑁𝑂𝑇 Example: Masking (𝐴𝑁𝐷) operation to separate the object and background of a biomedical images XRay Mask 𝐴𝑁𝐷 Computer Vision 2024 - Human and Computer Vision 42 *) Unless mentioned otherwise Note: Operators on Images Operations Array operations: pixel-by-pixel 𝑎11 𝑎12 𝑏11 𝑏12 𝑎11 𝑏11 𝑎12 𝑏12 𝑎21 𝑎22 × 𝑏21 𝑏22 = 𝑎21 𝑏21 𝑎22 𝑏22 Matrix operations 𝑎11 𝑎12 𝑏11 𝑏12 𝑎 𝑏 + 𝑎12 𝑏21 𝑎11 𝑏12 + 𝑎12 𝑏22 𝑎21 𝑎22 × = 11 11 𝑏21 𝑏22 𝑎21 𝑏11 + 𝑎22 𝑏21 𝑎21 𝑏12 + 𝑎22 𝑏22 Mathematical operations on images → assume array operations*) Computer Vision 2024 - Human and Computer Vision 43 Distortions in Digital Images (Source: Ira Hastitu et. al, 2002) Geometric Distortions Spatial distortion Result of a change in position and direction because of movement of the person capturing the image of the object captured *) Could also be a result of a fault of the internal camera sensor Computer Vision 2024 - Human and Computer Vision 44 Distortions in Digital Images (2) Radiometric Distortions Due to an incorrect intensity distribution Result of a different atmosphere / environment condition (haze, etc) resulting in a different gray level captured *) Could also be a result of a fault of the internal camera sensor Source: Leaves on a branch (MSU, 1990) Can be corrected digitally through filters Computer Vision 2024 - Human and Computer Vision 45 Image Formation in the Eye vs the Camera Photo camera: lens has fixed focal length. We can focus at various distances by varying the distance between lens and imaging plane (location of film or chip) Human eye: Distance lens-imaging region (retina) is fixed. Focal length for proper focus obtained by varying the shape of the lens. Computer Vision 2024 - Human and Computer Vision 46 Camera and World Coordinate System X x Z - 𝑦𝑌 Image plane 𝑥𝑋 (𝑋, 𝑌, 𝑍) 𝑧 Lens focus point 𝑍 (𝑥, 𝑦) The camera coordinate system (𝑥, 𝑦, 𝑧) aligned with the world coordinate system (𝑋, 𝑌, 𝑍) Computer Vision 2024 - Human and Computer Vision 47 Camera and World Coordinate System (3) X x Z - With both axes (camera and world coordinates) aligned, the object in the world and the image on the image plane with create similar triangles (segitiga sama dan sebangun) −𝑥 𝑋 = λ 𝑍−λ λ𝑋 λ𝑌 𝑥= 𝑦= λ−𝑍 λ−𝑍 Computer Vision 2024 - Human and Computer Vision 48 Homogeneous Coordinate System Object coordinates in the world system are commonly written in the homogeneous coordinate system For the Cartesian coordinates (𝑊𝑐) dan homogeneous coordinates (𝑊ℎ): 𝑘𝑋 𝑋 𝑘𝑌 𝑊𝐶 = 𝑌 𝑊𝐻 = 𝑘𝑍 𝑍 𝑘 𝑘 is a non-zero constant, usually 𝑘 = 1. Thus, how can the Cartesian coordinate 𝑊𝑐 (𝑋, 𝑌, 𝑍) be obtained from 𝑊ℎ? Computer Vision 2024 - Human and Computer Vision 49 Geometric Transformations in the Homogeneous Coordinate System Why? The need of a uniform representation for various cameras and setups (homogeneous representation) To enable a composite transformation efficiently To save the normalization factor of the coordinates due to a series of transformations Computer Vision 2024 - Human and Computer Vision 50 Geometric Transformations in Images Recall linear geometric transformation through matrix Translation multiplication x Scale / Dilatation Rotation a Computer Vision 2024 - Human and Computer Vision 51 Geometric Transformations in Images Translation 𝑋’ = 𝑋 + 𝑇𝑥 𝑌’ = 𝑌 + 𝑇𝑦 x Scale / Dilatation 𝑋’ = 𝑆𝑥. 𝑋 𝑌’ = 𝑆𝑦. 𝑌 Rotation 𝑋’ = 𝑋 cos(𝑎) 𝑌’ = 𝑋 sin(𝑎) a Computer Vision 2024 - Human and Computer Vision 52 Transformation Matrices for Geometric Transformations 1 0 0 𝑇𝑥 Translation 0 1 0 𝑇𝑦 0 0 1 𝑇𝑧 x 0 0 0 1 𝑆𝑥 0 0 0 Scale / Dilatation 0 𝑆𝑦 0 0 0 0 𝑆𝑧 0 0 0 0 1 1 0 0 0 Rotation 0 cos 𝛼 sin 𝛼 0 0 − sin 𝛼 cos 𝛼 0 a 0 0 0 1 Computer Vision 2024 - Human and Computer Vision 53 Perspective Transformation Perspective transformation matrix 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 − 1 𝜆 1 Minus means the image is upside down, λ is the lens focus length, and is the 𝜆 scale factor. The object coordinate on the camera can be obtained from the object coordinate in the world using the perspective transformation Computer Vision 2024 - Human and Computer Vision 54 Homogeneous Coordinate System on the Camera The object coordinates in the camera : 𝐶𝐶 for 𝐶ℎ the Cartesian and homogeneous coordinates, respectively Homogeneous coordinate system on the camera: 1 0 0 0 𝑘𝑋 𝑘𝑋 0 1 0 0 𝑘𝑌 𝐶ℎ = 1 0 𝑘𝑌 = 𝑘𝑍 0 0 1 𝑘𝑍 𝑘𝑍 0 0 − 1 𝑘 −( ) + 𝑘 𝜆 𝜆 Cartesian coordinate on the camera 𝐶𝐶 (𝑋, 𝑌, 𝑍) can thus be obtained from the homogeneous coordinates 𝐶ℎ : Computer Vision 2024 - Human and Computer Vision 55 Basic Camera Mathematical Model The Cartesian coordinates of the camera systems The relation between the (𝑥, 𝑦, 𝑧) and (𝑋, 𝑌, 𝑍) is the Basic Camera Mathematical Model Computer Vision 2024 - Human and Computer Vision 56 Color Image Formation Computer Vision 2024 - Human and Computer Vision 57 Pseudocolor Images Pseudocolor / false color : The color in the image is obtained by manually assigning a color to a certain range of gray level intensities for visualization purposes. Why pseudocolor? Human vision can distinguish thousands of colors Human vision can only distinguish less than a hundred grayscale values Computer Vision 2024 - Human and Computer Vision 58 Color Image Acquisition SPD of Illuminants (𝐸) Spectral Sensitivity/ Color Matching Functions Reflectance Coefficient 𝜌 Camera Sensor Sensitivities for RGB Capture: a Nikon D70 (solid) and Canon 20D (dotted). b Fujifilm 3D (Left - solid, Right - dotted). Computer Vision 2024 - Human and Computer Vision 59 Digital Color Images SPD of Illuminants (𝐸) Spectral Sensitivity/ Color Matching Functions Reflectance Coefficient 𝜌 y The final image captured is the result of matrix multiplication of Illumination 𝐸, reflectance 𝜌 , and sensor sensitivities of (𝑅. 𝐺. 𝐵) 𝑐 ∈ {𝑅, 𝐺, 𝐵} Resulting in the RGB Image x Computer Vision 2024 - Human and Computer Vision 60 Color is a Perception / Sensation SPD of Illuminants (𝐸) Spectral Sensitivity/ Color Matching Functions Reflectance Coefficient 𝜌 Each sensor (eye, camera) will have a specific spectral sensitivity (R,G,B) You may not all see the same color How is color defined/communicated? Computer Vision 2024 - Human and Computer Vision 61 A Bit of Color Trivia How many colors can humans distinguish? The estimate is more than 1 million unique colors How do we communicate with each other and describe a certain color perception? Samples. Human vocabulary is very limited We need some type of system that can order and organize the colors To facilitate the specification of color Based on some attributes that were agreed upon Color Order Systems / Color Model Computer Vision 2024 - Human and Computer Vision 62 RGB Color Model Standard model for color monitor Based on tri-stimulus theory of color vision There are so many variables among different systems / hardware, so we need a reliable subset of colors that can always be reproduced reliably on any system Safe RGB colors / all-systems safe colors / safe Web colors/ safe browser colors. The variety of color reproducing systems process colors differently We have 216 colors that can be reproduced by all systems de facto Computer Vision 2024 - Human and Computer Vision 63 CIE 15: Technical Report: Colorimetry, 3rd edition. 2014 CIELAB Color Space The nonlinear relations for 𝐿 ∗, 𝑎 ∗, and 𝑏 ∗ are intended to mimic the nonlinear response of the eye Components 𝐿, 𝑎 ∗, 𝑏 ∗ 𝑳 ∗ for the lightness 𝒂 ∗ and 𝒃 ∗ for the green–red and blue–yellow components Computer Vision 2024 - Human and Computer Vision 64 HSI (also known as HSV / HSL) Hue Saturation Intensity Computer Vision 2024 - Human and Computer Vision 65 Many Color Systems Exist Color systems: CIEXYZ, CIELAB, CIELUV RGB variants: sRGB, Adobe RGB YIQ, YUV, YCbr Different systems may be suitable for different tasks Computer Vision 2024 - Human and Computer Vision 66 Relevant Research Computer Vision 2024 - Human and Computer Vision 67 Digital vs Perceived Image SPD of Illuminants (𝐸) Spectral Sensitivity/ Color Matching Functions Reflectance Coefficient 𝜌 We can try to model human color capture – but there are many aspects we don’t yet understand Computer Vision 2024 - Human and Computer Vision Afifi, Mahmoud, et al. "When color constancy goes wrong: Correcting improperly white-balanced images." Proceedings of the IEEE/CVF 68 conference on computer vision and pattern recognition. 2019. When Color Constancy Goes Wrong: Correcting Improperly White-balanced Images Recall: Color Constancy / Chromatic White balance selects a certain color Adaptation: as white and color correct accordingly. Computer Vision 2024 - Human and Computer Vision Afifi, Mahmoud, et al. "When color constancy goes wrong: Correcting improperly white-balanced images." Proceedings of the IEEE/CVF 69 conference on computer vision and pattern recognition. 2019. When Color Constancy Goes Wrong: Correcting Improperly White-balanced Images (2) Computer Vision 2024 - Human and Computer Vision 70 Normal Color Vision SPD of Illuminants (𝐸) Spectral Sensitivity/ Color Matching Functions Reflectance Coefficient 𝜌 “Normal” color vision is obtained by “normal” spectral sensitivities in human eyes Computer Vision 2024 - Human and Computer Vision 71 Color Vision Deficiency Computer Vision 2024 - Human and Computer Vision 72 https://ixora.io/projects/colorblindness/daltonization/ Daltonization (John Dalton, 6 September 1766 – 27 July 1844) A procedure for adapting colors in an image or a sequence of images for improving the color perception by a color-deficient viewer. A color correction technique that attempts to adjust colors in such a way that there are less color combinations that would be confusing to a colorblind person. Original image Simulated result for red CVD Computer Vision 2024 - Human and Computer Vision 73 Optimisasi Algoritma Daltonization Untuk Citra Berwarna Bagi Orang Dengan Buta Warna Merah M. Irfan Amrullah, Skripsi, Sarjana Ilmu Komputer Universitas Indonesia (2021). One step in daltonization is mask creation through binarization – there are various methods Computer Vision 2024 - Human and Computer Vision 74 Color Image Formation is a Spectral Function SPD of Illuminants (𝐸) Spectral Sensitivity/ Color Matching Functions Reflectance Coefficient 𝜌 The final image captured is a spectral function of Illumination E, reflectance (𝜌), and sensitivity of sensor We have RGB images because basic cameras have on 3 discrete RGB sensitivities What if we have more sensors? Computer Vision 2024 - Human and Computer Vision Sowmya, V., K. P. Soman, and M. Hassaballah. "Hyperspectral image: fundamentals and advances." Recent Advances in Computer Vision. Springer, Cham, 2019. 401-424.75 Mishra, P., Asaari, M. S. M., Herrero-Langreo, A., Lohumi, S., Diezma, B., & Scheunders, P. (2017). Close range hyperspectral imaging of plants: A review. Biosystems Engineering, 164, 49-67. Multi/Hyperspectral Images The final image captured is a spectral function of: Illumination (𝐸) , reflectance (𝜌), and sensitivity of sensor Multispectral Images RGB Images Computer Vision 2024 - Human and Computer Vision 76 Devassy, Binu Melit, and Sony George. "Dimensionality reduction and visualisation of hyperspectral ink data using t-SNE." Forensic science international 311 (2020). Dimensionality Reduction and Visualisation of Hyperspectral Ink Data Using t-SNE These inks are all visually similar and difficult to distinguish A hyperspectral visualization may be able to differentiate them.. t-SNE Computer Vision 2024 - Human and Computer Vision 77 Reduksi Dimensi pada Citra Hyperspectral Tinta Biru untuk K-Means Clustering Nathasya E., Skripsi, Sarjana Ilmu Komputer Universitas Indonesia (2021). Citra hyperspectral tinta biru (Melit Devassy & George, 2020). 10 kelas, 186 bands Wavelength +- 400 nm - 1000 nm, interval 3.18 nm Computer Vision 2024 - Human and Computer Vision 78 Skripsi, Nathasya Eliora, Aruni Yasmin Azizah, Laksmita Rahadianti. 2021. Reduksi Dimensi pada Citra Hyperspectral Tinta Biru untuk K-Means Clustering Nathasya E., Skripsi, Sarjana Ilmu Komputer Universitas Indonesia (2021). Gabungan 3 Tinta Ground Truth Tinta 6 Tinta 7 Tinta 8 Percobaan segmentasi tinta tidak berhasil menggunakan RGB saja Segmentasi hyperspectral menggunakan K-Means Clustering Tanpa Dimensionality Reduction PCA t-SNE Computer Vision 2024 - Human and Computer Vision