COMP9517 Image Formation PDF
Document Details
Uploaded by FastGrowingJackalope
UNSW
2024
Erik Meijering
Tags
Related
Summary
This document discusses the fundamentals of image formation, focusing on theoretical concepts and mathematical models. Image formation is explained by considering pinhole models, lenses, and geometric transformations. The relationship between world and image coordinates via projection matrices, as well as distortions, is explored. The document emphasizes that computer vision concepts rely on insights from biological vision.
Full Transcript
COMP9517 Computer Vision 2024 Term 2 Week 1 Professor Erik Meijering Image Formation What is image formation? Image formation occurs when a sensor registers radiation that has interacted with physical objects Copyright (C) UNSW COMP9517 24T2W1 Image Formation...
COMP9517 Computer Vision 2024 Term 2 Week 1 Professor Erik Meijering Image Formation What is image formation? Image formation occurs when a sensor registers radiation that has interacted with physical objects Copyright (C) UNSW COMP9517 24T2W1 Image Formation 2 Geometry of image formation Mapping world coordinates (3D) to image coordinates (2D) Pinhole camera model Projective geometry Projection matrix Copyright (C) UNSW COMP9517 24T2W1 Image Formation 3 Image formation Idea 1: Put a piece of film in front of an object Object Film Do we get a reasonable image? Copyright (C) UNSW COMP9517 24T2W1 Image Formation 4 Image formation Idea 1: Put a piece of film in front of an object Object Film Do we get a reasonable image? Copyright (C) UNSW COMP9517 24T2W1 Image Formation 5 Image formation Idea 1: Put a piece of film in front of an object Object Film Do we get a reasonable image? Copyright (C) UNSW COMP9517 24T2W1 Image Formation 6 Image formation Idea 1: Put a piece of film in front of an object Object Film All object points are projected The resulting image is completely blurred to all points on the film Copyright (C) UNSW COMP9517 24T2W1 Image Formation 7 Image formation Idea 2: Add a barrier to block off most of the rays Object Barrier Film Opening known as the Object points are projected to This reduces blurring pinhole or aperture unique points on the film Copyright (C) UNSW COMP9517 24T2W1 Image Formation 8 Image formation Idea 3: Add a lens to refract all the rays Object Lens Film Acts like a pinhole but Object points are projected to This reduces blurring avoids losing light unique points on the film Copyright (C) UNSW COMP9517 24T2W1 Image Formation 9 Pinhole camera model 𝑓𝑓 𝐶𝐶 image plane pinhole virtual object image 𝑓𝑓 = focal length 𝐶𝐶 = camera centre Figure from Forsyth Copyright (C) UNSW COMP9517 24T2W1 Image Formation 10 Dimensionality reduction machine 3D world 2D image Copyright (C) UNSW COMP9517 24T2W1 Image Formation 11 Projection can be tricky… Copyright (C) UNSW COMP9517 24T2W1 Image Formation 12 Projective geometry B Length and C O area are not A’ preserved C’ A B’ d d Figure from Forsyth Copyright (C) UNSW COMP9517 24T2W1 Image Formation 13 Projective geometry Parallel? What is lost? Length and angles are not preserved Perpendicular? Copyright (C) UNSW COMP9517 24T2W1 Image Formation 14 Projective geometry Parallel? What is preserved? Straight lines are still straight Perpendicular? Copyright (C) UNSW COMP9517 24T2W1 Image Formation 15 Vanishing points and lines Parallel lines in the 3D world intersect in the 2D image at a “vanishing point” Copyright (C) UNSW COMP9517 24T2W1 Image Formation 16 Vanishing points and lines Vanishing Point Vanishing Point Vanishing Line Copyright (C) UNSW COMP9517 24T2W1 Image Formation 17 Vanishing points and lines Photo from Criminisi Copyright (C) UNSW COMP9517 24T2W1 Image Formation 18 Vanishing points and lines Vanishing Point Copyright (C) UNSW COMP9517 24T2W1 Image Formation 19 Vanishing points and lines Vanishing Vanishing Point Point Copyright (C) UNSW COMP9517 24T2W1 Image Formation 20 Vanishing points and lines Vanishing Point (Infinity) Vanishing Vanishing Point Point Copyright (C) UNSW COMP9517 24T2W1 Image Formation 21 Projection mathematics From world coordinates to image coordinates 𝑥𝑥 𝐩𝐩 = 𝑦𝑦 𝑧𝑧 𝑦𝑦 𝑧𝑧 ′ = −𝑓𝑓 𝑧𝑧 𝑥𝑥 𝑦𝑦𝑦 camera 𝑥𝑥𝑥 centre (0,0,0) 𝑥𝑥′ 𝐩𝐩𝐩 = 𝑦𝑦𝑦 Copyright (C) UNSW COMP9517 24T2W1 Image Formation 22 Projection mathematics From world coordinates to image coordinates 𝑥𝑥 If 𝑥𝑥 = 2, 𝑦𝑦 = 3, 𝑧𝑧 = 5, and 𝑓𝑓 = 2, 𝐩𝐩 = 𝑦𝑦 what are 𝑥𝑥𝑥 and 𝑦𝑦𝑦 ? 𝑧𝑧 𝑦𝑦 𝑧𝑧 ′ = −𝑓𝑓 𝑧𝑧 𝑥𝑥 𝑦𝑦𝑦 camera 𝑥𝑥𝑥 centre (0,0,0) 𝑥𝑥′ 𝐩𝐩𝐩 = 𝑦𝑦𝑦 Copyright (C) UNSW COMP9517 24T2W1 Image Formation 23 Projection mathematics From world coordinates to image coordinates 𝑥𝑥 If 𝑥𝑥 = 2, 𝑦𝑦 = 3, 𝑧𝑧 = 5, and 𝑓𝑓 = 2, 𝐩𝐩 = 𝑦𝑦 what are 𝑥𝑥𝑥 and 𝑦𝑦𝑦 ? 𝑧𝑧 𝑦𝑦 𝑧𝑧 ′ = −𝑓𝑓 𝑧𝑧 𝑓𝑓 2 𝑥𝑥 𝑥𝑥𝑥 = −𝑥𝑥 𝑥𝑥𝑥 = −2 𝑦𝑦𝑦 𝑧𝑧 5 camera 𝑓𝑓 2 𝑥𝑥𝑥 centre 𝑦𝑦𝑦 = −𝑦𝑦 𝑦𝑦𝑦 = −3 𝑧𝑧 5 (0,0,0) 𝑥𝑥′ 𝐩𝐩𝐩 = 𝑦𝑦𝑦 Copyright (C) UNSW COMP9517 24T2W1 Image Formation 24 Perspective projection Apparent size of object depends on its distance: far objects appear smaller By similar triangles: 𝑥𝑥 𝑦𝑦 𝑥𝑥 ′ , 𝑦𝑦 ′ , 𝑧𝑧 ′ = (−𝑓𝑓 , −𝑓𝑓 , −𝑓𝑓) 𝑧𝑧 𝑧𝑧 Ignore third coordinate and mirror: 𝑥𝑥 𝑦𝑦 𝑥𝑥 ′ , 𝑦𝑦 ′ = (𝑓𝑓 , 𝑓𝑓 ) 𝑧𝑧 𝑧𝑧 Copyright (C) UNSW COMP9517 24T2W1 Image Formation 25 Affine projection Suitable when scene depth is small relative to average distance from camera Let magnification 𝑚𝑚 = 𝑓𝑓/𝑧𝑧0 be a positive constant and all points in the scene have approximately constant distance 𝑧𝑧0 to the camera Leads to weak perspective projection: 𝑥𝑥 ′ , 𝑦𝑦 ′ = (𝑚𝑚𝑚𝑚, 𝑚𝑚𝑚𝑚) Orthographic projection when 𝑚𝑚 = 1: 𝑥𝑥 ′ , 𝑦𝑦 ′ = (𝑥𝑥, 𝑦𝑦) Copyright (C) UNSW COMP9517 24T2W1 Image Formation 26 Beyond pinholes: radial distortions Modern cameras use lenses instead of pinholes 𝑦𝑦0 𝑦𝑦0 𝑦𝑦0 𝑥𝑥0 𝑥𝑥0 𝑥𝑥0 No distortion Barrel distortion Pincushion distortion Image from Martin Habbecke Barrel distortion corrected Copyright (C) UNSW COMP9517 24T2W1 Image Formation 27 Comparing with human vision Cameras imitate the frequency response of the human eye so it is good to know something about it Computer vision probably would not get as much attention if biological vision (especially human vision) had not proven that it is possible to make The Eye important judgements from 2D images Copyright (C) UNSW COMP9517 24T2W1 Image Formation 28 Electromagnetic spectrum Normalized responsivity spectra of human cone cells (S, M, L types) https://sites.google.com/site/chempendix/em-spectrum Copyright (C) UNSW COMP9517 24T2W1 Image Formation 29 Colour represented by RGB images Red Green Blue Copyright (C) UNSW COMP9517 24T2W1 Image Formation 30 Colour spaces: RGB Default colour space in vision R (G=0,B=0) 0,1,0 G (R=0,B=0) 1,0,0 0,0,1 B (R=0,G=0) Source: Wikipedia Drawback: strongly correlated channels Copyright (C) UNSW COMP9517 24T2W1 Image Formation 31 Colour spaces: HSV Intuitive colour space H (S=1,V=1) S (H=1,V=1) V (H=1,S=0) Drawback: confounded channels Copyright (C) UNSW COMP9517 24T2W1 Image Formation 32 Colour spaces: YCbCr Fast to compute, good for Y compression, used by TV (Cb=0.5,Cr=0.5) Y=0 Y=0.5 Cr Cb (Y=0.5,Cr=0.5) Cb Y=1 Cr (Y=0.5,Cb=0.5) Copyright (C) UNSW COMP9517 24T2W1 Image Formation 33 Colour spaces: L*a*b* “Perceptually uniform” colour space L (a=0, b=0) a a.k.a. CIELAB (L=65, b=0) b (L=65, a=0) Any numerical change corresponds to similar perceived change in color: Euclidean distances make sense Copyright (C) UNSW COMP9517 24T2W1 Image Formation 34 Digital image formation Image from Gonzalez & Woods 2018 Copyright (C) UNSW COMP9517 24T2W1 Image Formation 35 Digital image formation Copyright (C) UNSW COMP9517 24T2W1 Image Formation 36 Displaying a digital image Copyright (C) UNSW COMP9517 24T2W1 Image Formation 37 Comparing the original and digital image Copyright (C) UNSW COMP9517 24T2W1 Image Formation 38 Comparing the original and digital image Copyright (C) UNSW COMP9517 24T2W1 Image Formation 39 Digitisation by spatial sampling 𝑥𝑥 = 𝑗𝑗∆𝑥𝑥 Digitisation converts an analog image to a digital image by sampling the image space Sampling discretises the coordinates 𝑥𝑥 and 𝑦𝑦 𝑦𝑦 = 𝑘𝑘∆𝑦𝑦 – Spatial discretisation of a picture function 𝑓𝑓(𝑥𝑥, 𝑦𝑦) – Typically a rectangular grid of sampling points is used 𝑥𝑥 = 𝑗𝑗∆𝑥𝑥, 𝑦𝑦 = 𝑘𝑘∆𝑦𝑦 for 𝑗𝑗 = 0 … 𝑀𝑀 − 1, 𝑘𝑘 = 0 … 𝑁𝑁 − 1 – The ∆𝑥𝑥 and ∆𝑦𝑦 are called the sampling intervals ∆𝑦𝑦 ∆𝑥𝑥 Copyright (C) UNSW COMP9517 24T2W1 Image Formation 40 Digital colour images Each channel is a digital image with the same number of rows and columns column row 0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99 R 0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91 0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92 0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95 0.71 0.49 0.81 0.62 0.81 0.60 0.87 0.58 0.57 0.50 0.37 0.60 0.80 0.92 0.58 0.88 0.93 0.50 0.89 0.94 0.61 0.79 0.97 0.45 0.85 0.62 0.33 0.37 0.85 0.97 0.93 0.92 0.99 G 0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91 0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.89 0.92 0.72 0.91 0.51 0.49 0.55 0.74 0.51 0.42 0.57 0.41 0.49 0.91 0.92 0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.96 0.90 0.95 0.94 0.88 0.82 0.94 0.93 0.56 0.46 0.91 0.87 0.90 0.97 0.95 0.69 0.79 0.49 0.73 0.56 0.90 0.66 0.67 0.43 0.33 0.42 0.61 0.77 0.71 0.69 0.49 0.73 0.81 0.79 0.62 0.71 0.81 0.73 0.60 0.90 0.87 0.93 0.58 0.99 0.57 0.97 0.50 0.37 0.60 0.80 0.92 0.58 0.88 0.93 0.50 0.89 0.94 0.61 0.79 0.97 0.45 0.85 0.62 0.33 0.37 0.85 0.97 0.93 0.92 0.99 B 0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91 0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.86 0.77 0.84 0.89 0.74 0.99 0.58 0.93 0.51 0.39 0.73 0.89 0.92 0.72 0.91 0.51 0.49 0.55 0.74 0.51 0.42 0.57 0.41 0.49 0.91 0.92 0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.96 0.90 0.95 0.94 0.88 0.82 0.94 0.93 0.56 0.46 0.91 0.87 0.90 0.97 0.95 0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.71 0.73 0.81 0.71 0.81 0.90 0.87 0.99 0.57 0.37 0.80 0.88 0.89 0.79 0.85 0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.49 0.79 0.62 0.73 0.60 0.93 0.58 0.97 0.50 0.60 0.58 0.50 0.61 0.45 0.33 0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.86 0.77 0.84 0.89 0.74 0.99 0.58 0.93 0.51 0.39 0.73 0.92 0.91 0.49 0.74 0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93 0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99 0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97 0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93 Copyright (C) UNSW COMP9517 24T2W1 Image Formation 41 Spatial resolution 1/1 1/2 Spatial resolution: number of pixels per unit of length 1/4 Example: resolution decreases by one half each time (see right) 1/8 Human faces can be recognized at 64 x 64 pixels per face 1/16 Appropriate resolution is essential: – Too little resolution yields poor recognition – Too much resolution is slow and wastes memory Copyright (C) UNSW COMP9517 24T2W1 Image Formation 42 Quantisation Quantisation digitises the image intensity or amplitude values 𝑓𝑓(𝑥𝑥, 𝑦𝑦) – Called intensity or gray-level quantisation – Gray-level resolution to be chosen per application For example, 16, 32, 64,...., 128, 256 levels Should be high enough for human perception of shading details The latter requires about 100 levels for a realistic image Should not be higher than necessary to avoid wasting storage Copyright (C) UNSW COMP9517 24T2W1 Image Formation 43 Quantisation and bits per pixel Pixel (picture element) Levels per pixel: 8 bits = 28 = 256 12 bits = 212 = 4,096 16 bits = 216 = 65,536 24 bits = 224 = 16,777,216 Copyright (C) UNSW COMP9517 24T2W1 Image Formation 44 Further reading on discussed topics Chapter 2 of Szeliski Chapter 2 of Shapiro and Stockman Acknowledgements Several slides from Derek Hoiem, Alexei Efros, Steve Seitz, and David Forsyth Some material drawn from referenced and associated online sources Image sources credited where possible Copyright (C) UNSW COMP9517 24T2W1 Image Formation 45 Example exam question Which one of the following statements about colour spaces is incorrect? A. The R, G, and B channels of the RGB colour space are often correlated. B. The H and the S channel of the HSV colour space are confounded. C. The Y channel of the YCbCr colour space represents the brightness. D. The a* channel of the L*a*b* colour space is the green-blue component. Please note that in the final exam, no materials may be taken into the room, and it is not allowed (and in fact not possible) to consult the internet. So make sure you know the course materials well. Copyright (C) UNSW COMP9517 24T2W1 Image Formation 46