AIS412 Lecture 7: Image Registration PDF

AIS412 Lecture 7: Image Registration MUSTAFA ELATTAR *This course material is sourced from Carnegie Mellon University for Computer Vision and Stanford University for the CNN for Visual Recognition course. Registration Problem Definition q = (912,632) p = (825,856) q = T(p;a) Pixel location in first image Homologous pixel location in second image Pixel location mapping function LECTURE 1 22 Image Registration Example Mapping Function q = (912,632) p = (825,856) Pixel scaling and translation LECTURE 1 33 Image Registration Registration Problem Definition q = (912,632) p = (825,856) q = T(p) Problems: Form of mapping function T Unknown mapping parameters  Unknown correspondences, p,q LECTURE 1 44 Image Registration Applications: Multimodal Integration Two or more different sensors view same region or volume Different viewpoints (Some specialized sensors have two or more coincident modalities, so registration is not needed.) Different information is prominent in each image The images may even have different dimensions! Range images vs. intensity images CT volumes vs. fluoro images LECTURE 1 55 Image Registration Example: MR-CT Brain Registration MR (magnetic resonance) measures water content MR CT CT measures x-ray absorption Bone is brightest in CT and darkest in MR Both images are 3d volumes 6 MR-CT Registration Results Superimposed images, with bone structures Aligned images from CT in green 7 Retinal Angiogram and Color Image 8 Applications: Image Mosaics Many, partially overlapping images No one gives a complete view Goal: “stitch” images together Requires: Limited camera viewpoint such as rotation about optical center Simple surface geometry such as plane or quadratic LECTURE 1 99 Image Registration Retinal Image Mosaics 10 Sea-Floor Mosaics Courtesy Woods Hole Oceanographic LECTURE 1Institution 11 11 Image Registration Spherical Mosaics Images from Sarnoff Corporation LECTURE 1 12 12 Image Registration Applications: Building 3d Models Range scanners store an (x,y,z) measurement at each pixel location Each “range image” gives a partial view Must register range images and texture map them Applications: Reverse engineering Digital architecture and archaeology LECTURE 1 13 13 Image Registration Examples Image Registration LECTURE 1 14 14 Applications: Change Detection Images taken at different times Following registration, the differences between the images may be indicative of change Deciding if the change is really there may be quite difficult LECTURE 1 15 15 Other Applications Multi-subject registration to develop organ variation atlases and used as the basis for detecting abnormal variations Object recognition - alignment of object model instance and image of unknown object Industrial inspection - Compare CAD model to instance of part to determine errors in manufacturing process LECTURE 1 16 16 Image Registration Steps Toward a Solution Analyze the images Determine the appropriate image primitives - Geometric and intensity Determine the transformation model Design an initialization technique Develop constraints and an error metric on the transformation estimate Design a minimization algorithm Develop a convergence criteria LECTURE 1 17 17 Image Registration Summary: Pervasive Questions Three questions to consider in approaching any registration problem: What intensity information or image structures is/are consistent between the images to be registered? What is the geometric relationship between the image coordinate systems? What prior information can be used to constrain the domain of possible transformations? LECTURE 1 18 18 Image Registration Spatial Transformations Rigid Affine Projective Perspective Global Polynomial (Spline) 19 Rigid Transformation Rotation(R)   x1    x2    s1   t1  p1 =   p2 =   s1 =   t1 =    y1   y2   s2  t 2  Translation(t)     p2 = t + s Rp1 Scaling(s) cos( ) − sin( ) R=   sin( ) cos( )  20 Affine Transformation Rotation Translation  x2  a13  a11 + a12   x1   y  = a  + a + a   y  Scale  2   23   21 22   1  No more preservation of lengths and angles Shear Parallel lines are preserved 21 Projective Transformation (xp,yp) → Plane Coordinates (xi,yi) → Image Coordinates a11 x p + a12 y p + a13 a21 x p + a22 y p + a23 xi = yi = a31 x p + a32 y p + a33 a31 x p + a32 y p + a33 amn → coefficients from the equations of the scene and the image planes 22 Perspective Transformation (Planar Homography) 23 Perspective Transformation(2) (xo,yo,zo) → world coordinates (xi,yi) → image coordinates − fxo − fyo xi = yi = zo − f zo − f Flat plane tilted with respect to the camera requires Projective Transformation 24 Complex Transformations Global Polynomial Transformation(splines) 25 Registration Process Components Feature Space Search Space or transformation Similarity Metric Search Strategy 26 Feature Space Geometric landmarks: Points Edges Contours Surfaces, etc. Intensities: 23 35 Raw pixel values 24 56 Feature-based Intensity-based 27 Image transformations Input image Transformation  x'  m 0 m1 m 2   x   y '  = m 3 m 4 m 5   y        w' m 6 m 7 m8   w cos − sin  tx  m 0 m1 m 2  Mrigid =  sin  cos ty  Maffine = m3 m 4 m5   0 0 1   0 0 1  Original shape Rigid transformation Affine transformation 28 Similarity Metric Absolute difference SSD (Sum of Squared Difference) Correlation Coefficient Mutual Information / Normalized Mutual Information 29 Search Strategy Powell’s direction set method Downhill simplex method Dynamic programming Relaxation matching Hierarchical techniques 30 Multi-modality Brain image registration Intensity-based 3D/3D Rigid transformation, DOF=6 (3 translations, 3 rotations) Maximization of Normalized Mutual Information Simplex Downhill Multi-resolution 31 Mutual Information as Similarity Measure Mutual information is applied to measure the statistic dependence between the image intensities of corresponding voxels in both images, which is assumed to be maximal if the images are geometrically aligned. PAB (a, b) I ( A, B) =  PAB (a, b) log a b PA (a ) PB (b) = H ( A) + H ( B ) − H ( A, B ) = H ( A) − H ( A | B ) = H ( B ) − H ( B | A) 32 Normalized Mutual Information Extension of Mutual Information Maes et. al.: NMI ( A, B ) = H ( A, B ) − MI ( A, B ) 2  MI ( A) NMI ( A, B ) = H ( A) + H ( B ) Studholme et. Al.: H ( A) + H ( B) NMI ( A, B) = H ( A, B) Compensate for the sensitivity of MI to changes in image overlap 33 Geometry Transformation Image Coordinate transform: The features (dimension, voxel size, slice spacing, gantry tilt, orientation) of images, which are acquired from different modalities, are not the same. From voxel units (column, row, slice spacing) to millimeter units with its origin in the center of the image volume.  x'   a00 a01 a02 0  x        y '   a10 a11 a12 0  y  T ( x, y , z ) =   =  z' a a21 a22 0  z     20   1  0 1  1     0 0 34 Target Image & Template Image j j i i Target Image Grid Template Image Grid y y’ Space Transform x x’ Target Image Template Image 35 Physical Coordinates Physical Coordinates Images from the same patient MRI-T2 PET Target Image ? Template Image ? 256 x 256 pixels 128 x 128 pixels 36 Interpolation Nearest Neighbor Tri-linear Interpolation Partial-Volume Interpolation Higher order partial-volume interpolation 37 Evaluating similarity measure for each transformation y y Transform x x Template Target Image Image Optimization Powell’s Direction Set method Downhill Simplex method 39 AIS412 Lecture 8: BoW MUSTAFA ELATTAR *This course material is sourced from Carnegie Mellon University for Computer Vision and Stanford University for the CNN for Visual Recognition course. Overview of today’s lecture Introduction to learning-based vision Image classification Bag-of-words K-means clustering Classification 41 Course overview 1. Image processing 2. Geometry-based vision 3. Physics-based vision 4. Dealing with motion 5. Learning-based vision We are starting this part now What do we mean by learning- based vision or ‘semantic vision’? Sky Mountain Trees Building Vendors People Ground Outdoor Marketplace City Object recognition Is it really so hard? Find the chair in this image Output of normalized correlation This is a chair Object recognition Is it really so hard? Find the chair in this image Pretty much garbage Simple template matching is not going to make it A “popular method is that of template matching, by point to point correlation of a model pattern with the image pattern. These techniques are inadequate for three-dimensional scene analysis for many reasons, such as occlusion, changes in viewing angle, and articulation of parts.” Nivatia & Binford, 1977. And it can get a lot harder Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. JVis, 3(6), 413-422 Why is this hard? Variability: Camera position Illumination Shape parameters Challenge: variable viewpoint Michelangelo 1475-1564 Challenge: variable illumination image credit: J. Koenderink Challenge: scale Challenge: deformation Deformation Challenge: Occlusion Magritte, 1957 Occlusion Challenge: background clutter Kilmeny Niland. 1995 Challenge: Background clutter Challenge: intra-class variations Svetlana Lazebnik Image Classification Image Classification: Problem Data-driven approach Collect a database of images with labels Use ML to train an image classifier Evaluate the classifier on test images Bag of words What object do these parts belong to? Some local feature are very informative An object as a collection of local features (bag-of-features) deals well with occlusion scale invariant rotation invariant Bag-of-features represent a data item (document, texture, image) as a histogram over features an old idea (e.g., texture recognition and information retrieval) Texture recognition histogram Universal texton dictionary Vector Space Model G. Salton. ‘Mathematics and Information Retrieval’ Journal of Documentation,1979 1 6 2 1 0 0 0 1 Tartan robot CHIMP CMU bio soft ankle sensor 0 4 0 1 4 5 3 2 Tartan robot CHIMP CMU bio soft ankle sensor http://www.fodey.com/generators/newspaper/snippet.asp A document (datapoint) is a vector of counts over each word (feature) counts the number of occurrences just a histogram over words What is the similarity between two documents? A document (datapoint) is a vector of counts over each word (feature) counts the number of occurrences just a histogram over words What is the similarity between two documents? Use any distance you want but the cosine distance is fast. but not all words are created equal TF-IDF Term Frequency Inverse Document Frequency weigh each word by a heuristic inverse document term frequency frequency (down-weights common terms) Standard BOW pipeline (for image classification) Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs Dictionary Learning: Learn Visual Words using clustering 1. extract features (e.g., SIFT) from images Dictionary Learning: Learn Visual Words using clustering 2. Learn visual dictionary (e.g., K-means clustering) What kinds of features can we extract? Regular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 Interest point detector Csurka et al. 2004 Fei-Fei & Perona, 2005 Sivic et al. 2005 Other methods Random sampling (Vidal-Naquet & Ullman, 2002) Segmentation-based patches (Barnard et al. 2003) Compute SIFT descriptor Normalize patch [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] … How do we learn the dictionary? … … Clustering Visual vocabulary … Clustering K-means clustering 1. Selec t i n i t i a l 2. A s si g n e ac h o b j e c t t o c e n t r o i d s a t ra ndom t he c l u s t e r with t he n ea r es t c e nt r o i d. 3. Compute e a ch c e n t r o i d a s t h e 2. A s si g n e ac h o b j e c t t o mean o f t h e o b j e c t s a s s i g n e d t o t he c l u s t e r with t he i t (go t o 2) n ea r es t c e nt r o i d. Re pe a t p r e v i o u s 2 s t e p s u n t i l no ch an ge From what data should I learn the dictionary? Dictionary can be learned on separate training set Provided the training set is sufficiently representative, the dictionary will be “universal” Example visual dictionary Example dictionary … Appearance codebook Source: B. Leibe Another dictionary … … … … … Appearance codebook Source: B. Leibe Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs 1. Quantization: image features gets associated to a visual word (nearest cluster center) Encode: build Bags-of-Words (BOW) vectors for each image Encode: build Bags-of-Words (BOW) vectors for each image 2. Histogram: count the number of visual word occurrences frequency ….. codewords Dictionary Learning: Learn Visual Words using clustering Encode: build Bags-of-Words (BOW) vectors for each image Classify: Train and test data using BOWs Image features vs ConvNets f Feature Extraction 10 numbers giving scores for classes training 10 numbers giving scores for classes training

AIS412 Lecture 7: Image Registration PDF

Document Details

Tags

Related

Summary

Full Transcript