COMP9517 Computer Vision 2024 Term 3 Week 9 Motion Estimation PDF
Document Details
Uploaded by ChampionCognition5977
UNSW Sydney
2024
Erik Meijering
Tags
Summary
This document contains lecture slides on computer vision, specifically focusing on motion estimation. It covers topics such as change detection, sparse motion estimation using template matching, and dense motion estimation using optical flow. The slides are part of COMP9517 Computer Vision course for 2024 Term 3, Week 9, taught by Professor Erik Meijering at UNSW Sydney.
Full Transcript
COMP9517 Computer Vision 2024 Term 3 Week 9 Professor Erik Meijering Motion Estimation Introduction Adding the time dimension to the image formation Different nature of higher image dimensions Different nature of images having the same dimensionality For th...
COMP9517 Computer Vision 2024 Term 3 Week 9 Professor Erik Meijering Motion Estimation Introduction Adding the time dimension to the image formation Different nature of higher image dimensions Different nature of images having the same dimensionality For this lecture 3D = 2D+t Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 2 Introduction A changing scene may be observed and analysed via a sequence of images 𝑡𝑡 Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 3 Introduction Changes in an image sequence provide features for – Detecting objects that are moving – Computing trajectories of moving objects – Performing motion analysis of moving objects – Recognising objects based on their behaviours – Computing the motion of the viewer in the world – Detecting and recognising activities in a scene Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 4 Applications Motion-based recognition: Human identification based on gait, object detection Automated surveillance: Scene monitoring scene to detect suspicious activities Video indexing: Automatic annotation and retrieval of videos in databases Human-computer interaction: Gesture recognition and eye gaze tracking Traffic monitoring: Real-time gathering of traffic statistics to direct traffic flow Vehicle navigation: Video-based path planning and obstacle avoidance Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 5 Scenarios Still camera Constant background with – Single moving object – Multiple moving objects Moving camera Relatively constant scene with – Coherent scene motion – Single moving object – Multiple moving objects Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 6 Topics Change detection Using image subtraction to detect changes in scenes Sparse motion estimation Using template matching to estimate local displacements Dense motion estimation Using optical flow to compute a dense motion vector field Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 7 Change detection Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 8 Change detection Detecting an object moving across a constant background Front and rear edges of the object advance only a few pixels per frame Subtracting image 𝐼𝐼𝑡𝑡 from the previous image 𝐼𝐼𝑡𝑡−1 reveals the changes Can be used to detect and localize objects that are moving Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 9 Image subtraction Step 1: Acquire a static background image (“empty” scene) Performance Evaluation of Tracking and Surveillance (PETS) 2009 Benchmark Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 10 Image subtraction Step 2: Subtract the background image from each subsequent frame - Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 11 Image subtraction Step 3: Threshold and process the difference image Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 12 Image subtraction Detected bounding boxes overlaid on input frame Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 13 Image subtraction algorithm Input: Images 𝐼𝐼𝑡𝑡 and 𝐼𝐼𝑡𝑡−∆𝑡𝑡 (or a model image) and an intensity threshold 𝜏𝜏 Output: Binary image 𝐼𝐼out and the set of bounding boxes 𝐵𝐵 1. For all pixels 𝑥𝑥, 𝑦𝑦 in the input images: Set 𝐼𝐼out 𝑥𝑥, 𝑦𝑦 = 1 if 𝐼𝐼𝑡𝑡 𝑥𝑥, 𝑦𝑦 − 𝐼𝐼𝑡𝑡−∆𝑡𝑡 𝑥𝑥, 𝑦𝑦 > 𝜏𝜏 Set 𝐼𝐼out 𝑥𝑥, 𝑦𝑦 = 0 otherwise 2. Perform connected components extraction on 𝐼𝐼out 3. Remove small regions in 𝐼𝐼out assuming they are noise 4. Perform a closing of 𝐼𝐼out using a small disk to fuse neighbouring regions 5. Compute the bounding boxes of all remaining regions of changed pixels 6. Return 𝐼𝐼out 𝑥𝑥, 𝑦𝑦 and the bounding boxes 𝐵𝐵 of regions of changed pixels Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 14 Sparse motion estimation Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 15 Motion vector A motion field is an array of 2D motion vectors A motion vector represents the displacement of a 3D point in the image – Tail at time 𝑡𝑡 and head at time 𝑡𝑡 + Δ𝑡𝑡 – Instantaneous velocity estimate at time 𝑡𝑡 Zoom out Zoom in Pan Left Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 16 Sparse motion estimation A sparse motion field can be computed by identifying pairs of points that correspond in two images taken at times 𝑡𝑡 and 𝑡𝑡 + Δ𝑡𝑡 Assumption: intensities of interesting points and their neighbours remain nearly constant over time Two steps: – Detect interesting points at 𝑡𝑡 – Search corresponding points at 𝑡𝑡 + Δ𝑡𝑡 Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 17 Detect interesting points Image filters – Canny edge detector – Hessian ridge detector – Harris corner detector – Scale invariant feature transform (SIFT) – Convolutional neural network (CNN) Interest operator – Computes intensity variance in the vertical, horizontal and diagonal directions – Interest point if the minimum of these four variances exceeds a threshold Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 18 Detect interesting points Procedure detect_interesting_points(I,V,w,t) { for (r = 0 to MaxRow–1) for (c = 0 to MaxCol–1) if (I[r,c] is a border pixel) break; else if (interest_operator(I,r,c,w) >= t) add (r,c) to set V; } Procedure interest_operator(I,r,c,w) { v1 = variance of intensity of horizontal pixels I[r,c-w]…I[r,c+w]; v2 = variance of intensity of vertical pixels I[r-w,c]…I[r+w,c]; v3 = variance of intensity of diagonal pixels I[r-w,c-w]…I[r+w,c+w]; v4 = variance of intensity of diagonal pixels I[r-w,c+w]…I[r+w,c-w]; return min(v1, v2, v3, v4); } Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 19 Search corresponding points Given an interesting point 𝑃𝑃𝑖𝑖 from 𝐼𝐼𝑡𝑡 , take its neighbourhood in 𝐼𝐼𝑡𝑡 as a template 𝑇𝑇 and find the best matching neighbourhood 𝑁𝑁 in 𝐼𝐼𝑡𝑡+∆𝑡𝑡 under the assumption that the amount of movement is limited to a search region 𝑅𝑅 𝑃𝑃𝑖𝑖 𝑅𝑅 𝑃𝑃𝑖𝑖 𝑄𝑄𝑖𝑖 𝑁𝑁 motion vector 𝑇𝑇 (enlarged) 𝐼𝐼𝑡𝑡 𝑄𝑄𝑖𝑖 𝐼𝐼𝑡𝑡+∆𝑡𝑡 This is also known as template matching Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 20 Similarity measures for template matching Cross-correlation (to be maximised) CC ∆𝑥𝑥, ∆𝑦𝑦 = 𝐼𝐼𝑡𝑡 𝑥𝑥, 𝑦𝑦 𝐼𝐼𝑡𝑡+∆𝑡𝑡 𝑥𝑥 + ∆𝑥𝑥, 𝑦𝑦 + ∆𝑦𝑦 (𝑥𝑥,𝑦𝑦)∈𝑇𝑇 Sum of absolute differences (to be minimised) SAD ∆𝑥𝑥, ∆𝑦𝑦 = 𝐼𝐼𝑡𝑡 𝑥𝑥, 𝑦𝑦 − 𝐼𝐼𝑡𝑡+∆𝑡𝑡 𝑥𝑥 + ∆𝑥𝑥, 𝑦𝑦 + ∆𝑦𝑦 (𝑥𝑥,𝑦𝑦)∈𝑇𝑇 Sum of squared differences (to be minimised) SSD ∆𝑥𝑥, ∆𝑦𝑦 = 𝐼𝐼𝑡𝑡 𝑥𝑥, 𝑦𝑦 − 𝐼𝐼𝑡𝑡+∆𝑡𝑡 𝑥𝑥 + ∆𝑥𝑥, 𝑦𝑦 + ∆𝑦𝑦 2 (𝑥𝑥,𝑦𝑦)∈𝑇𝑇 Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 21 Similarity measures for template matching Mutual information (to be maximised) 𝑃𝑃𝐴𝐴𝐴𝐴 𝑎𝑎, 𝑏𝑏 𝐵𝐵 MI 𝐴𝐴, 𝐵𝐵 = 𝑃𝑃𝐴𝐴𝐴𝐴 𝑎𝑎, 𝑏𝑏 log 2 𝑃𝑃𝐴𝐴 𝑎𝑎 𝑃𝑃𝐵𝐵 𝑏𝑏 𝑎𝑎 𝑏𝑏 Subimages to compare: 𝐴𝐴 ∈ 𝐼𝐼𝑡𝑡 and 𝐵𝐵 ∈ 𝐼𝐼𝑡𝑡+∆𝑡𝑡 𝑃𝑃𝐵𝐵 𝑏𝑏 Intensity probabilities: 𝑃𝑃𝐴𝐴 𝑎𝑎 and 𝑃𝑃𝐵𝐵 𝑏𝑏 Joint intensity probability: 𝑃𝑃𝐴𝐴𝐴𝐴 𝑎𝑎, 𝑏𝑏 𝐴𝐴 𝑃𝑃𝐴𝐴 𝑎𝑎 𝑃𝑃𝐴𝐴𝐵𝐵 𝑎𝑎, 𝑏𝑏 Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 22 Dense motion estimation Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 23 Dense motion estimation assumptions Properties of the light sources do not vary over time interval ∆𝑡𝑡 Distance of object to camera does not vary over this time interval Visual object appearance does not change over this time interval Any small neighbourhood 𝑁𝑁𝑡𝑡 𝑥𝑥, 𝑦𝑦 shifts over some vector 𝑣𝑣 = ∆𝑥𝑥, ∆𝑦𝑦 ⇒ 𝑁𝑁𝑡𝑡 𝑥𝑥, 𝑦𝑦 = 𝑁𝑁𝑡𝑡+∆𝑡𝑡 𝑥𝑥 + ∆𝑥𝑥, 𝑦𝑦 + ∆𝑦𝑦 These assumptions may not hold tight in reality but nevertheless they provide useful computational dense motion estimation methods and approximations Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 24 Spatiotemporal gradient Taylor series expansion of a function 𝜕𝜕𝜕𝜕 𝑓𝑓 𝑥𝑥 + ∆𝑥𝑥 = 𝑓𝑓 𝑥𝑥 + ∆𝑥𝑥 + h.o.t. (higher order terms) 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 ⇒ 𝑓𝑓 𝑥𝑥 + ∆𝑥𝑥 ≈ 𝑓𝑓 𝑥𝑥 + ∆𝑥𝑥 𝜕𝜕𝜕𝜕 Multivariable Taylor series approximation 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 𝑓𝑓 𝑥𝑥 + ∆𝑥𝑥, 𝑦𝑦 + ∆𝑦𝑦, 𝑡𝑡 + ∆𝑡𝑡 ≈ 𝑓𝑓 𝑥𝑥, 𝑦𝑦, 𝑡𝑡 + ∆𝑥𝑥 + ∆𝑦𝑦 + ∆𝑡𝑡 (1) 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦 𝜕𝜕𝑡𝑡 Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 25 Optical flow equation Using the dense motion estimation assumptions leads to 𝑓𝑓 𝑥𝑥 + ∆𝑥𝑥, 𝑦𝑦 + ∆𝑦𝑦, 𝑡𝑡 + ∆𝑡𝑡 = 𝑓𝑓 𝑥𝑥, 𝑦𝑦, 𝑡𝑡 (2) 𝑓𝑓 𝑥𝑥, 𝑦𝑦, 𝑡𝑡 𝑓𝑓 𝑥𝑥 + ∆𝑥𝑥, 𝑦𝑦 + ∆𝑦𝑦, 𝑡𝑡 + ∆𝑡𝑡 𝑡𝑡 𝑡𝑡 + Δ𝑡𝑡 Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 26 Optical flow computation Combining equations (1) and (2) yields the following constraint 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 ∆𝑥𝑥 𝜕𝜕𝜕𝜕 ∆𝑦𝑦 𝜕𝜕𝜕𝜕 ∆𝑡𝑡 ∆𝑥𝑥 + ∆𝑦𝑦 + ∆𝑡𝑡 = 0 ⇒ + + =0 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦 𝜕𝜕𝑡𝑡 𝜕𝜕𝜕𝜕 ∆𝑡𝑡 𝜕𝜕𝑦𝑦 ∆𝑡𝑡 𝜕𝜕𝑡𝑡 ∆𝑡𝑡 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 ⇒ 𝑣𝑣𝑥𝑥 + 𝑣𝑣𝑦𝑦 = − ⇒ ∇𝑓𝑓 𝑣𝑣 = −𝑓𝑓𝑡𝑡 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦 𝜕𝜕𝜕𝜕 ∆𝑥𝑥 ∆𝑦𝑦 Velocity or optical flow: 𝑣𝑣 = 𝑣𝑣𝑥𝑥 , 𝑣𝑣𝑦𝑦 = , ∆𝑡𝑡 ∆𝑡𝑡 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 Spatial image gradient: ∇𝑓𝑓 = 𝑓𝑓𝑥𝑥 , 𝑓𝑓𝑦𝑦 = , 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦 𝜕𝜕𝜕𝜕 Temporal image derivative: 𝑓𝑓𝑡𝑡 = 𝜕𝜕𝑡𝑡 Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 27 Optical flow computation The optical flow constraint equation can be applied at every pixel position However, it is only one equation, while we have two unknowns (𝑣𝑣𝑥𝑥 and 𝑣𝑣𝑦𝑦 ) Thus, it does not have a unique solution, and further constraints are required For example, assume a group of adjacent pixels have the same velocity Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 28 Optical flow computation Example: Lucas-Kanade approach to optical flow Assume the optical flow equation holds for all 𝑛𝑛 pixels 𝑝𝑝𝑖𝑖 in a neighbourhood 𝑓𝑓𝑥𝑥 𝑝𝑝1 𝑓𝑓𝑦𝑦 𝑝𝑝1 𝑓𝑓𝑥𝑥 𝑝𝑝1 𝑣𝑣𝑥𝑥 + 𝑓𝑓𝑦𝑦 𝑝𝑝1 𝑣𝑣𝑦𝑦 = −𝑓𝑓𝑡𝑡 𝑝𝑝1 𝑓𝑓𝑦𝑦 𝑝𝑝2 𝐴𝐴 = 𝑓𝑓𝑥𝑥 𝑝𝑝2 ⋮ ⋮ 𝑓𝑓𝑥𝑥 𝑝𝑝2 𝑣𝑣𝑥𝑥 + 𝑓𝑓𝑦𝑦 𝑝𝑝2 𝑣𝑣𝑦𝑦 = −𝑓𝑓𝑡𝑡 𝑝𝑝2 𝑓𝑓𝑥𝑥 𝑝𝑝𝑛𝑛 𝑓𝑓𝑦𝑦 𝑝𝑝𝑛𝑛 𝐴𝐴𝐴𝐴 = 𝑏𝑏 𝑣𝑣𝑥𝑥 ⋮ ⋮ ⋮ 𝑣𝑣 = 𝑣𝑣 𝑦𝑦 Least-squares solution 𝑓𝑓𝑥𝑥 𝑝𝑝𝑛𝑛 𝑣𝑣𝑥𝑥 + 𝑓𝑓𝑦𝑦 𝑝𝑝𝑛𝑛 𝑣𝑣𝑦𝑦 = −𝑓𝑓𝑡𝑡 𝑝𝑝𝑛𝑛 −𝑓𝑓𝑡𝑡 𝑝𝑝1 𝑣𝑣 = 𝐴𝐴𝑇𝑇 𝐴𝐴 −1 𝐴𝐴𝑇𝑇 𝑏𝑏 𝑏𝑏 = −𝑓𝑓𝑡𝑡 𝑝𝑝2 ⋮ −𝑓𝑓𝑡𝑡 𝑝𝑝𝑛𝑛 Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 29 Optical flow example https://www.youtube.com/watch?v=GIUDAZLfYhY Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 30 Further reading on discussed topics Chapter 8 and 9 of Szeliski 2022 Chapter 9 of Shapiro and Stockman 2001 Acknowledgement Some images drawn from the above references Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 31 Example exam question Which one of the following statements about motion analysis is incorrect? A. Detection of moving objects by subtraction of successive images in a video works best if the background is constant. B. Sparse motion estimation in a video can be done by template matching and minimising the mutual information measure. C. Dense motion estimation using optical flow assumes that each small neighbourhood remains constant over time. D. Optical flow provides an equation for each pixel but requires further constraints to solve the equation uniquely. Copyright (C) UNSW COMP9517 24T3W9 Motion Estimation 32