Computer Vision Introduction

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Match the historical periods with their characteristic advancements in computer vision:

1960s = Interpretation of synthetic worlds 1980s = Shift toward geometry and increased mathematical rigor 1990s = Face recognition; statistical analysis 2010s = Resurgence of deep learning

Match each computer vision field with its corresponding mathematical foundation:

Radiometry = Physics of light measurement Optics = Behavior and properties of light Sensor Design = Engineering of imaging devices Computer Graphics = Modeling of objects and animation

Match the stage of vision with its description

Scene = The external environment Image Acquisition = The capture of visual information Perception = The extraction of meaning from visual data Image Interpretation = A computer mimicking people's results from analysing inputs of imagery.

Match the descriptions to the components in human anatomy:

<p>Retina = Comparable to film inside a camera; consists of nerve tissue that senses light Macula Lutea = Area providing clearest vision Fovea Centralis = Area with a cone of photoreceptors Iris = Colored annulus with radial muscles</p> Signup and view all the answers

Match types of animal eye

<p>Rods = Sensitive to low light or dim light Cones = Sensitive to color Humans = Rely heavily on color vision Cats = Rely vision in the dark</p> Signup and view all the answers

Which of the following phenomena are a 'life choice' for a photon?

<p>Absorption = Photon's energy is transferred to atoms in a material Transparency = Photons pass through a material with little scattering Refraction = Light bends from one medium to another Phosphorescence = Energy is stored for longer time before re-emission.</p> Signup and view all the answers

Match the color attribute to the corresponding term.

<p>Hue = The mean of the color in the world Saturation = How mixed is the color Brightness = Amount of light Color on a monitor = Based on RGB</p> Signup and view all the answers

Sort computer vision

<p>Finding People in images = Finding object in images Reading License Plate = Optical character recognition (OCR) Face unlock on Apple iPhone X = Vision-based biometrics Robot playing soccer = Vision for robotics</p> Signup and view all the answers

Match the types of pixel values.

<p>Grid = Matrix of intensity values Pixels = Range between 0 and 255 Black = The pixel value of 0 White = The pixel value of 255</p> Signup and view all the answers

Match the terms with their properties.

<p>3D = The world is 3D and Dynamic Camera = Cameras and computers are cheap Computer Vision System = Biological systems rely on computer vision system Image = image is worth 1000 words</p> Signup and view all the answers

Match the descriptions to terminology:

<p>Pinhole Camera = A simple camera model, Captures pencil of rays. all rays through a single point Center of Projection = Another word for 'focal point' Image Plane = Where the image is formed Camera Obscura = A camera invented in China.</p> Signup and view all the answers

Match the distortion with the definition.

<p>Perspective distortion = The angles is distorted Non-preservation of length = Size is inversely proportional to distance vanisihing point = Parallel lines in the world intersect in the image parallel = All Directions in the same Plane have vanishing Points on the same Line</p> Signup and view all the answers

Match the descriptions to homogeneous coordinate concepts:

<p>Homogeneous coordinates = Represent points in a projective space Use Cases = Makes calculations easier Advantage = Easier for computations Scaling = Scale Invariant</p> Signup and view all the answers

Identify the main focus of the related disciplines.

<p>Pattern recognition = Finding structure in the data Computer graphics = Modeling objects and animations Machine learning = Learning from data Projective geometry = Projecting and viewing geometry</p> Signup and view all the answers

Match the type of Camera Parameters

<p>Calibration = The step in the image aquisition pipeline Lenses = A camera Model Active research = Vision for robotics Pinhole Cameras = Camera to record</p> Signup and view all the answers

Relate Computer Vision Concepts.

<p>Sensitivity = sensitivity to Error is common in inverse problems. Vision Algorithms = often error prone and it is amazing that humans do this so effortlessly. Image processing = Related disciplines in computer vision Machine Learning = Machine Learning is very useful for Computer Vision</p> Signup and view all the answers

Match the descriptions to Computer Vision applications.

<p>Self-driving Vehicles = Driving point-to-point between cities as well as autonomous flight Motion capture = Using retro-reflective markers viewed from multiple cam eras Consumer-level applications = Turning overlapping photos into a single seamlessly stitched Visual authentication = Automatically logging family members onto your home com puter</p> Signup and view all the answers

Match terms for Image filtering

<p>Linear Filters and Convolution = Linear is Filters. Pyramids = is a technique of Image Filtering and Enhancing Edge Detection = Linear Filters and Convolution Image Smoothing = is a technique of Image Filtering and Enhancing</p> Signup and view all the answers

Match the timeline year.

<p>1970 = Digital image processing 1980 = Image pyramids 2000 = Face recognition and detection 2010 = Machine learning</p> Signup and view all the answers

Match the computer vision concept.

<p>Linear filters = is a common way to to process images, It used correlation and represented mathematically Gaussian = Powerful for object category Computer Vision = It is used to improve the Quality Images Machine Learning = Machine Learning is very useful for Computer Vision</p> Signup and view all the answers

Match each term related to image filtering with its correct definition

<p>Convolution = Process of multiplying pixel values by corresponding kernel weights Correlation = Process similar of Convolution but can be rotated and represented mathematically Impulse response function = Response of a system to a brief input signal Kernel = The filter with size K × K to require some Size to work</p> Signup and view all the answers

Which of the following techniques are used to improve Image processing while also reducing processing strain

<p>Reductions of blur = The reduce blurry the object Edge detections = the process images is progress Linear Filters and Convolution = is a common way to to process images, It used correlation and represented mathematically Separable filtering = Optimize convolution operations by breaking down into one-dimensional</p> Signup and view all the answers

Match the computer vision milestones with the corresponding decade:

<p>1970s = Progress in interpreting selected images and pictorial structures. 1990s = Focus on face recognition and statistical analysis. 2010s = Resurgence of deep learning and advanced architectures. 1960s = Interpretation of synthetic worlds and basic object recognition.</p> Signup and view all the answers

Match the concepts to their descriptions in computer vision:

<p>Inverse Problem = Recovering unknowns from insufficient information, making vision difficult. Forward Models = Developed in physics and computer graphics to describe image formation. 3D Modeling = Creating digital representations of environments from overlapping photographs. Stereo Matching = Creating dense 3D surface models from multiple views of an object.</p> Signup and view all the answers

Match the following descriptions to the related fields of Computer Vision:

<p>Computer Graphics = Focuses on creating images from 3D models. Machine Learning = Provides algorithms for pattern recognition and intelligent decisions from data. Digital Image Processing = Deals with image manipulation and enhancement at the pixel level. Computational Photography = Aims to enhance image capture and generation using computation.</p> Signup and view all the answers

Match the following applications with their descriptions in Computer Vision:

<p>Optical Character Recognition (OCR) = Technology to convert scanned documents or images of text into machine-readable text. Face Detection = Identifying and locating human faces in digital images. Motion Capture (Mocap) = Techniques to record and interpret movement, often used for animation. 3D Model Building (Photogrammetry) = Creating 3D models from 2D images, often aerial or drone photographs.</p> Signup and view all the answers

Match the descriptions to the related terms in vision:

<p>Vision = The process of discovering what is present and where it is by looking. Computer Vision = Analysis of pictures and videos to achieve human-like visual understanding. Image Acquisition = The process of capturing visual data, either by human eye or camera. Image Interpretation = The process of analyzing and understanding the acquired visual data.</p> Signup and view all the answers

Match the parts of the Human Eye with their functions:

<p>Retina = The innermost layer of the eye, comparable to camera film, sensing light. Iris = The colored part of the eye that controls the size of the pupil. Pupil = The aperture in the center of the iris that allows light to enter the eye. Lens = Focuses light onto the retina, enabling clear vision at different distances.</p> Signup and view all the answers

Match the photoreceptor types with their characteristics:

<p>Rods = Highly sensitive photoreceptors, responsible for vision in low light conditions and grayscale perception. Cones = Photoreceptors concentrated in the macula, responsible for color vision and high acuity in bright light. Macula Lutea = Yellowish central portion of the retina, area of clearest and most distinct vision. Fovea Centralis = Center of the macula, densely packed with cones, responsible for sharp central vision.</p> Signup and view all the answers

Match the concepts related to the Electromagnetic Spectrum and Vision:

<p>Visible Light = The portion of the electromagnetic spectrum that humans can see, crucial for computer vision. Wavelength = Determines the color of visible light, ranging approximately from 400nm to 700nm. Photoreceptors = Cells in the retina that are sensitive to light in the visible spectrum. Brightness Constancy = The visual system's attempt to discount illumination when interpreting colors.</p> Signup and view all the answers

Match the concepts related to Camera Modeling:

<p>Pinhole Camera = A simple camera model where all light rays pass through a single point. Focal Length = The distance between the lens (or pinhole) and the image plane, affecting field of view. Aperture = The opening in the lens that controls the amount of light entering the camera. Image Plane = The plane where the image is formed in a camera, corresponding to the film or sensor.</p> Signup and view all the answers

Match the Projection Properties to their descriptions:

<p>Many-to-one Mapping = Points along the same ray in 3D space project to the same point in the 2D image. Vanishing Point = The point in the image plane where parallel lines in 3D space appear to converge. Perspective Distortion = The effect where objects appear smaller as their distance from the camera increases. Projection Line = Lines in 3D space project to lines in the 2D image (unless passing through the focal point).</p> Signup and view all the answers

Match the Camera Parameters with their categories:

<p>Intrinsic Parameters = Parameters internal to the camera, like focal length and principal point. Extrinsic Parameters = Parameters describing the camera's position and orientation in the world. Rotation Matrix (R) = Describes the camera's orientation relative to the world coordinate system. Translation Vector (t) = Describes the camera's position in the world coordinate system.</p> Signup and view all the answers

Match the types of Projection with their characteristics:

<p>Perspective Projection = Objects appear smaller as they are farther away, mimicking human vision and standard cameras. Orthographic Projection = Parallel lines remain parallel, no perspective distortion, often used in technical drawings. Scaled Orthographic Projection = Approximation of perspective projection, suitable when object dimensions are small compared to distance. Affine Camera = Approximation where parallel lines remain parallel and ratios of lengths are preserved.</p> Signup and view all the answers

Match the terms related to Camera Lenses:

<p>Aperture = Controls the amount of light entering the camera and affects depth of field. Focal Length = Determines the field of view and magnification of the lens. Depth of Field = The range of distances in a scene that appear acceptably sharp in an image. Lens Aberrations = Imperfections in lenses that can cause distortions like chromatic and spherical aberration.</p> Signup and view all the answers

Match the Photon's Life Choices with their descriptions:

<p>Absorption = Photon's energy is transferred to the material, causing an increase in temperature. Diffuse Reflection = Light scatters in many directions from rough surfaces. Specular Reflection = Light bounces off smooth surfaces at the same angle. Refraction = Light bends as it passes from one medium to another due to a change in speed.</p> Signup and view all the answers

Match the Color Descriptors with their Psychophysical Correspondence:

<p>Hue = Corresponds to the mean wavelength of light, representing the central color perceived. Saturation = Corresponds to the variance of the light spectrum, indicating color purity. Brightness = Corresponds to the area under the light spectrum, representing the intensity of light. Color Constancy = The ability to perceive the 'true color' of a surface regardless of illumination changes.</p> Signup and view all the answers

Match the Color Spaces with their characteristics:

<p>RGB = Default color space for devices, easy to implement but not perceptually uniform. HSV = Intuitive color space based on Hue, Saturation, and Value, decoupling color and brightness. YCbCr = Color space used in TV and video compression, separating luminance and chrominance. L<em>a</em>b* = Perceptually uniform color space, designed to approximate human vision more accurately.</p> Signup and view all the answers

Match the Image Filtering Types with their descriptions:

<p>Linear Filtering = Applying a weighted sum of neighboring pixels to compute a new pixel value. Non-linear Filtering = Using operations that are not linear combinations of pixel values, like median filtering. Separable Filtering = Optimizing filtering by breaking down a 2D kernel into two 1D kernels. Gaussian Filtering = Using a Gaussian kernel for blurring and smoothing, with weights decreasing from the center.</p> Signup and view all the answers

Match the Linear Filters with their applications:

<p>Box Filter = Simple averaging filter for blurring, all weights are equal. Bilinear Filter = Smoothing filter with weights decreasing linearly from the center, better edge preservation than box filter. Sobel Filter = Edge detection filter emphasizing horizontal or vertical edges. Laplacian of Gaussian (LoG) = Filter for edge and blob detection, combining Gaussian smoothing and Laplacian edge detection.</p> Signup and view all the answers

Match the Non-linear Filters with their characteristics:

<p>Median Filter = Replaces each pixel value with the median value of its neighborhood, effective for salt-and-pepper noise. Bilateral Filter = Edge-preserving smoothing filter, weights pixels based on both spatial proximity and intensity similarity. Guided Image Filter = Uses a separate 'guide' image to influence the filtering of the target image, enhancing edges. Morphological Filters = Used in binary image processing for shape manipulation, including dilation and erosion.</p> Signup and view all the answers

Match the concepts related to Fourier Transform:

<p>Frequency Domain = Representation of an image in terms of its frequency components rather than spatial pixels. Low-pass Filter = Filter that passes low frequencies, used for blurring and smoothing images. High-pass Filter = Filter that passes high frequencies, used for edge detection and sharpening. Band-pass Filter = Filter that passes a specific range of frequencies, used for texture analysis and feature extraction.</p> Signup and view all the answers

Match the Image Resizing Techniques with their descriptions:

<p>Upsampling (Interpolation) = Enlarging an image by estimating pixel values in between existing pixels. Downsampling (Decimation) = Shrinking an image by reducing the number of pixels, often with pre-filtering to avoid aliasing. Image Pyramids = Multi-resolution representations of images, used for multi-scale analysis and efficient processing. Bilinear Interpolation = A simple interpolation method using linear interpolation in two directions.</p> Signup and view all the answers

Match the Machine Learning types with their learning approach:

<p>Supervised Learning = Learning from labeled input-output pairs to predict outputs for new inputs. Unsupervised Learning = Discovering patterns and structure in unlabeled data without explicit output labels. Semi-supervised Learning = Utilizing both a small amount of labeled data and a large amount of unlabeled data for learning. Reinforcement Learning = Learning through interaction with an environment, receiving rewards or penalties for actions.</p> Signup and view all the answers

Match the Supervised Learning Algorithms with their characteristics:

<p>K-Nearest Neighbors (KNN) = Non-parametric algorithm, classifies based on the majority class of the k-nearest neighbors in the feature space. Bayesian Classification = Probabilistic approach using Bayes' theorem to calculate posterior probabilities for class membership. Logistic Regression = Linear model for binary classification, predicting probabilities using a sigmoid function. Support Vector Machines (SVMs) = Finds optimal hyperplanes to maximize the margin between classes, effective in high-dimensional spaces.</p> Signup and view all the answers

Match the Unsupervised Learning Algorithms with their applications:

<p>K-Means Clustering = Partitions data into k clusters by iteratively assigning data points to the nearest cluster center. Gaussian Mixture Models (GMMs) = Models data as a mixture of Gaussian distributions, useful for density estimation and soft clustering. Principal Component Analysis (PCA) = Dimensionality reduction technique finding principal components that capture maximum variance in data. Manifold Learning = Techniques for dimensionality reduction that assume data lies on a lower-dimensional manifold embedded in a higher-dimensional space.</p> Signup and view all the answers

Match the Deep Learning concepts with their descriptions:

<p>Activation Functions = Introduce non-linearity in neural networks, enabling complex representations. Backpropagation = Algorithm for training neural networks by calculating gradients and updating weights to minimize loss. Convolutional Neural Networks (CNNs) = Neural networks specialized for image processing, using convolutional layers to learn spatial hierarchies. Recurrent Neural Networks (RNNs) = Neural networks designed for processing sequential data like video and text, maintaining temporal dependencies.</p> Signup and view all the answers

Match the Regularization Techniques with their purpose:

<p>L2 Regularization (Weight Decay) = Shrinks large weights to prevent overfitting and improve generalization. L1 Regularization (Lasso) = Can drive some weights to zero, effectively performing feature selection. Dropout = Randomly sets a proportion of neuron activations to zero during training to reduce overfitting. Data Augmentation = Increases the size and diversity of the training dataset by applying transformations to existing samples.</p> Signup and view all the answers

Match the Advanced Optimization Algorithms with their features:

<p>Stochastic Gradient Descent (SGD) = Basic optimization algorithm using gradients from individual samples or mini-batches. Momentum = Speeds up convergence by adding a memory-like effect to gradient descent, averaging past gradients. AdaGrad = Adapts learning rates for each parameter based on the frequency of features. RMSProp = Adjusts learning rates by dividing by an exponentially weighted average of squared gradients.</p> Signup and view all the answers

Match the Convolutional Neural Network Architectures with their key innovations:

<p>AlexNet (2012) = Kickstarted deep learning in computer vision, using ReLUs, dropout, and data augmentation. VGG (2014) = Demonstrated the effectiveness of deep networks with repeated small 3x3 convolutions. GoogLeNet (2015) = Introduced the Inception module for efficient multi-scale feature extraction. ResNet (2016) = Enabled training of very deep networks through skip connections, addressing vanishing gradient problems.</p> Signup and view all the answers

Match the concepts related to Visualizing Neural Networks:

<p>Network Weights Visualization = Visualizing the weights of neural network layers to understand learned features. Activation Visualization = Visualizing the activations of neurons in different layers to see how networks respond to inputs. Feature Map Analysis = Techniques to understand how different image regions activate network units. Class Activation Mapping (Grad-CAM) = Visualizing areas in an image that most influence the model's output classification.</p> Signup and view all the answers

Match the Generative Models with their descriptions:

<p>Variational Autoencoders (VAEs) = Generative models creating a probabilistic field to model data distribution, useful for generating diverse samples. Generative Adversarial Networks (GANs) = Dual-network setup with a generator and discriminator competing to generate realistic data. Generative Pre-trained Transformer (GPT) = Transformer-based model for text generation and language understanding, adaptable to other modalities. Boltzmann Machines = Energy-based models used for learning complex probability distributions, predecessors to deep learning.</p> Signup and view all the answers

Match the terms related to Batch Normalization:

<p>Normalization Layer = Layer that normalizes the activations of the previous layer in each mini-batch. Zero Mean and Unit Variance = The target distribution to which each mini-batch is normalized. Internal Covariate Shift = Problem reduced by batch normalization, where the distribution of network activations changes during training. Training Stability = Improved by batch normalization, leading to faster and more reliable convergence.</p> Signup and view all the answers

Match the terms related to Decision Trees and Forests:

<p>Decision Tree = A tree-like structure for classification or regression, making decisions based on feature thresholds. Random Forest = Ensemble of decision trees, improving robustness and generalization through averaging predictions. Tree Depth = A design parameter controlling the complexity of decision trees, deeper trees can overfit. Information Gain = Criterion used in decision tree learning to select the best features for splitting nodes.</p> Signup and view all the answers

Match the concepts related to Image Pyramids:

<p>Gaussian Pyramid = Pyramid created by repeated Gaussian smoothing and downsampling, for multi-scale representation. Laplacian Pyramid = Pyramid storing detail differences between Gaussian pyramid levels, enabling image reconstruction. Downsampling = Reducing image resolution, often by halving dimensions and applying a low-pass filter. Upsampling = Increasing image resolution, using interpolation techniques to estimate pixel values.</p> Signup and view all the answers

Match the Loss Functions with their primary application areas:

<p>Cross-entropy Loss = Primarily used for classification tasks, measuring the difference between predicted and true probability distributions. L2 Loss (Mean Squared Error) = Commonly used for regression tasks, measuring the squared difference between predicted and target values. L1 Loss (Mean Absolute Error) = Used for regression, more robust to outliers compared to L2 loss. Perceptual Losses = Used in image synthesis, aiming to match high-level perceptual features between generated and target images.</p> Signup and view all the answers

Match the techniques used in Efficient Nearest Neighbor Search:

<p>FLANN (Fast Library for Approximate Nearest Neighbors) = Specialized library for fast approximate nearest neighbor search. Faiss = Library optimized for very large-scale similarity search, GPU-enabled for speed. Randomized k-d Trees = Data structure used for efficient nearest neighbor search in high-dimensional spaces. Locality-Sensitive Hashing (LSH) = Technique for approximate nearest neighbor search by hashing similar items into the same buckets.</p> Signup and view all the answers

Match the Active Research Topics in Computer Vision with their focus areas:

<p>Object Recognition = Developing algorithms to identify and classify objects in images and videos. Human Behavior Analysis = Using computer vision to understand and interpret human actions and interactions. Internet and Computer Vision = Leveraging the vast amount of visual data on the internet for computer vision tasks. Medical Image Processing = Applying computer vision techniques to analyze and interpret medical images for diagnosis and treatment.</p> Signup and view all the answers

Match the Image Processing Operations with their effects:

<p>Image Smoothing = Reduces noise and sharp details, blurring the image. Edge Detection = Highlights boundaries between regions with significant intensity changes. Image Sharpening = Enhances edges and fine details, making the image appear crisper. Region Segmentation = Divides an image into distinct regions based on properties like color or texture.</p> Signup and view all the answers

Match the Image Filtering techniques with their characteristics:

<p>Linear Filters = Operations where the output pixel value is a linear combination of input pixel values. Non-linear Filters = Operations that do not rely on linear combinations, often for noise reduction while preserving edges. Frequency Domain Filtering = Manipulating image frequencies using Fourier Transform to achieve effects like blurring or sharpening. Spatial Domain Filtering = Applying filters directly to pixel values in the image space.</p> Signup and view all the answers

Match the terms related to Image Homogeneous Coordinates:

<p>Homogeneous Coordinates = A system to represent points in projective space, adding an extra dimension. Invariant to Scaling = Property of homogeneous coordinates where scaling the coordinates does not change the point. Point in Cartesian = Represented as a ray in Homogeneous coordinates. Point at Infinity = Represented in homogeneous coordinates with the last component set to zero.</p> Signup and view all the answers

Match the following historical periods with the computer vision task that was most actively researched during that era:

<p>1970s = Interpreting selected images 1980s = Geometric modeling and mathematical rigor 1990s = Face recognition and statistical analysis 2010s = Deep learning</p> Signup and view all the answers

Match the tasks to their description in computer vision:

<p>Object recognition = Identifying specific objects in an image Machine inspection = Rapid parts inspection for quality assurance Optical character recognition = Reading handwritten codes on letters Motion capture = Capturing actors’ movements for computer animation</p> Signup and view all the answers

Match the concepts to the descriptions:

<p>Visual data = 90% of internet traffic Computer Vision = Machine learning applied to visual data 3D to 2D Conversion = Implies loss of information Machine Learning = Algorithms that enable the change of computer behavior based on the data</p> Signup and view all the answers

Match the term to the description:

<p>Image filtering = Process of modifying a picture to enhance certain features or remove noise Computer vision = The analysis of pictures and videos in order to achieve results similar to those of people Vision = Discovering what is present in the world and where it is by looking Machine Learning = A scientific discipline that is concerned with the design and development of algorithms that allow computers to change behavior based on data</p> Signup and view all the answers

Match the following descriptions to the stage in the vision process they describe

<p>Image Acquisition = Eye and Camera Image Interpretation = Brain and Computer Shape, Illumination, and Color distribution = Properties to reconstruct Radiometry, Optics, and Sensor design = Developed with physics</p> Signup and view all the answers

Match the method to the description

<p>Stitching = Turning overlapping photos into a single seamless photo Morphing = Turning a picture into another person 3D modelling = Converting photos into a 3D model Exposure bracketing = Merging multiple exposures under strong lighting conditions</p> Signup and view all the answers

Match the concept to the definition

<p>Pinhole camera = Captures pencil of rays through a single point Vanishing point = Parallel lines converge Focal length = Distance between the lens and the image sensor Aperture = Opening that constrains the rays of lights</p> Signup and view all the answers

Match the lens flaws to their description

<p>Chromatic aberration = Different refractive indices for different wavelengths Spherical aberration = Lenses do not focus light perfectly Radial distortion = Caused by imperfect lenses Vignetting = The edges of an image being darker than the center</p> Signup and view all the answers

Match the component of the eye to its descriptor

<p>Retina = Comparable to the film inside a camera Macula lutea = The area providing the clearest most distinct vision Fovea centralis = An area where all the photoreceptors are cones: there are no rods in the fovea Iris = Colored annulus with radial muscles</p> Signup and view all the answers

Match the types of light sensitive receptors to their characteristics:

<p>Rods = Highly sensitive, and operate at night Cones = Operate in high light, and provide color vision Macula = Cone concentrated area Retina = Film inside the eye's camera</p> Signup and view all the answers

Match item to its description:

<p>Light Source = Point, area and sun are examples Perception = Brightness can be affected by light, shadows and source Texture = Can affect how light interact with the surface Color = Vision trying to reconstruct a property such as its illumination</p> Signup and view all the answers

Match light phenomena with their description:

<p>Absorption = When a photon is absorbed by the material Refraction = Light bends as it passes into water Transparency = Material allows photons to pass through it with little scattering Interreflection = Light bounces between multiple surfaces before reaching the viewer</p> Signup and view all the answers

Match each technique with its use:

<p>Image stitching = Creating seamless panoramas Exposure bracketing = Merging photos taken under strong sunlight Image morphing = Blending photos between multiple exposures 3D modeling = Converting photos to 3D model of the object</p> Signup and view all the answers

Match the following steps to the correct order in basic linear filtering

<p>1 = Take a small neighborhood of pixels around the image 2 = Multiply their values by corresponding weights 3 = Add them up to become the value of a pixel in a output image 4 = Repeat for all pixel x = Any Order</p> Signup and view all the answers

Match the following with their mathematical representation

<p>Linear fitler: correlation = $ g(i,j) = \sum_{k,l} f(i + k,j + l) \cdot h(k,l) $ Linear filter: convolution = $ g(i,j) = \sum_{k,l} f(i - k,j - l) \cdot h(k,l) $ Homogeneous conversion = $(x, y) \Rightarrow\begin{bmatrix}x\y\1\end{bmatrix}$</p> Signup and view all the answers

Match the technique that solve's the issue

<p>Homogeneous points = How to account for points at infinity? Non-skewed pixel = The angles of the axes may not be perpendicular? Transalation (t) = The camera's position in the world? Scaling of the image = Object farther way appear smaller?</p> Signup and view all the answers

Match the following words with their definition regarding Deep Learning and Computer Visison

<p>Accuracy = The number of correct predictions made by the model Overfitting = When a model learns the training data too well Loss Function = The penalty for incorrection prediction Weights = Store the network's knowledge across all of the neurons.</p> Signup and view all the answers

Match the types of Layer's in the Deep Neural Network to their defintion

<p>Fully Connected Layer: FC = Use a dense weight matrix with connections between all inputs and outputs: Deep Neural Network (CNN) = Replacee dense weight matrices with sparse convolutional kernels Pooling = Is a technique to reduce the spatial dimensions X = Any Other Combination</p> Signup and view all the answers

Match image analysis and processing techniques to real-world applications.

<p>Edge Detection = Medical image processing Active Vision = Active vision has the ability to control the way the next picture will be Object Detection = Self-driving car Pattern Recognition = Facial recognition</p> Signup and view all the answers

Match methods with components used during it's application in transfer learning:

<p>Fine tuning = Weights (adjust pre-trained model weights) Head Replacement = Layers (Replace the final layers) Weight Decay = Shrinks wights to prevent over fitting</p> Signup and view all the answers

Flashcards

Inverse Problem

Computer vision aims to describe the world that we see in images and reconstruct its properties like shape and illumination.

Machine Learning

A scientific discipline concerned with the design and development of algorithms that allow computers to change behavior based on data.

Computer Vision

Describes the world that we see in one or more images and to reconstruct its properties, such as shape, illumination, and color distributions.

Difference between Machine learning and computer vision

In machine learning it usually does not care about how to obatain the date or sensors, where as in computer vision we deeply care about obtaining visual data.

Signup and view all the flashcards

Visual Data Dominance

Visual data constitutes the vast majority of internet traffic.

Signup and view all the flashcards

Vision

The process of discovering what is present in the world and where it is by looking.

Signup and view all the flashcards

Imaging Geometry

Analysis of the relationship between images and the geometry of the world from which they are formed.

Signup and view all the flashcards

Computer Vision

The process of describing the world that we see in images and to reconstruct its properties.

Signup and view all the flashcards

Computer vision inverse

Trying to do the inverse, that is, describe the world that we see in one or more image and to reconstruct it properities.

Signup and view all the flashcards

Retina

The retina is the innermost layer of the eye and is comparable to the film inside of a camera. It is composed of nerve tissue which senses the light entering the eye.

Signup and view all the flashcards

The hyman eye

The human eye is the organ which gives us a sight, allowing us to learn more about the surrounding world than we do with any of other four senses.

Signup and view all the flashcards

Macula lutea

The small yellowish central prtion of the retina. It is the area providing the clearest, most distinct vision.

Signup and view all the flashcards

Fovea centralis

Is an area all of the photo receptors are cones.

Signup and view all the flashcards

The forward models

The forward models that we use in computer vision are usually developed in physics and in computer graphics.

Signup and view all the flashcards

Electromagentic spectrum

The electromagentic spectrum from the lowest to the highest frequency includes all radiowaves, infrared radiation, visible light, ultraviolet radiation

Signup and view all the flashcards

doing an inverse

In computer vision, you are trying to do the inverse, i.e., to describe the world that we see in one or more images and to reconstruct its properties, such as shape, illumination, and color distributions.

Signup and view all the flashcards

Why computer vision

The world is 3d and dynamic, cameras and computers are cheap, and an image is worth 1000 words

Signup and view all the flashcards

active topics of research

A rough timeline of some of the most active topics of research in computer.

Signup and view all the flashcards

what is vision

Vision is the process of discovering what is present in the world and where it is by looking.

Signup and view all the flashcards

What is computer vision

Computer Vision is the study of the analysis of pictures and videos in order to achieve results similar to those as by people.

Signup and view all the flashcards

the inverse

In computer vision, we are trying to do the, to describe the world that we see in one or more images and to reconstruct its properties, such as shape, illumination, and color distributions.

Signup and view all the flashcards

interpreting images selected

1970's: interpreting selected images.

Signup and view all the flashcards

connect

a camera to a computer and get the machine to describe what it sees.

Signup and view all the flashcards

Study Notes

Module 1

  • Module 1 is an introduction to computer vision and recent advances, in DS-473 Computer Vision.

Weekly Learning Outcomes

  • A need to understand how computer vision and its evolution were needed is required
  • Need to understand what computer vision is, why it is needed and how it compares with related fields.
  • Need to understand the existing applications of computer vision and its promising research areas.

Contents

  • Background of computer vision
  • A brief history of computer vision is required
  • Need to understand what computer Vision is
  • Understanding computer vision topics
  • Applications of computer vision
  • Active research topics are all important

Background of Computer Vision

  • Humans perceive the world in three dimensions with ease, naming people in photos and guessing their emotions.
  • Optical illusions tease out the principles of how the visual system works, but a complete solution remains elusive.
  • Researchers in computer vision apply mathematical techniques to recover 3D shapes and appearances from images.
  • Reliable techniques now compute 3D environment models from overlapping photographs.
  • Accurate dense 3D surface models are creatable from views of an object using stereo matching
  • With partial success, most individuals and objects are able to be delineated in photographs.
  • Having a computer explain an image with two-year-old detail and causality remains challenging.
  • Vision is difficult because it's an inverse problem, seeking unknowns with insufficient specifying information.
  • Physics-based models, probabilistic models, or machine learning from examples needed to disambiguate solutions.
  • Modeling the complex visual world is harder than modeling vocal tracts that produce spoken sounds.
  • Forward models in computer visions use physics such as radiometry, optics and sensor design and computer graphics.
  • How light reflects off surfaces through camera lenses such as human eyes and it is projected onto a flat or curved plane is shown in computer and physics models.
  • Trying to inverse image properties and reconstruct their properties like shape, illumination and color distributions are what computer vision tries to achieve.
  • Humans/animals effortlessly do this, but computer vision algorithms prone to error.
  • Underestimating the difficulty is a common mistake by people who have not worked in the field.
  • The misperception of easy vision dates back to early AI, with cognitive parts believed more difficult than perceptual ones.

History of Computer Vision

  • The timeline of active research in computer vision: digital image processing (1970) to vision and language (2020)
  • In 1966, Minsky tasked a first-year student to connect a camera to a computer to describe what it sees.
  • Larry Roberts, the "Father of Computer Vision" wrote his PhD Thesis in 1963

Interpretation of Synthetic Worlds 1960's

  • Larry Roberts invented machine perception of three-dimensional solids.

Interpreting Selected Images 1970's

  • Fischler and Elschlager worked on the representation and matching of pictorial structures in 1973
  • The work involved locating HAIR at (13, 23), L/EDGE at (25, 13), R/EDGE at (25, 28), L/EYE at (22, 16), R/EYE was located at (22,23), NOSE was located at (27, 20) and MOUTH located at (29, 19).

ANNs & Rigour 1980's

  • ANNs rose to prominence and then waned, causing a shift towards geometry and increased mathematical rigor

Face Recognition 1990's

  • Face recognition and statistical analysis were in vogue

Data Sets 2000's

  • Broader recognition and large annotated datasets were available & video processing started

Deep Learning 2010's

  • A resurgence of deep learning took place

Autonomous Vehicles 2020's

  • Autonomous vehicles were developed

Robot Uprising 2030's

  • The potential for Robot uprising happens

What is computer vision?

  • Computer vision involves extracting properties of the 3D world from images.
  • Elements include type/number of traffic scene vehicles, closest obstacle and congestion.

Computer Vision vs. Graphics

  • 3D to 2D implies information loss
  • Unlike Graphics, computer vision requires sensitivity to errors and need for models

Relation to nearby fields

  • Machine learning applied visual data is = vision

Reasons computer vision is valuable

  • Images are worth 1000 words
  • Many biological systems rely on vision
  • The world is 3D and dynamic, with cheap cameras/computers

Example

  • An example of computer visions is to find people through images to define what is and is not a image with people

Topics

  • Imaging geometry
  • Camera modelling
  • Image filtering and enhancing
  • Region Segmentation
  • Color
  • Texture
  • Shape analysis

Successful application of vision

  • Face detection in digital cameras automatically focuses (AF) and optimizes exposure (AE).

Real-world applications

  • Optical character recognition (OCR) for postal codes/number plates reading.
  • Rapid inspection for quality assurance with specialized stereo vision.
  • Object recognition for automated checkout lanes.
  • Autonomous package delivery and pallet-carrying "drives".
  • Registering imagery in medical imaging performing people's brain morphology studies.
  • Self-driving cars capable of autonomous flight and point-to-point driving.
  • Fully automated models building from both aerial and drone photographs.
  • Tracking feature points in computer-generated imagery /live-action footage to estimate 3D camera motion/shape.
  • They require precise matting to insert new elements.
  • Motion capture uses retro-reflective markers/vision-based techniques for computer animation.
  • Surveillance monitors intruders, analyzes highway traffic and monitors pools for drowning victims.
  • Fingerprint recognition/biometrics authenticate access via forensics.

Consumer Level application

  • Photo-based walkthroughs allow in-home navigation via 3D photos.
  • Face detection improves camera focusing and image searching.
  • Visual authentication logs family members into the home computer.
  • Video match move and stabilization inserts 2D pictures or 3D models into videos, or removes video shake.
  • There is stitching turning overlapping photos into panoramas.
  • Exposure bracketing merges the lighting and shadows in multiple exposures to be perfect.
  • Morphing turns friends into another.
  • 3D modelling converts one or more snapshots into a 3D model of a subject

Real world application (state of the art)

  • Earth viewers using 3D modelling and virtual earth are used

Optical character recognition (OCR)

  • Technology converts scanned documents to text
  • Having a scanner probably means comes with OCR software

Face Detection

  • Detection using a digital camera detects the faces of people who are being photographed

Face Analysis & Recognition

  • Analysing and reading the faces and expressions of people

Biometrics

  • It is possible to log in without a password

Sports

  • Camera is implemented to aid helping/improving decisions

Recognition

  • Is used to pick out objects in supermarkets and mobile phones

Important Points Computer Vision Focuses on

  • What information should be extracted?
  • How can it be extracted?
  • How Should it be represented?
  • How can it be used to achieve the goal?

Active Research Topics

  • Object recognition
  • Human behaviour analysis
  • Internet/computer vision
  • Biometrics/soft biometrics
  • Large-scale 3D reconstruction, and medical image processing.
  • Also vision for robotics

Key principle

  • Vision should be easy, back from initial day of artificial intelligence.
  • Most believe that cognitive parts of intelligences were more difficult than the perception
  • Now it is know that that idea was incorrect

Other information

  • Flicker, Facebook,Instagram, and Youtube will increase usage of net to 90%

Vision

  • Discovering what is present and where it is by looking.
  • A scene image is interpreted by the brain perception

Core Elements of Vision

  • It is an inherently ambiguous problem and requires prior knowledge
  • Models are usually developed in physics (radiometry, optics sensor design in computer graphics)

Module 2

  • Module 2 is titled Human Vision and Cameras

Weekly Outcomes

  • In order to comprehend Computer vision, one must understand how does Human vision system work.
  • How does Camera work and how to represent an image?
  • Projection geometry used in Camera and Lens.

Contents

  • Human vision system
  • Human vision for computer vision
  • Pinhole camera model
  • Cameras and image formation
  • Projection geometry
  • Thin lens

Camera from scratch

  • What do you need to make one?

Human Eye key takeaways

  • The human eye gives us the sense of sight
  • The eye enables interpreting shapes, colors, dimension of objects by processing of the light reflected off them and can detect bright and dim light
  • The retina is like camera film which is made up of nerve tissue which senses the light that is coming through the eye.
  • The macula lutea provides clear distinctions.
  • The photoreceptors in the fovea centralis only contain cones but not rods.

Rods

  • There are approximately 120 millions
  • More sensitive than cone however they are not sensitive to colour

Cones

  • There are approximately 6-7 million
  • Provide eyes colour sensitivity and are located known as macula

Electro-magnetic specturm

  • Includes Radio waves, infrared, visible light, UV, Gamma Rays and X-rays.
  • Light is the process of discovering what is present and where by looking.
  • People do not see with their eyes, but instead use their brains

Human Vision to Computer Vision

  • Human vision is vastly better at recognition
  • Biology hints are very useful Vision is better with biology

Feedfoward Notes

  • LGN---Lateral Geniculate Nucleus
  • V1---The primary visual cortex
  • V2---Visual area V2
  • IT- Inferior temporal cortex

Models of Four Layers

  • (S1 --> C1 --> S2 --> C2)
  • Model is powerful for object recognition
  • The following Researchers include: Riesenhuber & Poggio '99. Seree et al. '05, '07. Mutch & Lowe '06

Cameras are used to capture images for

  • Image formation

Image

  • A grid/matrix of intensity values

Pinhole Camera Model

  • Rays travel through a small hole aperture.
  • Reduces bluring

Camera Obscura: The pre-camera

  • It was mostly used during the China/Greece period.
  • Larry Seitz is attributed this information
  • The camera obscured was used for tracing

One of Oldest suriving Photgraph

  • The oldest surviving photograph can be traced back to Joseph Niepce 1826
  • It took 8 hours and it was stored at U.T Austin
  • He teamed up with Daguerre and eventually created Daguerotypes

Dimensionality reduction machine (3d to 2d)

  • It loses angles and distances but obtains two dimensions

Project Properties

  • Parallel lines converge at a vanishing point
  • Cons include: each direction of space has its own vanishing point but they are still parallel to the image plane ( the same plane as the vanishing points)

Homogenous Coordinates

  • Coordinate Scaling
  • Invariant Scaling where kxX/kW = X/w and kY/kW=X/w

Basic Geometry in Homogenous Coordinates

  • The line equation includes : ax+by=c = 0
  • Appending 1 to pexil coordinate to get homogenous coordinate where p1= the power / v1/

Current view to the art

  • Current 3 d models of the earth can be viewed by Microsofts Virtual Earth

Optical character recognition ( OCR)

  • Has led many people with a scanner to probably have ocr software

Face detection

  • Has led Many new digital cameras to detech faces through many companies like Canon, Sony Fuji

Vision-Based Biometrics

  • Has been used with how the Afghan girl was identify by her iris pattern

vision based interaction wth games

  • Is being implemented to put faces on avatars

Real world examples

  • Mobileye vision system
  • 70% of car manufacturers

Computer vision is sports

  • Hawk-Eye is implemented helping/improving decisions.

Vision in space

  • Nasas Mars exploration Spirit Rover is used for westward view from the top A low plateau to spend the closing months of 2007

Medical imaging

  • Medical Imaging has developed 3D Imaging through Mri/Ct Scans
  • And surgeons/ doctors are guided and image

Key notes for Computer vision

  • Related Disciplines include: image processing, pattern recognition, photogrammetry, computer graphics, artificial intelligence, machine learning, projective geometry, control theory

Light and Color

  • There Exist myriad consumer-level ap plications, as things you can do with your own personal photos and video
    • There is stitching to turn overlapping photos for a stiched look, bracketing to allow multiple exposure, Morphing which blends images, and 3d modelling for persons and items.

A photon’s life choices includes

  • Absorption, Diffusion, Reflection , Transparency, Refraction, Fluorescence, Subsurface scattering, Phosphorescence, Interreflection

The Human Eye

  • Can be thought of like a camera.
  • The Iris has colour to regulate a light
  • There are photoreceptors/ cells in the retina to detect rods/ cones

Physiology of Color Vision

  • There are 3 kinds of cones that exist
  • There is detection/localization, frontization, and Sfc labels There is a language generation in a shopping market.
  • Visual data can often be from Flickr, Facebook, instagram, and youtube.

Background on electro-magnetic spectrum

  • Source: Guodong Guo.

###Vision for computer vision

  • Vision is better with Biology hints

Feedforward processing

  • Lgn equals --the lateral geniculate nucleus
  • V1 visual cortex are required

Current State of the Art

  • Earth viewers (3D modeling)
  • Microsoft is behind the virtual earth

Computer Vision Focuses On

  • what information should be collected
  • how can it be extracted
  • what method to represent it
  • how to achieve the same Goal

History Of Computer Vision

  • A rough timeline of how vision has developed

1960

  • Minsky hired a first-year undergraduate student to solve a summer problem

1970's

  • The representation and matching of pictorial structures was studied in 1973

1980's

  • ANN also shifted towards geometry and increased mathematical rigor

1990's

  • Face recognition and statistical analysis were developed and became popular

2000's

  • Broader recognition and large annotated data sets were available

2010's

  • Resurgence of deep learning

2020's

  • Autonomous Vehicles now exist

2030's

  • The question now lies for robot uprising?

Computer Vision

  • In Vision, the process is discovering what is present in the world and where it by looking.
  • The purpose Computer vision and study pictures video has a goal to achieve results similar to as by people.
  • An argument be made that one imagine was worth 1000 words

More facts

  • The world is 3d and dynamic
  • Cameras and computers are cheap

Computer Vision is useful for

  • finding shapes and colour
  • Linear filters are used for linear smoothing and edge detection

The History Of Research Computer Vision

  • A rough timeline of some of the most active topics of research

Misc

  • As humans do not see with their eyes, but their brains
  • In machine learning it does not care how to obtain the data or sensors but computer vision does.
  • Fischler and Elschlager

Types

  • Face detection
  • Object/face recognition
  • Biometrics

Module-3 Light And Colour

Weekly Outcomes:

  • A understanding Light, Color, Reflection, and absorption in nature is key for vision.
  • Understanding How is the image is represented.
  • What a pixel is represented, and what are the colour representation used for vision

The Contents

  • Light Color Reflection, and absorption.
  • An Understanding Human Eyes
  • What A Pixel is? HOw images are represented?
  • Understanding types Pixel’s Color ,Brightness and intensities.

Key points

  • When using computer vision, perception can be ambiguous.

The Bottom Line

  • Use reading materials such as Szeliskis, Richards. and Jean Ponce.

Szeliski

Electromagnetic Specturum

  • The electromagnetic spectrum, from the lowest to the highest frequency, includes all radio waves commercial radio and television microwaves radar infrared ultraviolet radiation, X-rays, and gamma rays.

Human Vision System

  • Visual Fields: people do not "see" with their eyes, but with their brains.
  • Lgn ---the lateral geniculate nucleus: -v1 - The primary visual cortex / v2: - Visual area V2 IT:Inferior temporal cortex.

Computer Vision 101

  • The human eye is vastly good at recognition rather than any Computer systems, so it may be very useful for biology systems.

1970 - PRESENT

  • Digital image processing Block world labeling / generalized cylinders / pattern recognition Stereographic correspondence / intrinsic images / optical flow / structure for motion / image pyramids *shape from shading / texture and focus
  • physically biased modeling / regularization *Markov random fileds / Kalman filters 3d range data processing / projection invariants
  • factorization / physics-based vision / graph cuts and particle filtering energy based for segmentation
  • face recognition and detection
  • image-based modeling and rendering
  • texture synthesis and impainting
  • computational photography
  • feature biased recognition
  • category recognition
  • machine learning
  • modeling and tracking humans
  • semantic segmentation
  • slom and Vol
  • deep learning
  • vision and language

What is Image Procession

  • Technology to convert scanned docs to text
  • Used most if you have scanner would include the OCR software.

Face detection

  • Most Digital camera can now detect faces

Active Reseach Topics

  • Object recognition
  • Human behaviour analysis
  • internet computer vision Biometrics and soft biometrics medical image processing / vision for Robotics.

Vision Based Interactions

  • Using digmask

Note 2010 for eye

  • There are many variables and parameters that need to be accounted for in regards to the light and colour with computer programming.

Module 4: Image Transmation and Filtering

Learning Outcomes:

  • Filtering should be very comprehensive and knowledge.
  • Master of image/filtering

Contents

  • Techniques of Image transformation in filtering

  • Techniques that are linearly separating

  • Advanced filtering

Important Tips to remember

  • One needs to always look at chapter 3
  • That helps in image processing

Key Notes

  • An image as for (x,y)
  • Linear filtering is called convolution h
  • Is more difficult to separate

Filters

Impulse response image and cross co rrelation output and convolution output

  • All of them should have a 2 to 1

More info

  • Image and world in a picture are at a point

Linear assumption’s

  • The pixles assumed square, and no skew will lead to better imaging

Module

  • When it comes to the the camera location.
  • In that space when you have vanishing points and lines and shapes
  • That helps with image recognition

Main notes and Topics for section 2

  • You need —a pinhole camera —and the geometry that applies to said image
  • the aperture- to help get the image through rays

Notes to the student on this module

  • The human has vision and you have to have models that show that as Well for these different points

What it implies

  • Is in there is an information loss the vision can improve by the sensitivity to the error.

That machine learning applies to the data set

Module 5

Deep Leaning

  • Machine learning and a transition from traditional techniques
  • It should describe from Supervised Learning (with labelled data), Unsupervised Learning With The Data.
  • Should Describe Algorithms And Practical Applications For Deep learning. That the data set —is for image recognition.

Types:

  • Non-linear
  • ANN

Key Note

  • A major focus for machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data.

Chapter - 5: Machine Learning

  • Can happen with Linear discrimination analysis/

  • --- It requires quadratic discrimination analysis

Extra Notes:

  • Supervised Learning- with labelled data Unsupervised Learning with out the labelled data.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Computer Vision: Application Areas
30 questions
Artificial Intelligence Overview
45 questions
Pattern Recognition in Images
14 questions

Pattern Recognition in Images

RevolutionaryHeliotrope6259 avatar
RevolutionaryHeliotrope6259
Use Quizgecko on...
Browser
Browser