Podcast
Questions and Answers
In the context of computer vision, what is the primary objective?
In the context of computer vision, what is the primary objective?
- To simulate human emotions through artificial intelligence.
- To enhance the processing speed of computer hardware.
- To create complex algorithms for data storage.
- To develop systems capable of interpreting and understanding visual information. (correct)
What should students do in the tutorials for this course?
What should students do in the tutorials for this course?
- Memorize all the definitions provided in the slides.
- Prepare summaries of the assigned readings.
- Solve assignments, ask relevant questions, and engage with the material. (correct)
- Transcribe lecture notes for later study.
Which programming language is primarily used for assignments in this computer vision course?
Which programming language is primarily used for assignments in this computer vision course?
- Python along with TensorFlow (correct)
- C++
- MATLAB
- Java
What is the minimum percentage of total points a student needs to pass the assignment portion of the course?
What is the minimum percentage of total points a student needs to pass the assignment portion of the course?
What is the format of the final exam in this course?
What is the format of the final exam in this course?
Which action is explicitly prohibited in this course regarding AI use?
Which action is explicitly prohibited in this course regarding AI use?
What was the Mark I Perceptron primarily composed of, as described in the content?
What was the Mark I Perceptron primarily composed of, as described in the content?
According to the content, what did Hubel and Wiesel's research in 1959 reveal about the visual cortex?
According to the content, what did Hubel and Wiesel's research in 1959 reveal about the visual cortex?
Within the models of optics, which category considers the wave nature of light, particularly phenomena like interference and diffraction?
Within the models of optics, which category considers the wave nature of light, particularly phenomena like interference and diffraction?
Light is described as part of what broader spectrum?
Light is described as part of what broader spectrum?
What fundamental concept describes light's ability to exhibit both wave-like and particle-like properties?
What fundamental concept describes light's ability to exhibit both wave-like and particle-like properties?
If an electron is accelerated using a voltage of 100kV, which of the listed constants is necessary to compute its De Broglie wavelength?
If an electron is accelerated using a voltage of 100kV, which of the listed constants is necessary to compute its De Broglie wavelength?
What phenomenon explains the bending of light waves around obstacles or through narrow openings?
What phenomenon explains the bending of light waves around obstacles or through narrow openings?
Which of the models is most suitable for designing lenses and mirrors in optical instruments, without needing to consider wave effects?
Which of the models is most suitable for designing lenses and mirrors in optical instruments, without needing to consider wave effects?
In the context of wave interference, what aspect of an electromagnetic wave is relevant when considering the combined effect of multiple waves at a specific point in time?
In the context of wave interference, what aspect of an electromagnetic wave is relevant when considering the combined effect of multiple waves at a specific point in time?
Given the formula for the De Broglie wavelength $\lambda = h/p$, where $h$ is Planck's constant and $p$ is momentum, how would an increase in a particle's momentum affect its De Broglie wavelength?
Given the formula for the De Broglie wavelength $\lambda = h/p$, where $h$ is Planck's constant and $p$ is momentum, how would an increase in a particle's momentum affect its De Broglie wavelength?
What is a significant drawback of traditional object detection methods that deep learning-based approaches aim to overcome?
What is a significant drawback of traditional object detection methods that deep learning-based approaches aim to overcome?
Which of the following best describes the output of a typical CNN-based object detection model?
Which of the following best describes the output of a typical CNN-based object detection model?
In the context of object detection, what does the 'sliding window' approach primarily involve?
In the context of object detection, what does the 'sliding window' approach primarily involve?
What is the primary purpose of converting fully connected layers into convolutional layers in the context of CNNs for object detection?
What is the primary purpose of converting fully connected layers into convolutional layers in the context of CNNs for object detection?
What is a significant drawback of using a CNN with a sliding window approach for object detection, particularly before optimizations like Fast R-CNN?
What is a significant drawback of using a CNN with a sliding window approach for object detection, particularly before optimizations like Fast R-CNN?
In the context of R-CNN, what is the role of the selective search algorithm?
In the context of R-CNN, what is the role of the selective search algorithm?
What is a major performance bottleneck in the original R-CNN object detection algorithm?
What is a major performance bottleneck in the original R-CNN object detection algorithm?
How does Fast R-CNN improve upon the original R-CNN in terms of processing the image with a convolutional neural network?
How does Fast R-CNN improve upon the original R-CNN in terms of processing the image with a convolutional neural network?
What is the primary purpose of filtering features in image processing?
What is the primary purpose of filtering features in image processing?
In the context of image processing, what does a grid (matrix) of intensity values typically represent?
In the context of image processing, what does a grid (matrix) of intensity values typically represent?
What does the convolution operation achieve in image processing?
What does the convolution operation achieve in image processing?
In Fourier Transform, what does the frequency index 'k' represent?
In Fourier Transform, what does the frequency index 'k' represent?
What is cross-correlation primarily used for in image processing?
What is cross-correlation primarily used for in image processing?
How does the Fourier Transform decompose a function?
How does the Fourier Transform decompose a function?
What is the significance of Euler's formula in the context of Fourier Transform?
What is the significance of Euler's formula in the context of Fourier Transform?
If a signal completes approximately 4.17 cycles in 1 second, according to the material, what is its period?
If a signal completes approximately 4.17 cycles in 1 second, according to the material, what is its period?
In the context of Wasserstein Distance, what does the term $c(x, y)$ typically represent?
In the context of Wasserstein Distance, what does the term $c(x, y)$ typically represent?
When comparing images using pixel intensity histograms and Wasserstein Distance (WD), under what circumstances might it be suitable to use a formula derived under the Gaussian assumption?
When comparing images using pixel intensity histograms and Wasserstein Distance (WD), under what circumstances might it be suitable to use a formula derived under the Gaussian assumption?
What are the two primary components that contribute to the overall transport cost in the context of Wasserstein Distance when applied to Gaussian distributions?
What are the two primary components that contribute to the overall transport cost in the context of Wasserstein Distance when applied to Gaussian distributions?
What key characteristic differentiates Jensen-Shannon divergence from Kullback-Leibler divergence?
What key characteristic differentiates Jensen-Shannon divergence from Kullback-Leibler divergence?
In unsupervised learning, which technique focuses on discovering the underlying structure of data by reducing the number of variables while preserving essential information?
In unsupervised learning, which technique focuses on discovering the underlying structure of data by reducing the number of variables while preserving essential information?
Which of the following unsupervised learning methods is specifically designed to group similar data points into clusters?
Which of the following unsupervised learning methods is specifically designed to group similar data points into clusters?
What is a defining characteristic of self-supervised learning that distinguishes it from traditional supervised learning?
What is a defining characteristic of self-supervised learning that distinguishes it from traditional supervised learning?
Consider a scenario where you need to compare the movement patterns of pixels in two video sequences. Which distance metric would be most appropriate?
Consider a scenario where you need to compare the movement patterns of pixels in two video sequences. Which distance metric would be most appropriate?
Which of the following scenarios would likely result in K-Means clustering performing suboptimally?
Which of the following scenarios would likely result in K-Means clustering performing suboptimally?
In image segmentation using K-means, what is the primary goal?
In image segmentation using K-means, what is the primary goal?
Which unsupervised learning task aims to reduce the number of variables in a dataset while preserving essential information?
Which unsupervised learning task aims to reduce the number of variables in a dataset while preserving essential information?
What is a key characteristic of self-supervised learning?
What is a key characteristic of self-supervised learning?
What is the primary function of the encoder in an autoencoder?
What is the primary function of the encoder in an autoencoder?
What is a major limitation of K-Means clustering that requires careful consideration before its application?
What is a major limitation of K-Means clustering that requires careful consideration before its application?
In the context of automatically labeling images using an autoencoder, what serves as the 'label' for each image?
In the context of automatically labeling images using an autoencoder, what serves as the 'label' for each image?
What is the purpose of back-propagation in training an autoencoder?
What is the purpose of back-propagation in training an autoencoder?
Flashcards
Computer Vision (CV)
Computer Vision (CV)
Building artificial systems to process, perceive, and reason about visual data.
Mark I Perceptron
Mark I Perceptron
A 3-layered perceptron network with sensory, association, and response units.
S-units
S-units
First layer: Contains 400 photocells arranged in a 20x20 grid, named sensory units or input retina
A-units
A-units
Signup and view all the flashcards
R-units
R-units
Signup and view all the flashcards
Simple cells
Simple cells
Signup and view all the flashcards
Light
Light
Signup and view all the flashcards
Optics Models
Optics Models
Signup and view all the flashcards
Wave Optics
Wave Optics
Signup and view all the flashcards
De Broglie Wavelength
De Broglie Wavelength
Signup and view all the flashcards
Wave-Particle Duality
Wave-Particle Duality
Signup and view all the flashcards
Interference
Interference
Signup and view all the flashcards
Diffraction
Diffraction
Signup and view all the flashcards
Fraunhofer Diffraction
Fraunhofer Diffraction
Signup and view all the flashcards
Image Representation
Image Representation
Signup and view all the flashcards
Cross-correlation
Cross-correlation
Signup and view all the flashcards
Convolution (Filtering)
Convolution (Filtering)
Signup and view all the flashcards
Filtering Features
Filtering Features
Signup and view all the flashcards
Fourier Transform
Fourier Transform
Signup and view all the flashcards
Euler's Formula
Euler's Formula
Signup and view all the flashcards
Frequency
Frequency
Signup and view all the flashcards
f(tn)
f(tn)
Signup and view all the flashcards
Inefficiency in Object Detection
Inefficiency in Object Detection
Signup and view all the flashcards
Sensitivity to Variations
Sensitivity to Variations
Signup and view all the flashcards
CNN output for Object Detection
CNN output for Object Detection
Signup and view all the flashcards
CNN with Sliding Window
CNN with Sliding Window
Signup and view all the flashcards
CNN Sliding Window Optimization
CNN Sliding Window Optimization
Signup and view all the flashcards
Region Proposal (Selective Search)
Region Proposal (Selective Search)
Signup and view all the flashcards
R-CNN Weakness
R-CNN Weakness
Signup and view all the flashcards
Fast R-CNN Improvement
Fast R-CNN Improvement
Signup and view all the flashcards
Euclidean Distance
Euclidean Distance
Signup and view all the flashcards
Mahalanobis Distance
Mahalanobis Distance
Signup and view all the flashcards
Wasserstein Distance
Wasserstein Distance
Signup and view all the flashcards
Kullback-Leibler Divergence
Kullback-Leibler Divergence
Signup and view all the flashcards
Jensen-Shannon Divergence
Jensen-Shannon Divergence
Signup and view all the flashcards
Clustering
Clustering
Signup and view all the flashcards
Dimensionality Reduction
Dimensionality Reduction
Signup and view all the flashcards
Generative Models
Generative Models
Signup and view all the flashcards
K-means Clustering
K-means Clustering
Signup and view all the flashcards
K-Means Assumption
K-Means Assumption
Signup and view all the flashcards
K-Means Limitations
K-Means Limitations
Signup and view all the flashcards
Image Segmentation
Image Segmentation
Signup and view all the flashcards
Self-Supervised Learning
Self-Supervised Learning
Signup and view all the flashcards
Study Notes
Learning Goals for Lecture 1
- Computer vision aims to build artificial systems that process, perceive, and reason about visual data
- CV or Computer Vision assists in understanding perception
- Computer Vision intersects with Artificial Intelligence and Machine Learning, focusing on deep learning in the class
- The lecture mentions the history of computer vision, main focus areas, and the focus of the course
General Course Information
- The course includes 2 lectures and 2 tutorials per week.
- Lecturers of the course are Bojana Rosic and Matthias Feinaeugle.
- Tutorials are conducted by Merit Fernhout and Vasos Arnaoutis.
- Lecture format includes slides.
- Tutorials involve solving assignments and asking questions to teachers.
- Programming language for the course is Python with TensorFlow.
- Assignments consist of 4 homeworks.
- Students need to pass 60% of the total points for all assignments.
- Exam is a written exam that includes both a theory and a coding component.
- AI bots or automated programs cannot be used for coding/writing reports.
History of Computer Vision
- Rosenblatt created the Mark I Perceptron in 1958, which is a 3 layered perceptron network.
- It had an array of 400 photocells in a 20x20 grid called "sensory units", or "input retina".
- Each sensory unit, can connect to up to 40 "association units".
- The hidden layer consisted of 512 perceptrons called "association units".
- The output layer had 8 perceptrons called "response units".
- Hubel and Wiesel, in 1959, studied the visual cortex and identified:
- Simple cells: which respond to specific light orientation.
- Complex cells: which respond to light orientation and movement, even if the bar is not in the same position.
- Hypercomplex cells: which respond to movement and length of the bar.
- Larry Roberts introduced perception of 3D solids in 1963.
- Marr et al. made contributions in the 1970s with a 2 1/2 D sketch and a 3-D model.
- The Hough transform was developed in 1972.
- Canny Edge detection came about in the 1980s.
- The Neocognitron was developed in the 1980s by Kunihiko Fukushima
- It is a hierarchical, multilayered artificial neural network introduced with convolutional and complex cells.
- No practical training algorithm.
- Automatix Inc. was founded in January 1980 as the first company to market industrial robots with machine vision.
- In the 1990s, recognition by grouping used Gestalt Principles.
Computer Vision applications using Deep Learning
- AlexNet: demonstrated deep learning applications.
- Deep Dream Generator became available.
- Image acquisition for computer vision is typical using multiscale content independent feature fusion networks for source camera identification
- Data Preprocessing is also typical
- Feature extraction: The third phase of CV
- High-Level Understanding-The final stage of the process is high-level understanding.
- Image Classification is used for tasks with specific inputs and outputs, such as image, cat and dog outputs
- Object detection detects women, cars, and traffic lights
- Semantic segmentation applies Road, Sidewalk, Pole, Vegetation, Building, Vehicle, Fence, Unlabel
More Computer Vision tasks
- Instance Segmentation is used
- Semantic Segmentation
- Instance Segmentation
- Objects have detection with a unique segmentation
- Training tasks include
- Zero-Shot Learning
- Attributed Features
- Unseen Class
- Training process
- Image Description
- Other tasks include
- Image Generation
- Online Deepfake Maker
- 3D vision (object reconstruction)
- 2D recognition
- 3D localization
- 3D voxel patterns
Syllabus Overview
- This is an overview of the syllabus topics
-
- Introduction to Computer Vision
-
- Basics of Optics
-
- Convolutional deep Neural Networks
-
- Recurrent deep Neural Networks and Hybrid Architectures
-
- Detection and Segmentation
-
- Zero Shot and Few Shot Learning
-
- Autoencoders and Generative Neural Networks
-
- 3D Vision and Design
Course Conclusions
- Computer Vision is a combination of:
- CV (Computer Vision
- ML (Machine Learning)
- Project
- RL (? What is this)
Conclusions of the Course
- definition of computer vision (CV)
- relationship of CV to AI
- evolution of CV
- types of CV tasks
- The course will start with basic idea of image acquisition
Learning Goals for Lecture 2
- This lecture covers: image representation, feature extraction, statistical and similarity features, and filtering
Classification Motivation for Feature Selection
- Classification is a task where a model categorizes input data into predefined classes.
- Statistical features, structural features, transformation features, filtering-based features, and learning-based features, can inform the model
Qualities of Effective Features
- Linear Separability: Features enabling easy separation of classes, ideal for unsupervised learning.
- Distinctiveness and Compactness: A minimal set of features effectively differentiating classes.
- Invariance: Features that remain stable under common transformations, like scaling, rotation, and illumination changes.
- Robustness: Features resilient to noise, distortions, and minor input data variations.
Feature Extraction Methods
- Statistical Features: Extracted from pixel intensity properties (mean, variance, skewness, GLCM).
- Structural Features: Based on geometric properties such as edges, contours, and object relationships.
- Transformation Features: Derived from transformed image representations like Fourier and Wavelet Transforms, or PCA.
- Filtering-Based Features: Obtained via convolution with filters (Gabor, edge detectors).
- Learning-Based Features: Learned automatically using machine learning or deep learning.
Pixels as Image Units
- A pixel, derived from "picture element", is the smallest programmable color unit on a display or in an image.
- Resolution is determined by the number of pixels; PPI is pixels per inch, while DPI is dots per inch. PPI and DPI are generally equivalent, though smaller resolution photos can have more than one dot.
- Each pixel in the image comprises a subpixel that emits red, green, and blue (RGB) color,
- Varying intensities of red, green, and blue light combine using an additive color mixing method, which is determined by the human brain
Bit Depth and Color Representation
- Bits per pixel (bpp) indicates the number of bits representing a pixel's color or grayscale value, determining the range of representable colors/shades.
- 1 bpp: Represents only two colors, black or white.
- 8 bpp: It can show 256 shades or colors and uses grayscale or indexed colors.
- 24 bpp: Standard for full RGB color, with 8 bits each for red, green, and blue.
- Higher bpp improves color representation and image quality.
- The image shows that an Image is represented as a grid (matrix) of intensity values.
Statistical Feature Analysis
- Images can be classified based on higher order moments and distributions.
- Mean is a measure of central tendency and provides insights into the average pixel intensity.
- The formula is μI = 1/N Σ Σ I(x,y)
- Standard deviation (σ) is calculated by √ 1/N Σ Σ (I(x, y) – μI)²
- This metric provides insight into the spread or variability of pixel intensities around the mean
- Skewess (γ) is calculated by 1/No³ Σ Σ (I(x,y) – µI)³ kurtosis (κ) is calculated by 1/No Σ Σ (I(x, y) – µI) – 3 These calculations involve a normalized sum of pixel intensity deviations from the mean, with adjustments, across the image. Different data sets return different distributions (Girls, Opera, VLSI), see figure.
Statistical Analysis of Univariate Features
- Histograms illustrate color composition and pixel counts for each color.
- Autocorrelation measures a signal's similarity to a time-delayed version of itself.
- Image autocorrelation involves comparing the similarity of the image with itself after shifting it by certain mounts in each dimension.
- It's goal to determine detect repetitive pixel structures
- Cross correlation: two images, the cross-correlation is defined as:
- C(x.y) M-1 N-1 EEI(x + iz + j) · т(i.j)
- Is used to give an assessment of similarity between two images
Cross Correlation & padding
Adding values on the side to extend the initial matrix to an algorithm that works with more data, or to increase algorithm precision by supplying more peripheral inputs
Cross-Correlation as Convolution
- The previous formula describes cross-correlation, also know as kernel.
- It could be used as convolution, as in the formula
- Filtering features has this formula
- Tipped(i.j) = T(m 1 − 1, n -1 j)
- Filter known, kernel flip looks about the same
- The general formula: (I * T)(x,y) is known as convolution
Statistical features:
- The cross product of the second two terms gives:
- There are steps involved in applying the cross product to the terms
- In particular, the cross-correlation in is similar to impulse response of linear time invariant system.
General linear time invariant functions
- Has been further reduced
- With new input of impulse in an output.
- Has some features such as A) instantaneous B) memory term C)premonition term And D) memory/premonition
How Computer Vision Works through Statistical feature extraction
- The function can be known using original and shifted graphs with time and periods displayed on the x-axis, with functions shown as y
Lecture 3 Takeaways
- Al was not covered
- Basic properties about waves were discussed
- It includes some components that make up modern optics like scalar diffraction or electromagnetic radiation
- A look at "De Brolie " wavelengths used in optics
- Wave function as an example of modern optical wavelength
- Interference and Interferece (2)
- Amplitude of an electromagnetic wave at specific time to
Lecture 3: FRAUHNOFERs DIFFRACTION
- Discusses the fraction based on the 650nm of the laser
- Discusses several things used in this fraction
- Includes refraction and relection
- Finally looks at what cannot be covered including : -Light and Vision
- Propagation of Light
- A few of these topics in general
- Several topics of discussion are mentioned
- End links to further readings to read
Learning Goals for Lecture 4
- Transforming features
- Relationship to convolution
- Re-cap: MLP network
- Extend: Convolutional NN network
- Different forms of CNN
Re-Cap of Last 2 Lectures
- Images are a grid of intensity values
- These values often range from 1-256 and are measured in bytes
- Recap of cross correlation: The formula is presented again with emphasis on it determining similarity
- Convolution is then defined with focus not flipping
- This is an overview of statistical image data, including filtering techniques
Image Transformation
- Discusses a basic transformation from wave length to frequency
- A transformation is more than what meets the intensity or strength
Fourier transform
- Several formulas are presented: F(w) = f f(x)e-jwx dx
- Also looks at some real and unreal functions
- Several things are discussed then there some other types of correlation like : Statistical features: autocorrelation • PADD WITH ZEROS: An example is used to support his
Autocorrelation functions
- Autocorrelation function measures similarity of signals
- F(7) = f · F(7)
- Is used to measure similarity of original signal and shifted one
- Gives Autocorrelation has same period as a signal
- Statistical features: autocorrelation T, Ty) = Σ I(x,y) · I(x + Tx, y + Ty) The autocorrelation function compares the similarity of the image with itself after shifting it by certain amounts in both the x- and y-directions.
- The goal is the detection of repetitive pixel structures
Back Propogation and Convolution
- We have a few equations
- Now to choose the filters • • We don't know what the filters should look like for any given problem. Filters are obtained automatically. This is done through the backpropagation
Al and learning of images
Al
- Learning involves a new image that is outputted to an encoder
- The image, such as a car is transformed into a classification
- Feature extraction is used to understand the data points into an output.
Overview of Convolutions
• Convolution layers are then used to
- Discover patterns
- Filter image
- Reduce input size
- No real defined way of knowing what filters to use • Intermix with machine learning such as deep learning networks are created And have layers made or are put from one source to another
Convolution NN (CNN)
- INPUT and FEATURE LEARNING and CLASSIFICATION This is a new method that we are classifying due data and using different methods to learn as its own classification.
- The feature map is used to out put an a Convolution NN
Learning goals for Lecture 5
- Time dependent signal approximation
- Meaning of MLP network
- Extension to residual net
- Extension to recurrent NN
- Hybrid architectures
What is a Re-cap:
- a grid (matrix) of intensity values
- There is the formula presented Rf(T) = f[i]f[i+] i+=0
- Is considered as a discrete lag or shift (which is an integer),
- Convolution is used with padding to understand numbers in fields with zeros and numbers are being mixed
- Here is the main formula (IT)(x, y) M¹NΣΣ i (x + iz + j) · tipped (i.j)
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This course introduces the fundamentals of computer vision. The key objectives, assignments, and exam formats are outlined. Rules regarding AI usage and important historical context are provided.