Podcast
Questions and Answers
Which of the following best describes the primary purpose of OpenCV?
Which of the following best describes the primary purpose of OpenCV?
- Database management
- Computer vision tasks (correct)
- Web development
- Operating system development
NumPy must be installed separately before installing OpenCV to ensure proper functioning of matrix operations.
NumPy must be installed separately before installing OpenCV to ensure proper functioning of matrix operations.
False (B)
To read an image file into OpenCV, the function cv.______()
should be used.
To read an image file into OpenCV, the function cv.______()
should be used.
imread
What is the common cause of a '-215 assertion failed' error when working with media files in OpenCV?
What is the common cause of a '-215 assertion failed' error when working with media files in OpenCV?
Match the cv.flip()
flip codes with their corresponding actions:
Match the cv.flip()
flip codes with their corresponding actions:
Which blurring technique is most effective at reducing 'salt and pepper' noise while preserving edges to a reasonable extent, compared to averaging or Gaussian blur?
Which blurring technique is most effective at reducing 'salt and pepper' noise while preserving edges to a reasonable extent, compared to averaging or Gaussian blur?
Explain the fundamental difference between 'contours' and 'edges' in the context of image processing, as discussed in the material.
Explain the fundamental difference between 'contours' and 'edges' in the context of image processing, as discussed in the material.
In cv.GaussianBlur()
, the parameter that directly controls the standard deviation of the Gaussian kernel in the X direction is known as ______.
In cv.GaussianBlur()
, the parameter that directly controls the standard deviation of the Gaussian kernel in the X direction is known as ______.
To convert a grayscale image directly to HSV color space in OpenCV, which of the following sequences of conversions is required?
To convert a grayscale image directly to HSV color space in OpenCV, which of the following sequences of conversions is required?
Bilateral blur utilizes a kernel size parameter, similar to averaging, Gaussian, and median blur, to define the area of pixels considered for blurring.
Bilateral blur utilizes a kernel size parameter, similar to averaging, Gaussian, and median blur, to define the area of pixels considered for blurring.
Flashcards
What is OpenCV?
What is OpenCV?
A computer vision library supporting Python, C++, and Java, used to extract insights from images and videos using deep learning.
pip install opencv-contrib-python
pip install opencv-contrib-python
Installs the main OpenCV package with contribution modules, along with NumPy for scientific computing.
What does cv.imread()
do?
What does cv.imread()
do?
Reads an image from the specified path into a matrix of pixels.
Function of cv.imshow()
Function of cv.imshow()
Signup and view all the flashcards
What does capture.read()
do?
What does capture.read()
do?
Signup and view all the flashcards
What is Rescaling?
What is Rescaling?
Signup and view all the flashcards
Creating a blank image
Creating a blank image
Signup and view all the flashcards
Drawing shapes on images
Drawing shapes on images
Signup and view all the flashcards
Converting to grayscale
Converting to grayscale
Signup and view all the flashcards
What is Translation?
What is Translation?
Signup and view all the flashcards
Study Notes
Course Overview
- Introduction to Python and OpenCV for image and video processing.
- Covers basics like reading/manipulating media files, image transformations, drawing shapes, and adding text.
- Progresses to advanced topics: color spaces, bitwise operations, masking, histograms, edge detection, and thresholding.
- Concludes with face detection, face recognition, and a deep computer vision model for classifying Simpsons characters.
- All materials will be available on GitHub.
Introduction to OpenCV
- OpenCV is a computer vision library available in Python, C++, and Java.
- Computer vision uses deep learning to extract insights from images and videos.
- Python 3.7 or higher is required.
- NumPy is a scientific computing package in Python used for matrix and array manipulations.
Package Installation
pip install opencv-contrib-python
: installs the main OpenCV package with contribution modules.- NumPy is automatically installed alongside OpenCV, and facilitates scientific computing, especially for matrix and array related tasks.
pip install seer
: installs a package of utility functions designed to speed up computer vision workflows.- Its usage is later in the course, specifically towards building the deep computer vision model.
Reading Images in OpenCV
- Use
cv.imread()
to read an image:img = cv.imread('photos/cat.jpg')
. - Takes the image path as input and returns the image as a matrix of pixels.
- Use
cv.imshow()
to display an image in a new window. - Requires the window name and the image matrix as parameters:
cv.imshow('Cat', img)
. cv.waitKey(0)
is a keyboard binding function that waits indefinitely for a key press.
Handling Large Images
- OpenCV may struggle with displaying images larger than the monitor's resolution.
- There isn't an inbuilt way to deal with this
- Resizing and rescaling techniques can help mitigate this issue, which are covered in the next video.
Reading Videos in OpenCV
- Read videos using
cv.VideoCapture()
:capture = cv.VideoCapture('videos/dog.mp4')
. - Accepts an integer (0, 1, 2, etc.) for webcams or the video file path.
0
typically references the default webcam.- Reading videos requires a loop to process frames sequentially.
Video Processing Loop
capture.read()
reads the video frame by frame, returning the frame and a boolean indicating success.cv.imshow()
displays each frame in a window:cv.imshow('Video', frame)
.cv.waitKey(20)
waits 20 milliseconds for a key press;0xFF == ord('d')
checks if the 'd' key is pressed to break the loop.capture.release()
releases the capture device.cv.destroyAllWindows()
closes all OpenCV windows.
Error Handling
- A negative 215 assertion failed error usually indicates that OpenCV can't find the specified media file.
- This error can occur when a video runs out of frames or if an incorrect file path is provided.
Resizing and Rescaling Frames
- Resizing and rescaling reduce computational strain associated with large media files.
- Rescaling involves modifying the height and width of a video or image.
- It's generally recommended to downscale to values smaller than the original dimensions.
Rescaling Function
rescale_frame(frame, scale=0.75)
rescales a frame by a given factor.- Calculates new width and height based on the scale factor:
width = int(frame.shape[1] * scale)
andheight = int(frame.shape[0] * scale)
. - Uses
cv.resize()
to resize the frame to the new dimensions, withcv.INTER_AREA
interpolation,dimensions = (width, height)
.
Alternative Resizing Method
capture.set(propertyId, value)
alters video capture properties.- Property IDs
3
and4
correspond to width and height, respectively. change_res(width, height)
sets width and height usingcapture.set(3, width)
andcapture.set(4, height)
.- Only works for live video feeds.
Drawing on Images
- Use
np.zeros()
to create a blank image:blank = np.zeros((500, 500, 3), dtype='uint8')
. (500, 500)
is the image size,3
represents the color channels (BGR), anduint8
is the data type.- Painting an image involves setting pixel values:
blank[:] = 0, 255, 0
paints the entire image green. - Specifying a range of pixels allows coloring specific regions:
blank[200:300, 300:400] = 0, 0, 255
creates a red square.
Drawing a Rectangle
- Use
cv.rectangle()
to draw a rectangle:cv.rectangle(blank, (0, 0), (250, 250), (0, 255, 0), thickness=2)
. - Parameters include the image, starting point, ending point, color, and thickness.
- To fill the rectangle, set
thickness=cv.FILLED
orthickness=-1
. - The starting point is the top-left corner, the ending point is the bottom-right corner.
- Use
img.shape[1] // 2
andimg.shape[0] // 2
to draw a rectangle in the center.
Drawing Shapes and Text on Images with OpenCV
cv.rectangle
draws rectangles, requiring the image, two points (top-left and bottom-right corners), color in BGR format, and thickness.- Negative thickness fills the rectangle.
cv.circle
draws circles, needing the image, center coordinates, radius, color, and thickness.- A negative thickness fills the circle.
cv.line
draws lines, requiring the image, start point, end point, color, and thickness.cv.putText
writes text on images, taking the image, text, origin (bottom-left corner of the text), font face (e.g.,cv.FONT_HERSHEY_TRIPLEX
), font scale, color, and thickness.- Origin specifies the starting point for the text.
- OpenCV has built-in fonts like Hershey Triplex.
- Adjust the origin or margins to handle text that goes off-screen.
Basic Functions in OpenCV
- Converting an image to grayscale is done using
cv.cvtColor
with thecv.COLOR_BGR2GRAY
color code. It shows intensity distribution of pixels rather than color. - Blurring an image reduces noise, and Gaussian Blur is a common technique in computer vision.
cv.GaussianBlur
requires the image, a kernel size (odd number tuple like (3, 3) or (7, 7)), andcv.BORDER_DEFAULT
.- A larger kernel size increases blurring.
- Edge cascade identifies edges; the Canny edge detector is a multi-step process with blurring and gradient computations.
cv.Canny
takes the image and two threshold values as arguments.- Blurring the image before finding edges can reduce the number of edges found.
- Dilation expands edges using
cv.dilate
, taking the structuring element (like Canny edges), a kernel size, and the number of iterations. - Erosion shrinks dilated edges using
cv.erode
, requiring dilated image, kernel size, and iterations; it can restore the original edge cascade if parameters match dilation. - Resizing images uses
cv.resize
with the image and destination size. - Default interpolation is
cv.INTER_AREA
(good for shrinking). - Use
cv.INTER_LINEAR
orcv.INTER_CUBIC
for enlarging images and achieving better quality,cv.INTER_CUBIC
is slowest but yields better images. - Cropping images uses array slicing on pixel values in the image array.
Image Transformations in OpenCV
- Translation shifts an image along the x and y axes. Shifting the image up, down, left, right, or do a combination.
- Requires constructing a translation matrix using NumPy
- Negative x values translate the image to the left, and negative y values translate up.
- Positive x values translate the image to the right, and positive y values translate down.
- Use
cv.warpAffine
to apply this transformation - Rotation rotates an image around a specified point.
cv.getRotationMatrix2D
creates the rotation matrix, taking the rotation point, angle, and scale.- Rotation point defaults to the image center if no rotation point is specified
cv.warpAffine
applies the rotation.- Positive angles rotate images counterclockwise, while negative rotate clockwise.
- Resizing adjusts the dimensions of image to enlarge or reduce.
cv.resize
requires the image and the desired destination size.- Interpolation method (
cv.INTER_AREA
,cv.INTER_LINEAR
,cv.INTER_CUBIC
) affects the final image quality andcv.INTER_CUBIC
produces best quality for enlarging. - Cropping selects a portion of the image using array slicing based on pixel values, allowing to extract regions based on pixel ranges.
Image Enlarging
- Use inter linear, or the dansko cubic to enlarge images.
- Dansk cubic is slower, but the resulting image is better with high quality.
Flipping Images
- Requires the
cv.flip
function. - Requires an image and a flip code as input.
- Flip code options:
0
: Flips the image vertically (over the x-axis).1
: Flips the image horizontally (over the y-axis).-1
: Flips the image both vertically and horizontally.
Cropping Images
- Achieved through array slicing.
- Specify the pixel ranges for height and width to define the cropped region.
Image Transformations Covered
- Translation
- Rotation
- Resizing
- Flipping
- Cropping
Contours
- Contours are basically the boundaries of objects.
- A line or curve that joins the continuous points along the boundary of an object
- Contours are useful tools in shape analysis, object detection, and recognition.
- From a mathematical point of view, contours and edges are different things.
Finding Contours
- Involves converting the image to grayscale.
- Grabbing the edges of the image using the canny edge detector.
- Using the
findContours
method. - The
findContours
method returns contours and heirarchies. - The CBO fund contours method looks at the structuring element for edges and returns two values
- Contours is essentially a python list of all the coordinates of the contours that were found in the image
- Hierarchies refers to the hierarchical representation of contours
- Reto external retrieves only the external contours to all the ones on the outside, it returns those.
- Reto tree returns all the hierarchical contours
- CV dot chain approx none method does nothing and returns the contours
- Chain approx simple compresses all the quantities that are returned in the simple ones that make most sense
- Blurring the image before finding edges reduces the number of contours detected.
Threshold
- Another function to find contours instead of Kenny.
- Threshold essentially looks at an image and tries to binarize that image
- if a particular pixel is below 125, if the density of that pixel is below 125, it's going to be set to zero or blank
- If it is above 125, it is set to white or two by five
- Thresholding attempts to binarize an image, take an image and convert it into binary form that is either zero or black, or white, or to Vi five
Visualizing Contours
- Contours can be visualized by drawing over the image.
- Use the
cv.drawContours
method for this. cv.drawContours
takes the image to draw over, the contours (as a list), a contour index (negative one to draw all), color, and thickness as inputs.
Color Spaces
- A space of colors, a system of representing an array of pixel colors.
- RGB is a kind of space, grayscale is color space, and there are also other color spaces like HSV, lamb and many more.
Converting to Grayscale
- The way to do it is by saying grey is equal to CV dot CBT color.
- Pass in the image and specify a color code, which is CV dot color, underscore BGR to grip since we're converting from a BGR image format to grayscale format.
HSV
- HSV is Hue Saturation Value and is kind of based on how humans think and conceive of color
- Convert from BGR to HSV by saying HSV is equal to CV dot CBT color, pass in the IMG variable, and specify a color code which is CV dot color, undergo BGR to HSV.
LAB
- The lb is equal to CV dot CVT color, we pass the MG and the color on the scope of BGR.to AB
BGR vs RGB
- OpenCV reads images in BGR format (Blue, Green, Red).
- Outside of OpenCV, the RGB format is more commonly used.
- Displaying a BGR image in a library expecting RGB can lead to color inversions.
Converting Between BGR and RGB
- Use
cv.cvtColor
with the appropriate color code (cv.COLOR_BGR2RGB
orcv.COLOR_RGB2BGR
). - Be mindful of color inversions when working with different libraries.
Color Space Conversion Limitations
- Grayscale images cannot be directly converted to HSV.
- Convert grayscale to BGR first, then BGR to HSV
HSV to BGR Conversion
- Converting from HSV to BGR uses
CV dot CVT color
in OpenCV. - The color code used for the conversion is
color on Cisco HSV, two BGR
.
Color Space Conversions
- Color spaces from BGR can be converted to grayscale, HSV, LGB, and RGB.
- There's no direct method to convert grayscale to LAB.
- To convert grayscale to LAB, convert grayscale to BGR first, then BGR to LAB.
Splitting and Merging Color Channels
- A color image consists of multiple channels (red, green, and blue in BGR or RGB images).
- OpenCV allows splitting an image into its respective color channels (e.g., BGR into blue, green, and red components).
CV dot split
splits an image into blue, green, and red channels.- The shapes of the blue, green, and red components do not show a three because the shape of that component is one.
- Components are displayed as grayscale images showing pixel intensity distribution.
- Lighter regions show a higher concentration of pixel values.
- Darker regions represent little or no pixels in that region.
- The additional element in the tuple represents the number of color channels.
- Grayscale images have a shape of one.
CV dot merge
merges color channels together by passing in a list of the channels.- To display the actual color in each channel, reconstruct the image using NumPy to create a blank image with the height and width, but not the number of colour channels.
- Red, green, and blue can be merged to get back the original image.
Smoothing and Blurring Techniques
- Smoothing and blurring are used to reduce noise in images caused by camera sensors or lighting issues.
- A kernel or window is drawn over an image, and something happens to the pixels in this window.
- Kernel size is the number of rows and columns in the kernel.
- Blur is applied to the middle pixel as a result of the surrounding pixels..
Averaging Blur
- Averaging blur computes the pixel intensity of the center pixel as the average of the surrounding pixel intensities.
CV dot blur
is used to apply averaging blur.- The source image and kernel size are defined
- Higher kernel sizes result in more blur.
Gaussian Blur
- Gaussian blur is similar to averaging, but each surrounding pixel is given a particular weight.
- The average of the products of those weights gives you the value for the true center.
- Gaussian blur results in less blurring than averaging.
- Considered more natural than averaging
CV dot Gaussian Blur
is the code used to apply Gaussian Blur- Needs the source image, kernel size and sigma x or standard deviation to be defined
Median Blur
- Median blurring finds the median of the surrounding pixels instead of the average.
- More effective in reducing noise, especially salt and pepper noise.
- Commonly used in advanced computer vision projects needing substantial noise reduction.
- It is important not to use high kernel sizes as you end up with a "washed up smudged version" of the image.
CV dot median blue
is the relevant code- The kernal size is passed in an integer as it automatically takes into account the hight and width
Bilateral Blurring
- Bilateral blurring applies blurring while retaining edges in the image.
- It does this be looking at whether the blurring is reducing edges or not.
CV dot bilateral filter
is used to apply bilateral blurring.- Parameters include the image, diameter of the pixel neighborhood,
sigmaColor
(color sigma), andsigmaSpace
(space sigma). sigmaColor
: A larger value means more colors in the neighborhood are considered.sigmaSpace
: Larger values mean pixels further from the center influence the calculation.- Higher
sigmaSpace
values can cause the image to look blurred and smudged. - The diameter is the diameter of the pixel neighborhood not a kernel size.
Bitwise Operators
- Four basic bitwise operators: AND, OR, XOR, and NOT.
- Used in image processing, especially with masks.
- Operate in a binary manner: a pixel is turned off if it has a value of zero, and is turned on if it has a value of one.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.