Recent Lessons

Show all results for ""

OpenCV and Python for Image Processing

OpenCV and Python for Image Processing

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following best describes the primary purpose of OpenCV?

Database management
Computer vision tasks (correct)
Web development
Operating system development

NumPy must be installed separately before installing OpenCV to ensure proper functioning of matrix operations.

False (B)

To read an image file into OpenCV, the function `cv.______()` should be used.

imread

What is the common cause of a '-215 assertion failed' error when working with media files in OpenCV?

<p>The most common cause is that OpenCV cannot locate the specified media file, either due to an incorrect file path or the file not existing at the given location.</p> Signup and view all the answers

Match the `cv.flip()` flip codes with their corresponding actions:

<p>0 = Vertically flip the image 1 = Horizontally flip the image -1 = Vertically and horizontally flip the image</p> Signup and view all the answers

Which blurring technique is most effective at reducing 'salt and pepper' noise while preserving edges to a reasonable extent, compared to averaging or Gaussian blur?

<p>Median Blur (C)</p> Signup and view all the answers

Explain the fundamental difference between 'contours' and 'edges' in the context of image processing, as discussed in the material.

<p>Edges are points of significant intensity change in an image, often representing boundaries between regions. Contours, on the other hand, are the boundaries of objects themselves, formed by joining continuous edge points. Contours are object-based, while edges are pixel-based.</p> Signup and view all the answers

In `cv.GaussianBlur()`, the parameter that directly controls the standard deviation of the Gaussian kernel in the X direction is known as ______.

<p>sigmaX</p> Signup and view all the answers

To convert a grayscale image directly to HSV color space in OpenCV, which of the following sequences of conversions is required?

<p>Grayscale -> BGR -> HSV (C)</p> Signup and view all the answers

Bilateral blur utilizes a kernel size parameter, similar to averaging, Gaussian, and median blur, to define the area of pixels considered for blurring.

<p>False (B)</p> Signup and view all the answers

Flashcards

What is OpenCV?

A computer vision library supporting Python, C++, and Java, used to extract insights from images and videos using deep learning.

`pip install opencv-contrib-python`

Installs the main OpenCV package with contribution modules, along with NumPy for scientific computing.

What does `cv.imread()` do?

Reads an image from the specified path into a matrix of pixels.

Function of `cv.imshow()`

Displays an image in a new window, given a window name and the image matrix.

Signup and view all the flashcards

What does `capture.read()` do?

Reads the video frame by frame, returning the frame and a boolean indicating success.

Signup and view all the flashcards

What is Rescaling?

Modifies the height and width of a video or image, often to reduce computational strain.

Signup and view all the flashcards

Creating a blank image

Creates a blank image (NumPy array) with specified dimensions and color channels.

Signup and view all the flashcards

Drawing shapes on images

Draws a filled shape on an image using specified parameters like start point, end point, color, and thickness.

Signup and view all the flashcards

Converting to grayscale

Converts an image to grayscale, showing pixel intensity distribution rather than color.

Signup and view all the flashcards

What is Translation?

Shifts an image along the x and y axes using a translation matrix and cv.warpAffine.

Signup and view all the flashcards

Study Notes

Course Overview

Introduction to Python and OpenCV for image and video processing.
Covers basics like reading/manipulating media files, image transformations, drawing shapes, and adding text.
Progresses to advanced topics: color spaces, bitwise operations, masking, histograms, edge detection, and thresholding.
Concludes with face detection, face recognition, and a deep computer vision model for classifying Simpsons characters.
All materials will be available on GitHub.

Introduction to OpenCV

OpenCV is a computer vision library available in Python, C++, and Java.
Computer vision uses deep learning to extract insights from images and videos.
Python 3.7 or higher is required.
NumPy is a scientific computing package in Python used for matrix and array manipulations.

Package Installation

pip install opencv-contrib-python: installs the main OpenCV package with contribution modules.
NumPy is automatically installed alongside OpenCV, and facilitates scientific computing, especially for matrix and array related tasks.
pip install seer: installs a package of utility functions designed to speed up computer vision workflows.
Its usage is later in the course, specifically towards building the deep computer vision model.

Reading Images in OpenCV

Use cv.imread() to read an image: img = cv.imread('photos/cat.jpg').
Takes the image path as input and returns the image as a matrix of pixels.
Use cv.imshow() to display an image in a new window.
Requires the window name and the image matrix as parameters: cv.imshow('Cat', img).
cv.waitKey(0) is a keyboard binding function that waits indefinitely for a key press.

Handling Large Images

OpenCV may struggle with displaying images larger than the monitor's resolution.
There isn't an inbuilt way to deal with this
Resizing and rescaling techniques can help mitigate this issue, which are covered in the next video.

Reading Videos in OpenCV

Read videos using cv.VideoCapture(): capture = cv.VideoCapture('videos/dog.mp4').
Accepts an integer (0, 1, 2, etc.) for webcams or the video file path.
0 typically references the default webcam.
Reading videos requires a loop to process frames sequentially.

Video Processing Loop

capture.read() reads the video frame by frame, returning the frame and a boolean indicating success.
cv.imshow() displays each frame in a window: cv.imshow('Video', frame).
cv.waitKey(20) waits 20 milliseconds for a key press; 0xFF == ord('d') checks if the 'd' key is pressed to break the loop.
capture.release() releases the capture device.
cv.destroyAllWindows() closes all OpenCV windows.

Error Handling

A negative 215 assertion failed error usually indicates that OpenCV can't find the specified media file.
This error can occur when a video runs out of frames or if an incorrect file path is provided.

Resizing and Rescaling Frames

Resizing and rescaling reduce computational strain associated with large media files.
Rescaling involves modifying the height and width of a video or image.
It's generally recommended to downscale to values smaller than the original dimensions.

Rescaling Function

rescale_frame(frame, scale=0.75) rescales a frame by a given factor.
Calculates new width and height based on the scale factor: width = int(frame.shape[1] * scale) and height = int(frame.shape[0] * scale).
Uses cv.resize() to resize the frame to the new dimensions, with cv.INTER_AREA interpolation, dimensions = (width, height).

Alternative Resizing Method

capture.set(propertyId, value) alters video capture properties.
Property IDs 3 and 4 correspond to width and height, respectively.
change_res(width, height) sets width and height using capture.set(3, width) and capture.set(4, height).
Only works for live video feeds.

Drawing on Images

Use np.zeros() to create a blank image: blank = np.zeros((500, 500, 3), dtype='uint8').
(500, 500) is the image size, 3 represents the color channels (BGR), and uint8 is the data type.
Painting an image involves setting pixel values: blank[:] = 0, 255, 0 paints the entire image green.
Specifying a range of pixels allows coloring specific regions: blank[200:300, 300:400] = 0, 0, 255 creates a red square.

Drawing a Rectangle

Use cv.rectangle() to draw a rectangle: cv.rectangle(blank, (0, 0), (250, 250), (0, 255, 0), thickness=2).
Parameters include the image, starting point, ending point, color, and thickness.
To fill the rectangle, set thickness=cv.FILLED or thickness=-1.
The starting point is the top-left corner, the ending point is the bottom-right corner.
Use img.shape[1] // 2 and img.shape[0] // 2 to draw a rectangle in the center.

Drawing Shapes and Text on Images with OpenCV

cv.rectangle draws rectangles, requiring the image, two points (top-left and bottom-right corners), color in BGR format, and thickness.
Negative thickness fills the rectangle.
cv.circle draws circles, needing the image, center coordinates, radius, color, and thickness.
A negative thickness fills the circle.
cv.line draws lines, requiring the image, start point, end point, color, and thickness.
cv.putText writes text on images, taking the image, text, origin (bottom-left corner of the text), font face (e.g., cv.FONT_HERSHEY_TRIPLEX), font scale, color, and thickness.
Origin specifies the starting point for the text.
OpenCV has built-in fonts like Hershey Triplex.
Adjust the origin or margins to handle text that goes off-screen.

Basic Functions in OpenCV

Converting an image to grayscale is done using cv.cvtColor with the cv.COLOR_BGR2GRAY color code. It shows intensity distribution of pixels rather than color.
Blurring an image reduces noise, and Gaussian Blur is a common technique in computer vision.
cv.GaussianBlur requires the image, a kernel size (odd number tuple like (3, 3) or (7, 7)), and cv.BORDER_DEFAULT.
A larger kernel size increases blurring.
Edge cascade identifies edges; the Canny edge detector is a multi-step process with blurring and gradient computations.
cv.Canny takes the image and two threshold values as arguments.
Blurring the image before finding edges can reduce the number of edges found.
Dilation expands edges using cv.dilate, taking the structuring element (like Canny edges), a kernel size, and the number of iterations.
Erosion shrinks dilated edges using cv.erode, requiring dilated image, kernel size, and iterations; it can restore the original edge cascade if parameters match dilation.
Resizing images uses cv.resize with the image and destination size.
Default interpolation is cv.INTER_AREA (good for shrinking).
Use cv.INTER_LINEAR or cv.INTER_CUBIC for enlarging images and achieving better quality, cv.INTER_CUBIC is slowest but yields better images.
Cropping images uses array slicing on pixel values in the image array.

Image Transformations in OpenCV

Translation shifts an image along the x and y axes. Shifting the image up, down, left, right, or do a combination.
Requires constructing a translation matrix using NumPy
Negative x values translate the image to the left, and negative y values translate up.
Positive x values translate the image to the right, and positive y values translate down.
Use cv.warpAffine to apply this transformation
Rotation rotates an image around a specified point.
cv.getRotationMatrix2D creates the rotation matrix, taking the rotation point, angle, and scale.
Rotation point defaults to the image center if no rotation point is specified
cv.warpAffine applies the rotation.
Positive angles rotate images counterclockwise, while negative rotate clockwise.
Resizing adjusts the dimensions of image to enlarge or reduce.
cv.resize requires the image and the desired destination size.
Interpolation method (cv.INTER_AREA, cv.INTER_LINEAR, cv.INTER_CUBIC) affects the final image quality and cv.INTER_CUBIC produces best quality for enlarging.
Cropping selects a portion of the image using array slicing based on pixel values, allowing to extract regions based on pixel ranges.

Image Enlarging

Use inter linear, or the dansko cubic to enlarge images.
Dansk cubic is slower, but the resulting image is better with high quality.

Flipping Images

Requires the cv.flip function.
Requires an image and a flip code as input.
Flip code options:
- 0: Flips the image vertically (over the x-axis).
- 1: Flips the image horizontally (over the y-axis).
- -1: Flips the image both vertically and horizontally.

Cropping Images

Achieved through array slicing.
Specify the pixel ranges for height and width to define the cropped region.

Image Transformations Covered

Translation
Rotation
Resizing
Flipping
Cropping

Contours

Contours are basically the boundaries of objects.
A line or curve that joins the continuous points along the boundary of an object
Contours are useful tools in shape analysis, object detection, and recognition.
From a mathematical point of view, contours and edges are different things.

Finding Contours

Involves converting the image to grayscale.
Grabbing the edges of the image using the canny edge detector.
Using the findContours method.
The findContours method returns contours and heirarchies.
The CBO fund contours method looks at the structuring element for edges and returns two values
Contours is essentially a python list of all the coordinates of the contours that were found in the image
Hierarchies refers to the hierarchical representation of contours
Reto external retrieves only the external contours to all the ones on the outside, it returns those.
Reto tree returns all the hierarchical contours
CV dot chain approx none method does nothing and returns the contours
Chain approx simple compresses all the quantities that are returned in the simple ones that make most sense
Blurring the image before finding edges reduces the number of contours detected.

Threshold

Another function to find contours instead of Kenny.
Threshold essentially looks at an image and tries to binarize that image
if a particular pixel is below 125, if the density of that pixel is below 125, it's going to be set to zero or blank
If it is above 125, it is set to white or two by five
Thresholding attempts to binarize an image, take an image and convert it into binary form that is either zero or black, or white, or to Vi five

Visualizing Contours

Contours can be visualized by drawing over the image.
Use the cv.drawContours method for this.
cv.drawContours takes the image to draw over, the contours (as a list), a contour index (negative one to draw all), color, and thickness as inputs.

Color Spaces

A space of colors, a system of representing an array of pixel colors.
RGB is a kind of space, grayscale is color space, and there are also other color spaces like HSV, lamb and many more.

Converting to Grayscale

The way to do it is by saying grey is equal to CV dot CBT color.
Pass in the image and specify a color code, which is CV dot color, underscore BGR to grip since we're converting from a BGR image format to grayscale format.

HSV

HSV is Hue Saturation Value and is kind of based on how humans think and conceive of color
Convert from BGR to HSV by saying HSV is equal to CV dot CBT color, pass in the IMG variable, and specify a color code which is CV dot color, undergo BGR to HSV.

LAB

The lb is equal to CV dot CVT color, we pass the MG and the color on the scope of BGR.to AB

BGR vs RGB

OpenCV reads images in BGR format (Blue, Green, Red).
Outside of OpenCV, the RGB format is more commonly used.
Displaying a BGR image in a library expecting RGB can lead to color inversions.

Converting Between BGR and RGB

Use cv.cvtColor with the appropriate color code (cv.COLOR_BGR2RGB or cv.COLOR_RGB2BGR).
Be mindful of color inversions when working with different libraries.

Color Space Conversion Limitations

Grayscale images cannot be directly converted to HSV.
Convert grayscale to BGR first, then BGR to HSV

HSV to BGR Conversion

Converting from HSV to BGR uses CV dot CVT color in OpenCV.
The color code used for the conversion is color on Cisco HSV, two BGR.

Color Space Conversions

Color spaces from BGR can be converted to grayscale, HSV, LGB, and RGB.
There's no direct method to convert grayscale to LAB.
To convert grayscale to LAB, convert grayscale to BGR first, then BGR to LAB.

Splitting and Merging Color Channels

A color image consists of multiple channels (red, green, and blue in BGR or RGB images).
OpenCV allows splitting an image into its respective color channels (e.g., BGR into blue, green, and red components).
CV dot split splits an image into blue, green, and red channels.
The shapes of the blue, green, and red components do not show a three because the shape of that component is one.
Components are displayed as grayscale images showing pixel intensity distribution.
Lighter regions show a higher concentration of pixel values.
Darker regions represent little or no pixels in that region.
The additional element in the tuple represents the number of color channels.
Grayscale images have a shape of one.
CV dot merge merges color channels together by passing in a list of the channels.
To display the actual color in each channel, reconstruct the image using NumPy to create a blank image with the height and width, but not the number of colour channels.
Red, green, and blue can be merged to get back the original image.

Smoothing and Blurring Techniques

Smoothing and blurring are used to reduce noise in images caused by camera sensors or lighting issues.
A kernel or window is drawn over an image, and something happens to the pixels in this window.
Kernel size is the number of rows and columns in the kernel.
Blur is applied to the middle pixel as a result of the surrounding pixels..

Averaging Blur

Averaging blur computes the pixel intensity of the center pixel as the average of the surrounding pixel intensities.
CV dot blur is used to apply averaging blur.
The source image and kernel size are defined
Higher kernel sizes result in more blur.

Gaussian Blur

Gaussian blur is similar to averaging, but each surrounding pixel is given a particular weight.
The average of the products of those weights gives you the value for the true center.
Gaussian blur results in less blurring than averaging.
Considered more natural than averaging
CV dot Gaussian Blur is the code used to apply Gaussian Blur
Needs the source image, kernel size and sigma x or standard deviation to be defined

Median Blur

Median blurring finds the median of the surrounding pixels instead of the average.
More effective in reducing noise, especially salt and pepper noise.
Commonly used in advanced computer vision projects needing substantial noise reduction.
It is important not to use high kernel sizes as you end up with a "washed up smudged version" of the image.
CV dot median blue is the relevant code
The kernal size is passed in an integer as it automatically takes into account the hight and width

Bilateral Blurring

Bilateral blurring applies blurring while retaining edges in the image.
It does this be looking at whether the blurring is reducing edges or not.
CV dot bilateral filter is used to apply bilateral blurring.
Parameters include the image, diameter of the pixel neighborhood, sigmaColor (color sigma), and sigmaSpace (space sigma).
sigmaColor: A larger value means more colors in the neighborhood are considered.
sigmaSpace: Larger values mean pixels further from the center influence the calculation.
Higher sigmaSpace values can cause the image to look blurred and smudged.
The diameter is the diameter of the pixel neighborhood not a kernel size.

Bitwise Operators

Four basic bitwise operators: AND, OR, XOR, and NOT.
Used in image processing, especially with masks.
Operate in a binary manner: a pixel is turned off if it has a value of zero, and is turned on if it has a value of one.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Kiel uzi HSV-spacon por detekti ruĝan objekton en bildo per Python kaj OpenCV

5 questions

Kiel uzi HSV-spacon por detekti ruĝan objekton en bildo per Python kaj...

PopularMookaite

Color Spaces and OpenCV

30 questions

Color Spaces and OpenCV

EnticingEcstasy

Object Detection with OpenCV

5 questions

Object Detection with OpenCV

BlamelessLaplace

OpenCV Basics and Image Loading

37 questions

OpenCV Basics and Image Loading

WieldyHazel128

Use Quizgecko on...

Browser