Introduction to OpenCV for Computer Vision PDF

Introduction to Prepared by: Mark O. Montances Understanding Image Processing vs Computer Vision Image Processing Computer Vision Scope A subset of signal processing, A broader field that builds focused specifically on on image processing but images. goes beyond it. Typically involves pixel-level Aims to replicate human manipulations to enhance, visual understanding and transform, or analyze the decision-making. visual data. Image Processing Computer Vision Scope A subset of signal processing, A broader field that builds focused specifically on on image processing but images. goes beyond it. Typically involves pixel-level Aims to replicate human manipulations to enhance, visual understanding and transform, or analyze the decision-making. visual data. Image Processing Computer Vision Purpose Improve image quality or Extract semantic, extract basic information for meaningful information to further processing. perform tasks like detection, recognition, and Usually doesn't involve prediction. "understanding" the content of the image. Image Processing Computer Vision Purpose Improve image quality or Extract semantic, extract basic information for meaningful information to further processing. perform tasks like detection, recognition, and Usually doesn't involve prediction. "understanding" the content of the image. Image Processing Computer Vision Examples Noise removal. Object detection (e.g., YOLO). Color corrections. Pose estimation (e.g., Basic segmentation. MediaPipe). Optical Character Recognition (OCR) Scene understanding. Image Processing Computer Vision Examples Noise removal. Object detection (e.g., YOLO). Color corrections. Pose estimation (e.g., Basic segmentation. MediaPipe). Optical Character Recognition (OCR) Scene understanding. Computer Vision is BIGGER ▪ Image Processing is often a necessary step in Computer Vision pipelines. ▪ Nowadays, CV integrates techniques from image processing but extends them with machine learning and deep learning for higher-level tasks. Why OpenCV? ▪ Popularity of OpenCV: ▪ Open-source, large community, cross-platform. ▪ Supports real-time applications with C++ optimization. ▪ Extensive library: Image processing, video handling, ML integration. ▪ When to Choose OpenCV: ▪ Real-time applications. ▪ Need for performance and advanced computer vision tasks. Basic Image handling in OpenCV # Import open CV import cv2 img = cv2.imread('image.jpg') # Load the image cv2.imshow('Display Window', img) # Show the image cv2.waitKey(0) # Wait for key press to close window cv2.destroyAllWindows() # Close all windows when done https://sl.bing.net/jFrkFOP09jU Basic Image handling in OpenCV # Import open CV import cv2 img = cv2.imread('image.jpg') # Load the image plt.imshow(img) plt.show() Cv2.imshow() vs plt.imshow() What happened? ▪ OpenCV stores image channels in BGR format by default (Blue- Green-Red), while most other libraries like matplotlib and image standards use RGB (Red-Green-Blue) img = cv2.imread('image.jpg’) # Loaded as BGR plt.imshow(img) # displayed as RGB plt.show() Simple Image Manipulations Convert BGR to RGB img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) cv2.imshow(‘RGB Image’, img_rgb) cv2.waitKey(0) cv2.destroyAllWindows() Simple Image Manipulations Convert BGR to Grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) cv2.imshow('Grayscale Image', gray) cv2.waitKey(0) cv2.destroyAllWindows() Simple Image Manipulations Thresholing (Grayscale to Binary) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) _, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY) cv2.imshow('Thresholded Image', thresh) cv2.waitKey(0) cv2.destroyAllWindows() Syntax Comparison - scikit-image vs OpenCV Syntax Comparison - scikit-image vs OpenCV Syntax Comparison - scikit-image vs OpenCV Syntax Comparison - scikit-image vs OpenCV Basic Drawing Functions in OpenCV Why Learn Basic Shapes and Text? Drawing shapes like rectangles, circles, and lines is essential for tasks like: ✓ Annotating images. ✓ Drawing bounding boxes for object detection results. ✓ Visualizing points, landmarks, or regions of interest. Drawing a Rectangle (Bounding Box) # Draw a rectangle (bounding box) start_point = (50, 100) # Top-left corner end_point = (1000, 200) # Bottom-right corner color = (0, 255, 0) # Green in BGR thickness = 2 # Line thickness (-1 for filled rectangle) cv2.rectangle(image, start_point, end_point, color, thickness) cv2.imshow("Rectangle Example", image) cv2.waitKey(0) cv2.destroyAllWindows() Drawing a Circle center = (900, 500) # Center of the circle radius = 150 color = (255, 0, 0) # Blue in BGR thickness = 2 # Line thickness cv2.circle(image, center, radius, color, thickness) Drawing a Line start_point = (200, 400) end_point = (1000, 100) color = (0, 0, 255) # Red in BGR thickness = 3 cv2.line(image, start_point, end_point, color, thickness) Adding a Text text = "Hello World" position = (160, 140) # Bottom-left corner of the text font = cv2.FONT_HERSHEY_SIMPLEX # Font type font_scale = 5 # Font size color = (255, 255, 255) # White in BGR thickness = 4 # Thickness of text cv2.putText(image, text, position, font, font_scale, color, thickness) Recall: how videos are formed Recall: how videos are formed 1 2 3 4 5 6 7 8 Reference: https://www.fg-a.com/wild-west-clipart.html 3 4 8 7 6 5 4 3 2 1 Reference: https://www.fg-a.com/wild-west-clipart.html 3 5 Recall: how videos are formed Combine multiple images to create motion. Frames per Second (FPS): Number of images shown per second. Sample Code: Live Video Feed Processing cap = cv2.VideoCapture(0) # Access webcam while True: ret, frame = cap.read() cv2.imshow('Webcam Feed’, frame) if cv2.waitKey(1) == ord('q’): #ASCII of ‘q’ is 113 or b01110001 break cap.release() cv2.destroyAllWindows() Sample Code: Live Video Feed Processing cap = cv2.VideoCapture(0) # Access webcam while True: ret, frame = cap.read() cv2.imshow('Webcam Feed’, frame) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows() Sample Code: Live Video Feed Processing cap = cv2.VideoCapture(0) # Access webcam while True: ret, frame = cap.read() if ret: cv2.imshow('Webcam Feed’, frame) if cv2.waitKey(1) & 0xFF == ord('q'): break else: break # or continue, or set max count of failure cap.release() cv2.destroyAllWindows() Sample Code: Live Video Feed Processing while True: ret, frame = cap.read() if ret: gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) cv2.imshow(‘Grayscale Feed’, gray) if cv2.waitKey(1) & 0xFF == ord('q'): break else: break # or continue, or set max count of failure cap.release() cv2.destroyAllWindows() Try this! import cv2 cap = cv2.VideoCapture(0) # Access webcam while True: ret, frame = cap.read() if ret: gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) cv2.imshow('Grayscale Feed', gray) if cv2.waitKey(1) & 0xFF == ord('q'): break else: break # or continue, or set max count of failure cap.release() cv2.destroyAllWindows() Typical Computer Vision Tasks Object Detection ▪ What it does: Identifies and localizes objects in an image or video. ▪ Use cases: Autonomous vehicles, surveillance, retail inventory management. ▪ Algorithms/Libraries: YOLO, SSD, Faster R-CNN, Detectron2. https://www.augmentedstartups.com/blog/how-to-implement-object-detection-using-deep-learning-a-step-by-step-guide Object Tracking ▪ What it does: Tracks the motion of objects across frames in a video. ▪ Use cases: Sports analytics, surveillance, traffic monitoring. ▪ Algorithms/Libraries: DeepSORT, ByteTrack, OpenCV Tracking API. https://www.youtube.com/watch?v=PalIIAfgX88 Facial Recognition ▪ What it does: Identifies or verifies a person based on facial features. ▪ Use cases: Authentication, attendance systems, social media tagging. ▪ Algorithms/Libraries: Dlib, FaceNet, OpenCV, DeepFace. https://images.app.goo.gl/qi9i1BLmmju3Y2mH7 Pose Estimation ▪ What it does: Determines body or object poses by detecting keypoints (e.g., joints). ▪ Use cases: Fitness tracking, gesture recognition, sports analytics. ▪ Algorithms/Libraries: MediaPipe, OpenPose, AlphaPose. https://images.app.goo.gl/LCDXt2iqgsgmv3t29 Optical Character Recognition (OCR) ▪ What it does: Extracts and converts text from images to machine- readable formats. ▪ Use cases: Document digitization, license plate recognition, text-based analytics. ▪ Algorithms/Libraries: EasyOCR, Tesseract, PaddleOCR, Google Vision API. https://www.plugger.ai/blog/what-is-optical-character-recognition-ocr-the-definite-guide Image Classification ▪ What it does: Categorizes an image into predefined classes. ▪ Use cases: Product identification, wildlife monitoring, spam detection. ▪ Algorithms/Libraries: ResNet, EfficientNet, MobileNet, VGG. https://images.app.goo.gl/PwNAGqy2TirQGisQ6 Image Segmentation ▪ What it does: Divides an image into regions or objects by assigning each pixel to a class. ▪ Use cases: Medical imaging, scene understanding, autonomous navigation. ▪ Algorithms/Libraries: Mask R- CNN, U-Net, DeepLab. https://images.app.goo.gl/tAEc4JKyfy91PD3QA https://images.app.goo.gl/1La5VkjK2BBqXhBf7 Anomaly Detection ▪ What it does: Identifies unusual patterns or outliers in visual data. ▪ Use cases: Industrial defect detection, security surveillance, fraud detection. ▪ Algorithms/Libraries: Autoencoders, Isolation Forest, PyTorch-based models. https://images.app.goo.gl/diUuHkZmQrcY9vQV7 Video Analysis ▪ What it does: Processes and interprets videos for insights like action recognition or event detection. ▪ Use cases: Behavior analysis, sports video analytics, event monitoring. ▪ Algorithms/Libraries: OpenCV, PyTorchVideo, MMAction2. Video Analysis ▪ What it does: Processes and interprets videos for insights like action recognition or event detection. ▪ Use cases: Behavior analysis, sports video analytics, event monitoring. ▪ Algorithms/Libraries: OpenCV, PyTorchVideo, MMAction2. Questions?

Introduction to OpenCV for Computer Vision PDF

Document Details

Tags

Related

Summary

Full Transcript