Computer Vision Lecture PDF
Document Details
Uploaded by ManageableGoblin9701
Ebenezer Owusu
Tags
Summary
This lecture provides an overview of computer vision, detailing its history, applications, and workings. The lecture covers topics from early attempts at enabling machines to see to modern techniques for object recognition.
Full Transcript
COMPUTER VISION Ebenezer Owusu What is Computer Vision? Computer vision is a field of computer science that focuses on enabling computers to identify and understand objects and people in images and videos. Like other types of AI, computer vision seeks to perform and automate tas...
COMPUTER VISION Ebenezer Owusu What is Computer Vision? Computer vision is a field of computer science that focuses on enabling computers to identify and understand objects and people in images and videos. Like other types of AI, computer vision seeks to perform and automate tasks that replicate human capabilities. In this case, computer vision seeks to replicate both the way humans see, and the way humans make sense of what they see. The range of practical applications for computer vision technology makes it a central component of many modern innovations and solutions. Computer vision can be run in the cloud or on premises. The History of Computer Vision Scientists and engineers have been trying to develop ways for machines to see and understand visual data for about 60 years. Experimentation began in 1959 when neurophysiologists showed a cat an array of images, attempting to correlate a response in its brain. They discovered that it responded first to hard edges or lines, and scientifically, this meant that image processing starts with simple shapes like straight edge Then, the first computer image scanning technology was developed, enabling computers to digitize and acquire images Another milestone was reached in 1963 when computers were able to transform two-dimensional images into three-dimensional forms. In the 1960s, AI emerged as an academic field of study, and it also marked the beginning of the AI quest to solve the human vision problem. 1974 saw the introduction of optical character recognition (OCR) technology, which could recognize text printed in any font or typeface. Similarly, intelligent character recognition (ICR) could decipher hand-written text using neural networks. Since then, OCR and ICR have found their way into document and invoice processing, vehicle plate recognition, mobile payments, machine translation and other common applications. In 1982, neuroscientist David Marr established that vision works hierarchically and introduced algorithms for machines to detect edges, corners, curves and similar basic shapes. Concurrently, computer scientist Kunihiko Fukushima developed a network of cells that could recognize patterns. The network, called the Neocognitron, included convolutional layers in a neural network. By 2000, the focus of study was on object recognition, and by 2001, the first real-time face recognition applications appeared. Standardization of how visual data sets are tagged and annotated emerged through the 2000s. In 2010, the ImageNet data set became available. It contained millions of tagged images across a thousand object classes and provides a foundation for CNNs and deep learning models used today. In 2012, a team from the University of Toronto entered a CNN into an image recognition contest. The model, called AlexNet, significantly reduced the error rate for image recognition. After this breakthrough, error rates have fallen to just a few percent Sample images from some popular databases How does computer vision work? Computer vision is a technique that extracts information from visual data, such as images and videos. Although computer vision works similarly to human eyes with brain work, this is probably one of the biggest open questions for IT professionals: How does the human brain operate and solve visual object recognition?µ On a certain level, computer vision is all about pattern recognition which includes the training process of machine systems for understanding the visual data such as images and videos, etc. Firstly, a vast amount of visual labelled data is provided to machines to train it. This labelled data enables the machine to analyze different patterns in all the data points and can relate to those labels. E.g., suppose we provide visual data of millions of dog images. In that case, the computer learns from this data, analyzes each photo, shape, the distance between each shape, color, etc., and hence identifies patterns similar to dogs and generates a model. As a result, this computer vision model can now accurately detect whether the image contains a dog or not for each input image. Tasks associated with Computer Vision Although computer vision has been utilized in so many fields, there are a few common tasks for computer vision systems. These tasks are given below: Object classification: Object classification is Object Verification: The system processes videos, a computer vision technique/task used to classify finds the objects based on search criteria, and an image, such as whether an image contains a tracks their movement. dog, a person's face, or a banana. It analyzes the visual content (videos & images) and classifies the Object Landmark Detection: The system defines object into the defined category. It means that we the key points for the given object in the image can accurately predict the class of an object data. present in an image with image classification. Image Segmentation: Image segmentation not Object Identification/detection: Object only detects the classes in an image as image identification or detection uses image classification classification; instead, it classifies each pixel of an to identify and locate the objects in an image or image to specify what objects it has. It tries to video. With such detection and identification determine the role of each pixel in the image. technique, the system can count objects in a given Object Recognition: In this, the system recognizes image or scene and determine their accurate the object's location with respect to the image. location and labeling. For example, in a given image, one dog, one cat, and one duck can be easily detected and classified using the object detection technique.