ebook-1_Part2.pdf
Document Details
Uploaded by ImpressedAzalea
Tags
Full Transcript
2 Understanding Data Interpretation by Machines In my opinion, machine learning, the application and science of algorithms that make sense of data, is the most exciting field of all the computer sciences! We are living in an age where data comes in abundance; using self-le...
2 Understanding Data Interpretation by Machines In my opinion, machine learning, the application and science of algorithms that make sense of data, is the most exciting field of all the computer sciences! We are living in an age where data comes in abundance; using self-learning algorithms from the field of machine learning, we can turn this data into knowledge. Thanks to the many powerful open source libraries that have been developed in recent years, there has probably never been a better time to break into the machine learning field and learn how to utilize powerful algorithms to spot patterns in data and make predictions about future events. Building intelligent machines to transform data into knowledge In this age of modern technology, there is one resource that we have in abundance: a large amount of structured and unstructured data. In the second half of the 20th century, machine learning evolved as a subfield of artificial intelligence (AI) involving self-learning algorithms that derive knowledge from data in order to make predictions. ® 42 Chapter 2 Understanding Data Interpretation by Machines Instead of requiring humans to manually derive rules and build models from analyzing large amounts of data, machine learning offers a more efficient alternative for capturing the knowledge in data to gradually improve the performance of predictive models and make data-driven decisions. Not only is machine learning becoming increasingly important in computer science research, but it is also playing an ever-greater role in our everyday lives. Thanks to machine learning, we enjoy robust email spam filters, convenient text and voice recognition software, reliable web search engines, and challenging chess-playing programs. Hopefully soon, we will add safe and efficient self-driving cars to this list. Also, notable progress has been made in medical applications; for example, researchers demonstrated that deep learning models can detect skin cancer with near-human accuracy (https://www.nature.com/articles/nature21056). Another milestone was recently achieved by researchers at DeepMind, who used deep learning to predict 3D protein structures, outperforming physics-based approaches for the first time (https://deepmind.com/blog/alphafold/). Machine learning terminology Machine learning is a vast field and also very interdisciplinary as it brings together many scientists from other areas of research. As it happens, many terms and concepts have been rediscovered or redefined and may already be familiar to you but appear under different names. For your convenience, in the following list, you can find a selection of commonly used terms and their synonyms that you may find useful and machine learning literature in general: Training example: A row in a table representing the dataset and synonymous with an observation, record, instance, or sample (in most contexts, sample refers to a collection of training examples). Training: Model fitting, for parametric models similar to parameter estimation. Feature, abbrev. x: A column in a data table or data (design) matrix. Synonymous with predictor, variable, input, attribute, or covariate. Target, abbrev. y: Synonymous with outcome, output, response variable, dependent variable, (class) label, and ground truth. Loss function: Often used synonymously with a cost function. Sometimes the loss function is also called an error function. In some literature, the term “loss” refers to the loss measured for a single data point, and the cost is a measurement that computes the loss (average or summed) over the entire dataset. A roadmap for building machine learning systems In previous sections, we discussed the basic concepts of machine learning and the three different types of learning. In this section, we will discuss the other important parts of a machine learning system accompanying the learning algorithm. The following diagram shows a typical workflow for using machine learning in predictive modeling, which we will discuss in the following subsections: ® 43 Chapter 2 Understanding Data Interpretation by Machines Preprocessing – getting data into shape Let’s begin with discussing the roadmap for building machine learning systems. Raw data rarely comes in the form and shape that is necessary for the optimal performance of a learning algorithm. Thus, the preprocessing of the data is one of the most crucial steps in any machine learning application. If we take the Iris flower dataset from the previous section as an example, we can think of the raw data as a series of flower images from which we want to extract meaningful features. Useful features could be the color, hue, and intensity of the flowers, or the height, length, and width of the flowers. Many machine learning algorithms also require that the selected features are on the same scale for optimal performance, which is often achieved by transforming the features in ® 44 Chapter 2 Understanding Data Interpretation by Machines the range [0, 1] or a standard normal distribution with zero mean and unit variance. Some of the selected features may be highly correlated and therefore redundant to a certain degree. In those cases, dimensionality reduction techniques are useful for compressing the features onto a lower dimensional subspace. Reducing the dimensionality of our feature space has the advantage that less storage space is required, and the learning algorithm can run much faster. In certain cases, dimensionality reduction can also improve the predictive performance of a model if the dataset contains a large number of irrelevant features (or noise); that is, if the dataset has a low signal-to-noise ratio. To determine whether our machine learning algorithm not only performs well on the training dataset but also generalizes well to new data, we also want to randomly divide the dataset into a separate training and test dataset. We use the training dataset to train and optimize our machine learning model, while we keep the test dataset until the very end to evaluate the final model. ® 45 Chapter 2 Understanding Data Interpretation by Machines Training and selecting a predictive model Many different machine learning algorithms have been developed to solve different problem tasks. An important point that can be summarized from David Wolpert’s famous No free lunch theorems is that we can’t get learning “for free” (The Lack of A Priori Distinctions Between Learning Algorithms, D.H. Wolpert, 1996; No free lunch theorems for optimization, D.H. Wolpert and W.G. Macready, 1997). We can relate this concept to the popular saying, “I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail” (Abraham Maslow, 1966). For example, each classification algorithm has its inherent biases, and no single classification model enjoys superiority if we don’t make any assumptions about the task. In practice, it is therefore essential to compare at least a handful of different algorithms in order to train and select the best performing model. But before we can compare different models, we first have to decide upon a metric to measure performance. One commonly used metric is classification accuracy, which is defined as the proportion of correctly classified instances. One legitimate question to ask is this: how do we know which model performs well on the final test dataset and real-world data if we don’t use this test dataset for the model selection, but keep it for the final model evaluation? In order to address the issue embedded in this question, different techniques summarized as “cross-validation” can be used. In cross- validation, we further divide a dataset into training and validation subsets in order to estimate the generalization performance of the model. Finally, we also cannot expect that the default parameters of the different learning algorithms provided by software libraries are optimal for our specific problem task. Therefore, we will make frequent use of hyperparameter optimization techniques that help us to fine-tune the performance of our model. We can think of those hyperparameters as parameters that are not learned from the data but represent the knobs of a model that we can turn to improve its performance. This will become much clearer in later chapters when we see actual examples. Evaluating models and predicting unseen data instances After we have selected a model that has been fitted on the training dataset, we can use the test dataset to estimate how well it performs on this unseen data to estimate the so-called generalization error. If we are satisfied with its performance, we can now use this model to predict new, future data. It is important to note that the parameters for the previously mentioned procedures, such as feature scaling and dimensionality reduction, are solely obtained from the training dataset, and the same parameters are later reapplied to transform the test dataset, as well as any new data instances—the performance measured on the test data may be overly optimistic otherwise. Customer Analytics Customer analytics is a process in which we use the data of customer behavior to derive the most important business decisions using market segmentation and predictive analytics. Market segmentation is the process of dividing the user base into subgroups based on their behavior and other types of shared characteristics. This will help companies in providing customized products for each user segment. The result ® 46 Chapter 2 Understanding Data Interpretation by Machines of this kind of analysis will lead the company to grow their business in an effective manner. Companies also make more profit. There are a lot of advantages. Companies can use the result generated by market segmentation and predictive models for direct marketing, site selection, customer acquisition, and customer relationship management. In short, with the help of customer analytics, the company can decide the most optimal and effective marketing strategy as well as growth strategy. The company can achieve great results with a limited amount of marking expenditure. Customer analytics include various methods. You can refer to the names of these methods in the following diagram: Figure 2.1: Variety of methods for customer analytics Customer segmentation In this section, we will cover customer segmentation in detail. Initially, I provided just a brief introduction of customer segmentation so that you could understand the term a bit. Here, we will understand a lot more about customer segmentation, which will help us further when we build the customer segmentation analysis. As mentioned earlier, customer segmentation is a process where we divide the consumer base of the company into subgroups. We need to generate the subgroups by ® 47 Chapter 2 Understanding Data Interpretation by Machines using some specific characteristics so that the company sells more products with less marketing expenditure. Before moving forward, we need to understand the basics, for example, what do I mean by customer base? What do I mean by segment? How do we generate the consumer subgroup? What are the characteristics that we consider while we are segmenting the consumers? Let’s answers these questions one by one. Basically, the consumer base of any company consists of two types of consumers: Existing consumers Potential consumers Generally, we need to categorize our consumer base into subgroups. These subgroups are called segments. We need to create the groups in such a way that each subgroup of customers has some shared characteristics. In order to explain how to generate the subgroup, let me give you an example. ® 48 Chapter 2 Understanding Data Interpretation by Machines Suppose a company is selling baby products. Then, it needs to come up with a consumer segment (consumer subgroup) that includes the consumers who want to buy the baby products. We can build the first segment (subgroup) with the help of a simple criterion. We will include consumers who have one baby in their family and bought a baby product in the last month. Now, the company launches a baby product that is too costly or premium. In that case, we can further divide the first subgroup into monthly income and socio-economic status. Based on these new criteria, we can generate the second subgroup of consumers. The company will target the consumers of the second subgroup for the costly and premium products, and for general products, the company will target consumers who are part of the first subgroup. When we have different segments, we can design a customized marketing strategy as well as customized products that suit the customer of the particular segment. This segment-wise marketing will help the company sell more products with lower marketing expenses. Thus, the company will make more profit. This is the main reason why companies use customer segmentation analysis nowadays. Customer segmentation is used among other domain such as the retail domain, finance domain, and in customer relationship management (CRM)-based products. I have provided a list of the basic features that can be considered during the segmentation. You can refer to them in the following screenshot: Figure 2.2: List of basic features used in customer segmentation You may wonder how companies are making marketing strategies based on the customer segmentation analysis. The answer is companies are using the STP approach to make the marketing strategy firm. What is the STP approach? First of all, STP stands for Segmentation-Targeting-Positioning. In this approach, there are three stages. The points that we handle in each stage are explained as follows: Segmentation: In this stage, we create segments of our customer base using their profile characteristics as well as consider features provided in the preceding figure. Once the segmentation is firm, we move on to the next stage. Targeting: In this stage, marketing teams evaluate segments and try to understand which kind of product is suited to which particular segment(s). ® 49 Chapter 2 Understanding Data Interpretation by Machines The team performs this exercise for each segment, and finally, the team designs customized products that will attract the customers of one or many segments. They will also select which product should be offered to which segment. Positioning: This is the last stage of the STP process. In this stage, companies study the market opportunity and what their product is offering to the customer. The marketing team should come up with a unique selling proposition. Here, the team also tries to understand how a particular segment perceives the products, ® 50 Chapter 2 Understanding Data Interpretation by Machines brand, or service. This is a way for companies to determine how to best position their offering. The marketing and product teams of companies create a value proposition that clearly explains how their offering is better than any other competitors. Lastly, the companies start their campaign representing this value proposition in such a way that the consumer base will be happy about what they are getting. I have summarized all the preceding points in the following diagram: ® 51 Chapter 2 Understanding Data Interpretation by Machines Figure 2.3: Summarization of the STP approach We have covered most of the basic parts of customer segmentation. Now it’s time to move on to the problem statement. Introducing the problem statement As you know, customer segmentation helps companies retain existing customers as well as acquire new potential customers. Based on the segmentation, companies can create customized products for a particular customer segment, but so far, we don’t know how to generate the segments. This is the point that we will focus on in this chapter. You need to learn how to create customer segmentation. There are many domains for which we can build customer segmentation, such as e-commerce, travel, finance, telecom, and so on. Here, we will focus only on the e-commerce domain. ® 52 Chapter 2 Understanding Data Interpretation by Machines Here is a detailed explanation of the problem statement, input, and output for the e-commerce customer segmentation application that we will be building: Problem statement: The goal of our customer segmentation application is to come up with a solution for the given questions: ° Can we categorize the customers in a particular segment based on their buying patterns? ° Can we predict which kind of items they will buy in future based on their segmentation? Input: We will be using e-commerce data that contains the list of purchases in 1 year for 4,000 customers. Output: The first goal is that we need to categorize our consumer base into appropriate customer segments. The second goal is we need to predict the purchases for the current year and the next year based on the customers’ first purchase. You may wonder how we can achieve a prediction about the upcoming purchases using segmentation. Well, let me tell you how segmentation helps us! So, we don’t know the purchase pattern of the new customer, but we know the customer profile. We also know which product the customer has bought. So, we can put the customer into one of the segments where all other customers have purchased similar items and share similar kinds of profile. Let me give you an example. Say, a person has bought a Harry Potter book and that person lives in the UK. The age group of the customer is from 13-22. If we have already generated a customer segment that satisfies these characteristics, then we will put this new customer in that particular subgroup. We will derive the list of items that the customer may buy in future. We will also offer similar services that other customers in the subgroup have. The approach that we will be using in order to develop customer segmentation for the e-commerce domain can also be used in other domains, but data points (features) will differ for each domain. Later on in the chapter, we will discuss what kind of data points you may consider for other domains, such as travelling, finance, and so on. I will provide the list of data points for other domains that will help you build the customer segmentation application from scratch. Deep learning Deep learning computer software imitates the network of neurons in a brain. It is a subset of ML and is called deep learning because it uses deep neural networks. A neural network is an architecture of layers stacked on top of each other. The word “Deep” in Deep Learning refers to many layers of neurons that help learn various data representations. In deep learning, the machine uses different layers to learn from the given data. The number of layers in the model represents its depth. Deep learning algorithms are built through connected layers. Input Layer is the first layer Output Layer is the last layer ® 53 Chapter 2 Understanding Data Interpretation by Machines There are multiple layers in between the input and output layers and are known as hidden layers. The word ‘deep’ means that the network of layers joins the neurons in more than two layers. Each hidden layer consists of neurons that are connected. The neuron will process it and then propagate the input signal received from the layer above it. The signal’s strength is transmitted to the next layer of neurons depends on weight, bias, and activation function. The network consumes a lot of input data and operates through multiple layers. The network can learn increasingly complex data features at each layer. Deep neural networks can provide state-of-the-art accuracy in many tasks, from object detection to speech recognition. It can learn automatically without the programmer’s pre-defined knowledge that is explicitly coded. Each layer signifies a deeper level of knowledge. Compared with a two-layer neural network, a four-layer neural network will learn more complex features. Learning is divided into two stages. The first stage involves nonlinearly transforming the input and creating a statistical model as output. The second phase aims at the derivative (mathematical method, which aims at ® 54 Chapter 2 Understanding Data Interpretation by Machines improving the model). The neural network repeats these two stages hundreds to thousands of times until it reaches a tolerable accuracy level. Repeating these two stages is called iteration. Why is Deep Learning Important? Deep learning is a powerful tool to convert prediction into actionable results. It is good at pattern discovery (unsupervised learning) and knowledge-based prediction. Big data is the driving force of deep learning. When the two are combined, organizations can achieve unprecedented results in productivity, sales, management, and innovation. Deep learning can outperform traditional methods. Its algorithms are more accurate than machine learning algorithms; 41% more in image classification; 27 % more in facial recognition and 25% more in voice recognition. ® 55 Chapter 2 Understanding Data Interpretation by Machines Limitations of Deep Learning Data labeling Almost all recent AI models are trained through “supervised learning.” This means that humans must label and classify the basic data, which can be quite large and error- prone. Companies that develop self-driving car technology are hiring hundreds of people to manually annotate hours of video in prototype vehicles to help train these systems. Get a huge training dataset It has been shown that in some of the cases, simple deep learning techniques like CNN can imitate expert knowledge in medicine and other fields. However, the current wave of machine learning requires training data sets to be not only labeled but also sufficiently broad and general. Deep learning methods require thousands of observations to make the model relatively good at the classification of tasks, and in some cases, millions of observations to perform at the human level. Not surprisingly, deep learning is prevalent among large technology companies. They are using big data to accumulate PB data. It enables them to create impressive and highly accurate deep learning models. The Interpretation Problem Large and complex models may be difficult to interpret from a human perspective. It is one of the reasons for the slow acceptance of certain AI tools in the application field. In areas where interpretability is useful or indeed requires interpretability, AI may be slow to merge. As AI applications expand, regulatory requirements may also require more interpretable AI models. Applications of DL Computer Vision: Computer vision uses deep learning neural network methods to solve challenging problems such as facial recognition, augmented reality, gesture recognition, and image classification. Advances in image recognition technology have made it easy to search or automatically organize photo collections without identification tags. It is possible to restore old, black-and-white images and generate artificial video with precise lip sync. Machine Translation: Advanced neural network algorithms are used to synthesize text into various sounds and languages. Recent stats show that digital assistants’ speech recognition functions work better than before, and customers have almost tripled their speech interfaces. In addition, automatic text generation generates custom text and translates foreign languages, such as Gmail auto-complete and automatic translation. Social Network Filtering: The extensive use of social media platforms creates the exciting potential for using neural network models to develop network representations. Healthcare: The Deep Learning method can derive correlations by analyzing millions of data points and enabling clinicians to keep up-to-date research and treatment possibilities. The Healthcare industry will be reformed with deep learning’s computer- ® 56 Chapter 2 Understanding Data Interpretation by Machines aided detection and diagnosis — a few of the areas like drug discovery and precision medicine and imaging analytics and diagnostics. Gaming: Deep learning (DL) dynamically renders finer graphic details in games and helps game developers focus on effective storytelling rather than game graphics. The development of human-like bots is one of the promising DL tools to enrich the gaming experience. Self-Driving Cars/Vehicles: A self-driving car uses various cameras and sensors to perceive its surroundings and move safely with little or no human input. The Society of Automotive Engineers (SAE) states five levels in the development of autonomous driving, including 0 => No automation (This is not a level) 1 => Driver assistance 2 => Partial automation 3 => Conditional automation 4 => High automation 5 => Full automation Presently, self-driving cars are at Level 3 technology. Leading companies and major automakers have developed various self-driving technologies like Tesla, Nissan, Google Waymo, and many more. Challenges in Deep Learning (DL) Training Data: Deep learning can give accurate predictions with the right quantity and quality of training data. Building a DL model involves time-consuming tasks, such as collecting and labeling training data. Effective learning and teaching: Machines need thousands of examples, whereas humans can learn from a handful of examples. We cannot provide all possible labeled samples in the problem space to the DL model. Therefore, the DL model can be generalized or interpolated to classify any data not included in its original data set. Understand the context: Deep learning cannot understand the context well because it lacks pure perception, so image classification ≠ scene understanding. Black box problem: Neural networks (NN) do not rely on pre-established rules. They find patterns and correlations without revealing the reasons for them. Human users like to understand how the system makes decisions because decision-making is a potential responsibility in the financial and medical fields. Large models and complexity: The most advanced deep learning models can easily reach several gigabytes in size and become larger and larger. The amount of parameters is directly proportional to the information absorbed by the neural network. Neural Network complexity is exploding to tackle increasingly complex challenges. ® 57 Chapter 2 Understanding Data Interpretation by Machines Infrastructure: Most useful deep learning problems activate the entire neural network for each batch size, blowing up the computational cost. Therefore, the model can be loaded and scaled on multiple machines. ® 58 Chapter 2 Understanding Data Interpretation by Machines Life Cycle of a Deep Learning Project Deep learning projects are iterative in nature and consist of tasks given below. It is advised to build smaller models with few data points and increase the complexity as time goes by. ® 59 Chapter 2 Understanding Data Interpretation by Machines Project Planning: Project planning includes defining goals, metrics, and baselines. ® 60 Chapter 2 Understanding Data Interpretation by Machines Data Collection and Labeling: In this step, data is collected, stored, and labeled using various tools. ® 61 Chapter 2 Understanding Data Interpretation by Machines Model Building: Model building activities include training, testing, and debugging. Deployment and Monitoring: Once the model meets the requirements, it is deployed and monitored using the desired interface. AI Vs. ML Vs. DL Artificial Intelligence can be divided into three sub-fields: Artificial Intelligence (AI): Artificial Intelligence (AI) is the computer science branch that emphasizes intelligence machines’ development, thinking, and working like humans. Al-powered computers have started simulating the human brain’s work styles, sensations, actions, interactions, perceptions, and cognitive abilities. However, all these developments are either at the primary or intermediate level. Once, it was believed that human intelligence could be precisely described, and machines can simulate it with AI, such as speech recognition, face recognition, biometric system, security cameras, and surveillance equipment. Machine Learning (ML): Computer science, which uses statistical techniques to enable ® 62 Chapter 2 Understanding Data Interpretation by Machines computer systems to “learn” (for example, gradually improve the performance of specific tasks) data without explicit programming. Machine learning (ML) is a subset of AI and an approach to achieve AI. For example, medical diagnosis, image processing, prediction, classification, learning association, and regression. Deep Learning (DL): “Deep learning (DL) is an algorithm that has no theoretical limitations of what it can learn; the more data you provide, and the more computational time you provide, the better the effect.” Sir Geoffrey Hinton Deep Learning is a part of an extended family of machine learning based on data representations instead of task-specific algorithms. Deep learning is used to extract useful patterns from data, which is a subset of machine learning. Deep Learning models make predictions independent of human intervention. ® 63 Chapter 2 Understanding Data Interpretation by Machines This method aims to achieve an artificial intelligence power that teaches computers to do tasks and gain the ability to understand anything. e.g., Automated Driving, The three concentric levels above describe DL as a subset of ML, a subset of AI. Therefore, AI is the all-encompassing concept wherein the other two concepts are rooted. ML thrived later, and now DL is promising to advance AI to another level. ® 64 Chapter 2 Understanding Data Interpretation by Machines Neural Networks Neural networks are a series of algorithms designed to identify potential relationships in a set of data by mimicking the way the human brain operates. Meaning neural networks are a collection of layers of computational units called neurons, connected in different layers. These networks convert the data until it can be classified as output. Each neuron multiplies the initial value by a certain weight, adds the result to other values entering the same neuron, adjusts the number of results by the deviation of the neuron, and then uses the activation function to normalize the output. ® 65 Chapter 2 Understanding Data Interpretation by Machines Classification of Neural Networks Shallow neural network: Shallow neural network has only one hidden layer between input and output. Deep neural network: Deep neural network has multiple layers. For example, the Google LeNet model for image recognition has 22 layers. Nowadays, deep learning is used in a driverless car, mobile phones, Google search engine, fraud detection, TV, and many other applications. Feed-forward neural networks: This is the simplest type of artificial neural network. In these networks, the information flows in only one direction, forward. It means, the information flow starts at the input layer, goes to the “hidden” layers, and ends at the output layer. The network does not have a loop. Recurrent neural networks (RNNs): RNN is a multi-layer neural network that can store information in context nodes, thus enabling it to learn data sequences and output numbers or other sequences. In simple words, it is an artificial neural network connecting neurons in loops. RNNs are well-suited for processing sequences of inputs. For example, RNN can be used to predict the next word in a sentence: “Do you want to...?” The RNN neuron will receive a signal pointing to the beginning of the sentence. The network gets the word “Do” as an input and produces a vector of the number. This vector is inputted to the neuron to give a memory to the network. This stage helps the network remember the “Do” it received, and it received it in the first place. The network will similarly enter the next word. It uses the terms “you” and “want.” After receiving each word, the state of the neuron is updated. The concluding stage occurs after receiving the word “a.” The neural network will provide each English word with a probability that it can be used to complete a sentence. A well-trained RNN may assign higher opportunities to “cafe,” “drink,” “hamburger,” etc. Common uses of RNN Help securities traders to generate analytic reports Detect abnormalities in financial statement contracts Detect fraudulent credit-card transaction Provide a caption for images ® 66 Chapter 2 Understanding Data Interpretation by Machines Power chatbots Commonly used when the practitioners are working with time-series data or sequences (e.g., audio recordings or text). Convolutional neural networks (CNN) CNN is a multi-layer neural network with a unique architecture designed to extract increasingly complex features from each layer of data to determine the output. CNN is very suitable for perception tasks. CNN is usually used when there are unstructured data sets (such as images), and practitioners need to extract information from it. For example, if the goal is to predict the image caption: Let’s assume that CNN receives an image of a cat. In computer terms, this image is a collection of pixels-usually, one layer is used for grayscale pictures, and three layers are used for color pictures. During feature learning (i.e., hidden layer), the network will recognize unique features, for example, the cat’s tail, ears, etc. When the network has thoroughly learned how to recognize pictures, it can provide probabilities for each image it knows. The label with the highest probability is the prediction of the network. Reinforcement Learning (RL) Reinforcement learning (RL) is an ML technique that enables agents to use feedback on their behavior and experience to learn in an interactive environment through trial and error. In this system, by receiving virtual “rewards” or “punishments” for training, Google’s DeepMind defeated human champions in Go games through reinforcement learning. Reinforcement learning improves the gaming experience in video games by providing smarter robots. Although both supervised learning and reinforcement learning use the mapping between input and output, unlike supervised learning, in supervised learning, the feedback provided to the agent is the correct set of actions to perform the task, while reinforcement learning treats rewards and punishments as positive and negative behavioral signals. Compared with unsupervised learning, reinforcement learning has different goals. Although the goal of unsupervised learning is to find similarities and differences between data points, the purpose of reinforcement learning is to find a suitable behavior model that will maximize the agent’s total cumulative reward. The following figure shows the basic ideas and elements involved in the reinforcement learning model. ® 67 Chapter 2 Understanding Data Interpretation by Machines Some of the most famous algorithms are: Q-learning Deep Q network State-Action-Reward-State-Action (SARSA) Deep Deterministic Policy Gradient (DDPG) AI Use cases Finance: The field of financial technology has begun to use AI to save time, reduce costs, and add value. Deep learning changes the lending industry by using robust credit scores. Credit decision-makers can use AI for powerful credit lending applications to use machine intelligence to consider the characteristics and capabilities of applicants, enabling faster and more accurate risk assessments. “Underwrite” is a fintech company that provides AI solutions for credit companies. “underwrite.ai” uses AI to distinguish which applicant is more likely to repay the loan. Their approach radically outperforms traditional methods. HR: Under Armour’s leadership, a sportswear company, has revolutionized recruitment methods and modernized the job seeker experience, with the help of AI. It reduced hiring time for its retail stores by 35%. The screening and interviewing process became a challenging task for 30000 resumes, which they received in a month. Slow hiring and on-boarding impacted Under Armour’s ability in staffing and operations. Under Armour partnered with HireVue, an AI provider for HR solutions, for both on- demand and live interviews. The results were impressive; they managed to decrease the time to fill by 35% and hired higher quality staff. Marketing: Artificial intelligence is a valuable tool to meet the challenges of customer service management and personalization. Improved speech recognition and call routing, AI techniques, in call centers allow a more seamless experience for customers. Deep-learning audio analysis enables systems to assess a customer’s tone. If the customer responds poorly to the AI chatbots, the system can reroute the conversation to human operators who take over the issue. Artificial Intelligence is also widely used in other industries. AI key concepts, libraries & framework Understanding AI key concepts, libraries and framework is crucial for AI projects and managing them. It is beneficial for the project managers or leaders to understand at least a high AI libraries & framework level. The followings are some of the many popular and widely used terms and AI libraries. TensorFlow TensorFlow is an open-source deep-learning library developed by Google that is used to perform complex numerical operations and several other tasks to model deep learning models. Its architecture allows easy deployment of computations across multiple platforms like CPU’s, GPU’s, etc. The products of the Google Brain team are used in Google Photos, Google Search, and Google Cloud Speech. ® 68 Chapter 2 Understanding Data Interpretation by Machines Characteristics TensorFlow It includes efficient C++ implementation of machine learning and custom C++ operations. High-level APIs such as TF Layers, Keras, and Pretty Tensor can run on top of TensorFlow. It provides a simple API TF-Slim (tensorflow.contrib.slim) for simple training routines. It can run on operating systems such as Windows, Linux and macOS, and mobile operating systems such as Android and iOS. It provides quite a few optimization nodes to search for parameters that minimize the cost function. It automatically calculates the gradient of the cost function. This is called Autodiff (automatic differentiating). It provides a visualization tool called TensorBoard, where you can view the calculation graph, learning curve, and many more. It provides a Python API called TF. Learning (tensorflow.contrib.learn) uses a few lines of code to train a neural network. Types of APIs The two main APIs provided by TensorFlow are: 1. TensorFlow Core API-low-level machine learning development 2. Higher Level APIs - more compact API, such as tf.layers or tf.contrib.learn TensorFlow core API TensorFlow core provides comprehensive programming control. It is best suited for machine learning researchers and others who need fine level control over their models. Higher-level API A higher-level API is built on TensorFlow Core. These are easier to learn and use than TensorFlow Core. They make repetitive tasks more comfortable and consistent among different users. Higher-level APIs such as tf.contrib learning can help you manage data sets, estimates, training, and inference. What are Tensors? Tensors are the input and output of TensorFlow or multidimensional data arrays. They are a set of values that are shaped into n-dimensional arrays with static and dynamic type dimensions. They represent physical entities mathematically, characterized by amplitude and multiple directions. They usually contain floating-point numbers, but they can also carry strings in the form of byte arrays. They can be passed between nodes of the computation graph and have data types. NumPy is a Python API for numerical calculations. ® 69 Chapter 2 Understanding Data Interpretation by Machines Tensor Ranks In the TensorFlow system, tensors are defined by dimensional units called ranks. Tensor rank is not similar to matrix rank. Tensor rank (also referred to as order or degree or n-dimension) is the number of dimensions of the tensor. The dimensions of tensors can be described using levels, shapes, and dimensions. The shape and size of the tensor determine its grade. Other libraries perform deep learning with almost similar capabilities, but, Google’s TensorFlow has proven to be a scalable, clean, flexible, and efficient. As Google backs, it has risen to the top of the developers’ choice. TensorFlow benefits The most significant benefit TensorFlow provides for machine learning development is an abstract concept. Developers don’t usually handle any specific details and can focus on the overall logic of the application. They can also set aside the worries of dealing with the implementation hassles of the algorithms. They are also not worried about finding out the correct method to connect the output of one function to the input of another function. TensorFlow takes care of the information in the back of the scenes. TensorFlow provides more convenience for developers who need to debug and introspect TensorFlow applications. The desired execution mode allows you to transparently evaluate and modify each graphic operation separately instead of constructing the entire graphic as a single opaque object and immediately evaluating it. The TensorBoard visualization kit allows you to examine and analyze the way graphics are run through an interactive web-based dashboard. TensorFlow has also gained numerous advantages from its association with Google. Google not only promoted the rapid development of the project but also developed many important products around TensorFlow, making them easier to deploy and use: the TPU as mentioned earlier, silicon for accelerated performance in Google’s cloud; an online center for sharing models created using frameworks; browsers and mobile- friendly framework applications; and more. One caveat: The implementation of TensorFlow makes it difficult for certain training jobs to obtain certain model training results. Sometimes the model trained on one system is slightly different from the model trained on another system, even if they are provided with the same data. The reason for this deviation is subtle. How and where to seed random numbers and, are there non-deterministic behavior when using GPUs? In other words, these problems can be solved, and the TensorFlow team is considering using more controls to influence the certainty in the workflow. TensorFlow vs. the competition TensorFlow competes with many other machine learning frameworks. PyTorch, CNTK, and MXNet are three important frameworks that meet many of the same needs. Few of the frameworks are: PyTorch: In addition to building with Python, PyTorch has many other similarities with TensorFlow, including hardware acceleration components under the hood, a highly interactive development model that allows you to design on-demand, and contains ® 70 Chapter 2 Understanding Data Interpretation by Machines many useful components. For the rapid development of projects that need to be up and running in a short period, PyTorch is usually the better choice, but TensorFlow wins over large projects and more complex workflows. CNTK: CNTK is a Microsoft cognitive toolkit such as TensorFlow. It uses a graph structure to describe the data flow, but it focuses on creating deep learning neural networks. CNTK can process many neural network jobs faster and has a broader API set (Python, C++, C#, Java). However, CNTK is currently not as easy to learn or deploy as TensorFlow. Apache MXNet: The Apache MXNet used by Amazon as the main deep learning framework on AWS, can scale linearly across multiple GPUs and multiple machines. It also supports various language APIs (Python, C++, Scala, R, JavaScript, Julia, Perl, and Go), although its native API is not as good as TensorFlow. TensorFlow is most suitable for: Large Dataset High Performance Functionality Object Detection Keras Keras is nothing but an advanced neural network API that is written in Theano, CNTK, or Python. Designed to be able to use deep neural networks for rapid experiments, it focuses on becoming a user-friendly, modular, and scalable network. Its development focuses on rapid experiments and is part of the research work of ONEIROS (Open- ended Neural Electronic Intelligent Robot Operating System) project. Its author is the Google engineer François Chollet who also maintains it. Chollet is also the author of the XCeption deep neural network model. Keras’ Features Neural networks API Easy and fast prototyping Convolutional networks support Recurrent networks support Runs on GPU Keras Alternatives PyTorch PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, mainly developed by Facebook’s AI Research Lab (FAIR). ® 71 Chapter 2 Understanding Data Interpretation by Machines TensorFlow TensorFlow is an open-source software library for numerical calculations using data flow graphs. The nodes in the graph represent mathematical operations, and the edges of the graph represent multidimensional data arrays (tensors) that communicate between them. The flexible architecture of TensorFlow allows you to deploy computation to either one or more GPUs or CPUs in a server or on your mobile device, desktop, even with a single API. Scikit-learn Scikit-learn is a module of Python for machine learning, built on SciPy, and distributed under the 3-Clause BSD license. ML Kit ML Kit brings Google’s machine learning expertise to mobile developers in a powerful and easy-to-use package. CUDA CUDA is a parallel computing platform and programming model developed by Nvidia, which enables developers to use the functions of GPU to realize the parallelizable part of the calculation, thereby accelerating the speed of computationally intensive applications. Keras is most suitable for: Rapid Prototyping Small Dataset Multiple back-end support Time series Time series is nothing but the arrangement of statistical data in chronological order following time. A time-series offers the relationship between two variables: one of them is time. Mathematically, it is given by Y = f(t) Y = f(t) Where “y” is the phenomena at any given time “t.” Thus “y” can be taken as a feature of “t.” Time series components: Trend: Increase or decrease a series of data over a more extended period. Seasonality: The pattern fluctuations caused by seasonal determinants in the short term. Cyclicity: Variations taking place at irregular intervals because of specific circumstances. Irregularity: Instability due to random factors that are not repeated in the pattern. The primary concern of time series analysis is to study the net-effect of these components ® 72 Chapter 2 Understanding Data Interpretation by Machines on the movement of the time series and also to study these components independently. ® 73 Chapter 2 Understanding Data Interpretation by Machines The number one concern of time series analysis is to examine the net effect of these components on the movement of the time series and also to look at these components independently. Time series uses: A time-series has profound importance in enterprises and policymaking. It is used: 1. To observe the past behavior of the phenomena under consideration. 2. Compare current trends with past or expected trends. Therefore, it clearly shows growth or decline. 3. In forecasting and policy planning by various organizations. 4. Cyclic changes help us understand the business cycle. 5. The seasonal variations are useful for businesses and retailers as they earn more in certain seasons. For example, a seller of clothes will make more profit if he sells woolen clothes in winter and silk clothes in summer. Time Series Application: Time series analysis can be used for a variety of purposes, such as: Stock Market Analysis Economic Forecasting Inventory Studies Budgetary Analysis Census Analysis Yield Projection Sales Forecasting and more. Time Series Modeling It involves data based on time (year, day, hour, minute) to draw hidden insights to make informed decisions. Time series models are very useful models when you have data related to a series. Most companies use time-series data to analyze next year’s sales, website visits, competitive position, etc. Few of the Time Series models are: ARIMA Model ARIMA (Autoregressive Integrated Moving Average version), that’s a sort of regression analysis that measures the influence of one established variable on the changing variables. The model is used to forecast moves inside the economic market, analyze the differences in values in a collection compared to the actual values. ARIMA can be classified into three components: AR(Autoregression), where the dependent relationship is used between observation and many lagged observations. I (Integrated), where raw observation is differentiated and is used to make the time series stationary. ® 74 Chapter 2 Understanding Data Interpretation by Machines MA (Moving Average), uses the dependency between observation and residual error. Every component is defined as a parameter which is replaced as integer to indicate the usage of the ARIMA model. ARIMA and Stationarity A stationary model consists of consistent data over a period. ARIMA model makes the data stationary through differencing. For instance, most of the economic data reflects a trend. Differencing of data eliminates the trends to make it stationary. Below is an example of index values that are analyzed month-to-month. The plot shows an upward trend which has non-stationary data. Thus, the ARIMA model can analyze, predict, and make the data stationary: Source: ElegantJ BI ® 75 Chapter 2 Understanding Data Interpretation by Machines Autoregressive Model (AR) The Autoregressive (AR) model derives the behavioral pattern from the past data to forecast the future. It is beneficial when there’s a correlation among the data in a time series. The model is based on the linear regression of the data in the current time series relative to previous data on the same series. The following is an example of the Google stock price from February 7, 2005, to July 7, 2005, with an n value of 105. Analyze the data to identify the AR model. In the figure below, the figure shows the relationship between stock price and time. ® 76 Chapter 2 Understanding Data Interpretation by Machines Source: https://morioh.com/p/5d5c6c01f9e6 These values are closely related to each other, indicating that an AR model is needed. The following figure shows some autocorrelation of data: ® 77 Chapter 2 Understanding Data Interpretation by Machines Source: PennState You can generate a lag-1 price variable and compare the scatter plot with the lag-1 variable: Source: PennState ® 78 Chapter 2 Understanding Data Interpretation by Machines A moderate linear pattern can be observed, which indicates the suitability of the first- order AR model. Moving Average Model (MA) The moving average model is used to model a univariate time series. The model defines that the output variables are linearly related to the current and past data of the time series. It uses past errors in predictions in regression instead of the previous values of predictors. Moving averages help reduce “noise” in prices. If the moving average is sloping upwards in the chart, it means that the price has risen. If it points down, it means that the price is falling, and if the price moves sideways, the price is likely to be in the range. In a 50-, 100-, or 200-day uptrend, the moving average may support the bottom line of a price rebound. Natural Language Processing (NLP) Natural language processing (NLP) is a branch of AI that can help computers understand, interpret, and manipulate human language. NLP lets developers organize and structure knowledge to perform tasks such as translation, summarization, named entity recognition, relationship extraction, speech recognition, and topic segmentation. NLP is a way of computers to analyze, understand, and derive meaning from human languages such as English, Spanish, Mandarin, etc. For Example, A robot is used to perform as per instructions. NLP system’s input and output can be – Speech Written Text NLP Techniques and Tools Syntax and semantic analysis are the two main techniques used in natural language processing. Grammar is the arrangement of words in a sentence to facilitate grammatical understanding. NLP uses syntax to evaluate the meaning of language based on grammar rules. The syntax techniques used include parsing (grammatical analysis of sentences), word segmentation (dividing large sections of text into multiple units), sentence breaking (placement of sentence boundaries in large texts), morphological segmentation (dividing words into groups) and stemming (divide words with metamorphosis into root form). Semantics involves the usage and meaning behind the word. NLP applies algorithms to understand the purpose and structure of sentences. The techniques used by NLP in conjunction with semantics include word sense disambiguation (which derives the meaning of a word based on context), named entity recognition (determining words that can be classified) and natural language generation (which will use a database to determine the semantics behind the word). The current NLP method is based on deep learning, which is an AI that examines and uses patterns in the data to improve the understanding of the program. Deep learning models require large amounts of labeled data to train and recognize correlations, and assembling such large data sets is one of the main obstacles of current NLP. ® 79 Chapter 2 Understanding Data Interpretation by Machines The earlier NLP method involved a rule-based approach, in which a simpler machine learning algorithm was told which words and phrases to look for in the text and gave specific responses when these phrases appeared. But deep learning is a more flexible and intuitive method, in which the algorithm learns to identify the speaker’s intent from many examples, just like how children learn human language. NLP Tools The three commonly used tools of NLP include NLTK, Gensim, and Intel NLP Architect. NTLK (Natural Language Toolkit) is an open-source python module with data sets and tutorials. Gensim is a Python library for subject modeling and document indexing. Intel NLP Architect is another Python library for deep learning topology and technology. Components of NLP There are two components of the Natural Language Processing systems: A. Natural Language Understanding (NLU) NLU can be used to learn the meaning of a text, and for this, you should understand the nature and structure of each word. i. Lexical Ambiguity: - words have multiple meanings ii. Syntactic Ambiguity: - the sentence with numerous parse trees. iii. Semantic Ambiguity: - the sentence can have innumerable implications iv. Anaphoric Ambiguity: - phrase or word have a different meaning. B. Natural Language Generation (NLG) You must produce meaningful phrases and sentences. This is the natural language form of internal representation. This process involves: Text Retrieve the relevant content from a knowledge base. Sentence Choose the required words for setting the tone of the sentence. Text Realization: Map sentence plan into sentence structure. Future of NLP Human-readable natural language processing is the biggest AI-problem. It is almost the same as solving the central artificial intelligence problem and making the computer as intelligent as a human. With the help of NLP, future computers or machines will be able to learn from online information and apply it to the real world. However, a lot of work is required in this regard. Natural language toolbox or NLTK becomes more effective with natural language generation, and computers will become more capable of receiving and providing useful and resource-rich information or data. ® 80 Chapter 2 Understanding Data Interpretation by Machines With the help of NLP, support for invisible UI, smarter search, intelligence from unstructured information, Intelligent Chatbot, and many more are becoming a reality. The idea of an invisible or zero user interface will depend on the direct interaction between the user and the machine, through voice, text, or a combination of the two. When applied to search, the same feature that enables chatbots to understand customer requests can enable the “search like you speak” feature (just like you can query Alexa) without having to focus on topics or keywords. Google has added NLP to Google Drive to allow users to search for documents and content in the conversational language. Natural language vs. Computer Language Natural Language Processing — Terminologies a. Phonology: - A study of organizing sound. b. Morphology: - A study of the construction of words from primitive meaningful units. c. Morpheme: A primitive unit of meaning in a language. Syntax: Arrange words to make a sentence. It also involves determining the structural role of words in the sentence and phrases. Semantics: Defines the meaning of words. How you can combine words into expressive phrases and sentences. Pragmatics: Deals with the use and understanding of sentences in different situations. It also defines how the interpretation of the sentence is affected. World Knowledge: Includes general knowledge about the world. Steps in NLP Natural language processing usually includes five steps: a. Lexical Analysis: It helps in analyzing the structure of words. A language’s ® 81 Chapter 2 Understanding Data Interpretation by Machines lexicon is the combination of words and phrases in it. b. Syntactic Analysis (Parsing) Parsing for the analysis of the word and arranging words in a particular manner that shows the relationship between words. c. Semantic Analysis The purpose of semantic analysis is to draw the exact meaning or meaning of the dictionary from the text. The job of the semantic analyzer is to check whether the text is meaningful. ® 82 Chapter 2 Understanding Data Interpretation by Machines d. Discourse Integration The meaning of any sentence depends on the context in the previous sentence. It brings sense to immediately succeeding sentence. e. Pragmatic Analysis You are interpreting data for actual meaning, although you must derive the linguistic aspects that require real-world knowledge. Use Cases of NLP In simple terms, NLP represents automated dealing with natural human languages like speech or text. Although the idea itself is fascinating, the actual value behind it comes from the use cases. NLP assists you with lots of tasks, and the fields of application just appear to grow daily. Here are a few examples: NLP facilitates the recognition and prediction of illnesses based on electronic health records and the patient’s search. It can detect many health conditions from cardiovascular diseases to depression or even schizophrenia. For instance, Amazon Comprehend Medical is a service that uses NLP to extract sickness, medications, and treatment results from clinical trial reports, patient notes, and other electronic health records. Organizations can determine customer reviews of services or products by identifying and extracting information such as social media. This sentiment analysis can provide plenty of facts about customers’ choices and their decision- making motivation. An inventor at IBM built a cognitive assistant that works like a customized search engine by collecting all information about you and then remind you of a name, a song, or anything you couldn’t recall the instant you want to. Companies such as Yahoo and Google use NLP to filter and classify emails by analyzing the text in emails flowing through the server and blocking spams before it even enters your inbox. To help identify the fake news, the MIT NLP team has developed a new system to determine whether the source of the news is accurate or politically biased, and thus to test whether the news source can be trusted. Apple’s Siri and Amazon’s Alexa are examples of smart voice-driven interfaces that use NLP to respond to voice prompts and perform various operations, such as finding a specific store, telling us the weather forecast, suggesting the best office route, or turn on the lights at home. Having an insight into what is happening and what human beings are talking about can be very valuable to financial traders. NLP is used to track information, reports, remarks about possible mergers among companies; the whole thing can then be included in a trading algorithm to generate big profits. Remember: buy the rumor, sell the news. NLP is also used in the search and selection phase of talent recruitment to determine the skills of potential employees and even discover potential customers before they are active in the job market. ® 83 Chapter 2 Understanding Data Interpretation by Machines Powered by IBM Watson, NLP technology, LegalMation, developed a platform to automate routine litigation tasks and help the legal team save time, cut expenses, and move strategic focus. NLP is predominantly booming in the healthcare industry. While medical institutions are increasingly adopting electronic health records, this technology is improving care services and disease diagnosis and reducing costs. Through enhanced clinical records, patients can be better understood and benefited through better healthcare. The purpose should be to optimize their experience, and several agencies are already working on this. Advantages of NLP Users can get answers within seconds of any questions about any topic. The NLP system provides answers to questions in natural language. The NLP system provides accurate answers to questions without unnecessary or unwanted information. The accuracy of the answer increases with the amount of applicable information provided within the question. NLP procedure allows computers to communicate with humans in their language and scales other language-related tasks. It allows users to perform more language-based data compared to a human being in a fatigue-free, unbiased, and consistent way. NLP process helps in building highly unstructured data sources Disadvantages of NLP Complex query language – If the question is not clear or is ambiguous, the system may not be able to provide the correct answer. The system is only built for a single specific task; due to limited functions, it cannot adapt to new fields and problems. NLP system doesn’t have a user interface or features that allow users to further interact with the system. Computer Vision (CV) Prof. Fei-Fei Li defines computer vision as “a subset of mainstream artificial intelligence that deals with the science of making computers or machines visually enabled, i.e., they can analyze and understand an image.» Computer vision (CV) is a process (and a branch of computer science) that includes capturing, processing, and analyzing real-world photos and videos to allow machines to extract meaningful, contextual information from the physical world. How Does Computer Vision Work? In ML, computer vision is used for deep learning to study the data sets through annotated images that show an object of interest in the given image. Computer vision can understand the patterns and identify the visual data feeding thousands or millions of photos which have been categorized for supervised machine learning algorithms training. A simple example may be finding the edges in a photograph. You need to use ® 84 Chapter 2 Understanding Data Interpretation by Machines a kernel to duplicate the differentiation function in the brightness values in the pixels and then set a threshold wherein the derivative is high - EDGES. This necessitates the use of various software techniques and algorithms that enable the computers to establish patterns in all elements related to the labels and make accurate future predictions. Computer vision can be utilized most effectively by conducting image processing through machine learning. Computer vision leads to pattern recognition, detecting shapes, etc. How is Computer Vision Different from Image Processing? Both are parts of AI technology and are used for processing the data and building a model. Computer vision differs from image processing as it facilitates high-level information from photos or videos. In computer vision, an image or video is used as input, and the goal is to understand (including being able to infer something about it) the image and its content. Computer vision uses image processing algorithms to solve some of its tasks. Computer Vision and picture processing are distinct. It deals with studying the picture - find the numerous components within the photograph, find the edges, etc. It is a superior form of image processing wherein the input is an image, but the output isn’t an image; it is an interpretation of the image. The image process task includes filtering, edge detection, noise removal, and color processing. In complete processing, you receive an image as input and extract another image as an output that may be used to train the device via Computer Vision. The predominant difference between Computer Vision and Image Processing is the goal (not the strategies used). For example, if the goal is to upgrade the image quality for later use, it is called image processing. If the aim is to visualize like people, detect defects, recognize objects, or automate driving, then it is referred to as Computer Vision. Applications of Computer Vision Computer vision core concepts are already being incorporated into every-day products. CV in Self-Driving Cars Computer vision permits self-driving vehicles to make sense of their surroundings. Cameras take videos from exclusive angles around the vehicle and feed it to the Computer Vision software, which then processes the pictures in real-time to find the extremities of roads, read traffic signs, find other cars, things, and pedestrians. The self-driving vehicle can then steer on streets and highways, avoiding barriers and safely driving its passengers to their destination. CV in Facial Recognition Computer Vision also performs a crucial function in facial recognition applications, the technology that allows computers to match photographs of humans’ faces to their identities. Computer Vision algorithms identify facial features in pictures and compare them with databases of fake profiles. Consumer devices use facial recognition to verify the uniqueness of their owners. Social media apps use facial recognition to find and tag ® 85 Chapter 2 Understanding Data Interpretation by Machines users. Law enforcement companies also depend upon facial recognition technology to pick out criminals in video feeds. CV in Augmented Reality and Mixed Reality Computer Vision plays a crucial role in augmented and mixed reality, the technology that permits computing gadgets like smartphones, tablets, and smart glasses to superimpose and embed virtual items on real-world imagery. Using Computer Vision, AR equipment detects objects in the real world to decide the places on a device’s display to place a digital object. For instance, Computer Vision algorithms can assist AR applications to locate planes like tabletops, walls, and floors, by establishing depth and dimensions and placing virtual gadgets in the physical world. CV in Healthcare Health-tech has witnessed some excellent benefits of computers. With the help of computer vision algorithms, its algorithms can help automate tasks such as finding symptoms in x-ray and MRI scans or detecting cancerous moles in skin photos. Challenges of Computer Vision Inventing the machine that sees as a human does is a deceptively tricky assignment, not just because it’s hard to make computer systems do it, but because you don’t know how human vision works. Studying biological vision needs an understanding of the perception of the organs like eyes, and also the interpretation of that perception within the brain. Much development has been made, both in charting the procedure and discovering the tricks and shortcuts utilized by the system. But like any other brain-related study, there is a long way to go. Source: Mike Tamir Numerous renowned computer vision applications involve identifying things in photographs. In a picture Computer Vision involves object: ® 86 Chapter 2 Understanding Data Interpretation by Machines Classification: The broad category of the object Identification: What is the object like? Verification: Can you see the object in the photograph? Detection: Where did you locate the object? Landmark Detection: Any critical points to detect the object. Segmentation: The pixels in the object. Recognition: What objects are there, and where are they? ® 87 Chapter 2 Understanding Data Interpretation by Machines Outside of just recognition, other methods of analysis include: With computer vision, video motion analysis can estimate the speed of objects in a video, or the camera itself. With image segmentation, algorithms separate images into many sets of views. A 3D model of a scene inputted through images or videos is created with scene reconstruction. With image restoration, noise (like blurring) is removed from photos using ML-based filters. Any other application that involves understanding pixels through software can safely be labeled as Computer Vision. How to Choose the Model The models need to be accurately selected to get correct results. To select the model, clear objectives like “What are you forecasting? What are the success parameters? What is then forecast horizon?” must be known. The subsequent step is to research if the dataset is stationary or has a steady variable over-time or non-stationary. This helps in identifying the correct forecasting model. Source: Datalytyx The process leads to accurate analysis predicting the statistical properties relating to the past. PyTorch (Python Library) PyTorch is an open-source machine learning library based on the Torch library, used for applications such as Computer Vision (CV) and Natural Language Processing (NLP), mainly developed by Facebook’s AI Research Lab (FAIR) as it provides dynamic computational graphs in the Recurrent Neural Network (RNN) model. It is one of the most preferred libraries among ML users, as it makes complex architectures built upon conveniently. PyTorch is reasonably easier to learn than other deep learning frameworks as its syntax and application are similar to many conventional ® 88 Chapter 2 Understanding Data Interpretation by Machines programming languages like Python. PyTorch is a deep learning research platform that provides maximum flexibility and speed. It is also a replacement for NumPy to use GPUs’ power. PyTorch’s documentation is also well-organized, which is valuable for beginners. ® 89 Chapter 2 Understanding Data Interpretation by Machines Features of PyTorch The features of PyTorch are as follows: Simple Interface Hybrid Frontend Distributed Training Native ONNX Support C++ Frontend Cloud Partners Popular PyTorch Projects CheXNet: Uses deep learning to perform radiologist-level pneumonia detection on chest X-rays. (https://stanfordmlgroup.github.io/projects/chexnet/) PYRO: Pyro, supported by PyTorch on the backend, is a universal probabilistic programming language (PPL) written in Python. (https://pyro.ai/) Horizon: Horizon is a platform used for applied reinforcement learning (Applied RL) (https://horizonrl.com) Pros and Cons Pros Easier to learn and use. Lots of software modules are available Easy to integrate custom layer types and executes on GPU Cons No commercial support as it is an open-source You usually write your training code (Less plug and play) TensorFlow (Python Library) Google created TensorFlow to replace Theano. It is an open-source artificial intelligence (AI) library, using data flow graphs to build models. Dataflow is a programming model used for parallel computing. It allows developers to create large-scale neural networks with many layers. TensorFlow, a symbolic math library, is also used for machine learning applications such as neural networks. It is a comprehensive end- to-end ML platform. So, if you are willing to get some hands-on in ML, this is where you may start. The Google Brain team products are used in Google Photos, Google Search, and Google Cloud Speech. It is primarily used for classification, perception, understanding, discovery, prediction, and creation. Popular TensorFlow Projects Magenta: Magenta is a distributed open-source Python library, which is powered by TensorFlow. The Magenta library has utilities for manipulating the source data, mostly images and music. This data is used to train Machine Learning models and generate new content from these models. (https://magenta.tensorflow.org/) Sonnet: Built on top of TensorFlow, Sonnet is a library used to build complex neural networks. (https://sonnet.dev/) ® 90 Chapter 2 Understanding Data Interpretation by Machines Ludwig: Ludwig is a toolbox for training and testing deep learning models without writing code. (https://uber.github.io/ludwig/) Pros and Cons Pros Python implementation of Numpy library (a large collection of high-level mathematical functions) Computational graph abstraction, similar to Theano Faster compilation than Theano Inbuilt TensorBoard (dashboard) for ML. metric visualization Support for data and model Parallelism Cons Slower and less user-friendly than other frameworks like Keras Much bulkier than Torch, as it has both high-level and low-level APIs Limited pre-trained models available No commercial support CNTK CNTK or “Computational Network Toolkit” is Microsoft’s open-source deep learning framework. It is a system for describing, training, and executing computational networks. The library contains feedforward deep neural networks (DNN), convolutional networks, and recurrent networks. It provides a Python API based on C++ code and has not adopted any standards, so it is not licensed for commercial use. DSSTNE (Deep Scalable Sparse Tensor Network Engine) Like other companies, Amazon has its own set of libraries and APIs that can be used for machine learning and deep learning. Although DSSTNE was released after TensorFlow and CNTK, there is no backup from Amazon that can be used for AWS backup. It is mainly written in C++, DSSTNE looks fast, although it does not attract many people like other libraries. Keras (Python Library) Keras, one of the best Python libraries, is an open-source neural network library written in Python. It is used for deep learning and can run on TensorFlow, Microsoft Cognitive Toolkit, R, and Theano, and it acts as a wrapper for these. Designed to conduct fast experiments through deep neural networks, emphasizing user-friendliness, modularity, and scalability. Pros and Cons Pros It is an Intuitive API inspired by Torch It works with Theano, TensorFlow, and Deeplearning4j backends (CNTK backend to come) It has a fast-growing framework The assumed standard for Python API for Neural Networks ® 91 Chapter 2 Understanding Data Interpretation by Machines Cons No commercial support as it is an open-source LightGBM (Python Library) LightGBM is a gradient boosting framework, a tree-based learning algorithm, and one of the most popular machine learning libraries for developing new basic models and decision trees. It has both fast and effective implementation methods. It supports parallel and GPU-based learning at the same time and can handle large-scale data. Scikit Scikit-learn is a free Python machine learning library. It has a variety of algorithms, such as support vector machines, random forests, and k-nearest neighbors. It supports Python numerical and scientific libraries such as NumPy and SciPy. Neural networks API Easy and fast prototyping Convolutional networks support Recurrent networks support Runs on GPU SystemML Apache SystemML is a flexible machine learning framework system that can be automatically extended to Spark and Hadoop clusters. The salient feature of SystemML is: through the customizability of algorithms in languages like R and Python, it also performs various AI tasks, including descriptive statistics, classification, clustering, regression, matrix decomposition, and survival analysis. It also supports vector machines. However, SystemML development requires additional deep learning GPU capabilities, such as importing and running neural network architectures and pre- trained models. Therefore, it cannot independently maintain the AI development process. OpenCV-learn OpenCV – Open-Source Computer Vision Library – is a programming function library for real-time computer vision. It was developed by Intel and later supported by Willow Garage and Itseez. It is a cross-platform library and can be used free of charge under the open-source BSD license, a valuable library for educational and commercial use. The OpenCV software library has more than 2500 optimized algorithms, along with an exhaustive set of both classic and State-of-the-art (SoTA) computer vision (CV) and ML algorithms. OpenCV algorithm functions include detection and face recognition, object recognition and tracking, human action and movement detection video, camera movement tracking, 3D model extraction, image/visible stitching, image search, Red- eye removal, video processing watermark, and overlay, among others. It has C++, C, Python, and Java interfaces and supports Windows, Linux, Mac OS, iOS, and Android. When designing OpenCV, the main focus is on real-time applications to improve computational efficiency. Point Cloud Library The Point Cloud Library or PCL is an open-source library of algorithms for point cloud processing tasks and 3D geometry processing like 3D computer vision. The library ® 92 Chapter 2 Understanding Data Interpretation by Machines contains algorithms for feature estimation, surface reconstruction, 3D registration, model fitting, and segmentation. It has many SoTAs, including filter feature estimation, surface reconstruction, and segmentation. ROS (Robot Operating System) ROS is robotics middleware. Although ROS is not an operating system (OS), it provides services designed for heterogeneous computer clusters, such as hardware abstraction, implementation of common functions, low-level device control, message transfer between processes, and package management. It takes input from various sources to make decisions. Therefore, ROS is the key to any form of requirements for robot development. MATLAB MATLAB, developed by MathWorks, is a multi-paradigm numerical computing environment and proprietary programming language. Therefore, many mathematical symbols are used to generate various graphic elements, including charts and graphs. MATLAB allows matrix manipulations, algorithm implementation, plotting of functions and data, user interface creation, and interface with programs written in other languages. It is a high-performance language for technical computation. It integrates calculation, visualization, and programming in an easy-to-use environment, where problems and solutions are expressed in familiar mathematical symbols. It is also widely used in data analysis, deep learning algorithms, computer vision, and signal processing. Typical uses include mathematics and calculations. CUDA (Compute Unified Device Architecture) The CUDA graphics processing unit or GPUs is dedicated hardware set for complex calculations of massive data sets and processing images. It is a parallel computing platform (PCP) and application programming interface (API) model created by Nvidia. It enables devel