Data Science Workshop PDF
Document Details
Uploaded by ModestVanadium
St. Joseph's Degree & PG College
Tags
Summary
This document is a workshop presentation on data science, machine learning, and artificial intelligence. It covers various concepts and applications within these fields, including illustrative examples and explanations.
Full Transcript
METAM IT SOLUTIONS WEL COME TO DATA SCIENCE WORK SHOP WHAT IS DATA SCIENCE Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extra knowledge and insights from many structural and unstructured data. It is related to...
METAM IT SOLUTIONS WEL COME TO DATA SCIENCE WORK SHOP WHAT IS DATA SCIENCE Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extra knowledge and insights from many structural and unstructured data. It is related to data mining , deep learning and big data. Data science is a concept to unify statistics, Data analysis, Machine learning, Domain knowledge and their related method in order to understand and analyze actual phenomena with data. ARTIFITIAL INTELLIGENCE, MACHINE LEARNING AND DEEPLEARNING ARTIFITIAL INTELLIGENCE In computer science, artifitial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, unlike the natural intelligence displayed by human and animals. Examples: 1. Computer chess game(1994) 2. Chat boots (2017 Google) 3. 2048 Games WHAT IS MACHINE LEARNING Machine learning is an application of artificial intelligence (AI) that provides system the ability to automatically learn and improve from experience without being explicitly programmed , machine learning focuses on the development of computer programs that can access data and use it learn for themselves The process of learning begins with observation or data , such as example. Direct experience ,or instruction in order to look for patterns in data and make better decisions in the future based on the example that we provide ,the primary aim is to allow the computer learn automatically without human interaction or assistance and adjust actions accordingly. DEEP LEARNING Deep learning is an artificial intelligence function that imitates the working of the human brain in processing data and creating patterns for using decision making. Basics human thoughts is equal to Deep Learning. Neurons. Bio Mimic what is Bio mimic. Ex: Aero plane. Base on Bird. Ex: Deep learning HOW TO LEARN DATA SCIENCE MACHINE LEARNINIG What is machine learning. Types of machine learning. What is the application of machine learning. What are the algorithms. WHAT IS MACHINE LEARNIG If some data is given to you, then you have apply it to a machine in some sequence i.e, like the form of algorithms so that these algorithms are applied to create some other new data nothing but “machine learning”. STEPS OF MACHINE LEARNING Pre process the data. Model the data. Algorithm fixing. Sorting data. Training phase. Testing phase. DATA DRIVEN TECHNOLOGY SUPERVISED MACHINE LEARNING It can apply what has been learned in the past to new data using labeled examples to predict future events. Starting from the analysis of a known training dataset, the learning algorithms produces an inferred function to make predictions about the output values. The system is able to provide targets for any how input after sufficient training. The learning algorithm can also compared its output with the correct, intended output and find errors in order to modify the model accordingly. Conti… Basic concepts. Decision tree induction. Evaluation of classifiers. Rule induction. Classification using association rules. Naïve bayesian classification. Naïve bayes for text classification. Support vector machine. K nearest neighbor. Ensemble method : Bagging and boosting. Summary. Conti… Like human learning from past experiences. A computer does not have experiences. A computer system learn from data, while represent same “past experiences” of an application domain. Our focus : learn a target function that can be used predict the values of a discrete class attribute, example,, approve or not approved and high risk or low risk. The task is commonly called : supervised learning, classification or inductive learn using. DATA AND GOAL Data : A set of data records (also called examples, instances or cases) described by K attributes : A1, A2,…….., Ak. A class : each example is labelled with pre-defined class. Goal : to learn a classification model from the data that can be used to predict the classes of how (future or test) cases/instances. Conti… Learning (training) : learn model using the straining data. Testing :- test the model using unseen test data to be access the model accuracy. Accuracy = Number of correct classification. Total number of test cases SUPERVISED LEARNING Name Gender Output Soujanya Female Yes Mahesh Male Yes Kalpana Male No Ex: supervised learning Salary chat with experience. UNSUPERVISED LEARNING The algorithms are thus allowed to classify , label and /or group the data points contained with in the data sets without having any external guidance in performing task. In other words, unsupervised learning allows the system to identify patterns within data sets on its own. Ex : REINFORCEMENT LEARNING Reinforcement learning is an area of machine learning concerned with how software agent ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigm, alongside supervised learning and unsupervised learning. Ex: INTRODUCTION OF PYTHON What is a python module? How to create python module? How to use python module? Built-in module in python? WHAT IS PYTHON MODULE A module is a simply file containing python code. It may contain functions, classes etc… Ex : HOW TO CREATE A PYTHON MODULE Simply writing a python code in a file is how we can create a module. Ex: def add (a , b) return a + b def sub(a , b); return a – b def prod(a , b); return a * b def div(a , b); return a/b. HOW TO IMPORT PYTHON LIBRARIES We have to use the import keyword to incorporate the module into our program. STEPS TO IMPORT LIBRARIES IN PYTHON 1. Open cmd. 2.Enter to your python location. 3.Type python. 4.Type quite( ). 5.Python-m pip install? BUILT-IN MODULES IN PYTHON Built-in modules are written in c and interpreted using the python interpritor/ Ex: Import sys a = sys. Built in-module-names Print(a) Help(‘modules’) this statement in the python console will give you the list of all the built-in modules in python. TOP TEN PYTHON LIBRARIES 1. Tensor flow. 2. Scikit-learn. 3. Numpy. 4. Keras. 5. PyTorch. 6.LightGBM. 7. Eli5. 8.SciPy. 9. Pandas. 10. Anaconda. NAIVE BAYES ALGORITHM What is naive bayes algorithm. What are the application of naive bayes algorithm. Where we use naive bayes algorithm. How to create naives bayes algorithm. WHAT IS NAIVE BAYES ALGORITHM A process or set of rules to be followed in calculation or other problem solving operation, especially by a computer is known as “algorithms”. A naive bayes algorithm is an algorithm that uses bayes theorem to classify objects. Naive bayes classifier assume strong, or naive, independence between attributes of data point. FORMULA FOR NAIVE BAYES ALGORITHM p(c/x) = p(x/c)*p(c) p(c/x) = Posterior probability. p(c) p(x/c)=likely hood. p(c) =Class prior probability. p(x) =Predictor prior probability. APPLICATION OF NAIVE BAYES ALGORITHM Real time Prediction : As Naive bayes is super fast , it can be used for making prediction in real time. Multi-class Prediction : This algorithm can predict the posterior probability of multiple classes of the target value. Text Classification/Spam Filtering/Sentiment Analysis: Naive bayes classifiers are mostly used in text classification (due to their better results in multi-class problems and independence rule) have a higher success rate as compared to other algorithms. As a result , it is widely used in spam filtering(identify spam e-mail) and sentiment analysis(in social media analysis, to identify positive and negative customer sentiment) WHERE IS NAIVE BAYES USED Naive bayes is most straightforward and fast classification algorithm, which is suitable for a large chunk of data. Naive bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is successfully used in various applications such as spam filtering, text classification, sentiment analysis. HOW NAIVEBAYES ALGORITHM WORKS Lets understand the working of naive bayes through an example. Give an example of whether condition and playing sports. You need to calculate the probability of playing sports. Now, you need to classify whether player will play or not, based on the whether condition. Steps Of Naive Bayes Algorithm: STEP1:Calculate the prior probability for given class labels. STEP2:Find likelihood probability with each class. STEP3:Put these value in bayes formula and calculate posterior probability. STEP4:see which class has a higher probability, given the input belongs to the higher probability class EXAMLE OF NAIVE BAYES ALGORITHM Given tables will help you to calculate the prior and posterior probability. Frequency table Conti….. Now suppose you want to calculate the probability of playing when the weather is sunny. Probability Of Playing: P(yes/sunny) = p(overcast/yes)*p(yes) p(sunny) Calculate prior probabilities: p(sunny) = 3/14 = 0.37 p(yes) = 9/14 =0.64 Calculate posterior probabilities: p(sunny/yes) = 3/9 =0.33 Put prior and posterior probabilities in formula: p(yes/sunny) = (0.33*0.64)/0.36 = 0.6 LINEAR REGRESSION Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable. This form of analysis estimates the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable. Linear regression fits a straight line or surface that minimizes the discrepancies between predicted and actual output values. There are simple linear regression calculators that use a “least squares” method to discover the best-fit line for a set of paired data. You then estimate the value of X (dependent variable) from Y (independent variable). You can perform the linear regression method In a variety of programs and environments, including: R linear regression MATLAB linear regression Sklearn linear regression Linear regression Python Excel linear regression REAL LIFE EXAMPLES : Some real-world examples for regression analysis include predicting the price of a house given house features, predicting the impact of SAT/GRE scores on college admissions, predicting the sales based on input parameters, predicting the weather, etc. Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables). DECISION TREE A decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks. It has a hierarchical, tree structure, which consists of a root node, branches, internal nodes and leaf nodes. During training, the Decision Tree algorithm selects the best attribute to split the data based on a metric such as entropy or Gini impurity, which measures the level of impurity or randomness in the subsets. The goal is to find the attribute that maximizes the information gain or the reduction in impurity after the split. How to choose the best attribute at each node…. While there are multiple ways to select the best attribute at each node, two methods, information gain and Gini impurity, act as popular splitting criterion for decision tree models. They help to evaluate the quality of each test condition and how well it will be able to classify samples into a class. Entropy and Information Gain It’s difficult to explain information gain without first discussing entropy. Entropy is a concept that stems from information theory, which measures the impurity of the sample values. K-MEANS k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k- medoids. How does the K-Means Algorithm Work? The working of the K-Means algorithm is explained in the below steps: Step-1: Select the number K to decide the number of clusters. Step-2: Select random K points or centroids. (It can be other from the input dataset). Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters Conti….. Step-4: Calculate the variance and place a new centroid of each cluster. Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster. Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. Step-7: The model is ready. REAL LIFE EXAMPLES: Clustering is used to identify groups of similar objects in datasets with two or more variable quantities. In practice, this data may be collected from marketing, biomedical, or geospatial databases, among many other places. Real-life examples include spam detection, sentiment analysis, scorecard prediction of exams, etc. For example, a query of "movie" can restore Web pages combined into categories including reviews, trailers, stars, and theaters. An example of cluster analysis would be if a company wanted to use this technique in order to find new markets to target. The company could collect data on potential customers' income, recent home purchases, and location WHAT IS NLP NLP is an interdisciplinary field that uses computational methods to: o Investigate the properties of written human language and model the cognitive mechanisms underlying the understanding and production of written language. o Develop novel practical applications involving the intelligent processing of written human language by computer. NLP plays a big part in Machine learning techniques: o automating the construction and adaptation of machine dictionaries o modeling human agents' desires and beliefs essential component of NLP closer to AI We will focus on two main types of NLP: o Human-Computer Dialogue Systems o Machine Translation Human-Computer Dialogue Systems Usually with the computer modelling a human dialogue participant. Will be able: o To converse in similar linguistic style o Discuss the topic o Hopefully teach DIALOGUE SYSTEM Current Capabilities of Dialogue Systems Simple voice communication with machines o Personal computers o Interactive answering machines o Voice dialing of mobile telephones o Vehicle systems o Can access online as well as stored information Currently working to improve The final end result of human computer dialogue systems: o Seamless spoken interaction between a computer and a human This would be a major component of making an AI that can pass the Turing Test Be able to have a computer function as a teacher MACHINE TRANSLATION Important for: o accessing information in a foreign language o communication with speakers of other languages The majority of documents on the world wide web are in languages other than English Future of Machine Translation Goal: o Aim to be able to flawlessly translate languages Link Human-Computer Dialogue and Machine Translation Have someone be able to talk in one language to a computer, translate for another person Translated Video Chat PROBLEMS: Computers can't deal with ambiguity, syntactic irregularity, multiple word meanings and the influence of context. Time flies like an arrow. Fruit flies like a banana. Accurate translation requires an understanding of the text, situation, and a lot of facts about the world in general. The box is in the pen. The sign is describing a restaurant (the Chinese text, 餐厅 , means "dining hall"). In the process of making the sign, the producers tried to translate Chinese text into English with a machine translation system, but the software didn't work, producing the error message, "Translation Server Error." The software's user didn't know English and thought the error message was the translation. Conti…….. Phonetics and phonology -----The study of language sounds Ecology---------The study of language conventions for punctuation, text mark- up and encoding Morphology----------------The study of meaningful components of words Syntax------------The study of structural relationships among words Lexical semantics----------The study of word meaning Compositional semantics-------The study of the meaning of sentences Pragmatics---------The study of the use of language to accomplish goals Discourse conventions --------The study of conventions of dialogue THANK YOU!!!