Math for Data Science PDF
Document Details
Tags
Summary
This document introduces the concept of data science and its related subjects such as math and statistics. It discusses the stages of data analytics maturity, and emphasizes the importance of business understanding in data science projects. It also introduces data collection techniques and typical tasks within data science.
Full Transcript
Math for data science Lesson 1 Data Science [Matrematic & Statistics for Data Science] Data Science > Our goal...
Math for data science Lesson 1 Data Science [Matrematic & Statistics for Data Science] Data Science > Our goal - ↳ 4 Collect (audience) I what's Competitiveness & use data about customers , trending , discover insights (pattern) that can help us improve the market. ourselves way what is Data Analytics ? A of data for I systematic analysis discovery , interpetation communication of meaningful patterns. · , focuses happen & what future explaining why something · will in happen. · It involves computer skills math I statistics. , Y Stages of Data Analytics Maturity > - [DDPP) Descriptive Diagnostic Prescriptive · ·. Predictive making Decision Data science always begin with Business Understanding Data Science Lifestyle - [DCPEMU] Define Collect Extract Model · · · Process · ·. Use ↳Collect you've set your goal , next step collect data After Traditionally Obtain is data explicity through · : , · data can be : structured (customer records transaction) sales questionnaires and , survey. : Unstructured (images , videos , text messages ↳ [for machine Filtering idea to make data more understandable : human user [FT] · Typical tasks : · Filtering Cleaning up noise : I stuff irrelvent to your question Tranformation making your meaningful : data more · ↳ Extract A task often very important negleted = involves exploring I extracting important element from data to reduce complexity = Typical tasks : Selection Choose which feature = · : to use [SAD] form Aggregation Combining two feature to · a new are : or more Performing math statistical technique to convert data into something meaningful : Decomposition more · ↳ Model statical/machine to learning technique get data to answer Apply your question = ↳ 3 common Clustering : ways · [CCR] Classification · Regression · 1) Use = use the result of hard work to meet our goal in visualization Usage = ~ use result to automate task like : Recommend content Monitor 1 improve customer engagement · to users · · Alert important activities /event · Take action on behalf of users Lesson 2 [Basic Tool for Data science) check Lesson 2 Powerpoint Python Libraries [MONTPP] Matrics & number MatplotLib MATLAB Numpy · : · : Pandas Data analysis OpenCV : images · Pillow · : , Flow Machine learning Tens or Pytorch · : , Os Files , pathlib : system · - Excel Formulas [CLAST1] formulas Sum addition subtant multiply division Logical Function And , or, not cases · Common · = = , , , , function · Text = Left concat , Function if right Ian Average median mode · mean average = , , , , , · function Vertical/horizontal lookup Statistical Function Min Lookup quartile , standard deviation , correlation · = data to get the = , max , [Basic Lesson 3 Probability ineory statistical methods about make inference by studying relatively small from it. · Basic idea : a population a sample chosen objects / outcomes about which information Apulation is entire collection is sought · A sample subset of containing objects/outcomes that actually observed. · is - a population , Preliminary concept understand probability , start sample space & event for observation by understanding · terms an The · set of all possible outcomes of experiment is called the space le for the experiment - · A subset of is called event. a sample space an - V = Union Intersect union everything 1 : = S Intersection : common probabilities · Given & event A any experiment : any PCA) denotes event A · Expression the probability that occurs · PCA) is the proportion times that event A would occurs the long run, the experiment were be repeated over I over again. Axioms of Probability · : Let s be a sample space. Then P(s) = 1 For event A 01P(A) E any · , · If A lb exclusive events the PCAUB) p(A) PCB) mutually are = + , (10 + 3 + 5) only = i (10 + 3+ 5+ y + 13) = 35 Addition Rule If A 1 PCAUB) B are mutually exclusive events then : P(A) + p(B) · = , let A & B be then : P(AUB) p(B) P(AnB) generally events P(A) · more + any = = , , [counting] ? Permutation of collection of A permutation is ordering objects · an The number of permutation of object is ! · n n of of from object : permutation objects of is given by · The number R chosen group a n /ka (n n - ! 1) ! P(n, r) : nPr = 0! = 1 - - - Combination ! (n) n A combination is selection of distinct of without objects, regards order · a to group. = The number of combinations of objects of objects K! (n 1) ! given by - from is · : chosen grap a n -permutation conditional Probability P(AnB) Probability · is based on part of a sample vace P(A(B) = PCB) events with P(B) Conditional of A denoted PCAIB) is : probability Let A & B = 0 given B · be , ,. Independence that one event has occurred does not knowledge change probability that another event occurs · In this events said to be independent · case are , If PCA) 0 and PCB) independent if · + + O the A l B are : , P(BIA) = P(B) and equally , PLAIB) = PCA) Theorem Bayes · Provides a formula that allows us to calculate one of the conditional probabilities if we know other one of conditional PCAB) By definition probability PCA(B) · : = P(B) P(BIA) PCA) · We can subsitiute PCBJA) PCA) for PCANB) : PCAIB) = P(B) · Lesson 4 [Descriptive Statistics] statistical measures [MMMSRV] Median : Middle value tendency (Average) of the values in dataset · Mean : Central. · in a list of dataset. Mode : value that the most in dataset standard Deviation : distance with value Average · appear a mean value lowest number variance : Squared average distance from the mean Range : Difference highest&. · · between - Type of Data Series [DIF] Frequency Individual Distribution series. 7 Stage. 2 Discrete Series. 3 Statistics for Individual Descriptive Series Statistics for Individual Descriptive Series raw data-mean to get Remember - M Population mean = - = sample mean Population Standard Deviation = - s Sample Standard Deviation = Descriptive Statistics for Frequency Distribution Series Graphical Representation for Numeric Data values Box far from most of data data plots : Diagram show how the extreme values how close values are to them. · or : of numerical data histogram Representation the distribution · Box Plots (Whisker Plots · based five-number of dataset [Minimum, first quartile median third quartile , I maximum) summary on , , Suitable Easy · multiple datasets interpret comparing to · · Interquartile Range (IQR) : Difference between the Third quartile & first quartile. Also represent middle 50 % of total data ↳ 1QR = Q3 - Q) - Outilers significantly · : Data that different from most of other values· ↳ Outilers = Greater Than 93 + (1 5. x 1QR) or Less than Q7-(1 5X1QR). Box Plot Distribution Symmetric /Normal Distribution distance from both maximum equal = = median is values. Distance from median skewed Positively : to maximum is greater than distance from the median to minimum = Negatively = skewed : Distance from median to minimum is greater than distance from the median to maximum Histogram Similar to charts but not the same Represent the distribution of numerical data · bar · ,. ↳Class of intervals , frequency each class Easier to box plots for datasets Suitable shape & spread of distribution · displaying · interpret than large. a Distribution [MBSTENDC) Histogram Normal Distribution : A symmetric, bell shaped symmetrical data arand · curve average · Skewed Distribution : data concentrated side with tail More on one , extending to the right different data dataset. Bimodal Distribution Two distinct groups/ processes : peaks representing within the · : (Plateau) Distribution contain several peaks, dataset with multiple data processes/source of data Multimodal indicating · a additional often Edge Peak Distribution Similar normal but with large peak tail de data groping histogram : an at one in construction , Comb Distribution Exhibits alternating high & low bars typically caused randing/interval histogram construction · : data issue in , by Truncated (Heart-Cut) Distribution Similar to normal with its tail sampling/quality : often selective data control missing due to · processes. Food Distribution lack of data around the with concentrations near the Dog shows mean · upper....... - , When to Use Box plots or Histogram useful for dataset 1 Box plots comparing multiple displaying the spread I skewness of the data · are distribution of data & of distribution Histogram determining shape I spread are better for the · display the a Limitations of Box plots & Histograms Box detailed than I not show the full distribution of the data be less histogram · plots can may affected bin size & not accurately represent the choice of distribution if the bins underlying · be Histogram are can by may tren or small. too large Lesson 5 [probability Distribution Random variables Random variables numerical assigns value to each outcome in sample space · a a discrete & continues · There are types 2 : A discrete random variable is whose possible values can be ordered I there are between adjacent value one gaps - , Possible values of continous random variable always contain interval all the points between some two numbers - a an , (Discrete) Probability Distribution list of of discrete random variable X for complete description of · possible value a , along with probabilities each , provides a the from which X population is drawn This is known distribution probability · as distribution of x) Probability discrete random variable X the function p(x) P(X · a is = = A cumulative distribution function that * is less than / equal to value i e F(x) P(X(x) specifies probability given. · = the a. , Distribution (Continuous Probability Random variable continuous if probabilities areas under given by · is are a. curve The distribution) for random variable called probability density function (or probability · curve a ⑧ Cumulative Distribution Function Let be a continuous random variable with function f(x) cumulative distribution function of X is: probability density · X , Bernalli Trial for "failure" · experiment that can result in one of 2 outcomes, example "success" & of "success" probability is denoted by - p of "failure" probability is therefore 1-p -. trial called Bernoulli trial with success Such a probability p. · a Bernalli Distribution Otherwise X. 0 For if experiment result 1 = Bernalli trial define random variable X is success then X= any ; · we a were , , , It follow that X is discrete random variable with probability distribution p(X) defined by : · a , p(0) = P(X = 0) = 7 - P P(X p() 1) = = = p · The random variable X is said to have the Bernalli distribution N Bernalli Trials "success" practice, might several samples / can't the number of them · In we very large take from lot a among This amants several independent Bernalli number of successes conducting trails & canting the · to · The number of success is then a random variable which is said to have a binomial distribution Binominal Distribution · If a total ofn Bernalli trials are conducted and: , trials independent· Each trial The are has the same probability - success p - X is the number of successes in the trials X binominal distribution with parameter no · has the p Normal Distribution · Normal distribution also called Gaussian distribution is by far the most commonly used distribution in statistic. This distribution model not all, populations provides many , although · for continuous a good l a given by : The function of variance is · normal random variable with probability density a mean it The 2-score normal items Dealing with population, we often convert from unit which the population were originally · in measured to standard mits · If x is an item sampled from a normal population with mean & variance -2 the C , standard mit equivalent & of X is number 2 where : number-mean & - sd number is called the z-score - z X = 10 8. 10 8-16. > 1 3 M = 10. 0 = 1. 3 Central Limit Theorem draw large enough sample from a population , then distribution of sample mean is approximately normal, no matter · a drawn from what population sample was though population from for compute probabilities table sample using · Allows us to sample means 2 which the was , drawn is not normal. Jointly Discrete · If x& Y are jointly discrete random variables, joint probability distribution of X1T is : p(u y) P(X xmdY y) = = = , distribution of XSofY marginal probability · : PX(x) = P(X = x) = Ep(x y) , Pi(y) P(i y) = p(u, y) = = = Jointly Continuous If X & are jointly continous random variable the joint probability distribution of X & Y is : · y , Marginal probability · distribution of X & of Y : Point Estimators calculated from data generall , quantity is called statistic · a a statistic that used to estimate unknown constant, parameter , is called point estimator/point estimate · an or For if X Xn random sample from population example · ,..... is a a. , y,9 is often used to estimate the The sample mean X population mean - The sample variance s is often used to estimate population variance a - Confidence Interval Xn (n > 3) random sample from with Let X 1 be large population · a a...... normal. Then pl standard deviation that approximately mean o so X is , a level 700 (1-x) % confidence interval for is · where UX = /2n When the value of a is unknown it can be replaced with the sample standard ,. deviation. S Hypothesis Testing test to determine certain we can be about · A now a hypothesis. be about : hypothesis · can test population mean -Paired data-Chi-square - population proportion variance difference between two / proportions - - - means. Alternative Null VS Hypothesis possibilities population mean is /equal I sample is lover because : Two actually greater than to 100 mean only · · , of random variation from population mean than 100 & sample reflects this fact Cie emission really reduced population mean is actually less · mean. , Null The effect indicated & hypothesis by sample is due only to random variation between sample The population · - indicated represent whole Alternative hypothesis the effect by sample isreal, that it accurately the population - · steps of Hypothesis Testing Define null I alternative HOSHI hypotheses · , · Assume Ho to be true Compute test statistic Test statistic is statistic used to access the strength of evidence against HO. · a. of test statistic P-value is the that test statistic would Compute the P-value value disagreement · probability have a whose. with MO is great as / greater than that actually observed concluded about of evidence strength against · no. Lesson 6 [Exploratory Data Analysis] what is Data & Collection of data objects& their attributes · of An Atribute is property characteristic of object e g eye color temperatue · · or an person,.. · A collection of attributes describes an object also known objects · are as records , sample /instances Attribute Properties [MOAD] · Distinctness = # · Addition + · Order · multiplication Properties of Attributes [ROIN] · Nominal : - distinctiness · Interval : - distinctness orders addition , · Ordinal : - distinctness & order · Ratio : - all 4 properties Discrete VS Continuous · Discrete Attributes Continuas Attributes attribute value finite/cantably infinite set of values Has only Has real number · as · a set of words Example : temperature , height weight · Examples zip : codes cants or or · , , often variable represented integer Typically floating points · represented · as as Types of Dataset [ROG] t · Record Data Set : - DataMatrix- Document Data - Transaction Data · Graph Data Set : - Would wibe web - molecular structure Ordered Data Set : Spatial Data Temporal Data Sequential Data Sequence Genetic - ·