MATM111 Reviewer - Statistics PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides a review of MATM111, covering topics in statistics such as measures of central tendency (mean, median, mode), measures of variability, normal distribution, and correlation. The document includes examples and formulas. It's designed to help students prepare for the course or exam.
Full Transcript
MATM111 COVERAGE Introduction of Statistics Measures of Central Tendency Measures of Relative Position Measures of Variability/Dispersion Normal Distribution Correlation Mathematics in the Modern World Lesson 1: Introduction of Statistics...
MATM111 COVERAGE Introduction of Statistics Measures of Central Tendency Measures of Relative Position Measures of Variability/Dispersion Normal Distribution Correlation Mathematics in the Modern World Lesson 1: Introduction of Statistics Statistics — from the Latin word “status” which means state. — From the ancient times, statistics was used by state leaders to know how much tax to levy their subjects and how many soldiers are needed in an expected war. — In capitalism, no also the leaders of the state but also capitalists, are interested in statistical surveys resulting to increased demand for data processing for their increasing benefits such as insurance. — DATA : in statistics is always a result of experiment, obser vation, investigation and other means and often appears as a numerical figure and then evaluated to make it into useful knowledge — Branch of Mathematics that deals with collecting, analyzing and interpreting data — Branch of Mathematics that transforms data into useful information Types of Statistics Descriptive Statistics Inferential Statistics — Collecting,summarizing, describing — Drawing conclusions and/or making data. decisions concerning a population based on — deals with the collection and the sample data. presentation of data and collection of — deals with the predictions and inferences summarizing values to describe its based on the analysis and interpretation of group characteristics. the results of the information gathered by — The most common summarizing the statistician. values are the measure of central - Some of the common statistical tools of tendency and variations such as: inferential statistics are the t-test, z-test, mean, median, mode correlation, ANOVA TWO MAJOR GROUPINGS IN TERMS OF VARIABLES / DATA What is Variable ? — a numerical characteristic or attribute associated with the population being studied. —They are further classified as categorical or qualitative and numerical or quantitative. Quantitative Data (numerical) — Data that can be measured with numbers, such as distance, duration, length, speed Types of Quantitative Data 1. Discrete variables 2. Continuous variables - values obtained by counting. - values obtained by measuring, all of - Whole numbers only, walang decimal. which cannot be put into a list because - Ex. Number of students they can have any value in some inter val of real numbers. - Ex. Length, height, kg, average of speed Qualitative Data (categorical) — Non - numerical data that is usually textual and descriptive, like “mostly satisfied, “brown eyes”, “female”, “yes/no” Scales of Measurement - subdivided into four categories and upon drawing inferences on a random sample, the type of measurement scale must be carefully chosen. Qualitative Data Quantitative Data A. Nominal C. Interval - classifies elements into t wo or more - in addition to ordering scores from high to categories or classes, the numbers low. indicating that the elements are different - establishes a uniform unit in the scale so but not according to order or that any equal distance bet ween t wo scores magnitude. is of equal magnitude. - Ex. Color, Gender, Country, Religion - There is no absolute zero in this scale. - Ex. Temperature, IQ , Likert scale B. Ordinal D. Ratio - a scale that ranks individual in terms of - in addition to being an inter val scale, it also the degree has an absolute zero. to which they possess a characteristic of - Ex. Weight, Length, Height, Rates, Speed of interest car - Ex. Agree and disagree, Military Ranks, Top 1-10, Educational attainment, Miss Universe Rank Population and Sample Population Sample — defined as groups of people, animals, — a subgroup of the population. places, things or ideas to which any — The measurable quality is called a conclusions based on characteristics STATISTIC. of a sample will be applied — The sample is a subset of the — The measurable quality is called a population. PARAMETER — Reports have a margin of error — The population is a complete set. and confidence inter val. — Reports are a true representation of — It is a subset that represents opinion the entire population — It contains all members of a specified group. Parameter STATISTIC — A numerical measure — A numerical measure that describes a that describes a characteristics of characteristics of the population the sample "myn" "Sigma" · Measures of Central Tendency To describe a whole set of data with a single value that represents the middle or centre of its distribution is the purpose of measure of central tendency (measures of centre or central location). Mean (Arithmetic To put in other words, it is a way to describe the center of mean or average) a data set. Median It lets us know what is normal or 'average' for a set of Mode data. it allows the comparison of one data set to another, as well as one piece of data to the entire data set. It also condenses the data set down to one representative value, which is useful when you are working with large amounts of data. Mean -X Formula : — the sum of all the values in the observation or a dataset divided by the total number of obser vations. X — Most stable measure. — This is also known as the arithmetic average. = — Used for both continuous and discrete numeric data as well as for categorical data, as the values where: cannot be summed. x = mean - — Includes every value in the distribution the mean & summation is influenced by outliers (which are numbers that x = data are much higher or much lower than the rest of the n = total number of data set) and skewed (asymmetric) distributions. data —applicable to use for ratio and inter val data. - Median - X — value that divides ranked data points into halves; — Midpoint. — considered as the physical middle point in adistribution because it is located at the center position when the values are arranged in ascending or descending order, which in turn divides the distribution in half (there are 50% of obser vations on either side of the median value). — If a distribution has an odd number of obser vations, the median value is the middle value. — If it is an even number, the median value is the mean or average of the t wo middle values. — usually the preferred measure of central tendency when the distribution is not symmetrical because it is less affected by outliers and skewed data than the mean. — cannot be identified for categorical nominal data, as it cannot be logically ordered. — This is widely used for ordinal type of information. — Arrange the given either from ascending to descending order, then find the one in the center. Mode - - X — The mode can be found for both numerical and categorical (non-numerical) data. It is the most commonly occurring value in a distribution. — There can be more than one mode for the same distribution of data, (unimodal, bimodal, trimodal,or multimodal), thus limiting the ability of the mode in describing the center of the distribution. - In some particular cases, the distribution may have no mode at all (i.e. if all values are different). In such case, it may be better to consider using the median or mean, or group the data in to appropriate inter vals, and find the modal class. - Simply find the most recurring value the center. EXAMPLE: Solve for Mean, Median, and Mode of the following data set. 1. 24, 33, 18, 40, 29, 37, 19, 25, 32, 39, 44, 40, 35 notes: step 1: arranged first; either in SOLUTION: #13 ascending or descending 12345678910111233 18, 19, 24, 25, 29, 32, 33, 35, 37, 39, 40, 40, 44 Carranged solve : mean (x) 39 24 + 25 + 40 + 40 +44 29+ (19 + 33 35 + 37 32 + + + + Y = x= = * = 31. 92 median ( *) note : if dalawa githa , add lang if may naulit has then divide * nakuhaku,eithas pare · - 33. = count as 1 only sa2 : Ex-1 mode (*) Kung ano lang may * = 40 (unimodal naulit na number note : (model langhavlit Unimoda if is a = , if 3 = trimodal if 49more multimodal = Measures of Relative Position Measures of relative positions are conversions of values, usually standardized test scores, to show where a given value stands in relation to other values of the same grouping. 3 types of QUANTILES: Quartiles; Deciles; Percentiles. Quartile Q = indicated Quartile K k = quartile location (1,2,3) n = number of data values or observations Deciles D = indicated Decile K k = quartile location (1,2,3) n = number of data values or observations Percentile PK = indicated Percentile k = quartile location (1,2,3) n = number of data values or observations note: Q255 = D = P = MEDIAN Given the following data: Example: 87, 95, 77, 82, 90, 89, 78, 85 and 90. Find each of the following. a. 1st Quartile b. Second Quartile c. Fifth Decile d. Seventh Decile e. Fifty-First Percentile 345678g Solution: · arranged first 77, 78, 82, 85, 87, 89, 90, 90, 95 - ⑮ ① 5th decile · 2nd Quartile , 4] [1 [ 50 % - Area 0 8 -0. 4236 24. = 0. 0764 = % X 108 at6 ! 2 t ! , 5. Area to the left of z = 1.28 2 = - Area : 0. 3197 50 % + Area 0 5. + 1. 39] = 0 889. % X100 -12 [09% /. 97 note : dito addition yung sa so % and Area knsi malaki yung area , twan ko pano explain basta tighan niyo graph yung rel marami area so it means malaki yung percent 6. Area to the left of z = -2.5 and z = 1.45 * 2= -2. S ADD = 0. 9203 # ↳ 03 % Example: There are 20 students who took an entrance exam from a certain & university. The average score of the students are 80 and the standard deviation is 10. A. Find the Percentage of students whose scores are: 1. From 50 to 80 2. From 80 to 95 3. Below 75 Jetonggive Given N = 80 0 -10 I ⑦ from 50 + 80 ⑳ from 88 to 95 T XN = 2 2 = N N = O - 80 - 86 z= - 2 : - 18 2 : ⑧ 1. 5 z= = 0 3 Are = O Area : 4337 2 E Area 0. = ↳ = o LAD q Area 0 4337. paramakuna = A DD 0. 4987 # percent - 2 to ! : It 3 Below. 75. (mean 2 = 2 = 75 - 88 -2 bih' - -ht 18 0 5 2 , = -. Area = 0. 1915 since isa lang ulit apply yung 50% 0. 5 - 0 1915. = 0. 3085 X100 ↳. 85 % Correlation and Regression Analysis Correlation Analysis — It is a statistical tool used to determine the degree to which t wo variables (x and y) are related. — Its goal is to see the strength and nature/direction of the relationship bet ween t wo variables. — exists bet ween t wo variables when one of them is related to the other in some way. — the degree of relationship or association (co-variation) bet ween t wo variables. — Scatterplot Diagram : Scatterplot Diagram is a graphical representation of the strength or Degree of Association bet ween t wo variables; the application of rectangular system in locating the coordinates of the t wo variables being investigated No perfect correlationa ↳ &... O (20-14-0--1 L = 0 = 1 Pearson Product — Moment Correlation Coefficient ( PEARSON R.) — Simple Correlation Coefficient (r), also known as the Pearson's r correlation or Product Moment-Correlation Coefficient - measures the strength of relationship bet ween variables. — a technique that is commonly used in determining relationship bet ween t wo sets of data. It is applicable once the data to be compared are measured in terms of inter val or ratio scale. -1 ≤ r ≤ 1 — A parametric tool that can be used to determine relationship or association bet ween t wo variables. First derived by a British statistician named Karl Pearson — Pearson's correlation coefficient r, indicates how far away all these data points are to this line of best fit. Example: J 2 X Day Temperature No of sales. XY 332 = 1294 ↳ h 32 4428 322 i 1024 ! : 342 : 1156 3 128 302 900 = 108 3) 2 = 96 # 322 1024 its : 382 + 900 t T 110 E 24 W 25,e 2x2 : L SOLUTION 42 I 1232 : 15129 15 14400 1202 1664 r = 7 (25753) - (225)(199) 1082 = 12321 - 1112 = 1225 / G7191383) E9)2 #261) 13225 - - 1152 = 00 1102 2 948756 V = 2383 # 98. 842 Hypothesis Testing: Hypothesis — This is a statement of belief used in the evaluation of population values. Is a decision-making process for evaluating claims about a population 2 Types of Hypothesis: 1. Null Hypothesis — is a statistical hypothesis testing that assumes that the obser vation is due to a chance factor. Ho: u1 = u2, which shows that there is no difference bet ween the t wo population means (or parameters). 2. Alternative hypothesis (Ha) is the opposite of null hypothesis; it shows that the obser vation is the result of a real effect. Its states that there is a difference bet ween t wo population means (or parameters). : continue of first Example example * STEP 1 : between temperatured sales Ho-there is no significant relationship Ha-there is significant relationship between temperature a sales STEP 2 : Level of significance a = 05 (given always 0 05) IE DO NOT. 0. df = n - 2 REJECT handt pag 7 1 ITHE it means - pintab =S /we reject the of Critical Value previous v = 0. 98 Therefore , we reject the will hypothesis. Thus , there is significant relationsphip. Spearman Rank — When the entries in as set of data are ranks, the spearman's rank correlation coefficient p (also known as the Spearman's rho) will be used in hypothesis testing. Example: Ten instructors were rated by third- and fourth-year students on their "mastery of the subjeot matter" and the results were tabulated. What is the spearman rho value for the data? Ea get toget ↳ ·4 # solution. 84 5 : no 03 p = 1 - 12 = Regression Analysis — Regression Analysis is a statistical tool concerned with predicting some known variables. — Uses variable (x) to predict some outcome varaible (y) — Tells you how values in y changes as the function changes in the values of x. — Correlation describes the strength of a linear relationship bet ween t wo variables — Linear means "straight line" — Regression tells us how to draw the straight line described by the correlation — Calculates the "best-fit" line for a certain set of data. — Regression describes the regression line mathematically, thus the equation use intercept and slope. Example : i year Enrolment no of 1515 + 1654 ↳ ↳ 352 54 ⑮ : SOLUTION Get the Ad B S As 797 4) - (x) (24) is a. B : n(Ex 2) 18x)2 ⑮46 - 9. b@ Syears y = a+ y = 679 7. + (196 : 9)(5) [664. 2