🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

Univariate Analyses to Z scores PART 1.pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

MARKETING DATA COLLECTION & ANALYSIS Renaud RenaudLunardo, Lunardo,PhD, PhD,HDR HDR...

MARKETING DATA COLLECTION & ANALYSIS Renaud RenaudLunardo, Lunardo,PhD, PhD,HDR HDR Office: Office:1421 1436 Mail: Mail:[email protected] [email protected] TWO KINDS OF STATISTICS DESCRIPTIVE statistics INFERENTIAL statistics To SUMMARIZE / SIMPLIFY sample’s Are used to MAKE characteristics GENERALIZATIONS (when the information is about the population based available among the whole on the sample’s population characteristics ex: Loyalty cards) SYMBOLS Symbols for Population and Sample Statistics VARIABLE POPULATION SAMPLE Mean μ X Proportion π p Variance σ² s² Standard Deviation σ s Size N n Standard Error of the Mean x Sx Standard Error of the Proportion p Sp X − X −X Standardized Variate (z)  S Coefficient of variation  S  X = TYPES OF DATA CATEGORICAL NUMERICAL Types of data which Data that is measurable, may be divided into can be ordered in either groups ascending or descending order, and the distance between each value is meaningful Ex: Grape variety, Ex: Robert Parker rating, social class, gender, scales, ratios (such as … price, time, height, weight, amount,... ) TYPES OF DATA CATEGORICAL NUMERICAL Types of data which Data that is measurable, may be divided into can be ordered in either QUALITATIVE groups QUANTITATIVE ascending or descending order, and the distance VARIABLE VARIABLE between each value is meaningful Ex: Grape variety, Ex: Robert Parker rating, social class, gender, scales, ratios (such as … price, time, height, weight, amount,... ) EXAMPLES Which Champagne brand do you On a scale from 1 (not like it at prefer: all) to 10 (very much like it), ❑ Veuve Cliquot please rate your evaluation of the ❑ Perrier Jouet following wine: ❑ Taittinger ❑ Pommery 1 ------------------------------ 10 ❑ Other QUALITATIVE QUANTITATIVE VARIABLE VARIABLE VARIABLES NOMINAL variable : a list of categories to which objects can be classified → qualitative rather than quantitative. Religious preference, race, and sex are all examples of such variables. ORDINAL variable : scale that assigns values to objects based on their ranking with respect to one another. → measurements with ordinal scales are ordered in the sense that higher numbers represent higher values, but the intervals between the numbers are not necessarily equal. RATIOS (Mass, length, or time) and Likert/Osgood scales (from 1 to 10 for example) are examples of scales. Univariate analyses 1/ CENTRAL TENDENCY ANALYZING CATEGORICAL DATA The frequency is the number of times the Example: category appears in the data set (n times). The relative frequency is the proportion of « 321 customers (58%) the time that the category appears in the prefer Wine A over data set (% of times). Wine B » ANALYZING NUMERICAL DATA The SAMPLE MEAN of a numerical sample x1, x2, …, xn denoted by x bar represents the average, or the norm. It is calculated using: sum of all observations in the sample x1 +... + xn  x x= = = number of observations in the sample n n The population mean, denoted by μ is the average of all x values in the entire population. The MEDIAN is the middle value which divides the sample (or population) into two equal parts. Graphically, the median is the value on the measurement axis that separates the histogram into two parts with 50% of the area under each part of the curve. QUESTION Q: Why (sometimes) use the median instead of the mean? A: For one very good reason. The median is insensitive to extreme scores, whereas the mean is not. Sometimes, there are just too many extreme scores that would skew, or significantly distort, what is actually a central point in the set or distribution of scores. DISTRIBUTIONS OF VARIABLES One characterization relates to the number of peaks, or MODES, the mode being the most frequent value. It is said to be unimodal if it has a single peak, bimodal if it has two peaks, and multimodal if it has more than two peaks. WHEN DISTRIBUTION MATTERS When the histogram is SYMMETRIC, the When the histogram has a LONGER MEAN and the MEDIAN are EQUAL. LOWER (OR UPPER) TAIL, the few outlying values in the tail pull the MEAN down (or up), so it generally lies BELOW THE MEDIAN. OUTLIERS OUTLIERS / EXTREME VALUES An observation that is substantially different from the other observations on one or more characteristics. At issue is its representativeness of the population. Problematic outliers are not representative of the population, are counter to the objectives of the analysis, and can seriously distort statistical tests. 2/ VARIABILITY THE NOTION OF VARIANCE Measures how far a set of numbers is spread out. → A small VARIANCE indicates that the data tends to be very close to the mean and hence to each other (see red curve) → A high VARIANCE indicates that the data is very spread out around the mean and from each other (see blue curve). THE NOTION OF VARIANCE The n deviations from the sample mean are the differences : (x1 − x ),(x2 − x ),...,(xn − x ) with  (x-x ) = 0 A customary way to prevent negative and positive deviations from counteracting one another is to square them before combining: s² =  ( x − x)² 1 =  ( x − x)² n n STANDARD DEVIATION The STANDARD DEVIATION of a variable (σx) describes VARIABILITY in the distribution (how much the curve spreads out around the mean). → When small, observed values of x tend to be close to the mean (little variability). → When large, there is more variability. In sum, it’s the AVERAGE DISTANCE FROM THE MEAN. To find σx, you have to calculate the square root of the variance: 1 x = n  ( xi − x)² 19 Z SCORES PERCEIVED QUALITY OF THE WINE WTP 1 10 1 5 1 36,75 2 7 2 51,45 3 8 3 58,8 4 4 4 29,4 5 2 5 14,7 6 10 6 73,5 7 4 7 29,4 8 6 8 44,1 9 7 9 51,45 10 7 10 51,45 Z SCORES We often use a z score to compute probabilities : Tells us « how many standard deviations x− we are away from z= the mean »  A z score can be interpreted as giving the distance of an x value from the mean in units of the standard deviation. Example: a score of 1.4 corresponds to an x value that is 1.4 standard deviations above the mean, and a z score of -2.1 corresponds to an x value that is 2.1 standard deviations below the mean. THE TABLE OF THE NORMAL DISTRIBUTION

Use Quizgecko on...
Browser
Browser