M9 - Ch01 Statistics Fundamentals PDF
Document Details
Uploaded by MatureHawkSEye8651
Hong Kong Metropolitan University, Lee Shau Kee School of Business and Administration
Andrew Yam
Tags
Summary
This document is an introduction to statistical fundamentals, covering topics such as the definition of statistics, population, sample, parameters, and statistics. It also differentiates between descriptive and inferential statistics, with examples using weight of chip box and iPhone users. The document further discusses different sampling methods and types of data (quantitative and qualitative).
Full Transcript
## BUS 2000BEF Integrated Business Foundation ### Module 9 ### Decision Making Skills ### Statistics Fundamentals ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Topics * Statistics (An Introduction) * Collection of Data and Sample Design * De...
## BUS 2000BEF Integrated Business Foundation ### Module 9 ### Decision Making Skills ### Statistics Fundamentals ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Topics * Statistics (An Introduction) * Collection of Data and Sample Design * Descriptive Statistics #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Statistics (An Introduction) ### Objectives * Define statistics * Define population, sample, parameters and statistics * Distinguish between descriptive and inferential statistics #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## What is Statistics? * Statistics is... * the study of the collection, organization, presentation and interpretation (analysis) of data * As a whole, we collect data, process and analyze them to achieve a deeper understanding in certain contexts in order to make informed decision * Business Statistics: * Data Analysis -> Aid in making business decision #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Some Key Terms in Statistics ### Population * All items or individuals of interest involved in a survey * A descriptive measure about the characteristic of a population is called a parameter * Commonly used population parameters: * Population mean $(\mu)$ and population proportion $(p)$ ### Sample * A set of data drawn from the population in order to estimate the parameter * A descriptive measure of the sample is called a statistic * Commonly used sample statistics: * Sample mean $(\bar{X})$ and sample proportion $(\hat{p})$ #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Example: weight of chip box * A chocolate manufacturer produces chocolate chips that are packed in boxes. The net weight printed in each box is 300g. The supervisor of the company wishes to ensure that this specification will not violate the Trade Descriptions Ordinance. She has randomly selected 450 boxes and weigh each of the boxes and their mean weight are found to be 305g. * Population: all the boxes of chocolate chips produced * Concerned parameter: the mean weight of all the boxes of chocolate chips produced $(\mu)$ * Sample: the 450 boxes of chocolate chips randomly selected * Statistic to be studied: the sample mean weight $(\bar{x} = 305g)$ #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Example: iPhone users * The marketing manager of a phone case manufacturer wishes to find out whether iPhone is the most popular choice among smart phone owners. He has randomly selected 500 smart phone owners and finds that 325 of them use an iPhone. * Population: all smart phone owners * Concerned parameter: the proportion of iPhone users within the population $(p)$ * Sample: the 500 smart phone owners that are randomly selected * Statistic to be studied: the sample proportion of iPhone users: *$\hat{p} = 325/500 = 0.65 = 65%$ #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Population Parameter and Sample Statistics | Population | Sampling | Sample | |---|---|---| | | | | | Parameter=? | calculation statistic | | | Inference | | | Population parameters | Sample statistics | |---|---| | Pop. Mean $(\mu)$ | Sample Mean $(\bar{x})$ | | Pop. proportion $(P)$ | Sample proportion $(\hat{p})$ | | Pop. Size $(N)$ | Sample Size $(n)$ | | etc. | etc. | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Activity 1.1 * The production manager of a manufacturer producing badminton shuttlecocks claims that the mean weight of the classic series shuttlecock produced is 5 grams. In five randomly selected tubes (a total of 60), the average weight of the shuttlecocks is found to be 4.83 grams. * a) Identify the population and the sample. * b) Is the value 5 grams refers to a parameter or a statistic? * c) Is the value 4.83 grams refers to a parameter or a statistic? #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Statistical Methods ### Descriptive Statistics * Concerns with summarizing and describing information from data collected * Relates to data organization and display ### Inferential Statistics * Concerns with estimating, predicting and making inference about the parameter of a population base on information reflect from the statistic of a sample * Relates to generalization of data in a wider scope #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Descriptive Statistics ### Involves * Collecting data * Summarizing data * Presenting data ### Purpose * To organise data and try to dig out useful but often hidden information #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Inferential Statistics ### Involves * Estimation, confidence interval * Hypothesis Testing * Performing statistical inference using the above methods ### Purpose * To get to understand the concerned characteristic of a population that is represented by a specific value (parameter) by analyzing similar value (statistics) obtained from the sample collected #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Statistical Inference * By following an appropriate randomization process, the sample obtained is considered to be representative (similar) of the population * The sample mean $\bar{X}$ will then be a good estimate to the population mean $\mu$ * That is, to make use of $\bar{X}$ to 'guess' or 'infer' what $\mu$ is about * The whole process used to draw conclusions about the characteristic (a parameter) of the population based on relevant sampled information (statistics) is called Statistical Inference * Use $\bar{X}$ to estimate $\mu$. #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Statistical Inference (cont'd) * In other occasion, we are interested in another population parameter, the population proportion $p$ * In the iPhone users example, based on the survey results, it will be interpreted that the population proportion of iPhone users is about 65%. This is again statistical inference (inferential statistics). * Use $\hat{p}$ to estimate $p$. #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Activity 1.2 * Identify each of the following studies as either relate to descriptive statistics or inferential statistics * a) Compute the average diameter of a sample of 25 screws produced by a production process * a) Compute the average diameter of a sample of 25 screws produced by a production process to deduce if the average diameter of all the screws produced by the process is up to standard or not. #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Collection of Data and Sample Design ### Why collecting data * To obtain input to a research study * To measure performance or quality * To assist in formulating decision alternatives and there must be more ... #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Sampling Design ### Why Sample? 1. Destructive nature of some tests * In some processes of quality control 2. Adequate accuracy & reliability * If the sample is a fair representation of the population 3. Pragmatic Reasons * Time, budget, manpower, etc. * Feasibility #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Sampling Frame * Some sampling methods require that each item or member of a population under consideration is known and is identifiable. A device or a list which supports this identification is called a sampling frame. * Examples: sales records, personnel records * However, some populations may have no sampling frame * Examples: department store's customers, owner of iPhone XS once it is released #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Types of Samples | Non-Probability Samples | Probability Samples | |---|---| | Judgment | Simple Random | | Quota | Systematic | | Chunk | Stratified | | | Cluster | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Probability Sampling * Utilize some forms of random selection to assure equal likelihood for each observation or subject in the population to be drawn ### Common random sampling methods: * Simple random sampling * Stratified random sampling * Systematic sampling * Cluster sampling #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Sources of Data | Primary | Secondary | |---|---| | Experiment | Published data | | Survey | | | Observation | | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Sources of Data (cont'd) ### Primary data * First-hand data collected by the researcher or the organization itself * Usually obtained through survey or by observation ### Secondary data * Data collected and published by others * Source from the government or other parties or organizations #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Types of Data | Quantitative | Qualitative | |---|---| | Numerical data that can be counted or measured | Categorical data that can be observed but not measured | | Discrete | Nominal | | Continuous | Ordinal | | Examples: Number of people, Number of defects | Examples: Family name, Sex | | Examples: Elapsed time, Weight, Height | Examples: Rating, Income level | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Quantitative Data ### Discrete quantitative data * Data usually result from counting * For example, number of LEGO figures owned * Possible values: 0, 1, 2, 3, ... ### Continuous quantitative data * Data usually result from measuring * For example, lap time of a racing car * Possible values: 1:06.088, 1:05.098, ... * Both type of data can be used for arithmetic computation #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Qualitative Data ### Nominal * Only for classification of observations and no particular order would be identified * For example, colour (red, green, blue, and yellow) * Cannot be used for arithmetic calculation #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Qualitative Data (cont'd) ### Ordinal (Ranked) * For classification of observations * Can be used for arithmetic calculation by its ordering process * For example, strongly agree, agree, neutral, disagree, strongly disagree (Likert scale) #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Types of Data (cont’d) | Time Series | Cross-sectional| |---|---| #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Types of Data (cont’d) ### Time Series Data * Data values that are recorded within a 'regular' successive interval of time * Example: Tracking the price of a particular stock every minute within a three-hour interval ### Cross-sectional Data * Data values that are measured at the same or approximately the same point of time * Example: The closing prices of 20 randomly selected stocks on a particular trading day #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Example: Real-time Stock quote | Name/ Symbol | SHHK/AH | Last 1 | |---|---|---| | HK & CHINA GAS | | 15.740 | | HSBC HOLDINGS | | 68.150 | | C SUCCESS FIN | | 4.980 | | DATANG RENEW | | 1.160 | | CLP HOLDINGS | | 66.700 | | CHINA MOBILE | | 99.550 | | CKH HOLDINGS | | 112.700 | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Descriptive Statistics ### Presenting Data ### Summarizing Data #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Descriptive Statistics (cont'd) ### Why we need descriptive statistics? |Quarterly expense on in-app purchase of a mobile game (in $100)| |---|---| | 42.19 | 8.41 | | 38.45 | 70.48 | | 29.23 | 92.88 | | 89.35 | 3.2 | | 118.04 | 75.71 | | 110. 46 | 6.93 | | 0 | 10.05 | | 72.88 | 99.03 | | 83.05 | 29.24 | | 95.73 | 15.21 | | 103.15 | 11.27 | | 94.52 | 72.02 | | 26.84 | 7.74 | | 93.93 | 5.04 | | 90.26 | 33.4 | | 72.78 | 9.44 | | 101.36 | 2.67 | | 104.8 | 4.69 | | 74.01 | 41.38 | | 56.01 | 45.77 | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Descriptive Statistics (cont'd) ### Why we need descriptive statistics? #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Descriptive Statistics (cont'd) ### Descriptive statistics involves * Arrangement of data * Summarization of data * Presentation of data * To enable meaningful interpretation so as to provide information in making decision ### Descriptive statistical methods * graphical techniques * numerical descriptive measures * The methods presented could be applied to both the entire population a sample #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Presenting Qualitative Data * Frequency distribution * Bar charts and pie charts * Line chart #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Example: Soft Drink Sales * A random sample of 20 soft drink purchases made in a convenience store is given as follows * Classic Coke * Lemon Tea * Distilled Water * Classic Coke * Light Coke * Classic Coke * Light Coke * Classic Coke * Distilled Water * Classic Coke * Distilled Water * Light Coke * Classic Coke * Classic Coke * Distilled Water * Classic Coke * Lemon Tea * Distilled Water * Light Coke * Classic Coke #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Frequency Distribution * A tabular summary of data showing the frequency of items in each of the categories or classes. * Relative frequency and percentage frequency distributions | Soft drink | Frequency | Relative frequency | Percentage frequency | |---|---|---|---| | Classic Coke | 9 | 9/20=0.45 | 9/20 x 100% = 45% | | Lemon Tea | 2 | 2/20=0.10 | 2/20 x 100% = 10% | | Distilled Water | 5 | 5/20=0.25 | 5/20 x 100% = 25% | | Light Coke | 4 | 4/20=0.20 | 4/20 x 100% = 20% | | Total | 20 | 1.00 | 100% | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Bar Chart and Pie Chart ### Bar chart * Depicting each category of data in rectangular bars that summarize frequency, relative frequency or percentage frequency distribution ### Pie chart * Presenting the relative frequency or percentage frequency distribution of a data set * #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Bar Chart and Pie Chart (cont’d) * #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Line Chart ### Line chart * Plotting points to represent frequencies of categories involved * Joining the points with straight lines * If x-axis is associated with time, the chart is known as a time-series chart #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Presenting Quantitative Data * Raw data and grouped data * Frequency distribution * Histograms * Cumulative distributions (Ogives) * Scatter diagrams #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Raw Data and Grouped Data * Raw data refers to data collected before any further processing is performed * When processing large number of observations, it becomes necessary to further condense the data into appropriate class groupings. Data being processed in this sense is called grouped data * Grouping data can reflects the shape of the data * Grouped data are thus used more often for simpler treatment and easier interpretation #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Tabulating Numerical Data: Frequency Distribution 1. Determine number of classes * Rule of thumb: frequency distribution should have at least 5 classes but no more than 15 2. Determine width of the classes * Approx. class width = largest data value-smallest data value / number of classes * Class limits: lower and upper limit, non-overlapping 3. Determine class frequency * Count number of values that fall within each of the classes | # of observations | # of classes | |---|---| | Less than 50 | 5-7 | | 50-200 | 7-9 | | 200 - 500 | 9-10 | | 500 - 1,000 | 10-11 | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Tabulating Numerical Data: Frequency Distribution (cont'd) ### Collect data | Bill | |---|---| | 42.19| | 38.4 5| | 29.23 | | 89.35 | | 118.04 | | 110.46 | | 0.00 | | 72.88 | | 83.05 | | | | | | | | (There are 200 data points) | ### Prepare a frequency distribution 1. How many classes to use? Eight 2. Class width? * Largest observation: 119.63; Smallest observation: 0 * Range = Largest observation - Smallest observation * Class width = [Range] / [# of classes] * = [119.63 - 0] / [8] = 14.95 -> 15 3. Class frequency? | Class Limits | Frequency | |---|---| | 0 but less than or equal to 15 | 71 | | 15 but less than or equal to 30 | 37 | | 30 but less than or equal to 45 | 13 | | 45 but less than or equal to 60 | 9 | | 60 but less than or equal to 75 | 10 | | 75 but less than or equal to 90 | 18 | | 90 but less than or equal to 105 | 28 | | 105 but less than or equal to 120 | 14 | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Example: Long-distance Telephone Bills * Frequency distribution of 200 long-distance telephone bills | Class Limits | Frequency | |---|---| | 0 but less than or equal to 15 | 71 | | 15 but less than or equal to 30 | 37 | | 30 but less than or equal to 45 | 13 | | 45 but less than or equal to 60 | 9 | | 60 but less than or equal to 75 | 10 | | 75 but less than or equal to 90 | 18 | | 90 but less than or equal to 105 | 28 | | 105 but less than or equal to 120 | 14 | | Total | 200 | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Histogram ### Histogram * Similar to bar chart but histogram is used to depict quantitative data * The widths of the rectangular bars are constructed with reference to the boundaries of each class * No gaps between bars #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Using Excel to Construct Histogram | Bill | Frequency | |---|---| | 15 | 71 | | 30 | 37 | | 45 | 13 | | 60 | 9 | | 75 | 10 | | 90 | 18 | | 105 | 28 | | 120 | 14 | | More | 0 | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Relative Frequency ### Relative frequency * It is sometimes preferable to show the relative frequency (proportion) of observations falling into each class, rather than the frequency itself. * Class relative frequency = Class frequency / Total number of observations * Relative frequencies should be used when comparing histograms from two or more samples while the number of observations of the samples collected are different. #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Ogives * Ogives, also known as cumulative relative frequency polygon | Class | Frequency | Cumulative frequency| Cum.Relative frequency| |---|---|---|---| | 0-15* | 71 | 71 | 71/200=.355 | | 15-30 | 37 | 108(=71+37) | 108/200=.540 | | 30-45 | 13 | 121(=108+13) | 121/200=.605 | | 45-60 | 9 | 130 (=121+9)| 130/200=.650 | | 60-75 | 10 | 140 | 140/200=.700 | | 75-90 | 18 | 158 | 158/200=.790 | | 90-105 | 28 | 186 | 186/200=.930 | | 105-200 | 14 | 200 | 200/200=1.000 | * Classes containing observations greater than their lower limits and less than or equal to their upper limits #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Ogive (cont'd) * #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Descriptive Statistics: Summarizing Data * #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Measures of Central Tendency * Measures of central tendency (or central location) ## Central Tendency | Arithmetic Mean | Median | Mode | |---|---|---| | $\bar{x} = \frac{\sum_{i=1}^n X_i}{n}$ | | | | | Midpoint of ranked values | Most frequently observed value | #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Measures of Central Tendency (cont’d) * Measures of central tendency (or central location) * A measure of central location are used to pinpoint the center of a set of data values. It is a typical value to represent the whole set of data * There are three commonly used measures of central location: mean, mode and median. * The mean is most commonly used #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Summation Notation: Σ * Summation notation * A symbol to represent a series of sum indexed by a variable i whose values varies from 1 to n where * $\sum_{i=1}^n X_i = X_1 + X_2 + X_3 + ... + X_n$ * Examples * $\sum\limits_{i=2}^6{X_i}= X_2 + X_3 + X_4 + X_5 + X_6$ * $\sum\limits_{i=1}^4{{Y_i}^2} = Y_1^2 + 2.Y_2^2 + 3.Y_3^2 + 4.Y_4^2$ #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Mean * Mean is the most popular and useful measure of central location * Mean = Sum of the measurements / Number of measurements ### Population mean * $\mu = \frac{\sum_{i=1}^N{X_i}}{N}$ ### Sample mean * $\bar{x} = \frac{\sum_{i=1}^n X_i}{n}$ #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Example: Calculation of Mean * Mean of a sample of six measurements of 7, 3, 9, -2, 4, 6: * $\bar{x} = \frac{\sum_{i=1}^6{X_i}}{6} = \frac{7 + 3 + 9 - 2 + 4 + 6}{6}= 4.5$ * Suppose the telephone bills example represents population of measurements. The population mean is: * $\mu= \frac{\sum_{i=1}^{200} {X_i}}{200} = \frac{42.19 + 38.45 + ... + 45.77}{200} = 43.59$ #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Median * Median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude * In an ordered array, the median is the middle number (50% above, 50% below) #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Finding the Median * Location of the median: * Median position = (n + 1)/2 position in the ordered data * If the number of values is odd, the median is the middle number. * If the number of values is even, the median is the average of the two middle numbers. * Note that (n+1)/2 is not the value of the median, only the position of the median in the ranked data. #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Mode * Mode of a set of measurements is the value that occurs most frequently. * Set of data may have one mode (or modal class), or two or more modes, or no mode. * Not affected by outliers. #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Quartiles * Quartiles * Split the ranked data into 4 segments with an equal number of values per segment. * The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger. * The second quartile, Q2, is the same as the median (50% are smaller, 50% are larger). * Only 25% of the observations are greater than the third quartile, Q3. #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Finding the Quartiles * Locations of the quartiles: * First quartile position: Q1 = (n+1)/4 * Second quartile position: Q2 = (n+1)/2 (the median position) * Third quartile position: Q3 = 3(n+1)/4 * where n is the number of observed values. #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Quartiles (cont’d) * Example * Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 * Q1 is in the (9+1)/4 = 2.5 position of the ranked data, so Q1 = 12.5 * Q2 is in the (9+1)/2 = 5th position of the ranked data, so Q2 = median = 16 * Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data, so Q3 = 19.5 #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Quartiles (cont’d) * Example * Data: 11 12 13 16 16 17 18 21 22 25 n=10 * Q1 is in the (10+1)/4 = 2.75 or rounded as the 3rd ranked data, so Q1 = 13 * Q2 is in the (10+1)/2 = 5.5th ranked data, so Q2 = median = 16.5 * Q3 is in the 3(10+1)/4 = 8.25 or rounded as the 8th ranked data, so Q3= 21 #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Measures of Variation (Dispersion) * Measures of variation give information on the spread or variability of the data values. * Variation * Range * Interquartile Range * Variance * Standard Deviation * Coefficient of Variation #### DM Module - Statistics Fundamentals | Andrew ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Need of Measuring Variation * Consider the height of players of the following two basketball teams. * The two teams have same mean (75), same mode (76) and same median. * The heights of players on Team Il vary much more than those on Team I * $\sigma_I = 2.449$ * $\sigma_{II} = 5.586$ #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Range * Range * Simplest measure of variation * Difference between the largest and smallest values of a set of data * Range =$X_{largest} - X_{smallest}$ #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Disadvantages of Range * Range * Ignores the way in which data are distributed * Sensitive to outliers #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Interquartile Range (IQR) * Interquartile range * Eliminate some high- and low-valued observations and calculate the range from the remaining values. * Especially useful when there is outlier. * Found by: Interquartile range = 3rd quartile – 1st quartile = Q3 – Q1 #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Boxplot: Exploratory Data Analysis * Boxplot * A graphical display of data using 5-number summary * Minimum −− Q1 −− Median −− Q3 −− Maximum #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Boxplot (cont’d) * Boxplot * Boxplot can be drawn either horizontally or vertically * Outliers can be detected and shown #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Boxplot (cont’d) * Example Association Football Average Player Salaries (2013) #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University ##### Lee Shau Kee School of Business and Administration ## Shapes of Distribution * Several typical shape characteristics * Symmetry #### DM Module - Statistics Fundamentals | Andrew Yam ##### Hong Kong Metropolitan University #####