Business Statistics Handouts PDF
Document Details
Uploaded by TenaciousEpic
University of St. La Salle
Leonarres, S. R.
Tags
Summary
These handouts provide an introduction to business statistics, focusing on measures of central tendency and variability for qualitative and quantitative data. They cover concepts like measures, parameters, and statistics, as well as providing several examples.
Full Transcript
UNIVERSITY OF ST. LA SALLE Yu An Log College of Business and Accountancy BSTAT – BUSINESS STATISTICS First Semester, Ay 2020 – 2021 HANDOUTS 3...
UNIVERSITY OF ST. LA SALLE Yu An Log College of Business and Accountancy BSTAT – BUSINESS STATISTICS First Semester, Ay 2020 – 2021 HANDOUTS 3 MEASURES OF CENTRAL TENDENCY & VARIABILITY Recall: Statistics involves a body of techniques and procedures dealing with the collection, organization, analysis, interpretation, and presentation of information that can be stated numerically. Summarizing data involves using statistical tools and procedures appropriate for answering a research problem or objective. The following terms are needed need to be differentiated: Measure – a numerical representation of a particular characteristic (variable of the study) of the group being studied Parameter – A measure calculated from the population; usually represented by letters of the Greek alphabet Statistic – A measure calculated from the sample; usually represented by letters of the English alphabet Summaries of QUALITATIVE DATA: Qualitative data are summarized using the following measures: proportions ( also called relative frequencies) percentages For example: the variable sex is coded as M–0 F –1 Remark: Since “sex” is a qualitative variable and the codes 0 and 1 represent nominal data, then it is not appropriate to consider them as numbers with values, so it is not correct to apply arithmetic operations such as addition and division to get the “average sex” since it will not make any sense for a qualitative variable; Rather, use proportion (or percentage) of males (or females) in the group Say, “Two out of 10 students are male,” or “twenty percent of the students are males” Summaries of QUANTITATIVE DATA: Quantitative data are usually summarized in terms of the center and spread of the distribution. The center of the distribution can be identified using an appropriate measure of central tendency or location. LEONARES, S. R. 1 MEASURES OF CENTRAL TENDENCY OR LOCATION (AVERAGES) A measure of central tendency or location is representative value of the data set the value around which most of the data points are found (ARITHMETIC) MEAN computed by summing all the data values in the sample or population and dividing the sum by the number of observations (usually referred to as “average”) Most important measure representing the center of the distribution if the distribution is symmetric data must be at least interval Most stable measure of location, especially for large data sets When n is small, the mean is very sensitive to extreme values Differentiate between the population and sample means by their symbols: Population Mean: x i , where x i is the ith score or observation, and N is the number N of observations in the population (the parameter is , the Greek letter “mu”) Sample Mean: x x i , where x i is the ith score or observation, and n is the number of n observations in the sample (the statistic is 𝑥̅ , and is read as “x-bar”) Why differentiate between and 𝑥̅ : if the research procedure is a population study, then a populations symbol (parameter) must be used; if it is a sample study, then a sample symbol (statistic) must be used. This will be a very important distinction in inferential statistics. That is why it is important to determine at the beginning of the research process if you will be doing a population of sample study, since it will have a bearing in the use of notations/symbols for parameters or statistics. Example 1: During a particular summer month, the eight salespeople in an appliance store sold the following number of central air-conditioning units: 8, 11, 5, 14, 8, 11, 16, 11. Considering this month as the statistical population of interest, the mean number of units sold is x i 84 10.5 central a / c units N 8 Why ? Because the problem stated that the month should be considered as a statistical population of interest. LEONARES, S. R. 2 WEIGHTED MEAN Also called the weighted average an arithmetic mean in which each value is weighted according to its importance in the overall group formulas for the population, and sample weighted means are identical: w or X w wX w Operationally, each value in the group (X) is multiplied by the appropriate weight factor (w), and the products are then summed and divided by the sum of the weights. Example 2: In a multiproduct company, the profit margins for the company’s four product lines during the past fiscal year were: line A, 4.2percent; line B, 5.5 percent; line C, 7.4 percent; and line D, 10.1 percent. The unweighted mean profit margin is x 27.2 6.80% N 4 However, unless the four products are equal in sales, this unweighted average is incorrect. Assuming the sales totals in the following table which are not all equal, the weighted mean correctly describes the overall average. Product Line Profit Margin, X (%) Sales, in Php (w) wX A 4.2 30,000,000 126,000,000 B 5.5 20,000,000 110,000,000 C 7.4 5,000,000 37,000,000 D 10.1 3,000,000 30,300,000 Total Php58,000,000 Php303,300,000 Hence, the weighted mean profit margin is 303,300,000 w 5.22% 58,000,000 Remark: The weighted mean is used in computing for final grades when the number of units of the subjects are not equal. Each grade is multiplied by the number of units of the subject, and the sum of the (grades x no. of units) is divided by the total number of units taken. LEONARES, S. R. 3 MEDIAN Center of an array (arrangement of the data from lowest to highest) Divides the array into two equal parts Useful for summarizing skewed distributions because it is not sensitive to extreme values Equal to the mean for symmetric distributions Data must be at least ordinal If N (or n) is odd, the median is the middle number of the array If N (or n) is even, the median is the mean of the two middle values Population Median: ~ (read “mu-tilde”) Sample Median: ~ x (read “x-tilde”) Example 3: The eight salespeople described in Example 1 sold the following number of central air- conditioning units, in ascending order: 5, 8, 8, 11, 11, 11, 14, 16. Find the median. Array: 5, 8, 8, 11, 11, 11, 14, 16 ~ 11 11 11 central a/c units 2 Since the number of data values is even (N = 8), then the value of the median is the mean of the two middle values, which are the fourth and fifth values in the ordered group. Both these values equal “11” in this case, so adding the two 11’s and dividing by 2 gives the median which is equal to 11. Note that there is an equal number of data points below and above the median (5, 8, 8, 11 are below; 11, 11, 14, 16 are above). Example 4: The reaction times for a random sample of 9 subjects to a stimulant were recorded as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1, and 3.4 seconds. Calculate the median. First form the array: 2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3 Since there are 9 data values (odd), then there will only be one middle value. 2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3 𝑥̃ = 3.1 seconds ̃? Because the problem specifically identifies the group as a random sample. Why 𝒙 NOTE: When the problem does not specifically indicate whether the group involved is a sample or population, treat the data set as a sample. LEONARES, S. R. 4 Recall Example 1: During a particular summer month, the eight salespeople in an appliance store sold the following number of central air-conditioning units: 8, 11, 5, 14, 8, 11, 16, 11. Considering this month as the statistical population of interest, a. the mean number of units sold is x i 84 10.5 central a / c units N 8 b. the median value from Example 3 is ~ 11 11 11 central a/c units 2 Dot plot: The mean and median are relatively close to each other. 5 6 7 8 9 10 11 12 13 14 15 16 The mean and the median values would be considered to be good representatives of the data set since they are located in the center of the distribution (where the points are). What if, instead of 16, the highest value is 160? Then the last point of the dot plot would be very far from the rest of the points (extremely high value) – it can also be called an outlier. Solution with the outlier, 160: Array: 5, 8, 8, 11, 11, 11, 14, 160 Then: x i 228 28.5 central a / c units N 8 ~ 11 11 11 central a/c units 2 The resulting value of the mean is not found at the center of where the points are (28.5 is far from the majority of the points), while the median remains the same. The value of the mean is affected if there are extreme values in the distribution, hence, it cannot be used to represent the distribution if the shape is skewed. That is why, one condition for its use as a representative value is that the shape must be symmetric. On the other hand, the median has not changed, because only the middle value (if n is odd) or the mean of the two middle values (if n is even) is used; the extreme value is not used in determining the median. Therefore, the median is a better representative value if the shape of the distribution is skewed. LEONARES, S. R. 5 MODE Value in the data set which has the highest frequency (occurs most often) Can be applied to any measurement level May not exist (the data set may not have a mode if all the values occur with the same frequency) May not be unique, if it exists (a data set may have more than one value which have the same highest fequency Related to the concept of a peak or peaks in the frequency distribution Unimodal – one peak Bimodal – two peaks, etc. Population Mode: Mo Sample Mode: mo Example 5: The eight salespeople described in Example 1 sold the following number of central air- conditioning units: 8, 11, 5, 14, 8, 11, 16, and 11. Find the mode. Mo =11 central air-conditioning units Example 6: The reaction times for a random sample of 9 subjects to a stimulant were recorded as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1, and 3.4 seconds. Find the mode. Since all values occur only once (they have the same frequency), then this distribution has no mode or we say that the mode does not exist. This different from saying that the mode is 0 (why?) RELATIONSHIP BETWEEN THE MEAN AND THE MEDIAN: Note that the shape of the distribution is important in choosing the most appropriate measure of central tendency (and in other measures and tests as well). Hence, to determine the shape and there is no graph to base it on, comparing the mean and median values will determine the shape: a. symmetric distribution: mean = median b. positively skewed distribution: mean > median c. negatively skewed distribution: mean 0 => positively skewed if SK < 0 => negatively skewed If SK = 0 => symmetric Rule of thumb (Bulmer, 1979): If SK is less than −1 or greater than +1, the distribution is highly skewed. between −1 and −½ or between +½ and +1, the distribution is moderately skewed. between −½ and +½, the distribution is approximately symmetric. D. EMPIRICAL RULE When the data are believed to approximate a bell-shaped distribution, the empirical rule can be used to determine the percentage of data values that must be within a specified number of standard deviations of the mean, that is, o Approximately 68% of the data values will be within 1 standard deviation of the mean ( ± 1) = ( - 1 , + 1). o Approximately 95% of the data values will be within 2 standard deviations of the mean ( ± 2) = ( - 2 , + 2). o Approximately 99.7% of the data values will be within 3 standard deviations of the mean ( ± 3) = ( - 3 , + 3). LEONARES, S. R. 15 Remarks on the bell-shaped curve (also called the normal curve): 1. the horizontal line can go much lower than - 4 and much higher than + 4. 2. the total area under the curve and above the horizontal line is 1 or 100% 3. since it is symmetric, the percentage between similarly distanced points on the x-axis from the mean are equal ( see above figure) 4. 0.15% (on the left of the figure) is the area from - 3 and below; 0.15% (on the right of the figure) is from + 3 and above. Example: Liquid detergent cartons are filled automatically on a production line. Filling weights frequently have a bell-shaped distribution. If the mean filling weight is 16.00 ounces and the standard deviation is 0.25 ounces, use the empirical rule to draw conclusions about the distribution of filling weights. = 16.00 oz ; = 0.25 oz LEONARES, S. R. 16 ± 1 : 16.00 ± 0.25 (16.00 - 0.25, 16.00 + 0.25) (15.75, 16.25) 68% of the liquid detergent cartons have filling weights between 15.75 oz and 16.25 oz ± 2 : 16.00 ± (2)0.25 16.00 ± 0.50 (15.50, 16.50) 95% of the liquid detergent cartons have filling weights between 15.50 oz and 16.50 oz ± 3 : 16.00 ± (3)0.25 16.00 ± 0.75 (15.25, 16.75) 99.7% of the liquid detergent cartons have filling weights between 15.25 oz and 16.75 oz EXERCISES 1. A goal of management is to help their company earn as much as possible relative to the capital invested. One measure of success is return on equity – the ratio of net income to stockholder’s equity. Shown here are return on equity percentages for 25 companies. Find the range, variance, and standard deviation. 9.0 19.6 22.9 41.6 11.4 15.8 52.7 17.3 12.3 5.1 17.3 31.1 9.6 8.6 11.2 12.8 12.2 14.5 9.2 16.6 5.0 30.3 14.7 19.2 6.2 2. During a 30-day period, the daily number of cars rented of a car rental company are as follows: 7 10 6 7 9 4 7 9 9 8 5 5 7 8 4 6 9 7 12 7 9 10 4 7 5 9 8 9 5 7 Find the range, variance, and standard deviation. 3. A manufacturing firm regularly places orders with two different suppliers, A and B. The following data are the number of days required to fill orders for these suppliers. Supplier A: 11 10 9 10 11 11 10 11 10 10 Supplier B: 8 10 13 7 10 11 10 7 15 12 Determine which supplier provides the more consistent and reliable delivery times. Use the range and standard deviation. Since you are comparing the two, why just use the standard deviation and not compute for the coefficient of variation? LEONARES, S. R. 17 4. A production department uses a sampling procedure to test the quality of newly produced items. The department employs the following decision rule at an inspection station: If a sample of 14 items has a variance of more than.005, the production line must be shut down for repairs. Suppose the following data have been collected: 3.43 3.45 3.43 3.48 3.52 3.50 3.39 3.48 3.41 3.38 3.49 3.45 3.51 3.50 Should the production line be shut down? Why or why not? 5. Two friends want to take a summer holiday before going to college in the autumn. They are looking for somewhere with plenty of clubs where they can party all night. Unfortunately they have left it rather late to book and there are only two resorts, Medlena and Bistry, available within their budget. When they ask about the ages of the holiday-makers at these resorts their travel agent says the only thing he can tell them is that that the mean age of people going to Medlena is 19 whereas the mean age of visitors to Bistry is 22. Just as they are about to book holidays in Medlena because it seems to attract the sort of young crowd they want to be with the travel agent says. ‘I’ve got some more figures, the standard deviation of the ages of visitors to Medlena is 8 and the standard deviation of the ages of visitors to Bistry is 2’. Should they change their minds on the basis of this new information, and if so, why? 6. Many national academic achievement and aptitude tests, such as the SAT, report standardized test scores with the mean for the normative group used to establish scoring standards converted to 500 with a standard deviation of 100. Suppose that the distribution of scores for such a test is known to be approximately normally distributed. Determine the approximate percentage of reported scores that would be a. between 400 and 600 b. between 500 and 700 c. greater than 700 d. less than 200 Hint: Draw the bell-shaped curve and replace the values of and on the horizontal axis: 7. A SAT test taker (refer to #6) got a score of 625. What is his standard score? 8. The same student (in #7) got the same score (625) in a different test, the mean of which is 450 and standard deviation 150. In which test did this student fare better? LEONARES, S. R. 18