Business Statistics Handouts PDF
Document Details
Uploaded by HeavenlyDerivative
University of St. La Salle
S. R. Leonardes, PhD
Tags
Summary
These handouts cover business statistics for the first semester of the 2020-2021 academic year at the University of St. La Salle. They provide an introduction to the statistical concepts required for research and to evaluate information presented in research articles.
Full Transcript
UNIVERSITY OF ST. LA SALLE College of Business and Accountancy BSTAT – BUSINESS STATISTICS First Semester, Ay 2020 – 2021 HANDOUTS 1...
UNIVERSITY OF ST. LA SALLE College of Business and Accountancy BSTAT – BUSINESS STATISTICS First Semester, Ay 2020 – 2021 HANDOUTS 1 STATISTICS IN THE RESEARCH PROCESS "Statistics can be fun or at least they don't need to be feared." Many folks have trouble believing this premise. Often, individuals walk into their first statistics class experiencing emotions ranging from slight anxiety to borderline panic. It is important to remember, however, that the basic mathematical concepts that are required to understand introductory statistics are not prohibitive for any university student. The key to doing well in any statistics course can be summarized by two words, "KEEP UP!" If you do not understand a concept--reread the material, do the practice questions, and do not be afraid to ask your professor for clarification or help. This is important because the material discussed four weeks from today will be based on material discussed today. If you keep on top of the material and relax a little bit, you might even find you enjoy this introduction to basic measurements and statistics. Why Study Statistics? "Why do I need to learn statistics?" or "What future benefit can I get from a statistics class?" There are five primary reasons to study statistics: The first reason is to be able to effectively conduct research. Without the use of statistics it would be very difficult to make decisions based on data collected from a research project. Statistics provides us with a tool with which to make an educated decision. A second point about research should be made. It is extremely important for a researcher to know what statistics they want to use before they collect their data. Otherwise data might be collected that is not interpretable. Unfortunately, when this happens it results in a loss of data, time, and money. Although you may never plan to be involved in research, research may find its way into your life. Certainly, if you decide to continue your education and work on a masters or doctoral degree, involvement in research will result from that decision. Secondly, more and more work places are conducting internal research or are part of broader research studies. Thus, you may find yourself assigned to one of these studies. The second reason to study statistics is to be able to read journals. Most technical journals you will read contain some form of statistics. Usually, you will these statistics in something called the results section. Without an understanding of statistics, the information contained in this section will be meaningless. An understanding of basic statistics will provide you with the fundamental skills necessary to read and evaluate most results sections. The ability to extract meaning from journal articles and the ability to critically evaluate research from a statistical perspective are fundamental skills that will enhance your knowledge and understanding in related coursework. S. R. LEONARES, PHD 1 The third reason is to further develop critical and analytic thinking skills. The study of statistics will serve to enhance and further develop these skills. To do well in statistics one must develop and use formal logical thinking abilities that are both high level and creative. The fourth reason to study statistics is to be an informed consumer. Like any other tool, statistics can be used or misused. Yes, it is true that some individuals do actively lie and mislead with statistics. More often, however, well-meaning individuals unintentionally report erroneous statistical conclusions. If you know some of the basic statistical concepts, you will be in a better position to evaluate the information you have been given. The fifth reason to have a working knowledge of statistics is to know when you need to hire a statistician. Conducting research is time consuming and expensive. If you are in over your statistical head, it does not make sense to risk an entire project by attempting to compute the data analyses yourself. It is very easy to compute incomplete or inappropriate statistical analysis of one's data. It is also important to have enough statistical savvy to be able to discuss your project and the data analyses you want computed with the statistician you hire. In other words, you want to be able to make sure that your statistician is on the right track. (https://universalteacher.com/1/reasons-for-conducting-research/) Statistics are part of our everyday life. Science fiction author H. G. Wells in 1903 stated, ""Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." Wells was quite prophetic as the ability to think and reason about statistical information is not a luxury in today's information and technological age. Anyone who lacks fundamental statistical literacy, reasoning, and thinking skills may find they are unprepared to meet the needs of future employers or to navigate information presented in the news and media On a most basic level, all one needs to do open a newspaper, turn on the TV, examine the baseball box scores, or even just read a bank statement to see statistics in use on a daily basis. Statistics in and of themselves are not anxiety producing. The idea of statistics is often anxiety provoking simply because it is a tool with which we are unfamiliar. ------------------------------------------------------------------------------------------------------------------------------------------- STATISTICS: Defined: A branch of science which deals with the collection, organization, presentation, analysis, and interpretation of data. A body of techniques and procedures dealing with the collection, organization, analysis, interpretation, and presentation of information that can be stated numerically. The backbone of (Quantitative) Research Two Branches of Statistics Descriptive statistics are used to organize or summarize a particular set of measurements. These deal with organizing and summarizing observations so that they are easier to comprehend. The census of households conducted by the Philippine Statistics Authority every five years represents an example of how descriptive statistics are generated. The information that is gathered concerning S. R. LEONARES, PHD 2 gender, race, income, etc. is compiled to describe the population of the Philippines at a given point in time. Collection, Organization, Presentation, and Analysis are part of descriptive statistics. Inferential statistics use data gathered from a sample to make inferences or generate conclusions about the larger population from which the sample was drawn. Opinion polls and television ratings systems represent some uses of inferential statistics. For example, a limited number of people are polled during an election and then this information is used to describe voters as a whole. Interpretation falls under Inferential Statistics. Example: We wanted to know the level of job satisfaction nurses experience working on various units within a particular hospital (e.g., psychiatric, cardiac care, obstetrics, etc.). The first thing we would need to do is collect some data. We might have all the nurses on a particular day complete a job satisfaction questionnaire. We could ask such questions as "On a scale of 1 (not satisfied) to 10 (highly satisfied), how satisfied are you with your job?". We might examine employee turnover rates for each unit during the past year. We also could examine absentee records for a two month period of time as decreased job satisfaction is correlated with higher absenteeism. Once we have collected the data, we would then organize it. In this case, we would organize it by nursing unit. Absenteeism Data by Unit in Days Psychiatric Cardiac Care Obstetrics 3 8 4 6 9 4 4 10 3 7 8 5 5 10 4 Mean = 5 9 4 Thus far, we have collected our data and we have organized it by hospital unit. You will also notice from the table above that we have performed a simple analysis. We found the mean (you probably know it by the name "average") absenteeism rate for each unit (descriptive statistics). Next, we would interpret our data (inferential statistics). We could take the information gained from our nursing satisfaction study and make inferences to all hospital nurses. We might infer, and therefore conclude, that cardiac care nurses as a group are less satisfied with their jobs as indicated by the high absenteeism rate. This course will be discussed in light of the role of statistics in the research, particularly quantitative research, process. Statistics in the Research Process: “Research is a procedure for carefully finding accurate solutions to important and relevant questions by the use of scientific method of gathering and interpreting information. Doing research is a multi- dimensional skill. Carrying out successful research must exceed the bounds of printed paper, and leap out to influence opinions and opinion shapers.” (https://universalteacher.com/1/reasons-for-conducting- research/) S. R. LEONARES, PHD 3 The Research Process (from the standpoint of Statistics) : Formulate the research problem (this could be your general or specific objective) S – pecific M - easurable A – attainable R – ealistic T – ime bound Remarks: A research objective that is SMART sets a very good road map for the conduct of research: The scope/population is delineated, hence it can be determined beforehand whether to do a census (gathering data from the whole population) or a survey (gathering data from a sample) will be conducted The subjects (sources of information) are identified, hence the appropriate method of data collection can be determined The kind of information needed to answer the problem/objective is known at the beginning of the study The type of objective is known, hence the appropriate descriptive and/or inferential statistical tools are anticipated Define the population of the study o Population – all subjects under investigation – the set of all elements of interest in a particular study o Sample – a subset of the population Notes: a. In order to identify the population of the study, ask the question, “Who/What are going to provide the information needed to answer the research problem?” b. the population of the study need not consist of a human population Identify the variable/s of the study o Variable – measurable characteristic or attribute of the subject that is the focus of the study that can take on different values Notes: a. In order to determine the variable/s of the study, ask the question, “What information is needed from each subject (element of the population) in order to answer the research problem?” b. A research problem or specific objective may involve one or more variables. c. It would be a good practice to determine the variable/s of each stated specific objective so as not to miss any information needed from each subject Example: Problem: What is the mean weekly household food expense of a USLS BStat student for the first semester of AY 2020 – 2021? Population of study: All USLS BStat students for the first semester, AY 2020 – 2021 S. R. LEONARES, PHD 4 Question: How will the description and scope of the population be affected if each of the following is omitted? a. USLS b. BStat d. first semester e. AY 2020-2021 Variable/s: weekly household food expense (only one information is needed from each USLS BStat student for the first semester of AY 2020-2021) Remarks: 1) Identifying the variable/s early in the study eliminates the possibility of a. missing it when eventually formulating the instrument or b. including variables in the instrument that are not necessary in answering the problem/objectives 2) Identifying the variable/s enables the researcher to determine the type of variable/s and the level/s of scale of the data that will be collected. These, in turn, determine the types of analysis and interpretation that will be applied to generate needed results : : (Anticipated) Conclusion (think ahead as to how the answer to the research problem/specific objective will look like): The mean weekly household food of a USLS BStat student for the first semester of AY 2020-2021 is _______. Notes: a. This will help you to anticipate that you need to compute for the mean of the weekly household food expense values that you have collected from all members of the population b. More importantly, you conclusion should be consistent with the statement of your problem/specific objectives, that is, it is about the population under study, so the conclusion should be about the population under study This is not a problem if a census is conducted – the conclusion is straightforward, like in the example above However, if only a sample was taken from the population for the study, the conclusion should never be about the sample; it should still be about the population, hence, its form will be quite different from the anticipated conclusion as in the example above (inferential statistics can provide a template for specific types of objectives) CLASSIFICATION OF RESEARCH OBJECTIVES/GOALS: Each state objective can be differentiated according to the following classification. This will guide the researcher to anticipate the type of analysis and interpretation that is required of the objective. S. R. LEONARES, PHD 5 Analytic goals: directed toward finding out from the data one or more of the following attributes of characteristics of the group being studied: 1. Central tendency – general characteristic of the group Examples: a. To determine the mean weekly allowance of USLS College Freshmen for the first semester, AY 2020 – 2021. b. To determine the percentage of USLS College students who prefer a Samsung over a Vivo cellphone for the first semester, AY 2020-2021. 2. Variance in the group – how individual members of the group vary from the average characteristic of the group Examples: a. To determine the age range of the students in this class. b. To determine if the final grades in this class are similar. 3. Difference within the group/between groups – whether or not subgroups of the group/ two separate groups being studied are different or similar on certain traits investigated (special case: comparison between/among two or more groups with regards to a particular variable) Examples: a. To compare the mean no. of Coke Sakto bottles consumed in July, 2020 between the male and female USLS students. b. To determine if there is a significant difference in the mean number of text messages sent in a day among the students from the five different colleges of USLS for the first semester, AY 2020-2021. 4. Relationships within the group – if relationship between certain variables covered in the study exists Examples: a. To establish if there is a significant relationship between choice of cellphone brand and the college a USLS student belongs to for the first semester, AY 2020-2021. b. To determine if relationship status and final grades in Statistics are independent for the first semester, AY 2020-2021. 5. Prediction – establishing a mathematical/statistical model to predict future outcomes Examples: a. What factors influence the a graduate’s ability to land a job within one year after graduation? b. What is the estimated sales of a particular restaurant for next week if the present conditions hold? Types of Analysis: 1. Descriptive – limited to the description of the particular group being studied a conclusion cannot be applied to cases outside the study group S. R. LEONARES, PHD 6 2. Inferential – application of the findings or conclusions from a small group to a large group from which the smaller group was drawn To summarize, the following diagram shows the aspects of statistics involved in a research process, depending on the scope of the study: Population study Sample study Sampling Collection Collection Organization Organization Presentation Presentation Analysis Analysis Interpretation Conclusion (always about the population) AVOID any one of two possible procedural errors: 1. You did a population study but you used inferential statistics to arrive at the conclusion. 2. You did a sample study but you did not use inferential statistics to arrive at the conclusion. Remember, inferential statistics is applied only in order to generate conclusions about the population BASED ON SAMPLE DATA. TYPES OF VARIABLES (inherent characteristic of the variable; does not change) 1. Qualitative/Categorical Attributes are in terms of categories or levels - the descriptions that you give a variable that help to explain how variables should be measured, manipulated and/or controlled. Examples: Variable Categories/levels 1. sex categories - Male - Female S. R. LEONARES, PHD 7 2. Religion categories - Roman Catholic - Protestant - Iglesia ni Cristo - Islam - Others, please specify _______ 3. Importance of university to levels - strongly agree getting a good job - agree - neither agree nor disagree - disagree - strongly disagree Notes: 1. categories vs levels Categories – do not have/possess an intrinsic order; they are all considered equal Levels – possess intrinsic or inherent order from one “category” to the next 2. Categories/levels should be a. exhaustive – should cover all possible answers (oftentimes, the use of “Others, please specify” serves the purpose of including all possibilities, especially those categories with small frequencies). This will prevent the respondent from being confused about what answer to tick () or mark with an x since his or her desired response is not among the given options b. mutually exclusive – should make sure that the categories do not overlap in order to ensure that the respondents provide only one answer. This will prevent the respondent from being confused as to which category to tick () or mark with an x if there is more than one possible answer. This holds true even for multiple response questions. 2. Quantitative/Numerical The variable has numerical properties which are the values by which the said variables can be measured, manipulated and/or controlled Attributes are in terms of counts (discrete) or measurements (continuous) Distinctions/Types of quantitative variables : a. Discrete Variable uses the process of counting to generate data values of attributes are in terms of whole numbers only Examples: a. Number of t-shirts owned b. Number of pocketbooks read b. Continuous Variable uses the process of measuring to generate data (with the use of a measuring instrument) values of attributes may have fractional or decimal parts Examples: S. R. LEONARES, PHD 8 a. Weight of a package b. Volume of water c. temperature Note: for continuous variables, it is important to append the unit of measurement since the result may have a different value depending on the unit Example: For discrete variables, the value of a number remains the same regardless of the variable: 5 chairs vs 5 students ( the value of 5 is the same for both) For continuous variables, the value of a number depends on the unit of measurement, even if the same variable is being measured: 5 inches vs 5 feet (length measuring 5 inches is shorter than length measuring 5 feet) READ: http://dissertation.laerd.com/types-of-variables.php FUNCTIONS OF VARIABLES Not an intrinsic property of the variable; it depends on the role of the variable in a study Important if the investigation is about cause and effect Distinctions: a. Independent Variable sometimes called an experimental or predictor variable is a variable that is being manipulated in an experiment in order to observe the effect this has on a dependent variable what the researcher (or nature) manipulates -- a treatment or program or cause b. Dependent Variable sometimes called an outcome variable a variable that is dependent on an independent variable(s) what is affected by the independent variable -- the effects or outcomes Example: Study/Problem: the effects of a new educational program on student achievement Independent variable - the program Dependent variables - measures of achievement a variable may function as an independent variable in one study and a dependent variable in another MEASUREMENT AND MEASUREMENT SCALES What is Measurement? Defn: Measurement – The process of assigning numbers to observations or observed characteristics S. R. LEONARES, PHD 9 Normally, when one hears the term measurement, they may think in terms of measuring the length of something (e.g., the length of a piece of wood) or measuring a quantity of something (e.g., a cup of coffee).This represents a limited use of the term measurement. In statistics, the term measurement is used more broadly and is more appropriately termed scales of measurement. Scales of measurement refer to ways in which variables/numbers are defined and categorized. Each scale of measurement has certain properties which in turn determines the appropriateness for use of certain statistical analyses. The four scales of measurement are nominal, ordinal, interval, and ratio. 1. Nominal Scale Consists of numbers which indicate categories for purely classification or identification purposes The numbers serve as codes only; any number can be used to represent a category as long as they do not duplicate The numbers do not indicate order among the categories The numbers have no numeric properties, hence, the four fundamental operations (addition, subtraction, multiplication, division) cannot be applied to the numbers in the nominal scale The categories are mutually exclusive (the observations cannot fall into more than one category) The categories are exhaustive (there must be enough categories for all the observations) Example: Sex: Male =1 Female = 2 Remarks: a. assigning the number 2 to Female does not imply that females are “better” than males b. these numbers cannot be arithmetically manipulated, for example, to get the “average sex” 2. Ordinal Scale Possesses rank order characteristics the categories must still be mutually exclusive and exhaustive, but they also indicate the order of magnitude of some variable the numbers serve as codes but must now be assigned in consecutive order, indicating degree of level (for example: lowest to highest, most preferred to least preferred, etc.) Example: Likert item response: Strongly agree =1 Agree =2 Neither agree nor disagree = 3 Disagree =4 Strongly disagree =5 Remarks: a. Although the numbers are arranged in consecutive order, it cannot be assumed that the differences between two consecutive numbers are the same anywhere in the scale, for example, the degree of difference of “1” in responses between strongly agree (1) and agree (2) is not necessarily the same as that between disagree (4) and strongly disagree (5) S. R. LEONARES, PHD 10 b. Fundamentally, these scales do not represent a measurable quantity; for this reason, arithmetic operations on the numbers are supposedly not applicable Example: Likert-type items (such as "On a scale of 1 to 10, with one being no pain and ten being high pain, how much pain are you in today?") also represent ordinal data. An individual may respond 8 to this question and be in less pain than someone else who responded 5. A person may not be in exactly half as much pain if they responded 4 than if they responded 8. All we know from this data is that an individual who responds 6 is in less pain than if they responded 8 and in more pain than if they responded 4. Therefore, Likert-type items only represent a rank ordering. REMEMBER: a. Nominal and Ordinal scale data are basically categories/levels converted to numeric codes. b. Qualitative variables generate either nominal (categories) or ordinal (levels) scale data. 3. Interval Scale Has all the properties of the ordinal scale A scale that represents quantity and has equal units A given interval (distance) between scores has the same meaning anywhere on the scale Interval scale provides information about how much better one value is compared with another zero does not represent the absolute lowest value but represents simply an additional point of measurement and not the absence of the property being measured Examples: a. temperature measured on Celsius scale Temperature is defined as the measure of the warmth or coldness of an object or substance with reference to some standard value Water boils at 100Celsius, freezes at 0Celsius (ice is cold to the touch) However, 0Celsius does not imply complete absence of heat – there are substances colder than ice (dry ice, liquid nitrogen) – so 0Celsius is not the absolute lowest value in the Celsius thermometer b. score on a test Test measures knowledge gained by a student about the topic A score of 0 does not imply complete absence of knowledge gained by a student about the topic 4. Ratio Scale Possesses all the characteristics of the interval scale (represents quantity and has equality of units) The most informative scale as it tends to tell about the order and number of the object between the values of the scale Allows comparison of intervals or differences Has a true or absolute zero point (no numbers exist below zero, i.e., there are no negative numbers) S. R. LEONARES, PHD 11 The ratio of two values is meaningful because the zero point characteristic makes it relevant or meaningful to say, “one object has twice the length of the other” or “is twice as long.” Examples: a. Very often, physical measures will represent ratio data (for example, height and weight). If one is measuring the length of a piece of wood in centimeters, there is quantity, equal units, and that measure cannot go below zero centimeters. A negative length is not possible. b. Cost of today’s lunch c. length of time of a full-length movie REMEMBER: a. Interval and Ratio scale data are possess inherently numeric characteristics b. Quantitative variables generate either interval or ratio scale data. The table below will help clarify the fundamental differences between the four scales of measurement: Indications Indicates Direction of Indicates Amount of Absolute Difference Difference Difference Zero Nominal X Ordinal X X Interval X X X Ratio X X X X You will notice in the above table that only the ratio scale meets the criteria for all four properties of scales of measurement. ------------------------------------------------------------------------------------------------------------------------------------------- EXERCISES 1. Indicate whether each of the following examples refers to a population or to a sample. a. A group of 25 customers selected to taste a new soft drink b. Salaries of all CEOs in the pharmaceutical industry c. Customer satisfaction ratings of all clients of a local bank d. Monthly phone expenses of selected Globe subscribers 2. Indicate whether the following are qualitative (QL), quantitative discrete (QD) or quantitative continuous (QC) variables and the corresponding level of measurement of the data generated for each variable. a. Brand of jeans you prefer b. Ratio of current assets to current liabilities c. Number of text messages received per day d. Rating of the management skills of a company president e. Number of banks in the municipalities and cities of Negros Occidental f. Ranking of professional tennis players S. R. LEONARES, PHD 12 g. Scores of freshmen college students on an attitude towards math scale h. Effectiveness of a drug for headache, measured in minutes i. Earnings per share j. Number of leaves k. Weekly allowance l. Distance of the student’s house from school m. Color of the hair n. Zip code 2. Identify the level of measurement of the following variables. a. Age f. Favorite TV show b. Place of birth g. Shoe size c. Number of children in the family h. High school GPA d. Grade in Math 1 i. Family monthly income e. Height (in cm.) j. Travel time (in minutes) from USLS to residence 3. A researcher measures two individuals and the uses the resulting scores to make a statement comparing two individuals. For each of the following statements, identify the scale of measurement (nominal, ordinal, interval, ratio) that the researcher used. a. I can only say that the two individuals are different. b. I can say that one individual scored 6 points higher than the other. c. I can say that one individual scored higher than the other, but I cannot specify how much higher. d. I can say that the score for one individual is twice as large as the score for the other individual. 4. A firm is interested in testing the advertising effectiveness of a new television commercial. As part of the test, the commercial is shown on a 6:30 PM local news program in Bacolod City. Two days later, a market research firm conducts a telephone survey to obtain information on recall rates (percentage of viewers who recall seeing the commercial) and impressions of the commercial. a. What is the population for this study? __________________________________________ _________________________________________________________________________ b. What is the sample for this study?_____________________________________________ _________________________________________________________________________ c. Why would a sample be used in this situation? Explain. S. R. LEONARES, PHD 13