BST 312 Chapter 1 Introduction to Statistics PDF

BST 312 Chapter 1 SOME BASIC CONCEPTS: Statistics: Is the science of conducting studies to collect, organize, present (summarize), analyze, and draw conclusions from data. Collect Organize Present Analyze Draw data data data data conclsions WHY WE NEED TO KNOW ABOUT STATISTICS? Statistics helps in simplifying complex data to simple, to make them understandable. We can represent the things in their true form with the help of tables and figures. Without a statistical study, our ideas would be vague and indefinite. With help of statistics we can frame favorable and intelligent strategies and policies. The statistics help in shaping future policies. Future is uncertain, but statistics help in all the phenomenon of the world to make correct estimation by taking and analyzing the various data of the part. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive Statistics – Consists of collecting, organizing and summarizing data in a meaningful way that makes it easy to be understood by an interested reader. Inferential Statistics – Consists of generalizing from samples to population, performing estimation and test of hypotheses, determining the relationship among variables and making predictions. Inferential statistics uses probability, (the chance of an event occurring). Note : The main difference between these branches is in inferential statistics the results will be generalized to all population while in descriptive statistics the result will remain limited to the sample or population that being study. INTRODUCTION Biostatistics: When the data is obtained from the biological sciences and medicine, we use the term "biostatistics". Biostatistics is about information; how it is obtained, how it is analyzed, and how it is interpreted. The objective of the course is to learn: (1) How to organize, summarize, and describe data. - (Descriptive Statistics) (2) How to reach decisions about a large body of data by examine only a small part of the data. - (Inferential Statistics) WHICH BRANCH OF STATISTICS IS USED IN THESE STATEMENTS? Based on Ibrahim's electric bill for last year he expects that he will be paying SAR 200 for each month in this year. Inferential statistics Last year’s total attendance at Long Run High School’s football games was 8235. Descriptive statistics According to 2030 Saudi Arabia vision the non-oil governmental revenue is expected to be Increase from 163 billion SAR to 1 trillion SAR. Inferential statistics EXERCISE Try it by yourself Which branch of statistics is used in these statements?  Generalizing sample to population in a meaningful way.  The average life in China is 80 years.  By 2040 at least 3.5 billion people will run short of water.  The median household income for people aged 25–34 is $35,888 SOME BASIC CONCEPTS: Data: are the values (measurements or observations) that the variables can assume. A collection of data is a data set, while each value in the data is called a datum (data value). Data is the raw material of statistics. There are two types of data: 1. Quantitative data (numbers: weights, ages, …). 2. Qualitative data (words or attributes: nationalities, occupations, …). DATA COLLECTION SOURCES A) Primary Data: means ‘First-hand information’ collected by an investigator.  It is collected for the first time.  It is original and more reliable.  For example Population census conducted by the government after every 10 years. B) Secondary Data  Secondary data refers to ‘Second-hand information’.  These are not originally collected rather obtained from already published or unpublished sources. EXAMPLES OF SOURCES OF DATA: 1. Routinely kept records. 2. Surveys. 3. Experiments. 4. External sources. (published reports, data bank, …) TERMINOLOGY Population: the largest collection of entities such as persons, animals, or cells for which we have an interest at a particular time oExample: If we are interested in the weights of students enrolled in the college of engineering at PSMCHS, then our population consists of the weights of all of these students, and our variable of interest is the weight Population Size (N): The number of elements in the population is called the population size and is denoted by N. TERMINOLOGY Sample: A subset/part of the population, Fraction of the population o Example: Suppose that we are interested in studying the characteristics of the weights of the students enrolled in the college of engineering at PSMCHS. If we randomly select 50 students among the students of the college of engineering at PSMCHS and measure their weights, then the weights of these 50 students form our sample. Sample Size (n): The number of elements in the sample is called the sample size and is denoted by n. TERMINOLOGY Parameter: A descriptive measure computed from the data in a population. Usually represented by Greek letters such as Population average µ or Population standard deviation σ Values of the parameters are unknown in general. We are interested to know true values of the parameters Statistic: A descriptive measure computed from the data in a sample. Usually represented by roman letters such as Sample average or sample Population Inferential statistics are used to make educated Parameter guesses about the µ=? population parameter (µ) Sample Statistic  = 28 Descriptive statistics are used to calculate  from the sample data Example Identify each of the following data sets as either a population or a sample: A. The grade point averages (GPAs) of all students at a college. b. The GPAs of a randomly selected group of students on a college campus. C. The gender of every second customer who enters a movie theater. Example A researcher is suspecting there is a relation between student’s absences and their final grade, to check that,are selected from the science track’ and their scores and their number of absences have been reported. Answer the following questions: a. What are variables under study? b. What are the population and the sample in this study? TERMINOLOGY Variable: A characteristic that can take on different values in different people, animals, or things. A random variable: is a variable which its values are determined by chance. Example of variables: Number of patients, Height, Gender, and Educational Level TYPES OF VARIABLES (1) Quantitative Variables: A quantitative variable is a characteristic that can be measured. The values of a quantitative variable are numbers indicating how much or how many of something. information regarding AMOUNTS Examples: Family Size, No. of patients, Weight, and TYPES OF QUANTITATIVE VARIABLES: (a) Discrete Variables: There are jumps or gaps between the values. Examples: Family size (x = 1, 2, 3,..), Number of patients (x = 0, 1, 2, 3,..) (b) Continuous Variables: There are no gaps between the values. A continuous variable can have any value within a certain interval of values. TYPES OF VARIABLES (2) Qualitative Variables: The values of a qualitative variable are words or attributes indicating to which category an element belong. Examples: Blood type, Nationality, Students Grades, Educational level TYPES OF QUALITATIVE VARIABLES: (a) Nominal Qualitative Variables: A nominal variable classifies the observations into various mutually exclusive and collectively non-ranked categories. The values of a nominal variable are names or attributes that can not be ordered or sorted or ranked. Examples: Blood type (O, AB, A, B), Nationality (Saudi, Egyptian, British, …), Sex (male, female) TYPES OF QUALITATIVE VARIABLES: (b) Ordinal Qualitative Variables: An ordinal variable classifies the observations into various mutually exclusive and collectively ranked categories. The values of an ordinal variable are categories that can be ordered, sorted, or ranked by some criterion. Examples: Educational level (elementary, intermediate, …), Students grade (A, B, C, D, F), Military rank EXAMPLES Choose the correct answer: The natural hair color of 20 randomly selected fashion models is what type of data?  A) qualitative b) continuous c) interval d) discrete The ages of 20 randomly selected fashion models are what type of data?  A) discrete b) quantitative c) interval d) qualitative The amount of gasoline put into a car at a gas station is what type of data?  A) quantitative b) nominal c) interval d) qualitative EXAMPLES Try it by yourself 1. The height of khalifa’s tower building is an example of……… data. A) quantitative b) nominal c) interval d) qualitative 2. Colors of baseball caps in a store are what type of data? A) discrete b) continuous c) interval d) qualitative 3. Time it takes to cut a lawn is what type of data? a) discrete b) continuous c) interval d) qualitative CATEGORICAL VARIABLE Organization: observations that have the same attributes are in the same category. A dichotomous variable is a type of categorical variable that can take on only two values, such as: dead or alive yes or no female or male MEASUREMENT SCALES Variables can be also classified by how they are categorized, counted or measured. The four levels of measurement are: nominal, ordinal, interval, and ratio. Qualitative NOMINAL SCALE ORDINAL SCALE INTERVAL SCALE Quantitative RATIO SCALE NOMINAL SCALE Nominal scales categorize data into distinct categories or labels. These categories don't have any inherent order or numerical value. No mathematical computations can be made at this level. Example: Eye color (Blue, Brown, Green), Gender, marital status, zip codes, area of residence, Scientific major field (statistics, mathematics, computers, ORDINAL SCALE Consists of qualitative observations that are not only different from category to category but can be ranked according to some criterion Example: Likert scales: (Extreme Pain, Moderate Pain, Mild Pain, No Pain), Students grades (A+, A, B+, B,..), rating scale (poor, good, excellent), judging (first place, second place, etc.), ranking of players , etc. INTERVAL SCALE is a numeric ranked data where precise difference exists however there is no meaningful of zero. If it is not only possible to order measurements, but also the distance between any two measurements is known Truly quantitative No true zero -- zero point is arbitrary Example: temperature, IQ, Calendar dates, etc. RATIO SCALE is a numeric ranked data where precise difference exists and there is a true zero. This scale is characterized by the fact that equality of ratios as well as equality of intervals may be determined. Zero point represents absolute absence of the characteristic being measured (statements such as that one number is twice as much as the other number makes sense) Highest level of measurement Example: total cholesterol, weight, height, age, time, etc. EXAMPLES OF LEVELS OF MEASUREMENT Nominal Ordinal Interval Ratio Zip Code Student letter grades Temperature Age Gamma ID Academic degree IQ test Weight Religious affiliation Player ranking SAT Score Number of sales Hair color Job degree Dress size Salary Country dial code English’s Levels Shoe size Distance International English Blood types Service Quality rating Language Testing System Time (IELTS) Test of English as a Foreign Language Weekly food Nationality Military rank (TOEFL) spending Scientific major field (statistics, Number of emails Position in a race Calendar dates mathematics, etc) received in a week EXAMPLES ▪ Weights of selected cell phones. , is an example of the ---------------- level of measurement. a) nominal b) ordinal c) interval d) ratio ▪ Categories of magazines in a physician’s office (sports, women’s, health, men’s, news). is an example of the …………... level of measurement? a) nominal b) ordinal c) interval d) ratio ▪ Rankings of golfers in a is an example of the ---------------- level of measurement? a) nominal b) ordinal c) interval d) ratio HOW WELL ARE YOU PAYING ATTENTION? A study wishes to assess birth characteristics in a population. For the following variables, describe the appropriate measurement scale or type: A. discrete B. continuous C. ordinal D. nominal E. dichotomous a.____ Birthweight in grams b.____ Birthweight classified as low, medium, high c.____ Type of delivery classified as cesarean, natural UNDERSTANDING THE MEANING OF DATA Summarization techniques involve: Frequency distributions Descriptive measures SUMMARY TABLES FOR QUALITATIVE DATA A first step in organizing data is the preparation of an ordered array. An ordered array is a listing of the values in order of magnitude from the smallest to the largest value. Example: The following values represent a list of ages of subjects who participate in a study on smoking cessation: 55 46 58 54 52 69 40 65 53 58 GROUPED DATA: THE FREQUENCY DISTRIBUTION: To group a set of observations, we select a suitable set of contiguous, non-overlapping intervals such that each value in the set of observations can be placed in one, and only one, of the intervals. These intervals are called "class intervals". EXAMPLE: The following table gives the hemoglobin level (g/dl) of a sample of 50 men. EXAMPLE CONT. We wish to summarize these data using the following class intervals: 13.0 – 13.9 , 14.0 – 14.9 , 15.0 – 15.9 , 16.0 – 16.9 , 17.0 – 17.9 , 18.0 – 18.9 EXAMPLE CONT. Solution: Variable = X = hemoglobin level (continuous, quantitative) Sample size = n = 50 Max= 18.3 Min= 13.5 EXAMPLE CONT. EXAMPLE CONT. The grouped frequency distribution for the hemoglobin level of the 50 men is: Frequency: count in each category EXAMPLE CONT. Relative frequency of a value is the proportion of observations in the distribution at that value. Relative frequency Class Interval = frequency/n Frequency Relative frequency 13.0 – 13.9 3 0.06 14.0 – 14.9 5 0.10 15.0 – 15.9 15 0.30 16.0 – 16.9 16 0.32 17.0 – 17.9 10 0.20 18.0 – 18.9 1 0.02 EXAMPLE CONT. Cumulative frequency distribution is a tabulation of the Classfrequency Interval ofFrequency all measurements at or below Relative a Cumulative frequency frequency given13.0 score – 13.9 3 0.06 3 14.0 – 14.9 5 0.10 8 15.0 – 15.9 15 0.30 23 16.0 – 16.9 16 0.32 39 17.0 – 17.9 10 0.20 49 18.0 – 18.9 1 0.02 50 EXAMPLE CONT. A cumulative relative frequency distribution is a tabulation Class Freque of the relative frequencies Relative Cumulative ofCumulative all relative Interval ncy frequency frequency frequency 13.0 –measurements 13.9 3 at0.06 or below a given 3 score. 0.06 14.0 – 14.9 5 0.10 8 0.16 15.0 – 15.9 15 0.30 23 0.46 16.0 – 16.9 16 0.32 39 0.78 17.0 – 17.9 10 0.20 49 0.98 18.0 – 18.9 1 0.02 50 1.00 From frequencies: The number of people whose hemoglobin levels are between 17.0 and 17.9 = From cumulative frequencies: The number of people whose hemoglobin levels are less than or equal to 15.9 = The number of people whose hemoglobin levels are less than or equal to 17.9 = From frequencies: The number of people whose hemoglobin levels are between 17.0 and 17.9 = Question?

BST 312 Chapter 1 Introduction to Statistics PDF

Document Details

Tags

Related

Summary

Full Transcript