BST 312 Chapter 1 PDF

# BST 312 ## Chapter 1 ## SOME BASIC CONCEPTS: ### Statistics: Is the science of conducting studies to collect, organize, present (summarize), analyze, and draw conclusions from data. | | | | | | | :---------- | :---------- | :------- | :-------- | :------- | | Collect data | Organize data | Present data | Analyze data | Draw conclusions | ## WHY WE NEED TO KNOW ABOUT STATISTICS? * Statistics helps in simplifying complex data to simple, to make them understandable. * We can represent the things in their true form with the help of tables and figures. Without a statistical study, our ideas would be vague and indefinite. * With help of statistics we can frame favorable and intelligent strategies and policies. * The statistics help in shaping future policies. * Future is uncertain, but statistics help in all the phenomenon of the world to make correct estimation by taking and analyzing the various data of the part. ## DESCRIPTIVE VS. INFERENTIAL STATISTICS ### Descriptive Statistics: Consists of collecting, organizing and summarizing data in a meaningful way that makes it easy to be understood by an interested reader. ### Inferential Statistics: Consists of generalizing from samples to population, performing estimation and test of hypotheses, determining the relationship among variables and making predictions. Inferential statistics uses probability, (the chance of an event occurring). **Note:** The main difference between these branches is in inferential statistics the results will be generalized to all population while in descriptive statistics the result will remain limited to the sample or population that being study. ## INTRODUCTION **Biostatistics:** When the data is obtained from the biological sciences and medicine, we use the term "biostatistics". Biostatistics is about information; how it is obtained, how it is analyzed, and how it is interpreted. The objective of the course is to learn: * **1. How to organize, summarize, and describe data.** * (Descriptive Statistics) * **2. How to reach decisions about a large body of data by examine only a small part of the data.** * (Inferential Statistics) ## WHICH BRANCH OF STATISTICS IS USED IN THESE STATEMENTS? * Based on Ibrahim's electric bill for last year he expects that he will be paying SAR 200 for each month in this year. **Inferential statistics** * Last year's total attendance at Long Run High School's football games was 8235. **Descriptive statistics** * According to 2030 Saudi Arabia vision the non-oil governmental revenue is expected to be Increase from 163 billion SAR to 1 trillion SAR. **Inferential statistics** ## EXERCISE **Try it by yourself** **Which branch of statistics is used in these statements?** 1. Generalizing sample to population in a meaningful way. 2. The average life in China is 80 years. 3. By 2040 at least 3.5 billion people will run short of water. 4. The median household income for people aged 25–34 is $35,888 ## SOME BASIC CONCEPTS: ### Data: are the values (measurements or observations) that the variables can assume. A collection of data is a data set, while each value in the data is called a datum (data value). Data is the raw material of statistics. There are two types of data: 1. **Quantitative data**(numbers: weights, ages, ...) 2. **Qualitative data**(words or attributes: nationalities, occupations, ...) ## DATA COLLECTION SOURCES **A) Primary Data:** means 'First-hand information' collected by an investigator. * It is collected for the first time. * It is original and more reliable. * For example Population census conducted by the government after every 10 years. **B) Secondary Data** * Secondary data refers to 'Second-hand information'. * These are not originally collected rather obtained from already published or unpublished sources. ## EXAMPLES OF SOURCES OF DATA: 1. Routinely kept records. 2. Surveys. 3. Experiments. 4. External sources. (published repo ## TERMINOLOGY * **Population:** the largest collection of entities such as persons, animals, or cells for which we have an interest at a particular time * Example: If we are interested in the weights of students enrolled in the college of engineering at PSMCHS, then our population consists of the weights of all of these students, and our variable of interest is the weight * **Population Size (N):** The number of elements in the population is called the population size and is denoted by N. * **Sample:** A subset/part of the population, Fraction of the population * Example: Suppose that we are interested in studying the characteristics of the weights of the students enrolled in the college of engineering at PSMCHS. If we randomly select 50 students among the students of the college of engineering at PSMCHS and measure their weights, then the weights of these 50 students form our sample. * **Sample Size (n):** The number of elements in the sample is called the sample size and is denoted by n. ## TERMINOLOGY * **Parameter:** A descriptive measure computed from the data in a population. Usually represented by Greek letters such as Population average μ or Population standard deviation σ * Values of the parameters are unknown in general. * We are interested to know true values of the parameters * **Statistic:** A descriptive measure computed from the data in a sample. Usually represented by roman letters such as Sample average or sample standard deviation S * Values of statistics are known in general. * Since parameters are unknown, statistics are used to approximate (estimate) parameters. ## TERMINOLOGY **Variable:** A characteristic that can take on different values in different people, animals, or things. **A random variable:** is a variable which its values are determined by chance. **Example of variables:** Number of patients, Height, Gender, and Educational Level ## TYPES OF VARIABLES * **1. Quantitative Variables: ** A quantitative variable is a characteristic that can be measured. The values of a quantitative variable are numbers indicating how much or how many of something. information regarding AMOUNTS Example: Family Size, No. of patients, Weight, and Height * **2. Qualitative Variables:** The values of a qualitative variable are words or attributes indicating to which category an element belong. Example: Blood type, Nationality, Students Grades, Educational level ## TYPES OF QUANTITATIVE VARIABLES: * **(a) Discrete Variables:** There are jumps or gaps between the values. Example: Family size (x = 1, 2, 3,..), Number of patients (x = 0, 1, 2, 3,..) * **(b) Continuous Variables:** There are no gaps between the values. A continuous variable can have any value within a certain interval of values. Example: Height (140 < x < 190), Blood sugar level (10<x<15) ## TYPES OF QUALITATIVE VARIABLES: * **(a) Nominal Qualitative Variables:** A nominal variable classifies the observations into various mutually exclusive and collectively non-ranked categories. The values of a nominal variable are names or attributes that can not be ordered or sorted or ranked. Example: Blood type (O, AB, A, B), Nationality (Saudi, Egyptian, British, ...), Sex (male, female) * **(b) Ordinal Qualitative Variables:** An ordinal variable classifies the observations into various mutually exclusive and collectively ranked categories. The values of an ordinal variable are categories that can be ordered, sorted, or ranked by some criterion. Example: Educational level (elementary, intermediate, ...), Students grade (A, B, C, D, F), Military rank ## CATEGORICAL VARIABLE Organization: observations that have the same attributes are in the same category. **A dichotomous variable** is a type of categorical variable that can take on only two values, such as: * dead or alive * yes or no * female or male ## MEASUREMENT SCALES * Variables can be also classified by how they are categorized, counted or measured. * The four levels of measurement are: nominal, ordinal, interval, and ratio. | | | | :------------------ | :------- | | NOMINAL SCALE | Qualitative | | ORDINAL SCALE | | | INTERVAL SCALE | | | RATIO SCALE | Quantitative | ## NOMINAL SCALE Nominal scales categorize data into distinct categories or labels. These categories don't have any inherent order or numerical value. No mathematical computations can be made at this level. **Example:** Eye color (Blue, Brown, Green), Gender, marital status, zip codes, area of residence, Scientific major field (statistics, mathematics, computers, ## ORDINAL SCALE Consists of qualitative observations that are not only different from category to category but can be ranked according to some criterion **Example:** Likert scales: (Extreme Pain, Moderate Pain, Mild Pain, No Pain), Students grades (A+, A, B+, B,..), rating scale (poor, good, excellent), judging (first place, second place, etc.), ranking of players, etc. ## INTERVAL SCALE is a numeric ranked data where precise difference exists however there is no meaningful of zero. If it is not only possible to order measurements, but also the distance between any two measurements is known Truly quantitative No true zero -- zero point is arbitrary **Example:** temperature, IQ, Calendar dates, etc. ## RATIO SCALE is a numeric ranked data where precise difference exists and there is a true zero. This scale is characterized by the fact that equality of ratios as well as equality of intervals may be determined. Zero point represents absolute absence of the characteristic being measured (statements such as that one number is twice as much as the other number makes sense) Highest level of measurement **Example:** total cholesterol, weight, height, age, time, etc. ## EXAMPLES OF LEVELS OF MEASUREMENT | Nominal | Ordinal | Interval | Ratio | | :-------------------------------------- | :--------------------------------------------------------- | :------------------------------------------------------ | :------------------------------------------------------------------------------------------------------------ | | Zip Code | Student letter grades | Temperature | Age | | Gamma ID | Academic degree | IQ Test | Weight | | Religious affiliation | Player ranking | SAT Score | Number of sales | | Hair color | English's Levels | Dress size | Salary | | Country dial code | Job degree | Shoe size | Distance | | Blood types | Service Quality rating | International English Language Testing System (IELTS) | Time | | Nationality | Military rank | Test of English as a Foreign Language (TOEFL) | Weekly food spending | | Scientific major field (statistics, mathematics, etc) | Position in a race | Calendar dates | Number of emails received in a week | ## EXAMPLES **Weights of selected cell phones., is an example of the ......... level of measurement.** a) nominal b) ordinal c) interval d) ratio **Categories of magazines in a physician's office (sports, women's, health, men's, news). is an example of the ........... level of measurement?** a) nominal b) ordinal c) interval d) ratio **Rankings of golfers in a is an example of the -------- level of measurement?** a) nominal b) ordinal c) interval d) ratio **Temperatures inside 10 pizza ovens, is an example of the level of measurement.** a) nominal b) ordinal c) interval d) ratio ## HOW WELL ARE YOU PAYING ATTENTION? A study wishes to assess birth characteristics in a population. For the following variables, describe the appropriate measurement scale or type: a. Birthweight in grams b. Birthweight classified as low, medium, high c. Type of delivery classified as cesarean, natural * A. discrete * B. continuous * C. ordinal * D. nominal * E. dichotomous ## UNDERSTANDING THE MEANING OF DATA Summarization techniques involve: * **Frequency distributions** * **Descriptive measures** ## SUMMARY TABLES FOR QUALITATIVE DATA A first step in organizing data is the preparation of an ordered array. An ordered array is a listing of the values in order of magnitude from the smallest to the largest value. **Example:** The following values represent a list of ages of subjects who participate in a study on smoking cessation: 55 46 58 54 52 69 40 65 53 58 The ordered array is: ## GROUPED DATA: THE FREQUENCY DISTRIBUTION: To group a set of observations, we select a suitable set of contiguous, non-overlapping intervals such that each value in the set of observations can be placed in one, and only one, of the intervals. These intervals are called "class intervals". ## EXAMPLE: The following table gives the hemoglobin level (g/dl) of a sample of 50 men. | 17.0 | 17.7 | 15.9 | 15.2 | 16.2 | 17.1 | 15.7 | 17.3 | 13.5 | 16.3 | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | 14.6 | 15.8 | 15.3 | 16.4 | 13.7 | 16.2 | 16.4 | 16.1 | 17.0 | 15.9 | | 14.0 | 16.2 | 16.4 | 14.9 | 17.8 | 16.1 | 15.5 | 18.3 | 15.8 | 16.7 | | 15.9 | 15.3 | 13.9 | 16.8 | 15.9 | 16.3 | 17.4 | 15.0 | 17.5 | 16.1 | | 14.2 | 16.1 | 15.7 | 15.1 | 17.4 | 16.5 | 14.4 | 16.3 | 17.3 | 15.8 | ## EXAMPLE CONT. We wish to summarize these data using the following class intervals: * 13.0 – 13.9, 14.0 – 14.9, 15.0 – 15.9, * 16.0 – 16.9, 17.0 – 17.9, 18.0 – 18.9 ## EXAMPLE CONT. Solution: Variable = X = hemoglobin level (continuous, quantitative) Sample size = n = 50 Max= 18.3 Min= 13.5 ## EXAMPLE CONT. | Class Interval | Tally | Frequency | | :------------ | :---- | :-------- | | 13.0-13.9 | ||| 3 | | 14.0-14.9 | ||| 5 | | 15.0-15.9 | ||| 15 | | 16.0-16.9 | ||| 16 | | 17.0-17.9 | ||| 10 | | 18.0-18.9 | ||| 1 | ## EXAMPLE CONT. The grouped frequency distribution for the 50 men is: | Class Interval | Frequency (no. of men) | | :------------------------- | :---------------------- | | 13.0 – 13.9 | 3 | | 14.0 – 14.9 | 5 | | 15.0 – 15.9 | 15 | | 16.0 – 16.9 | 16 | | 17.0 – 17.9 | 10 | | 18.0 - 18.9 | 1 | | **Total** | **n=50** | **Frequency:** count in each category ## EXAMPLE CONT. Relative frequency of a value is the proportion of observations in the distribution at that value. | Relative frequencyquency | Relative frequency | | :------------------------ | :----------------- | | 13.0-13.9 | 0.06 | | 14.0-14.9 | 0.10 | | 15.0-15.9 | 0.30 | | 16.0-16.9 | 0.32 | | 17.0-17.9 | 0.20 | | 18.0-18.9 | 0.02 | ## EXAMPLE CONT. Cumulative frequency distribution is a tabulation of the frequency of all measurements at or below a given score | Class Interval | Frequency | Relative frequency | Cumulative frequency | Cumulative relative frequency | | :------------ | :-------- | :----------------- | :-------------------- | :-------------------------- | | 13.0-13.9 | 3 | 0.06 | 3 | 0.06 | | 14.0-14.9 | 5 | 0.10 | 8 | 0.16 | | 15.0-15.9 | 15 | 0.30 | 23 | 0.46 | | 16.0-16.9 | 16 | 0.32 | 39 | 0.78 | | 17.0-17.9 | 10 | 0.20 | 49 | 0.98 | | 18.0-18.9 | 1 | 0.02 | 50 | 1.00 | ## EXAMPLE CONT. A cumulative relative frequency distribution is a tabulation of the relative frequencies of all measurements at or below a given score | Class Interval | Frequenc y | Relative frequency | Cumulative frequency | Cumulative relative frequency | | :------------ | :---------- | :----------------- | :-------------------- | :-------------------------- | | 13.0-13.9 | 3 | 0.06 | 3 | 0.06 | | 14.0-14.9 | 5 | 0.10 | 8 | 0.16 | | 15.0-15.9 | 15 | 0.30 | 23 | 0.46 | | 16.0-16.9 | 16 | 0.32 | 39 | 0.78 | | 17.0-17.9 | 10 | 0.20 | 49 | 0.98 | | 18.0-18.9 | 1 | 0.02 | 50 | 1.00 | ## EXAMPLE CONT | Class Interval | Frequency | Relative frequency | Cumulative frequency | Cumulative relative frequency | | :------------ | :-------- | :----------------- | :-------------------- | :-------------------------- | | 13.0-13.9 | 3 | 0.06 | 3 | 0.06 | | 14.0-14.9 | 5 | 0.10 | 8 | 0.16 | | 15.0-15.9 | 15 | 0.30 | 23 | 0.46 | | 16.0-16.9 | 16 | 0.32 | 39 | 0.78 | | 17.0-17.9 | 10 | 0.20 | 49 | 0.98 | | 18.0-18.9 | 1 | 0.02 | 50 | 1.00 | **From frequencies:** The number of people whose hemoglobin levels are between 17.0 and 17.9 = ## EXAMPLE CONT | Class Interval | Frequency | Relative frequency | Cumulative frequency | Cumulative relative frequency | | :------------ | :-------- | :----------------- | :-------------------- | :-------------------------- | | 13.0-13.9 | 3 | 0.06 | 3 | 0.06 | | 14.0-14.9 | 5 | 0.10 | 8 | 0.16 | | 15.0-15.9 | 15 | 0.30 | 23 | 0.46 | | 16.0-16.9 | 16 | 0.32 | 39 | 0.78 | | 17.0-17.9 | 10 | 0.20 | 49 | 0.98 | | 18.0-18.9 | 1 | 0.02 | 50 | 1.00 | **From cumulative frequencies:** * The number of people whose hemoglobin levels are less than or equal to 15.9 = * The number of people whose hemoglobin levels are less than or equal to 17.9 = ## EXAMPLE CONT | Class Interval | Frequency | Relative frequency | Cumulative frequency | Cumulative relative frequency | | :------------ | :-------- | :----------------- | :-------------------- | :-------------------------- | | 13.0-13.9 | 3 | 0.06 | 3 | 0.06 | | 14.0-14.9 | 5 | 0.10 | 8 | 0.16 | | 15.0-15.9 | 15 | 0.30 | 23 | 0.46 | | 16.0-16.9 | 16 | 0.32 | 39 | 0.78 | | 17.0-17.9 | 10 | 0.20 | 49 | 0.98 | | 18.0-18.9 | 1 | 0.02 | 50 | 1.00 | **From frequencies:** The number of people whose hemoglobin levels are between 17.0 and 17.9 = ## Question?

BST 312 Chapter 1 PDF

Document Details

Tags

Related

Summary

Full Transcript