Biostatistics Lecture Notes (PDF)
Document Details
Uploaded by Deleted User
Tags
Summary
These lecture notes cover an introduction to biostatistics, explaining fundamental concepts and examples. The notes introduce data, data sets, parameters, and different sampling techniques. This includes various examples covering descriptive and inferential statistics.
Full Transcript
Course: Biostatistics Lecture No: [ 2 ] Chapter: [ 1 ] Introduction to Statistics Section: [ 1.1 ] An Overview of Statistics What is DATA? Data: Consist of information coming from observations, counts, measurements, or responses. Example: According to a survey, more than 7 in 10 Ame...
Course: Biostatistics Lecture No: [ 2 ] Chapter: [ 1 ] Introduction to Statistics Section: [ 1.1 ] An Overview of Statistics What is DATA? Data: Consist of information coming from observations, counts, measurements, or responses. Example: According to a survey, more than 7 in 10 Americans say a nursing career is a prestigious occupation () ﻨﺔ ﻣﺮﻣﻮﻗﺔ. Example: “Social media consumes kids today as well, as more score their first social media accounts at an average age of 11.4 years old.” What is STATISTICS? Statistics: The science of collecting, organizing, analyzing, and interpreting data in order to make decisions. Data Sets Population: The collection of all outcomes, responses, measurements, or counts that are of interest. Sample: A subset, or part, of the population. Data Sets Example: Identifying Data Sets In a recent survey, 834 employees in the United States were asked if they thought their jobs were highly stressful. Of the 834 respondents, 517 said yes. 1. Identify the population and the sample. Population: the responses of all employees in the U.S. Sample: the responses of the 834 employees in the survey. 2. Describe the sample data set. The data set consists of 517 YES’s and 317 NO’s. Parameter and Statistic Parameter Statistic A numerical description of A numerical description of a population characteristic. a sample characteristic. Average age of all people in Average age of people from JORDAN. a sample of three cities. ̅ Parameter and Statistic Example: Decide whether each number describes a population parameter or a sample statistic. A survey of several hundred collegiate student-athletes in the United States found that, during the season of their sport, the average time spent on athletics by student-athletes is 50 hours per week. Because the average of 50 hours per week is based on a subset of the population, it is a sample statistic. Parameter and Statistic Example: Decide whether each number describes a population parameter or a sample statistic. The freshman class at a university has an average SAT math score of 514. Because the average SAT math score of 514 is based on the entire freshman class, it is a population parameter. Parameter and Statistic Example: Decide whether each number describes a population parameter or a sample statistic. In a random check of several hundred retail stores, the Food and Drug Administration found that 34% of the stores were not storing fish at the proper temperature. Because 34% is based on a subset of the population, it is a sample statistic. Branches of Statistics Descriptive Statistics Inferential Statistics Involves organizing, summarizing, Involves using sample data and displaying data. to draw conclusions about a population. Tables, charts, averages. Branches of Statistics A study of 2560 U.S. adults found that of Example: adults not using the For the following study: Internet, 23% are 1. Identify the population and from households the sample. earning less than 2. Then determine which part $30000 annually, as of the study represents the shown in the figure. descriptive branch of statistics. The population consists of the 3. What conclusions might be responses of all U.S. adults, and the drawn from the study using sample consists of the responses of the inferential statistics? 2560 U.S. adults in the study. Branches of Statistics A study of 2560 U.S. adults found that of Example: adults not using the For the following study : Internet, 23% are 1. Identify the population and from households the sample. earning less than 2. Then determine which part $30000 annually, as of the study represents the shown in the figure. descriptive branch of statistics. The descriptive branch of statistics 3. What conclusions might be involves the statement 23% of U.S. drawn from the study using adults not using the Internet are from inferential statistics? households earning less than $30000 annually. Branches of Statistics A study of 2560 U.S. adults found that of Example: adults not using the For the following study : Internet, 23% are 1. Identify the population and from households the sample. earning less than 2. Then determine which part $30000 annually, as of the study represents the shown in the figure. descriptive branch of statistics. A possible inference drawn from the 3. What conclusions might be study is that lower-income households drawn from the study using cannot afford access to the Internet. inferential statistics? Couse: Biostatistics Lecture No: Chapter: Introduction to Statistics Section: [1.2] Data Classification Types of Data Qualitative Data Quantitative Data Consists of attributes, labels, or Numerical measurements or nonnumerical entries. counts. Major Age Place of Weight Birth Eye color Temperature Course: Biostatistics Lecture No: Chapter: Introduction to Statistics Section: [1.3] Data Collection and Experimental Design DESIGN OF A STATISTICAL STUDY The goal of every statistical study is to collect data and then use the data to make a decision. Before interpreting the results of a study, you should be familiar with how to design a statistical study. DESIGN OF A STATISTICAL STUDY Designing a Statistical Study 1. Identify the variable(s) of interest (the focus) and the population of the study. 2. Develop a detailed plan for collecting data. If you use a sample, make sure the sample is representative of the population. 3. Collect the data. 4. Describe the data, using descriptive statistics techniques. 5. Interpret the data and make decisions about the population using inferential statistics. 6. Identify any possible errors. DESIGN OF A STATISTICAL STUDY Categories of a Statistical Study Observational Study Experiment A researcher does not influence A researcher deliberately (ً )ﻣﺗﻌﻣﺩﺍ the responses. applies a treatment before observing the responses. Example: an observational study was performed in which researchers observed and recorded the mouthing behavior on nonfood objects of children up to three years old. DESIGN OF A STATISTICAL STUDY Categories of a Statistical Study Experiment Another part of the population may A researcher deliberately (ً )ﻣﺗﻌﻣﺩﺍ be used as a control group, in applies a treatment before which no treatment is applied. observing the responses. (The subjects in both groups are called experimental units.) A treatment is applied to In many cases, subjects in the part of a population, called control group are given a placebo, a treatment group, and which is a harmless, fake treatment responses are observed. that is made to look like the real treatment. DESIGN OF A STATISTICAL STUDY Categories of a Statistical Study Experiment Example: An experiment was performed in A researcher deliberately (ً )ﻣﺗﻌﻣﺩﺍ which diabetics took cinnamon applies a treatment before extract daily while a control group observing the responses. took none. After 40 days, the diabetics who took the cinnamon It is a good idea to use the reduced their risk of heart disease same number of subjects while the control group for each group. experienced no change. DESIGN OF A STATISTICAL STUDY Example: Observational Study or an Experiment ? Researchers study the effect of vitamin supplementation among patients with antibody deficiency or frequent respiratory tract infections. To perform the study, 70 patients receive 4000 IU of vitamin daily for a year. Another group of 70 patients receive a placebo daily for one year. Because the study applies a treatment (vitamin D3) to the subjects, the study is an experiment. DESIGN OF A STATISTICAL STUDY Example: Observational Study or an Experiment ? Researchers conduct a study to find the U.S. public approval rating of the U.S. president. To perform the study, researchers call 1500 U.S. residents and ask them whether they approve or disapprove of the job being done by the president. Because the study does not attempt to influence the responses of the subjects (there is no treatment), the study is an observational study. DATA COLLECTION Simulation Survey Uses a mathematical or physical model An investigation of one or more to reproduce the conditions of a characteristics of a population. situation or process. Surveys are carried out on people by Often involves the use of computers. asking them questions. Allow you to study situations that are Commonly done by interview, Internet, impractical or even dangerous to create phone, or mail. in real life. In designing a survey, it is important to Often save time and money. word the questions so that they do not Example: automobile manufacturers lead to biased results, which are not use simulations with dummies to study representative of a population. the effects of crashes on humans. EXPERIMENTAL DESIGN To produce meaningful unbiased results, experiments should be carefully designed and executed. Three key elements of a well-designed experiment are: 1. Control 2. Randomization 3. Replication EXPERIMENTAL DESIGN Control Because experimental results can be ruined ( )ﺗﺗﺄﺛﺭby a variety of factors, being able to control these influential factors ( )ﺍﻟﻌﻭﺍﻣﻝ ﺍﻟﻣﺅﺛﺭﺓis important. One such factor is a confounding variable ()ﺍﻟﻣﺗﻐﻳﺭ ﺍﻟﻣﺿﻠﻝ. A confounding variable occurs when an experimenter cannot tell the difference between the effects of different factors on the variable. Example: A coffee shop owner remodels her shop at the same time a nearby mall has its grand opening. If business at the coffee shop increases, it cannot be determined whether it is because of the remodeling or the new mall. EXPERIMENTAL DESIGN Control Another factor that can affect experimental results is the placebo effect. The placebo effect occurs when a subject reacts favorably to a placebo when in fact the subject has been given a fake treatment. To help control or minimize the placebo effect, a technique called blinding can be used. Blinding is a technique where the subject does not know whether he or she is receiving a treatment or a placebo. Double-Blind experiment neither the subject nor the experimenter knows if the subject is receiving a treatment or a placebo. EXPERIMENTAL DESIGN Randomization Randomization is a process of randomly assigning subjects to different treatment groups. Randomized block design: Divide subjects with similar characteristics into blocks, and then within each block, randomly assign subjects to treatment groups. Example: An experimenter who is testing the effects of a new weight loss drink may first divide the subjects into age categories and then, within each age group, randomly assign subjects to either the treatment group or the control group EXPERIMENTAL DESIGN Replication Replication is the repetition of an experiment under the same or similar conditions. Sample size, which is the number of subjects in a study, is another important part of experimental design. Example: suppose an experiment is designed to test a vaccine against a strain of influenza. In the experiment, 10,000 people are given the vaccine and another 10,000 people are given a placebo. Because of the sample size, the effectiveness of the vaccine would most likely be observed. But, if the subjects in the experiment are not selected so that the two groups are similar (according to age and gender), the results are of less value. EXPERIMENTAL DESIGN Example A company wants to test the effectiveness of a new gum developed to help people quit smoking. Identify a potential problem with the given experimental design and suggest a way to improve it. The company identifies ten adults Problem who are heavy smokers. Five of the The sample size being used is not subjects are given the new gum and the large enough. other five subjects are given a placebo. After two months, the subjects are Solution evaluated, and it is found that the five The experiment must be replicated subjects using the new gum have quit smoking. EXPERIMENTAL DESIGN Example A company wants to test the effectiveness of a new gum developed to help people quit smoking. Identify a potential problem with the given experimental design and suggest a way to improve it. The company identifies one Problem thousand adults who are heavy The groups are not similar. The new smokers. The subjects are gum may have a greater effect on divided into blocks according to women than men, or vice versa. gender. After two months, the Solution female group has a significant They must be randomly assigned to be number of subjects who have in the treatment group or the control quit smoking. group. Sampling Techniques A census ( )ﺍﻟﺗﻌﺩﺍﺩ ﺍﻟﺳﻛﺎﻧﻲis a count or measure of an entire population. Taking a census provides complete information, but it is often costly and difficult to perform. A sampling is a count or measure of part of a population and is more commonly used in statistical studies. To collect unbiased ( )ﻏﻳﺭ ﻣﺗﺣﻳﺯdata, a researcher must ensure that the sample is representative of the population. Even with the best methods of sampling, a sampling error may occur. A sampling error is the difference between the results of a sample and those of the population. Sampling Techniques A random sample is one in which every member of the population has an equal chance of being selected. A simple random sample is a sample in which every possible sample of the same size has the same chance of being selected. One way to collect a simple random sample is to assign a different number to each member of the population and then use a random number tables, calculators or computer software programs to generate random numbers. Sampling Techniques When you choose members of a sample, you should decide whether it is acceptable to have the same population member selected more than once. If it is acceptable, then the sampling process is said to be with replacement. If it is not acceptable, then the sampling process is said to be without replacement. There are several other commonly used sampling techniques. Each has advantages and disadvantages. 1. Stratified Sample ()ﺍﻟﻌﻳﻧﺔ ﺍﻟﻁﺑﻘﻳﺔ 2. Cluster Sample ()ﺍﻟﻌﻳﻧﺔ ﺍﻟﻌﻧﻘﻭﺩﻳﺔ 3. Systematic Sample ()ﺍﻟﻌﻳﻧﺔ ﺍﻟﻣﻧﻬﺟﻳﺔ 4. Convenience Sample ()ﺍﻟﻌﻳﻧﺔ ﺍﻟﻣﺭﻳﺣﺔ Sampling Techniques Stratified Sample Divide a population into groups (strata) and select a random sample from each group. Example: To collect a stratified sample of the number of people who live in Amman households, you could divide the households into socioeconomic ( )ﺍﻟﻭﺿﻊ ﺍﻻﻗﺗﺻﺎﺩﻱ ﻭﺍﻻﺟﺗﻣﺎﻋﻲlevels and then randomly select households from each level. Sampling Techniques Cluster Sample Divide the population into groups (clusters) and select all of the members in one or more, but not all, of the clusters. In the Amman example you could divide the households into clusters according to zones, then select all the households in one or more, but not all, zones. Sampling Techniques Systematic Sample Choose a starting value at random. Then choose every th member of the population. In the Amman example you could assign a different number to each household, randomly choose a starting number, then select every 100th household. Sampling Techniques Convenience Sample Choose only members of the population that are easy to get. Often leads to biased studies (not recommended). Sampling Techniques Example: Identifying Sampling Techniques. You are doing a study to determine the opinion of students at your school regarding stem cell research. Identify the sampling technique used. You divide the You assign each student You select students student population with a number and generate who are in your biology respect to majors and random numbers. class. randomly select and You then question each question some students student whose number is in each major. randomly selected. Stratified Sampling Simple Random Sample Convenience Sample Course: Biostatistics Lecture No: Chapter: Descriptive Statistics Section: [2.3] Measures of Central Tendency Where You have Been You learned that there are many ways to collect data. Usually, researchers must work with sample data in order to analyze populations. Occasionally ()ﺑﻳﻥ ﺍﻟﺣﻳﻥ ﻭﺍﻵﺧﺭ, it is possible to collect all the data for a given population. Where You are Going You will take a review of some ways to organize and describe data sets. The goal is to make the data easier to understand by describing trends, averages, and variations. MEAN A measure of central tendency is a value that represents a typical, or central, entry of a data set. The three most used measures of central tendency are 1. the mean, 2. the median, 3. and the mode. MEAN The mean (average) of a data set is the sum of the data entries (∑ ) divided by the number of entries ( or ). To find the mean of a data set, use one of these formulas: ∑ Population Mean: = ∑ Sample Mean: ̅ = MEAN Example: The weights (in pounds) for a sample of adults before starting a weight-loss study are listed. What is the mean weight of the adults? 274 235 223 268 290 285 235 ∑ ̅= 274 + 235 + 223 + 268 + 290 + 285 + 235 = 7 1810 = ≈ 258.57 The mean weight of the adults 7 is about 258.6 pounds MEAN Advantage of using the mean: The mean is a reliable measure because it considers every entry of a data set. Disadvantage of using the mean: Greatly affected by outliers (a data entry that is far removed from the other entries in the data set). MEAN OF GROUPED DATA [WEIGHTED MEAN] Example: Consider the following data: 1 4 3 2 3 2 2 1 3 4 1 + 4 + 3 + ⋯+ 1 + 2 + 3 ̅= 20 ?! 3 2 1 4 3 1 1 2 1 3 ∑ ⋅ ̅= ∑ Frequency 1×6 + 2×5 + 3×6 + 4×3 1 6 = 2 5 20 6 + 10 + 18 + 12 3 6 = 4 3 20 = 2.3 20 Course: Biostatistics Lecture No: Chapter: Descriptive Statistics Section: [2.4] Measures of Variation RANGE In this section, you will learn different ways to measure the variation (or spread) of a data set. The simplest measure is the range of the set. Range = (Maximum data entry) − (Minimum data entry) The range has the advantage of being easy to compute. Its disadvantage is that it uses only two entries from the data set. VARIANCE AND STANDARD DEVIATION Two measures of variation that use all the entries in a data set. Before you learn about these measures of variation, you need to know what is meant by the deviation of an entry in a data set. The deviation of an entry in a population data set is the difference between the entry and the mean of the data set. [Deviation of ] = − ∑ − =0 VARIANCE AND STANDARD DEVIATION − Example: The average of the data is = 8. 11 3 7 −1 3 −5 13 5 6 −2 Σ=0 VARIANCE AND STANDARD DEVIATION ∑ − Population Variance = ∑ ∑ − = ∑ ∑ − ∑ − Population Standard = = Deviation VARIANCE AND STANDARD DEVIATION Example: Find the population variance − of the following data. 14 25 12 9 =8 6 9 13 16 2 49 14 + 12 + 6 + 13 + 2 + 11 + 9 + 5 11 4 = 8 9 0 =9 5 16 128 128 Variance = = 16 Standard Deviation: = 16 = 4 8 VARIANCE AND STANDARD DEVIATION ∑ − ̅ Sample Variance = −1 ∑ ∑ − = −1 ∑ ∑ − ̅ ∑ − Population Standard = = Deviation −1 −1 VARIANCE AND STANDARD DEVIATION Example: Find the sample variance of the following data. 10 100 5 25 =8 3 9 6 36 52 450 − 8 64 Variance = 8 = 16 2 4 8−1 4 16 14 196 Standard Deviation: = 16 = 4 52 450 VARIANCE AND STANDARD DEVIATION NOTES The standard deviation measures the variation of the data set about the mean and has the same units of measure as the data set. The standard deviation is always greater than or equal to 0. When = 0, the data set has no variation, and all entries have the same value. As the entries get farther from the mean (that is, more spread out), the value of increases. EMPIRICAL RULE (or 68 – 95 – 99.7 RULE) For data sets with distributions that are approximately symmetric and bell-shaped, the standard deviation has these characteristics. About 68% of the data lie within one standard deviation of the mean. About 95% of the data lie within two standard deviations of the mean. About 99.7% of the data lie within three standard deviations of the mean.