Biostat Compilation PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides a concise introduction to the field of biostatistics. It covers definitions, various types of statistics (descriptive and inferential), and the use of different types of statistics in the field.
Full Transcript
Biostat Introduction to Biostatistics What is statistics? 1. Statistics is an aggregate or a collection of numerical facts/figures(plural sense). Example:- vital statistics( numerical data on marriages, births, divorces, … etc), social statistics ( numeri...
Biostat Introduction to Biostatistics What is statistics? 1. Statistics is an aggregate or a collection of numerical facts/figures(plural sense). Example:- vital statistics( numerical data on marriages, births, divorces, … etc), social statistics ( numerical data on health, education, crime, … etc). In the plural sense again, statistics refers to numerical measures obtained from a sample. Example: sample mean, sample 2. Statistics is the science of collecting, organizing, presenting, analyzing and interpreting numerical data(Singular sense). Statistics is used in almost all fields of study; such as natural science, social science, engineering, medicine, agriculture, etc. Importance and applications of statistics Statistical methods are applied to any kind of situation in the face of uncertainty. Uses of statistics 1. It helps to present data in definite and precise form 2. It helps to data reduction 3. It helps to forecast for the future 4. It helps to identify the relationship between two or more variables 5. It helps to formulate and test hypothesis 6. It helps to formulate policy and decision making Limitations of statistics 1. Statistics are numerically expressed; qualitative statements are not statistics. 2. Statistics are aggregates of facts. Single and isolated figures are not statistics. 3. Statistical data are only approximately and not mathematically correct. Generally the field of statistics can be divided into two: 1. Mathematical Statistics: study and develop statistical theory and methods in the abstract. 2. Applied Statistics: application of statistical methods to solve real problems involving randomly generated data and the development of new statistical methodology motivated by real problems. __Bio-statistics is the branch of applied statistics. Biostatistics is the application of statistical methods & tools on data derived from biological, health science and medicine. Uses of biostatistics Provide methods of organizing data. Assessment of health status Health program evaluation Resource allocation Magnitude of association – Strong versus weak association between exposure and outcome Assessing risk factors. Evaluation of a new vaccine or drug – How effective is the vaccine (drug)? – Is the effect due to chance or some bias? Drawing of inferences – Information from sample to population Classification of statistics Based on how data are used, statistics can be classified in to two:- A. Descriptive statistics consists of It is concerned with only describing the data without going to further. It consists of : _the collection, _organization, _summarizing and presentation of data Data can be described using graphs, charts, tables Examples: suppose that a sample of result of 6 students were 45, 60, 72, 80, 85 and 93. - Half of them scored below 75. - The average score of the 6 students is 72.5. - The range of the 6 students is 48. B. Inferential statistics It is concerned with drawing conclusion ( that is, making inference) about a population based on sample data. It consists of: _performing estimations and hypothesis tests _ determining relationships among variables _making prediction and _ generalizing from sample to population. It is important because statistical data usually arises from sample. Inferential statistics uses probability, i.e., the chance of an event occurring. Example: suppose that a sample of result of 6 students were 45, 60, 72, 80, 85 and 93. If the instructor declares (based on this sample result) that the average score of the whole class is 72, this it inferential statistics. Definition of Some Basic terms Population: is the complete collection of individuals, objects, or measurements under investigation for a given objective. Target population: a collection of items that have something in common for which we wish to draw conclusions at a particular time. Study Population: the specific population from which data are collected · Sample: a subset of a study population, about which information is actually obtained. · Census: is total count or complete enumeration of a certain population. Example:- In a study of the prevalence of HIV among orphan children in Addis Ababa, a random Sampl e sample of orphan children in Lideta Kifle Study Population Ketema were included. Target Population Target Population: All orphan children in Addis Ababa Study population: All orphan children in 15 Stages in statistical investigation There are five stages or steps in any statistical investigation 1. Collection of data: The process of measuring, gathering, assembling the raw data up on which the statistical investigation is to be based. Careful planning is essential before collecting the data. 2. Organization of data: Summarization of data in some meaningful way, e.g table form. It helps to have a clear understanding of the information gathered which includes Editing Coding Tabulation of data. 3. Presentation of the data: The process of re- organization, classification, compilation, and summarization of data to present it in a meaningful form by using tabular or diagrammatic or graphic form. It helps to have an overall view of what the data actually look like, and to facilitate further statistical analysis by using table, graphs, and diagrams. 4. Analysis of data: The process of extracting relevant information from the summarized data, mainly through the use of elementary mathematical operation like Measures of central tendency and measures of variation 5. Inference of data: This is drawing conclusion from the data collected and analyzed; A valid conclusion must be drawn on the basis of analysis. This is difficult task and requires a high degree of skill and experience Statistical techniques based on probability theory are required. Data Handling Variable, Data and Information Variable is an attribute or a characteristics of an item /object which can take any value. Data are the values or measurements or observations that the variables can assume. __ Data are raw, unorganized facts that need to be processed. When data are processed, organized, structured or presented in a given context so as to make it useful, it is called information. Biostatistics/ Statistics is the tool which converts data to information Types of variable It is helpful to divide variables into different types, as different statistical methods are applicable to each. Based on their nature variables can be classified as: A. Qualitative Variables: are non-numeric variables and can't be measured in quantitative form. Example: place of birth, stages of breast cancer, degree of pain. B. Quantitative Variables: are numerical variables and that can be measured and expressed numerically. Example: patients age, patients’ weight Quantitative Variables can be classified into two Discrete variable is a variable which can take countable values. Example : o Number of daily admissions to a hospital o Number of students in different classes Continuous variable can assume an infinite number of values between any two specific values. e.g. Heights , Weight, Age Scales of measurement According to the scale of measurement, data can be classified as nominal, ordinal, interval and ratio data. 1. Nominal scale: uses names, labels, or symbols to assign each measurement to one of a limited number of categories that cannot be ordered. All mathematical operations and comparisons are impossible Examples: Blood type, sex, race, marital status 2. Ordinal scale: assigns each measurement to one of a limited number of categories that are ranked in terms of a graded order. Examples: Patient status(Improved or unimproved), Cancer stages, rating scale(i.e excellent, very good….). 3. Interval scale: Assigns numerical value to each measurement. It allows comparison of difference between two objects Meaningful addition and subtraction of scale values are possible There is no true zero Example: Temperature and IQ level of person 4. Ratio scale: Assigns numerical value to each measurement. There exist a true zero point. Examples: Height, weight, blood pressure Degree of precision measuring Degree of precision in Nomina l Ordinal Interval Ratio 25 Variables can be again classified in to two broad categories: Dependent variable Can be also called response or outcome or predicted variable. It is the focus of the research Affected by other (independent) variables. Independent variables Can be also called explanatory or predictor or covariate or factor variable Affects the outcome variable Example Global Burden of Non-communicable diseases and risk factors. They are by far the leading cause of death in the Region, representing 62% of all annual deaths. The outcome variable is burden of NCD NCD risk factors include: Tobacco, physical inactivity, Obesity and Unhealthy diet On the basis of role of time, data can be classified as : 1. Cross-sectional data – observations taken at one point in time. 2. Time series data – observations taken for a series of periods, usually at equal intervals, may be on a weekly, monthly, quarterly, yearly, etc, basis. Methods of data collection Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is statistical data when they are _Comparable _ Meaningful and _Collected for a well defined objective Before going to determine/identify methods of data collections we need to identify the actual sources of data. According to the source of the data, there are two methods of data collection : 1. Primary method: Data can be measured or collected by the investigator or the user directly from the source. 2. Secondary method: Data gathered or compiled from published and unpublished sources or files. From journals, reports, publications of government or professionals and research organizations. E.g. CSA: Census, DHS Methods of collecting primary data Mainly, primary data collection methods are broad categorized as:- 1. Qualitative data collection methods: play an important role in impact evaluation by providing useful information to understand the processes behind observed results and assess changes in people's perceptions of their well-being. These methods are characterized by the following attributes: They tend to be open-ended and have less structured protocols, They rely more heavily on interactive interviews, They use triangulation to increase the credibility of their findings(researchers use multiple data collection methods to check the authenticity of their results) Generally their findings are not generalizable to any specific population. The qualitative methods most commonly used in evaluation can be classified in four broad categories: i. In-depth interview ii. Document review iii. Observation methods i. In-depth interview method is a useful qualitative data collection technique that can be used for a variety of purposes, including _needs assessment _program refinement _issue identification and _strategic planning. Most appropriate for situations in which you want to ask open-ended questions that elicit depth of information from relatively few people. ii. Document review is a way of collecting data by reviewing existing documents. iii. Observation method is one of the most common methods for qualitative data collection, It requires that the researcher become a participant in the culture or context being observed. iv. A focus group discussion is a group interview of approximately six to twelve people who share similar characteristics or common interests. 2. Quantitative data collection methods: rely on random sampling and structured data collection instruments that fit diverse experiences into predetermined response categories. They produce results that are easy to summarize, compare, and generalize. Typical quantitative data gathering strategies include: 1. Administering surveys with closed- ended questions 2. Experiments(clinical trials). 1. Survey method Data can be collected in a variety of ways. One of the most common methods is through the use of surveys method. The most common administering survey methods with closed-ended questions are, _face-to-face interviews _telephone interviews, _Computer Assisted Personal Interviewing(CAPI) Telephone survey It has an advantage over personal interview surveys in that they are less costly. It has also a drawback for which some people in the population will not have phones or will not answer when the calls are made; hence, not all people have a chance of being surveyed. Also, many people now have unlisted numbers and cell phones, so they cannot be surveyed. Personal interview surveys have the advantage of obtaining in- depth responses to questions from the person being interviewed. One disadvantage is that interviewers must be trained in asking questions and recording responses, Which makes more costly than the other survey methods. Interviewers may be biased in his or her selection of respondents. Computer Assisted Personal Interviewing (CAPI) It is a form of personal interviewing using laptop(Tab) or hand-held computer to enter the information directly into the database. In this method the questionnaires are uploaded on the computer system Advantages: Saves time involved in processing the data. Saves the interviewer from carrying around hundreds of questionnaires. Mailed questionnaire It is a self-report data collection instrument filled out by research participants. Researchers use questionnaires to obtain information about the thoughts, feeling, attitudes, beliefs, values, perceptions, personality, behavioral intentions, and other necessary characteristics of research participants. can be used to cover a wider geographic area than telephone surveys or personal interviews since it is less expensive to conduct. Its drawbacks include a low number of responses, inappropriate answers to questions and some people may have difficulty reading or understanding the questions. Questionnaire design principles should be clear and unambiguous: Questions should not be too difficult to answer Responses should be mutually exclusive if only one option can be chosen Avoid leading questions Avoid biased scaling Avoid overly complicated questions Avoid very similar questions Questions can be: 1. Open-ended Questions: are questions that invite free ranging responses Such responses are extremely useful for obtaining a deep understanding of the respondents' views and behavior. They are only suited to qualitative and small quantitative surveys. 2.Closed-ended Questions: these questions invite a response that is fitted into a predefined answer. Ethical considerations The main ethical considerations include: All respondents have provided informed consent All respondents know how the information will be used, why it is being collected, and by whom All are guaranteed that their participation will not affect their safety or security When interviewing children, always seek permission from their parents or other Secondary data sources To use secondary data, consider the following points – Does it cover the correct geographical location? – Is it current (not too out of date)? – If you are going to combine with other data, are the data the same (for example, units, time, etc.)? – If you are going to compare with other data, are you comparing like with like? Generally, when our source is secondary data check that: The purpose for which the data are collected and compatible with the present problem. The nature and classification of data is appropriate to our problem. Whether there are no biases and misreporting in the published data. Note: Data which are primary for one may be secondary for the other. Methods of data presentation Before discussing about the presentation of data we need to know about data cleaning and editing. Due to a large number of reasons the dataset may be dirty or incorrect. Data cleaning is a process to cleaning the data set to have appropriate data. and its applications are __ Removes major errors. __Removes inconsistencies that are most likely occur when multiple sources of data are store into one data-set. __Data Cleaning make the data-set more efficient, more reliable and more accurate Common data presentation methods are: 1. Tabular method 2. Diagrammatic method 3. Graphic method. Tabular method/Frequency distributions Definitions: Frequency: is the number of observations belonging to a given value/group or category. Frequency distribution: is the organization of raw data in table form using classes and frequencies. There are three basic types of frequency distributions 1. Categorical frequency distribution 2. Ungrouped frequency distribution 3. Grouped frequency distribution Categorical frequency Distribution Used for data that can be place in specific categories such as nominal, or ordinal. e.g. marital status. Steps to construct categorical frequency distribution Step 1:Make a table as shown. Class Tally Frequency Percent (1) (2) (3) (4) Step 2: Tally the data and place the result in column (2). Step 3: Count the tally and place the result in column (3). Step 4: Find the percentages of values in each class by using Where f= frequency of the class, n=total number of value. 49 Step 5: Find the total for column (3) and (4) Example: A social worker collected the following data on marital status for 25 M S D persons.(M=married, W DS=single, S S M M M W=widowed, D=divorced) W D S M M W D D S S S W W D D 50 Combing all steps one can construct the following frequency distribution. Class Tally Frequency Percent (1) (2) (3) (4) M //// / 6 24 S //// // 7 28 D //// // 7 28 W //// 5 20 51 Ungrouped frequency Distribution is a table of all the potential raw score values that could possible occur in the data along with the number of times each actually occurred. often constructed for small set or data on discrete80variable. 76 90 85 80 Example:70The following 60 62 data70shows 85 sample of birth weights 65 from 20 60 63 consecutive 74 75 deliveries76at Black 70 Lion70Hospital. 80 85 52 Construct a frequency distribution, which is ungrouped. Solution: Step 1: Make a table Step 2: Tally the data. Birth weight Tally Step 3: Compute the Frequency 60 // 2 frequency. 62 / 1 63 / 1 65 / 1 70 //// 4 74 / 1 75 // 2 76 / 1 80 /// 3 85 /// 3 90 / 1 53 Grouped frequency Distribution Basic terms:- Class limits: are the lowest and the highest values that can be included in the class. Lower class limit of a class is a value such that no lower value can fall in to that class. Upper class limit of a class is a value such that no upper value can fall in to that class. Units of measurement (U): The distance between two possible consecutive measures. 54 If there is no decimal point, then U=1. If there is one digit after the decimal, then U= 0.1. If there is two digit after the decimal, then U= 0.01. Class boundaries: Types of limits that help to maintain continuity between any two classes. It avoid the gap between the consecutive classes. Open class interval (open-ended class): A class interval which has either no upper class limit or no lower class limit. 55 Class mark(C.M.): is the mid point of the class interval. CM= or CM= Relative frequency distribution(R.F.): It shows the relative concentration of items in that class in relation to total frequency. It gives the proportion or the percentage of cases in each group. R.F = Cumulative frequency distributions: It tells us how often the values fall below or above that class. There are two types of cumulative frequency distributions. 56 1. “Less than” cumulative F.D. : is obtained by adding the frequency of all the preceding classes including the frequency of that class. 2. “More than” cumulative F.D. : is obtained by adding the frequency of all the succeeding classes including the frequency of that class. Steps for constructing Grouped frequency Distribution Step 1. Find the largest and smallest values k 1 3.32 log n Step 2. Compute the Range(R) = 57 Maximum - Minimum Step 4. Find the class width by dividing the range by the number of classes and rounding up, not off. Step 5. Pick a suitable starting point less than or equal to the minimum value. The starting point is called the lower class limit of the first class. Continue to add the class width to this lower class limit to find the next lower class limit and so on. Step 6. Find the upper limit of the first class by subtract U from the lower limit of the second class. Then continue to add the class width to this upper limit to find the rest of the upper limits. 58 Step 7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units from the upper limits. Step 8. Tally the data. Step 9. Find the frequencies. Step 10. Find the cumulative frequencies Step 11. If necessary, find the relative frequencies and/or relative cumulative frequencies. Example: Construct grouped frequency distribution for patients age data. 59 11 29 6 33 14 31 22 27 19 20 18 17 22 38 23 21 26 34 39 27 Solutions: Step 1: Find the highest and the lowest value H=39, L=6 Step 2: Find the range; R=H-L=39-6=33 Step 3: Select the number of classes desired using Surges formula = 1+3.32log (20) =5.32=6(rounding up) 60 Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up) Step 5: Select the starting point, let it be the minimum observation.6, 12, 18, 24, 30, 36 are the lower class limits. Step 6: Find the upper class limit; The 1st upper class=12-u=12-1=11 – 11, 17, 23, 29, 35, 41 Classare limits the upper class limits. 6 – 11 So combining step 512and – 17 step 6, one can construct the following 18 – 23 classes. 24 – 29 30 – 35 36 – 41 61 Step 7: Find the class boundaries; E.g. For class 1 Lower class boundary=6-U/2=5.5 Upper class boundary =11+U/2=11.5 Then continue adding Class boundary w on both 5.5 – 11.5 boundaries to obtain the rest boundaries. 11.5 – 17.5 By doing so one can 17.5 obtain – 23.5 the following 23.5 – 29.5 classes. 29.5 – 35.5 35.5 – 41.5 62 Step 8: Tally the data. Step 9: Write the numeric values for the tallies in the frequency column. Step 10: Find cumulative frequency. Step 11: Find relative frequency or/and relative cumulative frequency. The complete frequency distribution is follows: 63 Class limit Class boundary Class mark Tally Freq. Cf (less Cf (more rf. rcf (less than type) than type) than type) 6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10 12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20 18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55 24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75 30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90 36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00 64 Exercise: The 25 people were given a blood test to determine their blood types. The following data is obtained then construct appropriate frequency distribution: A B B AB O O O B AB B B B O A O A O O O AB AB A O A B Diagrammatic and Graphic presentation of data These are techniques for presenting data in visual display using geometric and pictures. Importance: They have greater attraction. They facilitate comparison. They are easily understandable. 66 Diagrams Diagrams are appropriate for presenting qualitative data. There are two most commonly used diagrammatic presentation (usually for qualitative data); namely –Pie charts –Bar charts A. Pie Chart: A pie chart is a circle that is divided in to sections according to the percentage of frequencies in each 67 Class Frequency Percent Degree Example: Men 2500 25 90 Women 2000 20 72 Girls 4000 40 144 Boys 1500 15 54 68 68 Bar Charts A set of bars (narrow rectangles) representing some magnitude over time space. There are different types of bar charts. The most common are : Simple bar chart Component or sub divided bar chart. Multiple bar charts. 69 I. Simple bar chart In simple bar charts, each bar represents one and only one figure. It is a one-dimensional diagram in which the bar represents the whole of the magnitude. The height or length of each bar indicates the size Distribution of patients (frequency) of the in hospital figureby source of referral represented. Example: Source of referral No. of patients Relative freq. Construct Otherahospital bar chart for the 97 following 5.1 data. General practitioner 769 40.3 Out-patient department 623 32.7 Casualty 256 13.4 Other 161 8.5 Total 1 906 100.0 70 Simple bar chart Distribution of patients in hopital X by source of referal, 1999 60 56 769 800 50 700 50 45 623 Number of students 600 40 40 No. of pat i ent s 500 400 30 300 20 256 200 161 10 97 100 0 0 OtherMath. GP Stat. OPD PhysicsCasualty Chemistry Other hospital Department Source of referal 71 71 II. Component (sub-divided) bar chart Component bar chart is used to represent two categorical variables at a time. Under this type of bar chart the total is sub-divided in to components according to its size. Example: consider education level and expenditure given bellow 72 Education Expenditure (in million for health care) 1978-80 1980-81 1981-82 Primary 60 80 40 Secondary 40 60 60 Higher 20 40 20 Education Total 120 180 120 73 Component (sub-divided) bar chart Primary 200 Secondary Higher Education 150 100 50 0 1978-80 1880-81 1981-82 74 74 III. Multiple bar charts Allowing comparison between different parts Each part should be drawn side by Primary 90 side Secondary 80 Higher Education 70 60 50 40 30 20 10 0 1978-80 1880-81 1981-82 75 75 Graphical data presentation Graphical data presentation methods are used for presenting quantitative/ continuous type variables. These includes: Histogram Frequency polygon line graph Scatter plot Box plot Cumulative frequency curve (O-give), etc… 76 Histogram A histogram is the graph of the frequency distribution of continuous measurement variables. To construct a histogram, we draw the class boundaries on a horizontal line and the frequencies on a vertical line. Non-overlapping intervals that cover all of the data values must be used. The width of the bar is represented by the class width while the length of the bar is proportional to their corresponding frequencies. Each bar is adjacent to each other. 77 Example: Distribution of the age of women at the time of marriage Age group 15-19 20- 25- 30- 35- 40- 45- 24 29 34 39 44 49 Number 11 36 28 13 7 3 2 Age of women at the time of marriage 40 35 30 No of women 25 20 15 10 5 0 14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5 Age group 78 Frequency polygon To draw a frequency polygon we connect the mid-point or class mark with the frequency using straight line. The class marks are plotted along the x-axis and frequencies along the y-axis. Useful when comparing two or more frequency distributions by drawing them on the same diagram 79 Age of women at the time of marriage 40 35 30 25 No of women 20 15 10 5 0 12 17 22 27 32 37 42 47 Age 80 Cumulative frequency curve (O-give) Less than O-give’: Upper class boundaries are plotted against the ‘less than’ cumulative frequencies. More than’ O-give: Lower class boundaries are plotted against the ‘more than’ cumulative frequencies. Example: Heart rate of patients admitted to hospital Y, 1998 Heart rate No. of Cumulative frequency Cumulative frequency patients(f Less than type(LCF) More than type(MCF) 54.5-59.5 1 1 54 59.5-64.5 5 6 53 64.5-69.5 3 9 48 69.5-74.5 5 14 45 74.5-79.5 11 25 40 79.5-84.5 16 41 29 84.5-89.5 5 46 13 89.5-94.5 5 51 8 94.5-99.5 2 53 3 99.5-104.5 1 54 1 Heart rate of patients admited in hospital Y, 1998 60 50 40 Cum. freqency 30 20 10 0 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5 104.5 Heart rate LM MM The line graph The line graph is especially useful for the study of some variables according to the passage of time. The time, in weeks, months or years is marked along the horizontal axis; and the value of the quantity that is being studied is marked on the vertical axis. The distance of each plotted point above the base-line indicates its numerical value. The points are joined by a line. The line graph is suitable for depicting a consecutive trend of a series over a long period. The line graph E.g. Malaria parasite rates as given below, Ethiopia ( 1967-1979 E.C.). Year ( E.C.) Parasite rate (slide positivity rate) 1967 2.62 1968 2.73 1969 0.51 1970 3.00 1971 1.29 1972 1.22 1973 4.14 1974 5.17 1975 1.68 1976 2.48 1977 2.14 1978 3.83 1979 1.96 The line graph Malaraia parasite rate obtained from seasonal blood survey results, Ethiopia, 1967-79 Eth.cal. 6 5 4 parasite rate 3 2 1 0 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 year N.B. Parasite rate = (Number of persons infected with malaria divided by the total number of persons examined ) X 100. Measures of Central Tendency 87 Measures of central tendency The tendency of statistical data to get concentrated at certain values is called the “Central Tendency”. The various methods of determining the actual value at which the data tend to concentrate are called measures of central Tendency or averages. It is a single value that attempts to describe a set of data by identifying the central position within that set of data. It is sometimes called measure of location. Objective of measure of central tendency To comprehend the data easily. To facilitate comparison. To make further statistical analysis. 88 Characteristics of a good MCT 1. It should be based on all the observations 2. It should not be affected by the extreme values 3. It should be less affected by sampling fluctuation 4. It should be as close to the maximum number of values as possible 5. It should have a definite value 6. It should not be subjected to complicated and tedious calculations89 Types of Measure of Central Tendency There are different measures of central tendency; each has its advantage and disadvantage. The Mean The Mode The Median There are different types of means; namely, _ Arithmetic mean _ Geometric mean _ Harmonic mean _ weighted mean 90 The Arithmetic Mean (simple Mean) Definition: the arithmetic mean is the sum of all observations divided by the number of observations. X The mean of X1, X2 ,X3 …, Xn is denoted by A.M or and is given by: 1. General X formula X ...forraw X data 1 2 X n n n X i 1 i X n 91 Example: Find the mean of the following data: 52, 75, 40, 70, 43, 40, 65, 35 and 48. 2.For ungrouped frequency distribution k where f X i i X i1 k xi is the i th f class observation i1 i k is the number of class 92 Example: find the mean of the following sample 1 data: 2 3 4 5 6 7 5 9 12 17 14 10 6 3.For grouped frequency distribution k where f i C.M i X i 1 k , C.Mi is the ith class mark fi i 1 k is the number of class 93 Example: calculate the mean for the following age distribution. Class frequency 6- 10 35 11- 15 23 16- 20 15 21- 25 12 26- 30 9 31- 35 6 Solution: First find the class marks Find the product of frequency and class marks 94 The distribution is grouped so, Class fi C.Mi C.Mifi 6- 10 35 8 280 11- 15 23 13 299 16- 20 15 18 270 21- 25 12 23 276 26- 30 9 28 252 31- 35 6 33 198 Total 100 1575 6 f C.M i i 1575 X i 1 6 15.75 100 f i 1 i 95 Combined mean If is the mean of n1 observations If is the mean of n2 observations Then the mean of all the observation in all groups often called the combined mean which is given by: 96 Example: In a class there are 30 females and 70 males. If females averaged mark in an examination is 60 and boys averaged mark is 72, find the mean for the entire class. Answer: = 68.4 Correct mean: If a wrong figure has been used when calculating the mean the correct mean can be obtained without repeating the whole process using: 97 Where n is total number of observations. Example: An average weight of 10 students was calculated to be 65.Latter it was discovered that one weight was misread as 40 instead of 80 k.g. Calculate the correct average weight. Correct mean = 65+ = 69 98 Note: – If a constant k is added/ subtracted to/from every observation then the new mean will be the old mean± k respectively. – If every observations are multiplied or divided by a constant k then the new mean will be k*old or 1/k*old mean respectively. 99 Properties of mean Easy to calculate and understand (simple). It is unique for a set of data. It is based on all the observations. It is highly affected by the extreme values. It can not be calculated for open ended 100 Mode The mode of a set of data is defined as the value with the highest frequency. It is another measure of central tendency. It is sometimes used to describe the center of a set of data. There are many situations in which the mean and median fail to show the true characteristics of a set of data. Example: most common size of shoes. most common hairstyle. most common color of shoes …etc. 101 Example: Mode Mode Mode 20 18 16 14 12 N 10 8 6 4 2 0 102 2024-11-02 102 Example Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6 Mode is 4 “Uni-modal” Example Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8 There are two modes , 2 & 5 This distribution is said to be “bi- modal” Example Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12 103 Mode for Grouped data ˆ 1 X Lmo w 1 2 Lmo = the lower class boundary of modal class Xˆ the mod e of the distributi on w the size of the mod al class 1 f mo f1 2 f mo f 2 f mo frequency of the mod al class f1 frequency of the class preceeding the mod al class f 2 frequency of the class following the mod al class Note: The modal class is a class with the highest frequency. 104 Example: The following is the distribution of the age of 105 patients selected at random from a certain Hospital. Calculate the Age No. of mode of the distribution.patients 5-15 8 15-25 12 25-35 17 35-45 29 45-55 31 55-65 5 65-75 3 105 Solutions: 45 55isthemod alclass ,sinceitisaclass withthehighest frequency. Lmo 45 w 10 1 fmo f1 2 2 fmo f2 26 fmo 31 f1 29 f2 5 ˆ 2 X 45 10 2 26 45.71 106 Properties of mode It is easy to calculate and understand. It is not affected by extreme values. It can be calculated for distributions with open ended classes. It can be used to qualitative data. Often its value is not unique. The main drawback of mode is that sometimes it does not exist. 107 The Median In a distribution, median is the value of the variable which divides it in to two equal halves after arranging in ascending or descending order. It is the middle most value in the sense that the number of values less than the median is equal to the number of values greater than it. If X1, X2 …Xn be the observations, then the numbers arranged in ascending order will be X , X …X[n]. ~ X It is denoted by 108 Median for ungrouped data X ( n 1) 2 , If n is odd. ~ X 1 (X X ), If n is even 2 n 2 ( n 2) 1 Example: Find the median of the following numbers. a. 6, 5, 2, 8, 9, 4. b. 2, 1, 3, 5, 8. Solution: ~ X a. ~ = 5.5 X b. =3 109 Median for grouped data ~ w n. X Lmed ( c) f med 2 Where : L med lower class boundary of the median class. w the size of the median class n total number of observations. c the cumulative frequency (less than type) preceeding the median class. f med the frequency of the median class. Remark: The median class is the class with the smallest cumulative frequency (less than type) greater n than or equal to. 2 110 Example: Find the median of the following distribution. Class Frequency Class Frequency Cum.Freq (less than type) 40-44 7 45-49 10 40-44 7 7 45-49 10 17 50-54 22 55-59 15 50-54 55-59 22 15 39 54 60-64 12 60-64 12 66 65-69 6 65-69 6 72 70-74 3 70-74 3 75 Solutions: ~ X L med w (n c) f 2 med 49.5 5 (37.5 17) 22 111 54.16 Properties of the median There is only one median for a given set of data (uniqueness) The median is easy to calculate. Median is a positional average and hence it is insensitive to very large or very small values. Median can be calculated even in the case of open end intervals. It is determined mainly by the middle points and less sensitive to the remaining data points (weakness). 112 Quantiles Quantiles are values which divides the data set arranged in order of magnitude in to certain equal parts. Some of these values of quantiles are _quartiles, _deciles and _percentiles. I. Quartiles: are values which divide the data set in to four equal parts, denoted by Q1 , Q2 and Q3. The first quartile Q1 is also called the lower quartile and the third quartile Q3 is the upper quartile. The second quartile Q2 is the median. 113 II. Deciles: are values which divide the data in to ten equal parts, denoted by D1, D2,..., D9. The fifth decile D5 is the median. III. Percentiles: are values which divide the data in to one hundred equal parts, denoted by P1, P2,..., P99. The fiftieth percentile P50 is the median. 114 Measures of variation, Skewness and Kurtosis 115 Measure of Dispersion The MCT alone is not enough to have a clear idea about the distribution of the data. Two or more sets may have the same mean and/or median but they may be quite different in variation. Thus to have a clear picture of data, we need to have a measure of dispersion/variability (scatterdness) among observations in the set. It helps to know the variability of a given set of numerical values. 116 117 2024-11-02 117 Consider the following three datasets: Dataset1:7, 7, 7, 7, 7, 7 Mean=7, s.d=0 Dataset2: 6, 7, 7, 7, 7, 8, mean=7, s.d=0.63 Dataset3: 3, 2, 7, 8, 9, 13, mean=7, s.d=4.04 Other synonymous term: – “Measure of Variation” – “Measure of Spread” – “Measures of Scatter” Commonly used measures of dispersion (variation) are: Range, variance, standard deviation, coefficient of variation. 118 Range is the difference between the largest and the smallest numbers in the data. It is a quick and dirty measure of variability, because the range is greatly affected by extreme scores It may give a distorted picture of the scores. It can not be computed for open ended data. It is highly fluctuates from sample to sample. The following two distributions have the same range, 13, yet appear to differ greatly in the amount of variability. 119 For this reason, among others, the range is not the most importantRmeasure L S , of L variability. largest observation S smallest observation Range for grouped data: If data are given in the shape of continuous frequency distribution, the R range is computed UCBL LCB F , UCBL is as: upper class boundary of the last class. LCB f is lower class boundary of the first class. 120 Relative Range (RR): It is also sometimes called coefficient of range and given by: L S R RR LS LS Example: If the range and relative range of a series are 4 and 0.25 respectively. Then what is the value of: a. Smallest observation 121 Solution R 4 L S 4 _________________(1) RR 0.25 L S 16 _____________( 2) Solving (1) and ( 2) at the same time , one can obtain the following value L 10 and S 6 122 Variance and Standard deviation They are most important and widely used measures of studying variability. They are calculated based on all 1 Population Varince 2 1. observations. N (X i ) 2 , i 1,2,.....N Types of variance For the case of frequency distribution it is expressed as 1 σ f i (X i μ) , i 1,2,..... k 2 2 N 123 1 2. Sample Varince S 2 n 1 ( X i X ) 2 , i 1,2,....., n For the case of frequency distribution 1 2 S n 1 f i (X i X ) 2 , i 1,2,....., n Types of standard deviation 1.Populati on standard deviation 2 2.Sample standard deviation s S 2 124 Example: 24, 25, 29,29,30,31 Mean = 28. Find sample variance and standard deviation We can proceed as follows: Value minus Mean Difference Difference Squared ____ ____ 2 Xi X Xi X 24-28 -4 16 25-28 -3 9 29-28 1 1 29-28 1 1 30-28 2 4 31-28 3 9 168-168=0 0 40 125 Variance = X ___ 2 i X n 1 40 Standard deviation 8 = = 2.83 5 8 Variance is the mean of the squared differences of the mean from the observations. The standard deviation is the square root of the variance. 126 Properties of 1.The main variance drawback of variance => unit is squared and this is difficult to interpret. 2.Variance gives weight to extreme values than those near to the mean value. This is because the difference is squared. 3.Variance will be zero for distributions with equal magnitude. 4.The greater the difference among the values, the greater the variance and vise versa. 127 Properties of the standard 1.Itdeviation is the best measure of dispersion as it overcomes the difficulties in variance. 2.It possesses characteristics of variance except No. 1 3.If the units of measurements of variables of two series is not the same (e.g. one in gm and the other in cm), then their variability cannot be compared by comparing the values of the standard deviation 128 Coefficient of variation (CV) When two data sets have different units, or their means differ sufficiently in size, the CV should be used as a measure of dispersion. It is the best measure to compare the variability of two series of sets of observations. Data with less coefficient of variation is considered more consistent CV is the ratio of the SD to the mean multiplied by 100% CV is a relative measure free from unit of measurement. 129 130 Example: One patient’s blood pressure, measured daily over several weeks, averaged 182 with a standard deviation of 12.6, while that of another patient averaged 124 with a standard deviation of 9.4. Which patient’s blood pressure is relatively more variable? 131 Blood pressure of the second patient is relatively more variable or less consistent. 132 132 Standard score(Z-score) A standard score is a measure that describes the relative position of a single score in the entire distribution of scores in terms of the mean and standard deviation. It also gives us the number of standard deviations a particular observation lie above or below the mean. If X is a measurement from a distribution with mean and standard deviation S, then its value in 133 Example: Compare the performance of the following two students Candidate Marks in Biostatistics Environmental Total A 84 75 159 B 74 85 159 Average mark for Biostatistics is 60 with standard deviation of 13 & for that of Environmental is 50 with standard deviation of 11. Whose performance is better A or B? 134 134 84 60 Biostatist ics 1.846 Z score for A 13 75 50 Enviromental 2.273 11 Total Z score for A = 1.846 + 2.273 = 4.119 74 60 Biostatist ics 1.077 13 Z score for B Enviromental 85 50 3.182 11 tal Z – Score for B = 1.077 + 3.182 = 4.259 Since B’s Z – score is higher, so performance of B is better than A. 135 Moments Moments supply information about the shape of a distribution Moments can be calculated : 1. around the origin (called raw moments) 2. around the mean(called central moments) 3. around any arbitrary origin They are obtained by as first, second, third, fourth, etc. 1. The raw moment (about origin or zero) for n observation is defined as = 136 If r=1, it is the simple arithmetic mean, this is called the first moment. 2. The central sample moment is defined as = Exercise: calculate the first four central and raw moments for the sample data 1, 2, 3,, 5, 2, 8, 4 and 7. 137 Skewness Skewness is the degree of asymmetry or departure from symmetry of a distribution. If extremely low or extremely high observations are present in a distribution, then the mean tends to shift towards those scores In a symmetrical (or bell-shaped) distribution, mean = median = mode. In a positively skewed distribution , mode < median < mean In a negatively skewed distribution, mean < median < mode 139 Remarks: In a positively skewed distribution, smaller observations are more frequent than larger observations. __i.e. the majority of the observations have a value below an average and it has a long tail in the positive direction. In a negatively skewed distribution, smaller observations are less frequent than larger observations. __i.e. the majority of the observations have a value above an average. 140 Kurtosis It is enables us to have an idea about the degree of flatness or peakedness of the distribution relative to the peakedness of a normal curve. If a distribution is more peaked than the normal curve(“mesokurtic”), it is called “ leptokurtic”. If a distribution is more flat-topped than the normal curve, it is called “platykurtic”. the coefficient of kurtosis is denoted by and given by: = 141 If > 3, then the curve is leptokurtic(more peaked). If < 3, then the curve is platykurtic(flat topped). If = 3, then the curve is mesokurtic( normal). 142 Probability and Probability Distribution 143 Elementary Probability Probability as a general concept can be defined as the chance of an event occurring. Measure of the degree of chance or the likelihood of occurrence of an uncertain event Quantitative measure of uncertainty. The theory of probability provides the foundation for statistical inference. and its concept is not new to health workers and is frequently encountered in everyday communication. Eg. we may hear a physician 144 say 144 Definitions of some probability terms Experiment: Any process of observation or measurement or any process which generates well defined outcome. Outcome: The result of a single trial of a random experiment. Sample Space: Set of all possible outcomes of a probability experiment. Event: It is a subset of sample space. Certain event: An event which is sure to occur. Impossible event: An event which can't 145 Compound Event: more than one sample points/elements Equally Likely Events: Events which have equal chance of occurrence. Mutually Exclusive PA B Two A B Events: 0 events which cannot happen at the same time. Independent Events: Two events are independent if the occurrence of one does not affect the probability of the other occurrence. Dependent Events: Two events are dependent if the first event affects the 146 Counting rules In order to calculate probabilities, we have to know the number of elements in the event & in the sample space. In order to determine the number of outcomes, one can use several rules of counting. Addition rule The multiplication rule Permutation rule 147 Addition rule: If a certain activity can be done n1 different ways by first individual and if the same activity can be also done by 2nd individual n2 different ways ……it also can be done nk different ways by kth individual. Then, it can be done n1+n2+…+nk different ways in general. Example: Suppose there are 3 lists of computer projects and a student can choose148 Multiplication Rules: If a choice consists of k steps, of which the first can be made in n1 ways, the second can be made in n2 ways,…, the kth can be made in nk ways, then the whole choice can be made in n1 * n2*…*nk ways. Example: Distribution of Blood Types There are four blood types, A, B, AB, and O. Blood can also be Rh+ and Rh-. Finally, a 149 150 Solution Since there are 4 possibilities for blood type, 2 possibilities for Rh factor, and 2 possibilities for the gender of the donor, there are 4 2 2, or 16, different classification categories. 151 Permutation: An arrangement of n objects in a specified order is called permutation of the objects. The number of permutations of n distinct objects taken all together is n! Where 152 The arrangement of n objects in a specified order taking r objects at a time is called the permutation of n objects taken r objects at a time. It is written as and the formula is The number of permutations of n objects in which k1 are alike k2 are alike ---- etc is 153 Exercise 1. Suppose we have a letters A,B, C, D a. How many permutations are there taking all the four letters at a time? b. How many permutations are there taking two letters at a time? 2. How many different permutations can be made from the letters in the word “CORRECTION”? 154 Combination: The number of ways of selecting r objects from a set of n objects with out regard to the order of selection is called combination. Example: Given the letters A, B, C, and D. List the permutation and combination for selecting two letters. Solution: AB BA CA DA AB CD Permutation combination AC BC CB DB AC BD AD BD CD DC AD BC 155 Combination Rule The number of combinations of r objects selected n n from n C r or objects is denoted by r and is given by 156 Example: Among 15 clocks there are two defectives.In how many ways can an inspector choose three of the clocks for inspection if: A. There is no restriction. B. None of the defective clock is included. C. Only one of the defective clocks is included. D. Two of the defective clock is included. 157 Approaches to measuring Probability There are three different conceptual approaches to the study of probability theory. These are: The classical approach. The relative frequency or empirical approach. The axiomatic approach. 158 The classical approach Definition: If event A can occur in n different ways out of a total of N possible ways, all of which are equally likely, then the probability of favorable event A Pis( Adefined as. cases to A n ) exhaustiveNo. of cases N This approach is used when: All outcomes are equally likely. Example: In a given basket there are 3 yellow, Total number of outcome is finite, say N. 4 black and 3 white capsules. What is the probability that a randomly selected capsule is black? Solution: Let event A drawing of black4 capsule. favorable cases to A = 10 = 0.4 exhaustiveNo. of cases 159 Short coming of the classical approach This approach is not applicable when: The total number of outcomes is infinite. Outcomes are not equally likely. 160 (based on repeatability of events) This is based on the relative frequencies of outcomes belonging to an event. The probability of an event A is the proportion of outcomes favorable to A in the N long run when the experimentN A is repeated under same condition P(A) = Example: If records show that 60 out of 100,000 painkillers produced are defective. What is the probability of a newly produced painkiller to be defective? Solution: Let PA be Nthe ( A) lim A event 60 that the newly 0.0006 produced painkiller isN 100,000 N defective 161 Axiomatic Approach: 162 Example: The probability that a person gets affected by disease X is 0.08, affected by disease Y is 0.05, and affected by both diseases is 0.02, in a given community. Then, Find the probability that a randomly selected person from this community affected by either diseases X OR Y ; 163 Solution: Let A be the event that a person gets affected by disease X, and B be the event that a person gets affected by disease Y. We are given that P (A) = 0.08, P (B) = 0.05 and P( A B) P( A) P( B) P( A B) 0.08 0.05 0.02 0.11. P (getting either X OR Y ) = 164 Conditional probability of an event The conditional probability of an event A given that B has already occurred, p( A B) denoted by p( A B) p ( A B ) = p( B) , p( B) 0 Remark: (1) ' p( A B) 1 p( A B) ' p( B A) 1 p( B A) (2) 165 Conditional probability Conditional Events: If the occurrence of one event has an effect on the next occurrence of the other event then the two events are conditional or dependent Male events. Femal example Total e Right 38 42 80 handed Left 12 8 20 handed Total 50 50 100 166 1. What is the probability of left-handed given that he is P(LH | M) = 12/50 = 0.24 a male? 2. What is the probability of female given that P(F|is RH) he/she = 42/80 = right-handed? 0.525 3. What is the probability P(LH) = 20/100of= being left-handed? 0.20 Example: In a certain community, 36 percent of the families own a dog, 22 percent of the families that own a dog also own a cat, and 30 A family isofselected percent the atfamilies own a cat. random. 167 (a) Compute the probability that the family owns both a cat and dog. (b) Compute the probability that the family owns a dog, given that it owns a cat. 168 169 Independent Events ► Often there are two events such that the occurrence or non-occurrence of one does not have any effect on the occurrence or non- occurrence of the other. if events A and B are independent, P(B/A) = P(B) P(A/B) = P(A) 170 Example; A box contains four black and six white Capsules. What is the probability of getting two black Capsules in drawing one after the other under the following conditions? a. The first Capsule drawn is not replaced b. The first Capsule drawn is replaced Solution; Let A= first drawn capsule is black Requir A second pB= B drawn capsule is black ed a. pA B pB A. pA 4 103 9 2 15 b. pA B pA. pB 4 104 10 4 25 171 A random variable distribution Probability (r.v): is a variable whose values are determined by chance, usually denoted by capital letters. Or a numerical valued function defined on sample space, usually denoted by capital letters. Let E be an experiment and S be its sample space. Then, a r.v X is a function that assigns a real number X(s) to every element in S. 172 Types of random variable 1. Discrete random variable: are variables which can assume only a specific number of values. They have values that can be counted. Example: Flip a coin three times, let X be r.v representing the number of heads in three tosses. X = {0, 1, 2, 3} are possible values of X. 173 Examples of discrete R.V: 1.Dead/alive 2.Number of car accidents per week. 3.Number of patients. 4.Number of bacteria per two cubic centimeter of water. Continuous random variable: are variables that can assume all values between any two given values. Examples: 1.weight of patients at hospital. 2. Height of student 3.Life time of light bulbs. 174 Definition: A probability distribution consists of a value of a random variable that can assume and the corresponding probabilities of the values. Discrete probability distribution of r.v It is usually called a probability mass function(pmf). Example: The number of patients seen in Hospital in anyx given 10 hour 11 is12 a 13 random 14 variable represented by X..2 The.2 probability P(x).4.1.1 distribution for X is: that in a given hour: Find the probability a. exactly 14 patients arrive p(x=14)=.1 b. At least 12 patients p(x arrive 12)= (.2 +.1 +.1) =.4 c. At most 11 patientsp(x≤11)= arrive (.4 +.2) =.6 175 A function can serve as the pmf of a discrete random variable X if and only if its values, p(x) satisfy the following conditions. 1. for all x. in S. 2. =1 Example: Check whether the following function can serve as a pmf of a discrete r.v. P(x)= Continuous probability distribution of r.v It is usually called a probability density function(pdf). A function can serve as the pdf of a continuous random variable X if its values, f(x) satisfy the following conditions. 1. for all x. 176 Remark: For a continuous r.v, the probability at a point is always zero. If X is continuous and then Example: Suppose that the r.v X is continuous with the pdf of a. Check that is a pdf. b. Find 177 Mean and variance of random variable 1. Mean and variance of discrete r.v I. If X is a discrete r.v with pmf , then the expected value or the mean of X, denoted by , is given by: Example 1: The number of patients seen in Hospital in any given hour is a random variable represented by X. The probability distribution for X is: x 10 11 12 13 14 P(x).4.2.2.1.1 Find the expected value of X. 178 II. The variance of a discrete r.v, denoted by V , is defined as , where = Example 2: find the variance of variable defined in Example 1. 2. Mean and variance of continuous r.v I. If X is a continuous r.v with pdf , then the expected value or the mean of X, denoted by , is given by: Example3: Suppose that the r.v X is continuous with the pdf of 179 II. The variance of a discrete r.v, denoted by V , is defined as , where = Example 4: find the variance of variable defined in Example 3. 180 Common Discrete Probability Distributions 1. Binomial Distribution Assumptions of a binomial distribution. The experiment consists of n identical trials. Each trial has only one of the two possible mutually exclusive outcomes, success or a failure. The probability of each outcome does not change from trial to trial, and The trials are independent, thus we must sample with replacement. The probability distribution of the binomial random variable X, the number of successes in n independent trials is: n X n X f (x ) P (X x ) p q , x 0,1,2,...., n x n Where x is the number of combinations of n distinct objects taken x of them at na time. n! x x !( n x )! x ! x (x 1)(x 2)....(1) * Note: 0! =1 182 The parameters of the binomial distribution are n and E (Xp) np 2 var(X ) np (1 p ) 183 Example1: What is the probability of getting three heads by tossing a fair coin four times? Solution: Let X be the number of heads in tossing a fair coin four times 184 Exercise: Suppose that an examination consists of six true and false questions, and assume that a student has no knowledge of the subject matter. The probability that the student will guess the correct answer to the first question is 30%. Likewise, the probability of guessing each of the remaining questions correctly is also 30%. What is the probability of getting a. more than three correct answers? b. at least two correct answers? c. at most three correct answers? d. less than five correct answers? e. Find expected value and variance of the 185 Exercise Suppose that in a certain malarias area, past experience indicates that the probability of a person with a high fever will be positive for malaria is 0.7. Consider 3 randomly selected patients (with high fever) in the same area. What is the probability that A. No patient will be positive for malaria? B. Exactly one patient will be positive for malaria? C. At least two patients will be positive for 186 2.Poisson distribution - The Poisson distribution depends only on the average number of occurrences per unit time of space. - The Poisson distribution is used as a distribution of rare events, such as: Number of misprints. Natural disasters like earth quake. Accidents. Arrivals A random variable X is said to have a Poisson distribution if its probability distribution is given by: Examples: If 1.6 accidents can be expected on any given day, what is the probability that there will be a) 3 accidents on any given day? b) At least 1 accident on a given day? Solution A)Let X =the number of accidents, B) p(x)= 1-p(x=0)= 1-( The Poisson distribution cont’d… Mean (µ) and variance 2 are equal in Poisson distribution and are the same as λ. The Poisson distribution cont’d… Example: In a study of suicides, the monthly distribution of adolescent suicides in an area for ten years interval closely followed a Poisson distribution with parameter λ = 2.75. Find the probability that a randomly selected month will be one in which three adolescent suicides occurred. P (x=3) = (e-2.75 *2.753)/3! = 0.2216 = The Poisson distribution cont’d… Example: λ = 2 which is the average number of items per sample and assuming that the number of items follows Poisson distribution, find the probability that the next sample taken will contain: a. One or fewer items? Answer: p(X130) P= σ 10 = P(Z>1) = 0.1587 15.9% of normal healthy individuals have a systolic blood pressure greater than 130 mm Hg. 199 100 120 X μ 140 120 2. P(100