Probability and Statistics Syllabus PDF
Document Details
Aabed Mohammed
Tags
Summary
This is a syllabus for a Probability and Statistics course. The course covers topics from introduction to statistics and variables, frequency distribution, data description, probability, discrete probability, the normal distribution, correlation, and regression. The course materials include concepts and examples.
Full Transcript
Course name : Probability and statistics Course Code & Number:FET201 Credit hours: 3 Textbook :Elementary Statistics a Step by Step Approach, 8th Edition by Allan Bluman, McGraw/Hill. Instructor: Associate Professor Dr. Aabed Mohammed E-mail: [email protected]...
Course name : Probability and statistics Course Code & Number:FET201 Credit hours: 3 Textbook :Elementary Statistics a Step by Step Approach, 8th Edition by Allan Bluman, McGraw/Hill. Instructor: Associate Professor Dr. Aabed Mohammed E-mail: [email protected] 1 Syllabus 1- Introduction: Definition of statistics, types of statistics, population , sample , variables and types of variables, boundaries of a continuous variable. 2- Frequency distribution and graph: categorical frequency distribution grouped frequency distribution un grouped frequency distribution histogram, frequency polygon, ogive, stem and leaf plots. Other Types of Graphs ( Pie graph, Bar graph, Pareto chart, Time Series graph 3- Data description measures of central tendency measures of variation measures of position 4- Probability Basic concept: probability experiment- outcome- sample space- event- Tree diagram Probability of an event, complement of an event, mutually exclusive events 2 Addition rule, Multiplication Rules, Conditional Probability. 5- Discrete probability distributions Probability Distributions Mean, variance, standard deviation, and expectation The binomial distribution 6- The Normal Distribution: Properties of normal distribution The Standard normal distribution Application of the normal distribution 7-Correlation and Regression Correlation- scatter plot- Linear Correlation Coefficient. levels of correlation Regression- Equation of regression 3 1- Introduction and Basic Concepts Statistics: is the science of conducting studies to collect ,organize,summarize,analyze and drawing conclusions from data. A population: consists of all subjects (human or otherwise) that are being studied. Example: All students who registered in the university last year. A sample : is a group of subjects selected from a population Example: A group of students who registered in the department of IT. 4 Types of statistics Descriptive Statistic Inferential statistics consists of generalizing from consists of the collection , samples to populations, and organization , summarization hypothesis testing, determining and presentation of data. relationships among variables, and making predictions. EX:”the average age of the student is 14 years” EX: the relationship between smoking and lung cancer” Inferential statistics uses probability, that is the probability is a tool of inferential statistics. The Variable and its classifications A Variable: is characteristic or attribute that can assume different values. Data: are the values that the variables can assume. data set : Collection of data values. Each value in the data set is called a data value or a datum. Types of data Most data can be put into the following categories: Qualitative Quantitative 1- Qualitative data are also often called categorical data and are generally described by words or letters. For instance: Color: black, dark, brown, light brown, blonde, gray, or red. Blood type: A, B, O, or AB. Major: IT, IS, Mechatronics, Biomedical, … 2-Quantitative data are always numbers and are the result of counting or measuring. For example number students, Age ,Height , Weight, … ,temperature …..etc. Quantitative variables can be divided in to two types: Discrete Variables: assume values that can be counted For example: number of children in a family , number of students in classroom, the number of phone calls you receive for each day of the week. Continuous Variables: results of measurements For example: lengths, weights, or times. 8 Types of data variables Quantitative Qualitative (numerical) (categorical) Continuous Discrete Example Determine the correct data type (quantitative or qualitative). Indicate whether quantitative data are continuous or discrete. a. the number of pairs of shoes you own b. the type of car you drive c. the distance it is from your home to the nearest grocery store d. the number of classes you take per school year. e. the type of calculator you use f. weights of sumo wrestlers g. number of correct answers on a quiz h. IQ (Intelligent quotient.) 10 Solution a. quantitative discrete. b. qualitative, or categorical c. quantitative continuous d. quantitative discrete.. e. qualitative, or categorical f. quantitative continuous g. quantitative discrete. h. quantitative continuous 11 The boundaries of a continuous variable The boundaries of a continuous variable are given in one additional decimal place and always end with the digit 5. Example: Exercise: Give the boundaries of each value. a. 36 inches. b. 105.4 miles. c. 72.6 tons. d. 5.27 centimeters. e. 5 ounces. 12 2- Frequency Distribution Data collected in original form is called raw data. A frequency distribution is the organization of raw data in table form, using classes and frequencies. Categorical frequency distributions. Example 2-1. Twenty-five army inductees were given a blood test to determine their blood type. The data set is Construct a frequency distribution for the data. 13 Relative frequency IIII 5/25=0.2 20 7/25=0.28 28 9/25=0.36 36 4/25=0.16 16 1 14 Grouped Frequency Distribution Example: The following data represent the record high temperatures for each of the 50 states. Construct a grouped frequency distribution for the data using 7 classes. 15 Solution Determine the classes. Determine the lowest value (L), L=100, highest value (H), H=134. Find the range (R). Range= highest value – smallest value R=H-L=134-100=34. Find the class width. Class width = Range/number of classes =34/7 = 5 Rounding Rule: Always round up if a remainder 16 Constructing a Grouped Frequency Distribution For convenience sake, we will choose the lowest data value, 100, for the first lower class limit. The subsequent lower class limits are found by adding the width to the previous lower class limits. Class Limits The first upper class limit is one 100 - 104 105 - 109 less than the next lower class limit. 110 - 114 The subsequent upper class limits 115 - 119 120 - 124 are found by adding the width to the 125 - 129 previous upper class limits. 130 - 134 17 Constructing a Grouped Frequency Distribution Exercise: Construct a relative Frequency and a percent Frequency for this example 18 The class width from the frequency distribution table class width = Lower (or upper)class limit of one class - Lower(or upper)class limit of preceding class Or Class width = (upper class limit – lower class limit of the same class)+1 Or Class width = upper class boundary – lower class boundary of the same class The class midpoint Xm Lower limit +upper limit Lower boundary +upper boundary X m 2 2 Xm of any class = Xm of preceding class +the class width Exercise: Find the midpoint for the classes in the previous example. 19 Rules for Classes in Grouped Frequency Distributions 1. There should be 5-20 classes. 2. The classes must be mutually exclusive. 3. The classes must be continuous. 4. The classes must be exhaustive. 5. The classes must be equal in width (except in open- ended distributions). 20 Cumulative Frequency A cumulative frequency distribution is a distribution that shows the number of data values less than or equal to a specific value (usually an upper boundary). 0 2 10 28 41 48 49 50 21 Un Grouped Frequency Distribution When the range of the data values is relatively small, a frequency distribution can be constructed using single data values for each class. This type of distribution is called an ungrouped frequency distribution Example The data shown here represent the number of miles per gallon (mpg) that 30 selected four-wheel-drive sports utility vehicles obtained in city driving. Construct a frequency distribution. 22 Solution STEP 1 Determine the classes. Determine the lowest value (L), L=12, highest value (H), H=19. Find the range (R), R=H-L=19-12=7. 23 Cumulative Frequency 24 2-2 Graphs 3 Most Common Graphs in Research 1. Histogram 2. Frequency Polygon 3. Cumulative Frequency Polygon (Ogive) 25 1- Histograms The histogram is a graph that displays the data by using contiguous (unless the frequency of a class is 0) vertical bars of various heights to represent the frequencies of the classes. Steps 1: Draw and label the x and y axes. The x axis is always the horizontal axis, and the y axis is always the vertical axis. 2: Represent the class boundaries on the x axis. and the frequency on the y axis. 3: Using the frequencies as the heights, draw vertical bars for each class. 26 Example 2-4 Construct a histogram to represent the data for the record high temperatures for each of the 50 states (see Example 2–2 for the data). Class Frequency Limits 100 - 104 2 105 - 109 8 110 - 114 18 115 - 119 13 120 - 124 7 125 - 129 1 130 - 134 1 27 Course name : Probability and statistics Course Code & Number:FET201 Credit hours: 3 Textbook :Elementary Statistics a Step by Step Approach, 8th Edition by Allan Bluman, McGraw/Hill. Instructor: Associate Professor Dr. Aabed Mohammed E-mail: [email protected] 1 Syllabus 1- Introduction: Definition of statistics, types of statistics, population , sample , variables and types of variables, boundaries of a continuous variable. 2- Frequency distribution and graph: categorical frequency distribution grouped frequency distribution un grouped frequency distribution histogram, frequency polygon, ogive, stem and leaf plots. Other Types of Graphs ( Pie graph, Bar graph, Pareto chart, Time Series graph 3- Data description measures of central tendency measures of variation measures of position 4- Probability Basic concept: probability experiment- outcome- sample space- event- Tree diagram Probability of an event, complement of an event, mutually exclusive events 2 Addition rule, Multiplication Rules, Conditional Probability. 5- Discrete probability distributions Probability Distributions Mean, variance, standard deviation, and expectation The binomial distribution 6- The Normal Distribution: Properties of normal distribution The Standard normal distribution Application of the normal distribution 7-Correlation and Regression Correlation- scatter plot- Linear Correlation Coefficient. levels of correlation Regression- Equation of regression 3 1- Introduction and Basic Concepts Statistics: is the science of conducting studies to collect ,organize,summarize,analyze and drawing conclusions from data. A population: consists of all subjects (human or otherwise) that are being studied. Example: All students who registered in the university last year. A sample : is a group of subjects selected from a population Example: A group of students who registered in the department of IT. 4 Types of statistics Descriptive Statistic Inferential statistics consists of generalizing from consists of the collection , samples to populations, and organization , summarization hypothesis testing, determining and presentation of data. relationships among variables, and making predictions. EX:”the average age of the student is 14 years” EX: the relationship between smoking and lung cancer” Inferential statistics uses probability, that is the probability is a tool of inferential statistics. The Variable and its classifications A Variable: is characteristic or attribute that can assume different values. Data: are the values that the variables can assume. data set : Collection of data values. Each value in the data set is called a data value or a datum. Types of data Most data can be put into the following categories: Qualitative Quantitative 1- Qualitative data are also often called categorical data and are generally described by words or letters. For instance: Color: black, dark, brown, light brown, blonde, gray, or red. Blood type: A, B, O, or AB. Major: IT, IS, Mechatronics, Biomedical, … 2-Quantitative data are always numbers and are the result of counting or measuring. For example number students, Age ,Height , Weight, … ,temperature …..etc. Quantitative variables can be divided in to two types: Discrete Variables: assume values that can be counted For example: number of children in a family , number of students in classroom, the number of phone calls you receive for each day of the week. Continuous Variables: results of measurements For example: lengths, weights, or times. 8 Types of data variables Quantitative Qualitative (numerical) (categorical) Continuous Discrete Example Determine the correct data type (quantitative or qualitative). Indicate whether quantitative data are continuous or discrete. a. the number of pairs of shoes you own b. the type of car you drive c. the distance it is from your home to the nearest grocery store d. the number of classes you take per school year. e. the type of calculator you use f. weights of sumo wrestlers g. number of correct answers on a quiz h. IQ (Intelligent quotient.) 10 Solution a. quantitative discrete. b. qualitative, or categorical c. quantitative continuous d. quantitative discrete.. e. qualitative, or categorical f. quantitative continuous g. quantitative discrete. h. quantitative continuous 11 The boundaries of a continuous variable The boundaries of a continuous variable are given in one additional decimal place and always end with the digit 5. Example: Exercise: Give the boundaries of each value. a. 36 inches. b. 105.4 miles. c. 72.6 tons. d. 5.27 centimeters. e. 5 ounces. 12 2- Frequency Distribution Data collected in original form is called raw data. A frequency distribution is the organization of raw data in table form, using classes and frequencies. Categorical frequency distributions. Example 2-1. Twenty-five army inductees were given a blood test to determine their blood type. The data set is Construct a frequency distribution for the data. 13 Relative frequency IIII 5/25=0.2 20 7/25=0.28 28 9/25=0.36 36 4/25=0.16 16 1 14 Grouped Frequency Distribution Example: The following data represent the record high temperatures for each of the 50 states. Construct a grouped frequency distribution for the data using 7 classes. 15 Solution Determine the classes. Determine the lowest value (L), L=100, highest value (H), H=134. Find the range (R). Range= highest value – smallest value R=H-L=134-100=34. Find the class width. Class width = Range/number of classes =34/7 = 5 Rounding Rule: Always round up if a remainder 16 Constructing a Grouped Frequency Distribution For convenience sake, we will choose the lowest data value, 100, for the first lower class limit. The subsequent lower class limits are found by adding the width to the previous lower class limits. Class Limits The first upper class limit is one 100 - 104 105 - 109 less than the next lower class limit. 110 - 114 The subsequent upper class limits 115 - 119 120 - 124 are found by adding the width to the 125 - 129 previous upper class limits. 130 - 134 17 Constructing a Grouped Frequency Distribution Exercise: Construct a relative Frequency and a percent Frequency for this example 18 The class width from the frequency distribution table class width = Lower (or upper)class limit of one class - Lower(or upper)class limit of preceding class Or Class width = (upper class limit – lower class limit of the same class)+1 Or Class width = upper class boundary – lower class boundary of the same class The class midpoint Xm Lower limit +upper limit Lower boundary +upper boundary X m 2 2 Xm of any class = Xm of preceding class +the class width Exercise: Find the midpoint for the classes in the previous example. 19 Rules for Classes in Grouped Frequency Distributions 1. There should be 5-20 classes. 2. The classes must be mutually exclusive. 3. The classes must be continuous. 4. The classes must be exhaustive. 5. The classes must be equal in width (except in open- ended distributions). 20 Cumulative Frequency A cumulative frequency distribution is a distribution that shows the number of data values less than or equal to a specific value (usually an upper boundary). 0 2 10 28 41 48 49 50 21 Un Grouped Frequency Distribution When the range of the data values is relatively small, a frequency distribution can be constructed using single data values for each class. This type of distribution is called an ungrouped frequency distribution Example The data shown here represent the number of miles per gallon (mpg) that 30 selected four-wheel-drive sports utility vehicles obtained in city driving. Construct a frequency distribution. 22 Solution STEP 1 Determine the classes. Determine the lowest value (L), L=12, highest value (H), H=19. Find the range (R), R=H-L=19-12=7. 23 Cumulative Frequency 24 2-2 Graphs 3 Most Common Graphs in Research 1. Histogram 2. Frequency Polygon 3. Cumulative Frequency Polygon (Ogive) 25 1- Histograms The histogram is a graph that displays the data by using contiguous (unless the frequency of a class is 0) vertical bars of various heights to represent the frequencies of the classes. Steps 1: Draw and label the x and y axes. The x axis is always the horizontal axis, and the y axis is always the vertical axis. 2: Represent the class boundaries on the x axis. and the frequency on the y axis. 3: Using the frequencies as the heights, draw vertical bars for each class. 26 Example 2-4 Construct a histogram to represent the data for the record high temperatures for each of the 50 states (see Example 2–2 for the data). Class Frequency Limits 100 - 104 2 105 - 109 8 110 - 114 18 115 - 119 13 120 - 124 7 125 - 129 1 130 - 134 1 27 Histograms Histograms use class boundaries and frequencies of the classes. Class Class Frequency Limits Boundaries 100 - 104 99.5 - 104.5 2 105 - 109 104.5 - 109.5 8 110 - 114 109.5 - 114.5 18 115 - 119 114.5 - 119.5 13 120 - 124 119.5 - 124.5 7 125 - 129 124.5 - 129.5 1 130 - 134 129.5 - 134.5 1 28 Histograms Histograms use class boundaries and frequencies of the classes. 29 Frequency Polygon The frequency polygon is a graph that displays the data by using lines that connect points plotted for the frequencies at the class midpoints. The frequencies are represented by the heights of the points. Steps 1: Draw and label the x and y axes. 2: Represent the midpoint, on the x axis. 3: Choose a suitable scale for the frequencies, and label it on the y axis. 4: Connect adjacent points with line segments. Draw a line back to the x axis at the beginning and end of the graph, at the same distance that the previous and next midpoints would be located. 30 Example 2-5 Construct a frequency polygon to represent the data for the record high temperatures for each of the 50 states. Class Frequency Limits 100 - 104 2 105 - 109 8 110 - 114 18 115 - 119 13 120 - 124 7 125 - 129 1 130 - 134 1 31 Frequency Polygons Frequency polygons use class midpoints and frequencies of the classes. Class Class Frequency Limits Midpoints 100 - 104 102 2 105 - 109 107 8 110 - 114 112 18 115 - 119 117 13 120 - 124 122 7 125 - 129 127 1 130 - 134 132 1 32 Frequency Polygons Frequency polygons use class midpoints and frequencies of the classes. 33 An Ogive (Cumulative Frequency Polygon The ogive is a graph that represents the cumulative frequencies for the classes in a frequency distribution. steps 1: Draw and label the x and y axes. 2: Represent the class boundaries on the x axis 3: Choose a suitable scale cumulative frequencies, and label it on the y axis. 4: Plot the points and then draw the bars or lines. 34 Example 2-6 Construct an ogive to represent the data for the record high temperatures for each of the 50 states (see Example 2–2 for the data). Class Frequency Limits 100 - 104 2 105 - 109 8 110 - 114 18 115 - 119 13 120 - 124 7 125 - 129 1 130 - 134 1 35 Solution Ogives use upper class boundaries and cumulative frequencies of the classes. Class Class Cumulative Frequency Limits Boundaries Frequency 100 - 104 99.5 - 104.5 2 2 105 - 109 104.5 - 109.5 8 10 110 - 114 109.5 - 114.5 18 28 115 - 119 114.5 - 119.5 13 41 120 - 124 119.5 - 124.5 7 48 125 - 129 124.5 - 129.5 1 49 130 - 134 129.5 - 134.5 1 50 36 Ogives Ogives use upper class boundaries and cumulative frequencies of the classes. Cumulative Class Boundaries Frequency Less than 99.5 0 Less than 104.5 2 Less than 109.5 10 Less than 114.5 28 Less than 119.5 41 Less than 124.5 48 Less than 129.5 49 Less than 134.5 50 37 An ogive (Cumulative Frequency Polygon) 38 Ogives Ogives use upper class boundaries and cumulative frequencies of the classes. 39 2.2 Relative Frequency Graphs If proportions are used instead of frequencies, the graphs are called relative frequency graphs. Relative frequency graphs are used when the proportion of data values that fall into a given class is more important than the actual number of data values that fall into that class. 40 Example 2-7 Page #57 Construct a histogram, frequency polygon, and ogive using relative frequencies for the distribution (shown here) of the miles that 20 randomly selected runners ran during a given week. Class Frequency Boundaries 5.5 - 10.5 1 10.5 - 15.5 2 15.5 - 20.5 3 20.5 - 25.5 5 25.5 - 30.5 4 30.5 - 35.5 3 35.5 - 40.5 2 41 Histograms The following is a frequency distribution of miles run per week by 20 selected runners. Divide each Class Relative Frequency frequency by Boundaries Frequency the total 5.5 - 10.5 1 frequency to 1/20 = 0.05 10.5 - 15.5 2 get the 2/20 = 0.10 15.5 - 20.5 3 relative 3/20 = 0.15 20.5 - 25.5 5 frequency. 5/20 = 0.25 25.5 - 30.5 4 4/20 = 0.20 30.5 - 35.5 3 3/20 = 0.15 35.5 - 40.5 2 2/20 = 0.10 f = 20 rf = 1.00 42 Histograms Use the class boundaries and the relative frequencies of the classes. 43 Frequency Polygons The following is a frequency distribution of miles run per week by 20 selected runners. Class Class Relative Boundaries Midpoints Frequency 5.5 - 10.5 8 0.05 10.5 - 15.5 13 0.10 15.5 - 20.5 18 0.15 20.5 - 25.5 23 0.25 25.5 - 30.5 28 0.20 30.5 - 35.5 33 0.15 35.5 - 40.5 38 0.10 44 Frequency Polygons Use the class midpoints and the relative frequencies of the classes. 45 Ogives The following is a frequency distribution of miles run per week by 20 selected runners. Class Cumulative Cum. Rel. Frequency Boundaries Frequency Frequency 5.5 - 10.5 1 1 1/20 = 0.05 10.5 - 15.5 2 3 3/20 = 0.15 15.5 - 20.5 3 6 6/20 = 0.30 20.5 - 25.5 5 11 11/20 = 0.55 25.5 - 30.5 4 15 15/20 = 0.75 30.5 - 35.5 3 18 18/20 = 0.90 35.5 - 40.5 2 20 20/20 = 1.00 f = 20 46 Ogives Ogives use upper class boundaries and cumulative frequencies of the classes. Cum. Rel. Class Boundaries Frequency Less than 5.5 0 Less than 10.5 0.05 Less than 15.5 0.15 Less than 20.5 0.30 Less than 25.5 0.55 Less than 30.5 0.75 Less than 35.5 0.90 Less than 40.5 1.00 47 Ogives Use the upper class boundaries and the cumulative relative frequencies. 48 Shapes of Distributions 49 Shapes of Distributions 50 Other Types of Graphs Stem and Leaf Plots A stem and leaf plots is a data plot that uses part of a data value as the stem and part of the data value as the leaf to form groups or classes. It has the advantage over grouped frequency distribution of retaining the actual data while showing them in graphic form. 51 Example At an outpatient testing center, the number of cardiograms performed each day for 20 days is shown. Construct a stem and leaf plot for the data. 25 31 20 32 13 14 43 2 57 23 36 32 33 32 44 32 52 44 51 45 Solution Step 1 Arrange the data in order: 02, 13, 14, 20, 23, 25, 31, 32, 32, 32, 32, 33, 36, 43, 44, 44, 45, 51, 52, 57 52 Step 2 Separate the data according to the first digit, as shown. 02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36 43, 44, 44, 45 51, 52, 57 Stem and Leaf Plot Stem Leaf 0 2 1 3 4 2 0 3 5 3 1 2 2 2 2 3 6 4 3 4 4 5 5 1 2 7 53 Example An insurance company researcher conducted a survey on the number of car thefts in a large city for a period of 30 days last summer. The raw data are shown. Construct a stem and leaf plot by using classes 50–54, 55–59, 60–64, 65–69,70–74, and 75–79. 52 62 51 50 69 58 77 66 53 57 75 56 55 67 73 79 59 68 65 72 57 51 63 69 75 65 53 78 66 55 Solution Step 1 Arrange the data in order: 54 Step 2 Separate the data according to the classes. Stem and Leaf Plot Stem Leaf 55 The Pie Graph: Pie graphs are used extensively in statistics. The purpose of the pie graph is to show the relationship of the parts to the whole The pie graph is used to represent the nominal or categorical variable A pie graph is a circle that is divided into sections according to the percentage of frequencies in each category of the distribution. Example: Construct a pie graph showing the blood types of the army inductees described in Example 2–1. The frequency distribution is repeated here. Step 3 Using a protractor, graph each section and write its name and corresponding percentage, as shown in following figure. Example The average amounts spent by college freshmen for school items are shown. Construct a pie graph. Electronics/computers $728 Dorm items $344 Clothing $ 141 Shoes $ 72 Solution: Convert the frequency to degrees, also the frequency to percent f f Degree = 360 Percent = 100 n n 728 728 El ect r onics 360 204 Electronics 100 56% 1285 1285 344 344 D o rm item s 360 96 Do rm item s 100 27% 1285 1285 141 141 Clothing 360 40 Clothing 100 11% 1285 1285 72 72 Shoes 360 20 Shoes 100 6% 1285 1285 Step 3 Using a protractor, graph each section and write its name and corresponding percentage, as shown in following figure. Bar Graphs When the data are qualitative or categorical, bar graphs can be used to represent the data. A bar graph can be drawn using either horizontal or vertical bars. A bar graph represents the data by using vertical or horizontal bars whose heights or lengths represent the frequencies of the data. Example: Bluman, Chapter 2 62 63 64 Pareto Charts A Pareto chart is used to represent a frequency distribution for a categorical variable, and the frequencies are displayed by the heights of vertical bars, which are arranged in order from highest to lowest. Example: 65 Solution Step 1 Arrange the data from the largest to smallest according to frequency. Step 2 Draw and label the x and y axes. Step 3 Draw the bars corresponding to the frequencies. 66 Pareto Charts The graph shows that the number of homeless people is about the same for Atlanta and Chicago and a lot less for Baltimore and St. Louis. 67 The Time Series Graph When data are collected over a period of time, they can be represented by a time series graph. Example The number of homicides that occurred in the workplace for the years 2003 to 2008 is shown. Draw a time series graph for the data. 68 Solution Step 1 Draw and label the x and y axes. Step 2 Label the x axis for years and the y axis for the number. Step 3 Plot each point according to the table. Step 4 Draw line segments connecting adjacent point. 69 There was a slight decrease in the years ’04, ’05, and ’06, compared to ’03, and again an increase in ’07. The largest decrease occurred in ’08. 70 Histograms Histograms use class boundaries and frequencies of the classes. Class Class Frequency Limits Boundaries 100 - 104 99.5 - 104.5 2 105 - 109 104.5 - 109.5 8 110 - 114 109.5 - 114.5 18 115 - 119 114.5 - 119.5 13 120 - 124 119.5 - 124.5 7 125 - 129 124.5 - 129.5 1 130 - 134 129.5 - 134.5 1 28 Histograms Histograms use class boundaries and frequencies of the classes. 29 Frequency Polygon The frequency polygon is a graph that displays the data by using lines that connect points plotted for the frequencies at the class midpoints. The frequencies are represented by the heights of the points. Steps 1: Draw and label the x and y axes. 2: Represent the midpoint, on the x axis. 3: Choose a suitable scale for the frequencies, and label it on the y axis. 4: Connect adjacent points with line segments. Draw a line back to the x axis at the beginning and end of the graph, at the same distance that the previous and next midpoints would be located. 30 Example 2-5 Construct a frequency polygon to represent the data for the record high temperatures for each of the 50 states. Class Frequency Limits 100 - 104 2 105 - 109 8 110 - 114 18 115 - 119 13 120 - 124 7 125 - 129 1 130 - 134 1 31 Frequency Polygons Frequency polygons use class midpoints and frequencies of the classes. Class Class Frequency Limits Midpoints 100 - 104 102 2 105 - 109 107 8 110 - 114 112 18 115 - 119 117 13 120 - 124 122 7 125 - 129 127 1 130 - 134 132 1 32 Frequency Polygons Frequency polygons use class midpoints and frequencies of the classes. 33 An Ogive (Cumulative Frequency Polygon The ogive is a graph that represents the cumulative frequencies for the classes in a frequency distribution. steps 1: Draw and label the x and y axes. 2: Represent the class boundaries on the x axis 3: Choose a suitable scale cumulative frequencies, and label it on the y axis. 4: Plot the points and then draw the bars or lines. 34 Example 2-6 Construct an ogive to represent the data for the record high temperatures for each of the 50 states (see Example 2–2 for the data). Class Frequency Limits 100 - 104 2 105 - 109 8 110 - 114 18 115 - 119 13 120 - 124 7 125 - 129 1 130 - 134 1 35 Solution Ogives use upper class boundaries and cumulative frequencies of the classes. Class Class Cumulative Frequency Limits Boundaries Frequency 100 - 104 99.5 - 104.5 2 2 105 - 109 104.5 - 109.5 8 10 110 - 114 109.5 - 114.5 18 28 115 - 119 114.5 - 119.5 13 41 120 - 124 119.5 - 124.5 7 48 125 - 129 124.5 - 129.5 1 49 130 - 134 129.5 - 134.5 1 50 36 Ogives Ogives use upper class boundaries and cumulative frequencies of the classes. Cumulative Class Boundaries Frequency Less than 99.5 0 Less than 104.5 2 Less than 109.5 10 Less than 114.5 28 Less than 119.5 41 Less than 124.5 48 Less than 129.5 49 Less than 134.5 50 37 An ogive (Cumulative Frequency Polygon) 38 Ogives Ogives use upper class boundaries and cumulative frequencies of the classes. 39 2.2 Relative Frequency Graphs If proportions are used instead of frequencies, the graphs are called relative frequency graphs. Relative frequency graphs are used when the proportion of data values that fall into a given class is more important than the actual number of data values that fall into that class. 40 Example 2-7 Page #57 Construct a histogram, frequency polygon, and ogive using relative frequencies for the distribution (shown here) of the miles that 20 randomly selected runners ran during a given week. Class Frequency Boundaries 5.5 - 10.5 1 10.5 - 15.5 2 15.5 - 20.5 3 20.5 - 25.5 5 25.5 - 30.5 4 30.5 - 35.5 3 35.5 - 40.5 2 41 Histograms The following is a frequency distribution of miles run per week by 20 selected runners. Divide each Class Relative Frequency frequency by Boundaries Frequency the total 5.5 - 10.5 1 frequency to 1/20 = 0.05 10.5 - 15.5 2 get the 2/20 = 0.10 15.5 - 20.5 3 relative 3/20 = 0.15 20.5 - 25.5 5 frequency. 5/20 = 0.25 25.5 - 30.5 4 4/20 = 0.20 30.5 - 35.5 3 3/20 = 0.15 35.5 - 40.5 2 2/20 = 0.10 f = 20 rf = 1.00 42 Histograms Use the class boundaries and the relative frequencies of the classes. 43 Frequency Polygons The following is a frequency distribution of miles run per week by 20 selected runners. Class Class Relative Boundaries Midpoints Frequency 5.5 - 10.5 8 0.05 10.5 - 15.5 13 0.10 15.5 - 20.5 18 0.15 20.5 - 25.5 23 0.25 25.5 - 30.5 28 0.20 30.5 - 35.5 33 0.15 35.5 - 40.5 38 0.10 44 Frequency Polygons Use the class midpoints and the relative frequencies of the classes. 45 Ogives The following is a frequency distribution of miles run per week by 20 selected runners. Class Cumulative Cum. Rel. Frequency Boundaries Frequency Frequency 5.5 - 10.5 1 1 1/20 = 0.05 10.5 - 15.5 2 3 3/20 = 0.15 15.5 - 20.5 3 6 6/20 = 0.30 20.5 - 25.5 5 11 11/20 = 0.55 25.5 - 30.5 4 15 15/20 = 0.75 30.5 - 35.5 3 18 18/20 = 0.90 35.5 - 40.5 2 20 20/20 = 1.00 f = 20 46 Ogives Ogives use upper class boundaries and cumulative frequencies of the classes. Cum. Rel. Class Boundaries Frequency Less than 5.5 0 Less than 10.5 0.05 Less than 15.5 0.15 Less than 20.5 0.30 Less than 25.5 0.55 Less than 30.5 0.75 Less than 35.5 0.90 Less than 40.5 1.00 47 Ogives Use the upper class boundaries and the cumulative relative frequencies. 48 Shapes of Distributions 49 Shapes of Distributions 50 Other Types of Graphs Stem and Leaf Plots A stem and leaf plots is a data plot that uses part of a data value as the stem and part of the data value as the leaf to form groups or classes. It has the advantage over grouped frequency distribution of retaining the actual data while showing them in graphic form. 51 Example At an outpatient testing center, the number of cardiograms performed each day for 20 days is shown. Construct a stem and leaf plot for the data. 25 31 20 32 13 14 43 2 57 23 36 32 33 32 44 32 52 44 51 45 Solution Step 1 Arrange the data in order: 02, 13, 14, 20, 23, 25, 31, 32, 32, 32, 32, 33, 36, 43, 44, 44, 45, 51, 52, 57 52 Step 2 Separate the data according to the first digit, as shown. 02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36 43, 44, 44, 45 51, 52, 57 Stem and Leaf Plot Stem Leaf 0 2 1 3 4 2 0 3 5 3 1 2 2 2 2 3 6 4 3 4 4 5 5 1 2 7 53 Example An insurance company researcher conducted a survey on the number of car thefts in a large city for a period of 30 days last summer. The raw data are shown. Construct a stem and leaf plot by using classes 50–54, 55–59, 60–64, 65–69,70–74, and 75–79. 52 62 51 50 69 58 77 66 53 57 75 56 55 67 73 79 59 68 65 72 57 51 63 69 75 65 53 78 66 55 Solution Step 1 Arrange the data in order: 54 Step 2 Separate the data according to the classes. Stem and Leaf Plot Stem Leaf 55 The Pie Graph: Pie graphs are used extensively in statistics. The purpose of the pie graph is to show the relationship of the parts to the whole The pie graph is used to represent the nominal or categorical variable A pie graph is a circle that is divided into sections according to the percentage of frequencies in each category of the distribution. Example: Construct a pie graph showing the blood types of the army inductees described in Example 2–1. The frequency distribution is repeated here. Step 3 Using a protractor, graph each section and write its name and corresponding percentage, as shown in following figure. Example The average amounts spent by college freshmen for school items are shown. Construct a pie graph. Electronics/computers $728 Dorm items $344 Clothing $ 141 Shoes $ 72 Solution: Convert the frequency to degrees, also the frequency to percent f f Degree = 360 Percent = 100 n n 728 728 El ect r onics 360 204 Electronics 100 56% 1285 1285 344 344 D o rm item s 360 96 Do rm item s 100 27% 1285 1285 141 141 Clothing 360 40 Clothing 100 11% 1285 1285 72 72 Shoes 360 20 Shoes 100 6% 1285 1285 Step 3 Using a protractor, graph each section and write its name and corresponding percentage, as shown in following figure. Bar Graphs When the data are qualitative or categorical, bar graphs can be used to represent the data. A bar graph can be drawn using either horizontal or vertical bars. A bar graph represents the data by using vertical or horizontal bars whose heights or lengths represent the frequencies of the data. Example: Bluman, Chapter 2 62 63 64 Pareto Charts A Pareto chart is used to represent a frequency distribution for a categorical variable, and the frequencies are displayed by the heights of vertical bars, which are arranged in order from highest to lowest. Example: 65 Solution Step 1 Arrange the data from the largest to smallest according to frequency. Step 2 Draw and label the x and y axes. Step 3 Draw the bars corresponding to the frequencies. 66 Pareto Charts The graph shows that the number of homeless people is about the same for Atlanta and Chicago and a lot less for Baltimore and St. Louis. 67 The Time Series Graph When data are collected over a period of time, they can be represented by a time series graph. Example The number of homicides that occurred in the workplace for the years 2003 to 2008 is shown. Draw a time series graph for the data. 68 Solution Step 1 Draw and label the x and y axes. Step 2 Label the x axis for years and the y axis for the number. Step 3 Plot each point according to the table. Step 4 Draw line segments connecting adjacent point. 69 There was a slight decrease in the years ’04, ’05, and ’06, compared to ’03, and again an increase in ’07. The largest decrease occurred in ’08. 70