Statistics for Computer Programmers (MATH1236) - Niagara College PDF

Course title : Statistics for Computer Programmers Code :MATH1236 Instructor's name :Harpreet Kaur Session Number : 1 niagaracollegetoronto.ca MATH1236 Statistics for Computer Programming Lesson week- 1 Harpreet Kaur Course Information COURSE INFORMATION Course Title Statistics for Computer Course Code MATH1236 Programmers Requisites/Restrictions Credit Value 3 Delivery Type Onsite Program Coordinator Ridhi Patel Program Coordinator Contact Riddhi.patel@niagaracolleg etoronto.ca Developed by Harpreet Kaur Approved by Subagini Manivannan Harpreet Kaur Course Information Code Program Semester Term Credit Value Group Course Name NCT Computer MATH1236 4 W24 3 CP4.1 Statistics for Computer Programmer Programming Start Campus Delivery Credits Hours Start Date End Date Days End Time Time 22 College In-Campus 3 42 12-Dec-24 19-April-24 Friday 18:30:00 21:30:00 Harpreet Kaur Course Information Assessments and Week Topic Learning Objectives Activities Familiarize with the Course Lecture, ppt. and 1.Demonstrate knowledge Outline and TLP: Practice Questions of statistical language. 2. A Introduction to the learning Categorize data by type and objectives and outcome, level of measurement.3. G Familiarize with the evaluation Design experiments. Week methods and weightage Without computer software, E Statistics, Type of Statistics, Level organize and represent data 1 of Measurement, Design with frequency N Experiment, Pictorial distributions. D representation, Graphing, Frequency Distribution, A Harpreet Kaur Course Information LEARNING OUTCOMES Upon successful completion of this course, the student has reliably demonstrated the ability to: # Description 1 Present data with appropriate statistical language, both with and without computer software. 2 Analyze univariate data 3 Analyze bivariate data, and interpret the linear relationship between the two variables. 4 Determine probabilities by use of classical and empirical probabilities. 5 Evaluate probabilities and statistics for discrete probability distribution 6 Perform statistical calculations by use of the normal distribution. 7 Predict confidence intervals and required sample sizes. Harpreet Kaur Course Information Harpreet Kaur Introduction to the learning objectives and outcome Harpreet Kaur Learning Objectives and Outcomes Harpreet Kaur Learning Objectives and Outcomes Harpreet Kaur TLP Week # Topic /Theme/Chapter Assessments and Learning Objectives Activities Week 1 Familiarize with the Course Outline and TLP: Lecture, ppt. and 1.Demonstrate knowledge of statistical language. Introduction to the learning objectives and Practice Questions 2. Categorize data by type and level of outcome, Familiarize with the evaluation measurement.3. Design experiments. without methods and weightage computer software, organize and represent data with frequency distributions. Statistics, Type of Statistics, Level of Measurement, Design Experiment, Pictorial representation Week 2 Graphing, Frequency distribution, Stem and Leaf Lecture, ppt. and With and without computer software, organize plot, Measures of Location. Measures of Practice Questions and represent, and stem and leaf plots. 1. dispersion, Measure of Relative Standing, Using Summarize data by use of measures of central Excel to create and calculate them. tendency and variation, and verify calculations with software. Week 3 Measures of position and relative standing, Lecture, ppt. and 2. Apply the empirical rule for a normal empirical Rule, Z-score, Using Excel for Practice Questions distribution 3. Use z-scores and quartiles to distribution and measure of position, identify the position of data in a data set. 4. Use Introduction of R-Studio exploratory data analysis to discover various aspects of data. Week 4 Test-1 Based on week-1,2 and 3 contents Harpreet Kaur TLP Week # Topic /Theme/Chapter Assessments and Learning Objectives Activities Week 5 Descriptive Statistics by R-Studio. Lecture, ppt. and With and without computer software, Correlation coefficient, Data Practice Questions organize and represent data with frequency Analysis, scatter plot for bivariate distributions, histograms, and stem and leaf data by Excel for Correlation plots. 1. Import data to software for analysis coefficients. and representation. 2. Develop a scatter plot for bivariate data. 3. Calculate and interpret correlation coefficients. 4. Test and interpret the significance of a correlation for data. Week 6 Least Square Regression Line, Quiz-1 5. Calculate the least squares regression Coefficient of determination, line.6. Use the regression equation and Distribution, Measures of statistical software to perform statistical Correlation coefficient and predictions Regression line and prediction by R- studio Week 7 Test-2 as Midterms Based on W-5 and 6 Week 8 Reading Week (No Classes) Harpreet Kaur TLP Week # Topic /Theme/Chapter Assessments and Learning Objectives Activities Week 9 Probability, rule of addition and Lecture, ppt. and 1. Establish sample spaces and use them to determine the multiplication, Conditional Practice Questions probability of an event. 2. Calculate probabilities of Probability compound events using the addition and multiplication rules. 3. Calculate the conditional probability of an event. Week 10 Discrete distribution, Mean, Lecture, ppt. and 1. Construct a probability distribution for a discrete random Variance and standard deviation, Practice Questions variable. 2. Compute mean, variance, and standard Binomial probability. Calculation deviation for a discrete random variable. 3. Determine by R- studio probabilities in a binomial experiment. 4. Verify calculations through software. Week 11 Standard Normal distribution, Quiz-2 1. Compute the area under the standard normal distribution variable transformation and for any given z value. 2. Compute probabilities by use of extrapolation of data. standard normal variable transformation. 3. Extrapolate data values for given percentages or probabilities. Week 12 Test-3 based on contents of week-9 and 10 Harpreet Kaur TLP Week # Topic /Theme/Chapter Assessments and Learning Objectives Activities Week 13 Confidence interval for small and Lecture, ppt. and 1. Construct the confidence interval for a large sample with and without Practice Questions population mean for large samples. 2. Calculate standard deviation and for the sample size necessary to establish a population. confidence interval for a mean. 3. Construct the confidence interval for a population mean for small samples. Week 14 Confidence interval for small and Lecture, ppt. and 1. Construct the confidence interval for a large sample with and without Practice Questions population mean for large samples. 2. Calculate standard deviation and for the sample size necessary to establish a population with Excel and R- confidence interval for a mean. 3. Construct the Studio confidence interval for a population mean for small samples. Week 15 Test-4 Based on week 11, 13 and 14 Harpreet Kaur Familiarize with the evaluation methods and weightage Harpreet Kaur Evaluation Details Percentage of Number Type of Evaluation/Assessment Final Grade Quiz 1 (6%) - Week 6 E 1 12.00% Quiz 2 (6%) - Week 11 V A Tests 1 (22%) - Week 4 L Tests 2 (22%) - Week 7 U 3 88.00% A Tests 3 (22%) - Week 12 T Tests 4 (22%) -Week 15 I O Total 100% N Harpreet Kaur Software Requirement Harpreet Kaur Software Requirement MICROSOFT EXCEL and R-Studio you will build an understanding of both R-Studio software and Microsoft excel, and their ability to assist in statistical calculations. You will learn how to input statistical data into the software as well as how to interpret the results in an applied context. Harpreet Kaur Statistics Harpreet Kaur Statistics Statistics The term statistics can refer to numerical facts such as averages, medians, percents, and index numbers that help us understand a variety of business and economic situations. Statistics can also refer to the art and science of collecting, analyzing, presenting, and interpreting data. Harpreet Kaur Population, Samples and Processes Engineers and scientists are constantly exposed to collections of facts, or data, both in their professional capacities and in everyday activities. The discipline of statistics provides methods for organizing and summarizing data and for drawing conclusions based on information contained in the data. An investigation will typically focus on a well-defined collection of objects constituting a population of interest. In one study, the population might consist of all gelatin capsules of a particular type produced during a specified period. Harpreet Kaur Population, Samples and Processes Another investigation might involve the population consisting of all individuals who received a B.Sci. in engineering during the most recent academic year. When desired information is available for all objects in the population, we have what is called a census. Constraints on time, money, and other scarce resources usually make a census impractical or infeasible. Instead, a subset of the population—a sample—is selected in some prescribed manner. Harpreet Kaur Data and Data Sets ❖ Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation. ❖ All the data collected in a particular study are referred to as the data set for the study. Harpreet Kaur Elements, Variables, and Observations ◼ Elements are the entities on which data are collected. ◼ A variable is a characteristic of interest for the elements. ◼ The set of measurements obtained for a particular element is called an observation. ◼ A data set with n elements contains n observations. ◼ The total number of data values in a complete data set is the number of elements multiplied by the number of variables. Harpreet Kaur Data, Data Sets, Elements, Variables, and Observations Variables Stock Annual Earn/ Company Exchange Sales($M) Share($) Dataram NQ 73.10 0.86 EnergySouth N 74.00 1.67 Keystone N 365.70 0.86 LandCare NQ 111.40 0.33 Psychemedics N 17.60 0.13 Data Set Harpreet Kaur Variable A variable is any characteristic whose value may change from one object to another in the population. We shall initially denote variables by lowercase letters from the end of our alphabet. Examples include x = brand of calculator owned by a student y = number of visits to a particular Web site during a specified period z = braking distance of an automobile under specified conditions Harpreet Kaur Univariate Data Data results from making observations either on a single variable or simultaneously on two or more variables. A univariate data set consists of observations on a single variable. For example, we might determine the type of transmission, automatic (A) or manual (M), on each of ten automobiles recently purchased at a certain dealership, resulting in the categorical data set M A A A M A A M A A Harpreet Kaur Bivariate Data The following sample of lifetimes (hours) of brand D batteries put to a certain use is a numerical univariate data set: 5.6 5.1 6.2 6.0 5.8 6.5 5.8 5.5 We have bivariate data when observations are made on each of two variables. Our data set might consist of a (height, weight) pair for each basketball player on a team, with the first observation as (72, 168), the second as (75, 212), and so on. Harpreet Kaur Multivariate Data If an engineer determines the value of both x = component lifetime and y = reason for component failure, the resulting data set is bivariate with one variable numerical and the other categorical. Multivariate data arises when observations are made on more than one variable (so bivariate is a special case of multivariate). For example, a research physician might determine the systolic blood pressure, diastolic blood pressure, and serum cholesterol level for each patient participating in a study. Harpreet Kaur Multivariate Data Each observation would be a triple of numbers, such as (120, 80, 146). In many multivariate data sets, some variables are numerical and others are categorical. Thus the annual automobile issue of Consumer Reports gives values of such variables as type of vehicle (small, sporty, compact, mid-size, large), city fuel efficiency (mpg), highway fuel efficiency (mpg), drivetrain type (rear wheel, front wheel, four wheel), and so on. Harpreet Kaur Branches of Statistics Harpreet Kaur Branches of Statistics An investigator who has collected data may wish simply to summarize and describe important features of the data. This entails using methods from descriptive statistics. Some of these methods are graphical in nature; the construction of histograms, boxplots, and scatter plots are primary examples. Other descriptive methods involve calculation of numerical summary measures, such as means, standard deviations, and correlation coefficients. The wide availability of statistical computer software packages has made these tasks much easier to carry out than they used to 32 be. Harpreet Kaur Branches of Statistics Computers are much more efficient than human beings at calculation and the creation of pictures (once they have received appropriate instructions from the user!). This means that the investigator doesn’t have to expend much effort on “grunt work” and will have more time to study the data and extract important messages. We are using R-Studio to investigate many topics in this course. 33 Harpreet Kaur Example 1 Charity is a big business in the United States. The Web site charitynavigator.com gives information on roughly 5500 charitable organizations, and there are many smaller charities that fly below the navigator’s radar screen. Some charities operate very efficiently, with fundraising and administrative expenses that are only a small percentage of total expenses, whereas others spend a high percentage of what they take in on such activities. 34 Harpreet Kaur Example 1 Here is data on fundraising expenses as a percentage of total expenditures for a random sample of 60 charities: 6.1 12.6 34.7 1.6 18.8 2.2 3.0 2.2 5.6 3.8 2.2 3.1 1.3 1.1 14.1 4.0 21.0 6.1 1.3 20.4 7.5 3.9 10.1 8.1 19.5 5.2 12.0 15.8 10.4 5.2 6.4 10.8 83.1 3.6 6.2 6.3 16.3 12.7 1.3 0.8 8.8 5.1 3.7 26.3 6.0 48.0 8.2 11.7 7.2 3.9 15.3 16.6 8.8 12.0 4.7 14.7 6.4 17.0 2.5 16.2 35 Harpreet Kaur Example 1 Without any organization, it is difficult to get a sense of the data’s most prominent features—what a typical (i.e. representative) value might be, whether values are highly concentrated about a typical value or quite dispersed, whether there are any gaps in the data, what fraction of the values are less than 20%, and so on. 36 Harpreet Kaur Example 1 Figure 1 shows what is called a stem-and-leaf display as well as a histogram. A Minitab stem-and-leaf display (tenths digit truncated) and histogram for the charity fundraising percentage data Figure 1 37 Harpreet Kaur Branches of Statistics Clearly a substantial majority of the charities in the sample spend less than 20% on fundraising, and only a few percentages might be viewed as beyond the bounds of sensible practice. Having obtained a sample from a population, an investigator would frequently like to use sample information to draw some type of conclusion (make an inference of some sort) about the population. That is, the sample is a means to an end rather than an end in itself. Techniques for generalizing from a sample to a population are gathered within the branch of our discipline called inferential statistics. 38 Harpreet Kaur Scope of Modern Statistics Harpreet Kaur Scope of Modern Statistics These days statistical methodology is employed by investigators in virtually all disciplines, including such areas as molecular biology (analysis of microarray data) ecology (describing quantitatively how individuals in various animal and plant populations are spatially distributed) materials engineering (studying properties of various treatments to retard corrosion) Harpreet Kaur Scope of Modern Statistics marketing (developing market surveys and strategies for marketing new products) public health (identifying sources of diseases and ways to treat them) civil engineering (assessing the effects of stress on structural elements and the impacts of traffic flows on communities) Harpreet Kaur Levels of Measurement Levels of measurement include: Nominal Interval Ordinal Ratio The scale determines the amount of information contained in the data. The scale indicates the data summarization and statistical analyses that are most appropriate. Harpreet Kaur Levels of Measurement Nominal Data are labels or names used to identify an attribute of the element. A nonnumeric label or numeric code may be used. Example: Students of a university are classified by the school in which they are enrolled using a nonnumeric label such as Business, Humanities, Education, and so on. Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on). Harpreet Kaur Levels of Measurement Ordinal The data have the properties of nominal data and the order or rank of the data is meaningful. A nonnumeric label or numeric code may be used. Example: Students of a university are classified by their class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on). Harpreet Kaur Levels of Measurement Interval The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure. Interval data are always numeric. Example: Melissa has an SAT score of 1985, while Kevin has an SAT score of 1880. Melissa scored 105 points more than Kevin. Harpreet Kaur Levels of Measurement Ratio The data have all the properties of interval data and the ratio of two values is meaningful. Variables such as distance, height, weight, and time use the ratio scale. This scale must contain a zero value that indicates that nothing exists for the variable at the zero point. Example: Melissa’s college record shows 36 credit hours earned, while Kevin’s record shows 72 credit hours earned. Kevin has twice as many credit hours earned as Melissa. Harpreet Kaur Categorical and Quantitative Data Data can be further classified as being categorical or quantitative. The statistical analysis that is appropriate depends on whether the data for the variable are categorical or quantitative. In general, there are more alternatives for statistical analysis when the data are quantitative. Harpreet Kaur Categorical Data Labels or names used to identify an attribute of each element Often referred to as qualitative data Use either the nominal or ordinal scale of measurement Can be either numeric or nonnumeric Appropriate statistical analyses are rather limited Harpreet Kaur Quantitative Data Quantitative data indicate how many or how much: discrete, if measuring how many continuous, if measuring how much Quantitative data are always numeric. Ordinary arithmetic operations are meaningful for quantitative data. Harpreet Kaur Level of Measurement Data Categorical Quantitative Numeric Non-numeric Numeric Nominal Ordinal Nominal Ordinal Interval Ratio Harpreet Kaur Design Experiment What is Experimental Design? An experimental design is a detailed plan for collecting and using data to identify causal relationships. Through careful planning, the design of experiments allows your data collection efforts to have a reasonable chance of detecting effects and testing hypotheses that answer your research questions. An experiment is a data collection procedure that occurs in controlled conditions to identify and understand causal relationships between variables. Researchers can use many potential designs. The ultimate choice depends on their research question, resources, goals, and constraints. In some fields of study, researchers refer to experimental design as the design of experiments (DOE). Both terms are synonymous. Harpreet Kaur Design Experiment At its most basic level, an experiment involves researchers manipulating at least one independent variable (aka factors or inputs) under controlled conditions, and they measure the changes in the dependent variable (outcomes). An effective experimental design develops a systematic procedure that increases the ability to draw meaningful conclusions from the data and reduces the interference of other variables that the researchers aren’t studying. Ultimately, the design of experiments helps ensure that your procedures and data will evaluate your research question effectively. Without an experimental design, you might waste your efforts in a process that, for many potential reasons, can’t answer your research question. In short, it helps you trust your results. Harpreet Kaur Design Experiment Design of Experiments: Goals & Settings Experiments occur in many settings, ranging from psychology, social sciences, medicine, physics, engineering, and industrial and service sectors. Typically, experimental goals are to discover a previously unknown effect, confirm a known effect, or test a hypothesis. Effects represent causal relationships between variables. For example, in a medical experiment, does the new medicine cause an improvement in health outcomes? If so, the medicine has a causal effect on the outcome. An experimental design’s focus depends on the subject area and can include the following goals: Understanding the relationships between variables. Identifying the variables that have the largest impact on the outcomes. Finding the input variable settings that produce an optimal result. Harpreet Kaur Design Experiment Design of Experiments: Goals & Settings For example, psychologists have conducted experiments to understand how conformity affects decision-making. Sociologists have performed experiments to determine whether ethnicity affects the public reaction to staged bike thefts. These experiments map out the causal relationships between variables, and their primary goal is to understand the role of various factors. Conversely, in a manufacturing environment, the researchers might use an experimental design to find the factors that most effectively improve their product’s strength, identify the optimal manufacturing settings, and do all that while accounting for various constraints. In short, a manufacturer’s goal is often to use experiments to improve their products cost-effectively. In a medical experiment, the goal might be to quantify the medicine’s effect and find the optimum dosage. Harpreet Kaur Design Experiment Developing an Experimental Design Developing an experimental design involves planning that maximizes the potential to collect data that is both trustworthy and able to detect causal relationships. Specifically, these studies aim to see effects when they exist in the population the researchers are studying, preferentially favor causal effects, isolate each factor’s true effect from potential confounders, and produce conclusions that you can generalize to the real world. To accomplish these goals, experimental designs carefully manage data validity and reliability, and internal and external experimental validity. When your experiment is valid and reliable, you can expect your procedures and data produce trustworthy results. An excellent experimental design involves the following: Lots of preplanning. Developing experimental treatments. Determining how to assign subjects to treatment groups. Harpreet Kaur Pictorial and Tabular Methods in Descriptive Statistics 56 Harpreet Kaur Type of Charts Descriptive statistics can be divided into two general subject areas. In this section, we consider representing a data set using visual techniques. We are taking care of many visual techniques may already be familiar to you: frequency tables, tally sheets, histograms, pie charts, bar graphs, scatter diagrams. 57 Harpreet Kaur Type of Charts A statistics professor collects information about the classification of her students as freshmen, sophomores, juniors, or seniors. The data she collects are summarized in the pie chart. What type of data does this graph show? Type of Student Percentage Freshman 65% Sophomore 20% Junior 10% Senior 5% Harpreet Kaur Type of Charts The registrar at State University keeps records of the number of credit hours students complete each semester. The data he collects are summarized in the histogram. The class boundaries are 10 to less than 13, 13 to less than 16, 16 to less than 19, 19 to less than 22, and 22 to less than 25. Credit Hours Number of Students 10 – 13 250 13 – 16 578 16 – 19 727 19 – 22 620 22 – 25 258 Harpreet Kaur Type of Charts Omitting Categories/Missing Data The table displays Ethnicity of Students but is missing the "Other/Unknown" category. This category contains people who did not feel they fit into any of the ethnicity categories or declined to respond. Notice that the frequencies do not add up to the total number of students. In this situation, create a bar graph and not a pie chart. Table Ethnicity of Students at De Anza College Fall Term 2007 (Census Day) Frequency Percent Asian 8,794 36.1% Black 1,412 5.8% Filipino 1,298 5.3% Hispanic 4,180 17.1% Native American 146 0.6% Pacific Islander 236 1.0% White 5,978 24.5% TOTAL 22,044 out of 24,382 90.4% out of 100% Harpreet Kaur Type of Charts Frequency Percent Asian 8,794 36.1% Black 1,412 5.8% Filipino 1,298 5.3% Hispanic 4,180 17.1% Native American 146 0.6% Pacific Islander 236 1.0% White 5,978 24.5% TOTAL 22,044 out of 24,382 90.4% out of 100% Harpreet Kaur Frequency Data value Frequency 2 3 3 5 4 3 5 6 6 2 7 1 Table : Frequency Table of Student Work Hours A frequency is the number of times a value of the data occurs. According to Table , there are three students who work two hours, five students who work three hours, and so on. The sum of the values in the frequency column, 20, represents the total number of students included in the sample. Harpreet Kaur Relative Frequency A relative frequency is the ratio (fraction or proportion) of the number of times a value of the Data value Frequency Relative frequency data occurs in the set of all 2 3 3 or 0.15 20 outcomes to the total number 5 3 5 or 0.25 of outcomes. To find the relative 20 4 3 3 frequencies, divide each 20 or 0.15 frequency by the total number 5 6 6 or 0.30 20 of students in the sample–in 6 2 2 or 0.10 this case, 20. Relative 20 1 7 1 or 0.05 frequencies can be written as 20 fractions, percents, or decimals. Table : Frequency Table of Student Work Hours with Relative Frequencies 20 The sum of the values in the relative frequency column of Table is , or 1. 20 Harpreet Kaur Cumulative Relative Frequency Cumulative relative frequency is the Data value Frequency Relative Frequency Cumulative Relative Frequency 2 3 3 0.15 accumulation of the or 0.15 20 3 5 5 0.15 + 0.25 = 0.40 previous relative 20 or 0.25 frequencies. To find the 4 3 3 or 0.15 0.40 + 0.15 = 0.55 20 cumulative relative 6 5 6 or 0.30 0.55 + 0.30 = 0.85 frequencies, add all the 20 6 2 2 0.85 + 0.10 = 0.95 previous relative 20 or 0.10 frequencies to the relative 7 1 1 or 0.05 0.95 + 0.05 = 1.00 20 frequency for the current row, as shown in Table. Table: Frequency Table of Student Work Hours with Relative and Cumulative Relative Frequencies The last entry of the cumulative relative frequency column is one, indicating that one hundred percent of the data has been accumulated. NOTE : Because of rounding, the relative frequency column may not always sum to one, and the last entry in the cumulative relative frequency column may not be one. However, they each should be close to one. Harpreet Kaur All Together Table represents the heights, in inches, of a sample of 100 male semiprofessional soccer players. Heights (inches) Frequency Relative frequency Cumulative relative frequency 59.95–61.95 5 5 0.05 = 0.05 100 61.95–63.95 3 3 0.05 + 0.03 = 0.08 = 0.03 100 63.95–65.95 15 15 0.08 + 0.15 = 0.23 = 0.15 100 65.95–67.95 40 40 0.23 + 0.40 = 0.63 = 0.40 100 67.95–69.95 17 17 0.63 + 0.17 = 0.80 = 0.17 100 69.95–71.95 12 12 0.80 + 0.12 = 0.92 = 0.12 100 71.95–73.95 7 7 0.92 + 0.07 = 0.99 = 0.07 100 73.95–75.95 1 1 0.99 + 0.01 = 1.00 = 0.01 100 Total = 100 Total = 1.00 Table: Frequency Table of Soccer Player Height Harpreet Kaur All Together The data in this table have been grouped into the following intervals: 59.95 to 61.95 inches 61.95 to 63.95 inches 63.95 to 65.95 inches 65.95 to 67.95 inches 67.95 to 69.95 inches 69.95 to 71.95 inches 71.95 to 73.95 inches 73.95 to 75.95 inches In this sample, there are five players whose heights fall within the interval 59.95–61.95 inches, three players whose heights fall within the interval 61.95–63.95 inches, 15 players whose heights fall within the interval 63.95–65.95 inches, 40 players whose heights fall within the interval 65.95–67.95 inches, 17 players whose heights fall within the interval 67.95–69.95 inches, 12 players whose heights fall within the interval 69.95– 71.95, seven players whose heights fall within the interval 71.95–73.95, and one player whose heights fall within the interval 73.95–75.95. All heights fall between the endpoints of an interval and not at the endpoints. Harpreet Kaur Frequency Heights (inches) Frequency Relative frequency Cumulative relative frequency 59.95–61.95 5 0.05 0.05 61.95–63.95 3 0.03 0.08 63.95–65.95 15 0.15 0.23 65.95–67.95 40 0.40 0.63 67.95–69.95 17 0.17 0.80 69.95–71.95 12 0.12 0.92 71.95–73.95 7 0.07 0.99 73.95–75.95 1 0.01 1.00 Total = 100 Total = 1.00 Use the heights of the 100 male semiprofessional soccer players in Table. Fill in the blanks with your answers. a. The percentage of heights that are from 67.95 to 71.95 inches is: ____. b. The percentage of heights that are from 67.95 to 73.95 inches is: ____. c. The percentage of heights that are more than 65.95 inches is: ____. d. The number of players in the sample who are between 61.95 and 71.95 inches tall is: ____. e. What kind of data are the heights? Harpreet Kaur Tabular Methods in Descriptive Statistics Harpreet Kaur CONSTRUCTING FREQUENCY DISTRIBUTION : Harpreet Kaur Constructing frequency distribution : EXAMPLE : Barry Bonds of the San Francisco Giants established a new single-season Major League Baseball home run record by hitting 75 home runs during the 2001 season. Listed below is the sorted distance of each of the 75 home runs 320 320 347 350 360 360 360 361 365 370 370 375 375 375 375 380 380 380 380 380 380 390 390 390 394 396 400 400 400 400 405 410 410 410 410 410 410 410 410 410 410 410 411 415 415 416 417 417 420 420 420 420 420 420 420 420 429 430 430 430 430 430 435 435 436 440 440 440 440 440 450 480 488 490 496 Harpreet Kaur Constructing frequency distribution : STEP – 1 Decide the number of classes “ 2 to the k rule ” So n is 75. 26 = 64 , 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 75. 27 = 128 , 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑚𝑜𝑟𝑒 𝑡ℎ𝑎𝑛 75. 𝑆𝑜 𝑟𝑒𝑐𝑜𝑚𝑒𝑛𝑑𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 𝑖𝑠 7. k = 7 STEP – 2 Determine the class interval 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑉𝑎𝑙𝑢𝑒 −𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑉𝑎𝑙𝑢𝑒 Class interval (i) ≥ 𝑘 Harpreet Kaur Constructing frequency distribution : 496 − 320 i ≥ = 25.142 7 we rounded up to a convenient number higher multiple of 10 i.e. 30 STEP – 3 Set the individual class limits. It is possible to start at 300 up to 330 310 – 340 OR 330 up to 360 315 – 345 also 360 up to 390 and so on………… Harpreet Kaur Constructing frequency distribution : STEP – 4 Tally the distances into the classes CLASS INTERVAL FREQUENCY 300 − 330 330 − 360 360 − 390 390 − 420 420 − 450 450 − 480 480 − 510 Harpreet Kaur Constructing frequency distribution : STEP – 5 Count the number of items in each class CLASS INTERVAL FREQUENCY FREQUENCY 300 − 330 2 330 − 360 2 360 − 390 17 390 − 420 27 420 − 450 22 450 − 480 1 480 − 510 4 Harpreet Kaur Constructing frequency distribution : CLASS FREQUENCY CUMULATIVE RELATIVE CUMULATIVE INTERVAL FREQUENCY FREQUENCY RELATIVE FREQUENCY 300 − 330 2 2 𝟐 0.03 = 𝟎. 𝟎𝟑 𝟕𝟓 330 − 360 2 4 𝟐 0.06 = 𝟎. 𝟎𝟑 𝟕𝟓 360 − 390 17 21 𝟏𝟕 0.29 = 𝟎. 𝟐𝟑 𝟕𝟓 390 − 420 27 48 𝟐𝟕 0.65 = 𝟎. 𝟑𝟔 𝟕𝟓 420 − 450 22 70 𝟐𝟐 0.94 = 𝟎. 𝟐𝟗 𝟕𝟓 450 − 480 1 71 𝟏 0.95 = 𝟎. 𝟎𝟏 𝟕𝟓 480 − 510 4 75 𝟒 1.00 = 𝟎. 𝟎𝟓 𝟕𝟓 TOTAL 75 =𝟏 Harpreet Kaur Constructing Histogram : HISTOGRAM : a graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the height of bars, and the bars are drawn adjacent to each other Histogram 30 25 20 FREQUENCY 15 10 5 0 300-330 330-360 360-390 390-420 4 20-450 450-480 480-510 CLASS INTERVAL Harpreet Kaur Constructing Frequency Polygon : FREQUENCY POLYGON : also shows the shape of distribution and is similar to histogram. 30 25 20 15 FREQUENCY 10 5 0 300 - 330 330 - 360 360 - 390 390 - 420 420 - 450 450 - 480 480 - 510 Harpreet Kaur Constructing Cumulative Frequency Polygon : 80 70 60 FREQUENCY 50 40 CUMULATIVE 30 20 10 0 300 - 330 330 - 360 360 - 390 390 - 420 420 - 450 450 - 480 480 - 510 DISTANCE Harpreet Kaur Week Topic S ✓ Explore LMS(Canvas) features, learn to communicate with instructors and peers U through Canvas. ✓ Familiarize with the TLP: Introduction to the topics and its learning objectives and M outcome M ✓ Familiarize with the evaluation methods and weightage E Week 1 ✓ Software required ✓ Statistics, Type of Statistics, Level of Measurement, Design Experiment R ✓ Pictorial Representation Y ✓ Constructing Frequency Distribution Harpreet Kaur Any Questions Harpreet Kaur Assessments and Week Topic Learning Objectives Activities Stem and Leaf plot, Lecture, ppt. and With and without computer N Measures of Location. Measures of dispersion, Practice Questions software, organize and represent data with frequency distributions, histograms, and E Measure of Relative stem and leaf plots. Week Standing, Using Excel to 1. Summarize data by use of create and calculate them. X 2 measures of central tendency and variation, and verify T calculations with software. Harpreet Kaur

Statistics for Computer Programmers (MATH1236) - Niagara College PDF

Document Details

Tags

Related

Summary

Full Transcript