Introduction to Basic Statistics PDF

BASIC STATISTICS What is Statistics? Statistics is a branch of applied mathematics that involves the collection, description, analysis, and inference of conclusions from quantitative data. Descriptive and Inferential Statistics The two major areas of statistics are known as descriptive statistics, which describes the properties of sample and population data, and inferential statistics, which uses those properties to test hypotheses and draw conclusions. Descriptive statistics include mean (average), variance, skewness, and kurtosis. Inferential statistics include linear regression analysis, analysis of variance (ANOVA), logit/ Probit models, and null hypothesis testing. Descriptive Statistics Descriptive statistics mostly focus on the central tendency, variability, and distribution of sample data. Central tendency means the estimate of the characteristics, a typical element of a sample or population. It includes descriptive statistics such as mean, median, and mode. Descriptive Statistics Variability refers to a set of statistics that show how much difference there is among the elements of a sample or population along the characteristics measured. It includes metrics such as range, variance, and standard deviation. Descriptive Statistics The distribution refers to the overall "shape" of the data, which can be depicted on a chart such as a histogram or a dot plot, and includes properties such as the probability distribution function, skewness, and kurtosis. Descriptive statistics can also describe differences between observed characteristics of the elements of a data set. They can help us understand the collective properties of the elements of a data sample and form the basis for testing hypotheses and making predictions using inferential statistics. Inferential Statistics Inferential statistics are tools that statisticians use to draw conclusions about the characteristics of a population, drawn from the characteristics of a sample, and to determine how certain they can be of the reliability of those conclusions. Based on the sample size and distribution, statisticians can calculate the probability that statistics, which measure the central tendency, variability, distribution, and relationships between characteristics within a data sample, provide an accurate picture of the corresponding parameters of the whole population from which the sample is drawn. Inferential Statistics Inferential statistics are used to make generalizations about large groups, such as estimating average demand for a product by surveying a sample of consumers' buying habits or attempting to predict future events. This might mean projecting the future return of a security or asset class based on returns in a sample period. Inferential Statistics Regression analysis is a widely used technique of statistical inference used to determine the strength and nature of the relationship (the correlation) between a dependent variable and one or more explanatory (independent) variables. The output of a regression model is often analyzed for statistical significance, which refers to the claim that a result from findings generated by testing or experimentation is not likely to have occurred randomly or by chance. It's likely to be attributable to a specific cause elucidated by the data. Understanding Statistical Data The root of statistics is driven by variables. A variable is a data set that can be counted that marks a characteristic or attribute of an item. For example, a car can have variables such as make, model, year, mileage, color, or condition. By combining the variables across a set of data, such as the colors of all cars in a given parking lot, statistics allows us to better understand trends and outcomes. Understanding Statistical Data There are two main types of variables. First, qualitative variables are specific attributes that are often non-numeric. Many of the examples given in the car example are qualitative. Other examples of qualitative variables in statistics are gender, eye color, or city of birth. Qualitative data is most often used to determine what percentage of an outcome occurs for any given qualitative variable. Qualitative analysis often does not rely on numbers. For example, trying to determine what percentage of women own a business analyzes qualitative data. Understanding Statistical Data The second type of variable in statistics is quantitative variables. Quantitative variables are studied numerically and only have weight when they're about a non-numerical descriptor. Similar to quantitative analysis, this information is rooted in numbers. In the car example above, the mileage driven is a quantitative variable, but the number 60,000 holds no value unless it is understood that is the total number of miles driven. Understanding Statistical Data Quantitative variables can be further broken into two categories. First, discrete variables have limitations in statistics and infer that there are gaps between potential discrete variable values. The number of points scored in a football game is a discrete variable because: ✓ There can be no decimals, and ✓ It is impossible for a team to score only one point Understanding Statistical Data Statistics also makes use of continuous quantitative variables. These values run along a scale. Discrete values have limitations, but continuous variables are often measured into decimals. Any value within possible limits can be obtained when measuring the height of the football players, and the heights can be measured down to 1/16th of an inch, if not further. Statistical Levels of Measurement There are several resulting levels of measurement after analyzing variables and outcomes. Statistics can quantify outcomes in four ways. Nominal-level Measurement There's no numerical or quantitative value, and qualities are not ranked. Nominal-level measurements are instead simply labels or categories assigned to other variables. It's easiest to think of nominal-level measurements as non- numerical facts about a variable. Example: The name of the President elected in 2020 was Joseph Robinette Biden, Jr. Statistical Levels of Measurement Ordinal-level Measurement Outcomes can be arranged in an order, but all data values have the same value or weight. Although numerical, ordinal-level measurements can't be subtracted against each other in statistics because only the position of the data point matters. Ordinal levels are often incorporated into nonparametric statistics and compared against the total variable group. Example: American Fred Kerley was the 2nd fastest man at the 2020 Tokyo Olympics based on 100-meter sprint times. Statistical Levels of Measurement Interval-level Measurement Outcomes can be arranged in order but differences between data values may now have meaning. Two data points are often used to compare the passing of time or changing conditions within a data set. There is often no "starting point" for the range of data values, and calendar dates or temperatures may not have a meaningful intrinsic zero value. Example: Inflation hit 8.6% in May 2022. The last time inflation was this high was in December 1981. Statistical Levels of Measurement Ratio-level Measurement Outcomes can be arranged in order and differences between data values now have meaning. But there's a starting point or "zero value" that can be used to further provide value to a statistical value. The ratio between data values has meaning, including its distance away from zero. Example: The lowest meteorological temperature recorded was -128.6 degrees Fahrenheit in Antarctica. Statistics Sampling Techniques It would often not be possible to gather data from every data point within a population to gather statistical information. Statistics relies instead on different sampling techniques to create a representative subset of the population that's easier to analyze. In statistics, there are several primary types of sampling in statistics. Statistics Sampling Techniques Simple Random Sampling Simple random sampling calls for every member within the population to have an equal chance of being selected for analysis. The entire population is used as the basis for sampling, and any random generator based on chance can select the sample items. For example, 100 individuals are lined up and 10 are chosen at random. Statistics Sampling Techniques Systemic Sampling Systematic sampling calls for a random sample as well, but its technique is slightly modified to make it easier to conduct. A single random number is generated and individuals are then selected at a specified regular interval until the sample size is complete. For example, 100 individuals are lined up and numbered. The 7th individual is selected for the sample followed by every subsequent 9th individual until 10 sample items have been selected. Statistics Sampling Techniques Stratified Sampling Stratified sampling calls for more control over your sample. The population is divided into subgroups based on similar characteristics. Then you calculate how many people from each subgroup would represent the entire population. For example, 100 individuals are grouped by gender and race. Then a sample from each subgroup is taken in proportion to how representative that subgroup is of the population. A public secondary school wishes to assess the students’ views of the quality of service of specific offices under student services. The population of 2000 students consists of: Year Level Population (N) Sample Size (n) First Year 550 Second Year 500 Third Year 500 Fourth Year 450 TOTAL N = 2000 n = 200 Using stratified random sampling procedure, determine the number of samples form each level based on a sample size of 200. Statistics Sampling Techniques Cluster Sampling Cluster sampling calls for subgroups as well, but each subgroup should be representative of the population. The entire subgroup is randomly selected instead of randomly selecting individuals within a subgroup.

Introduction to Basic Statistics PDF

Document Details

Tags

Related

Summary

Full Transcript