Statistics and Probability PDF
Document Details
Uploaded by ComprehensiveHilbert
Tags
Related
- Statistics And Probability STAT 303 PDF
- JG University Probability and Statistics Sample Mid-Term Exam Paper 2023-2024 PDF
- JG University Probability and Statistics Mid-Term Exam 2023-24 PDF
- Chapter 8 - Statistics and Probability PDF
- Probability & Statistics 2024-2025 PDF
- Mathematics of Data Management Notes (Lower Canada College)
Summary
This document provides an introduction to statistics and probability, discussing different data types, variable categories, and examples of statistical application. It covers both qualitative and quantitative data, discrete and continuous variables, and includes examples of statistical use cases.
Full Transcript
Statistics and Probability Intro… Statistics and probability are essential as these branches of science lead the basic foundation of all data science domains, ie., artificial intelligence, machine learning algorithms, and deep learning. Statistics and prob...
Statistics and Probability Intro… Statistics and probability are essential as these branches of science lead the basic foundation of all data science domains, ie., artificial intelligence, machine learning algorithms, and deep learning. Statistics and probability are foundations to execute the concepts of trending technologies. Data Types Data Types Data is a set of numbers or some sort of documents that are stored on a computer. Definition of data Data is "Facts and statistics collected for reference or analysis." Importance in generating insights for better business decisions. Data as a pervasive element in everyday interactions. Each click on your computer or mobile systems generates data. The data is so important that the generated data provides size for analysis and support to make better decisions in business. Data is divided into two major subcategories, that is, qualitative data and quantitative data Qualitative data: This type of data can be observed subjectively and deals with characteristics and descriptors that cannot be easily measured. It is further divided into nominal and ordinal data. The nominal data is any sort of data that doesn’t have any order or ranking. An example of nominal data is gender. Now, there is no ranking in gender, and there is only male, female, or other. There is no 1, 2, 3, 4, or any sort of ordering in gender. The race is another sample of nominal data. The ordinal data is an ordered series of information. Examples: In restaurant, Customer ratings (good, average, bad). Any data which has some sort of sequence or some sort of order to rate is known as ordinal data. Quantitative data: Quantitative data concerns measurable things with numbers The quantitative data can further be divided into discrete and continuous data Discrete data Data with a limited number of possible values. Examples: Number of students in a class which means you can’t have an unlimited number of students in a class. continuous data. The continuous data concerns an unlimited number of possible values, e.g., the weight of a person. For example, the weight can be 50kg, or 50.1kg, or 50.001 kg or 50.0001 or 50.023kg, and so on. There are an unlimited number of possible values, so this is an example of continuous data. Variable Types What are Variables? Variables as storage units for data. Different types of variables, i.e., discrete variables and continuous valuables Discrete variable The discrete variable is also known as a categorical variable which can hold values of different categories Example: Message type (spam, non-spam) Continuous Variables Variables storing an unlimited number of values. Examples: Weight of a person. If you associate any sort of data with the variable, then it will become either a discrete variable or continuous variable that is also a dependent and independent type of variable Dependent Variable: Values depend on other variables. Independent Variable: Values do not depend on other variables. Statistics What is Statistics? Definition: "Statistics is an area of Applied Mathematics which is concerned with data collection, analysis, interpretation, and presentation." Role in visualization and solving complex problems. The statistical methods are used to visualize data and collect data for interpretation of information. The methods to solve complex problems using data are statistical methods. Example For example, to treat cancer patients the organization has created a new drug. What will be the process to carry out the test to satisfy the effectiveness of drug? This can be solved using a statistical approach that you have to create a test that can confirm the effectiveness of the drug. This is a common problem that can be solved using statistics. Another example is as follows: The latest sales data is made available to you and your team leader has to work on the strategy to improve business of the company and needs to identify the places where scopes of improvement are better and prepare a report for management. This problem involves a lot of data analysis. You have to look at the different variables that are causing your business to go down, or you have to look at the variables that are increasing the performance of your models and grow your business. So, this involves a lot of data analysis, and the basic idea behind data analysis is to use statistical techniques to figure out the relationship between different variables or different components in your business Types of Statistics Descriptive Statistics Inferential Statistics Descriptive Statistics: Summarizes and describes features of a dataset. This includes measures like mean, median, mode, standard deviation, and variance. Inferential Statistics: Makes inferences and predictions about a population based on a sample. Descriptive statistics Summarizes and describes features of a dataset. The method that is used to describe and understand the features of specific datasets, by giving a summary of data, is called descriptive statistics It mainly focuses on the characteristics of data and graphical summary of the data. Focuses on data characteristics and graphical summaries. Example: Measuring student uniform sizes in an institute. For example, an institute wants to procure and distribute uniforms to the students. It needs to study the average size of the student’s uniform in the institute. Here, it needs to take measurements of all the students for the uniform. From all measurement data, we need to conclude the small, average, and large size of the measurement, so that the vendor can supply uniforms of three variable sizes instead of considering each student’s uniform separately. This describes the descriptive statistics concept There are four major types of descriptive statistics Measures of Frequency —Count, percent, frequency, etc. Measures of Central Tendency —Mean, median, and mode. Measures of Dispersion or Variation —Range, variance, standard deviation, etc. Measures of Position —Percentile ranks and quartile ranks. Inferential statistics. The method that makes inferences and predictions based on the sample datasets about the population is nothing but inferential statistics. It allows you to infer data parameters using sample data Generalizes large datasets and applies probability. Describes situations or phenomena based on hypothetical situations. Example: Inferring uniform sizes from a sample of students. Consider the example of finalizing the measurement of uniform size for the students; in inferential statistics, we can take the sample size of the class, which means a few students from the entire class. Students are already categorized into measurement classes of small, average, and large. In this method, you build the statistical model and expand it for the entire population in the class Key Terminologies in Statistics Population and Sample Population: A set of individual objects or events whose properties are to be analyzed. Sample: A subset of the population representing the entire population. choose the sample in such a way that it presents the entire population. Sample should not focus on part of the population, but represent the entire. The Importance of Sampling in Statistics Helps deduce statistical knowledge about a population. Example: Surveying pizza eating habits in the USA. Example As the population of USA is 42 million, and number is still growing, it is difficult to survey each of these 42 million individuals about their health. It might be possible, but this will take it ever to do. So it’s not reasonable to go around each row and ask for what your pizza eating status is. That’s why sampling is used as a method in a sample study to draw inference about the entire population. Sampling method helps to study the entire population instead of considering entire individual population and trying to find out solutions. The statistical analysis represents inference of total population Sampling Techniques and Probability Types of Sampling Techniques are of two types, that is, probability sampling and non-probability sampling. Probability Sampling The samples chosen from a large population using theory of probability are known as probability sampling. The types of probability sampling are : Simple Random Sampling Systematic Sampling Stratified Sampling Cluster Sampling 1) Random sampling: In the random sampling method, each member of the population has an equal chance of being selected in the sample. Every individual or every object in the population has an equal chance of being a part of the sample Methods: Random number generation, lottery method. Example: Selecting 10 students from a class of 50 by drawing names from a hat 2) Systematic sampling: In systematic sampling, every n" record is chosen from the population to be a part of the sample. Now as shown in Figure, out of these groups every second person is chosen as a sample Method: Choose a starting point at random, then pick every k-th element. Example: Surveying every 10th customer entering a store. 3) Stratified Sampling Definition: Dividing the population into strata and randomly sampling from each stratum. striatum is used to form samples from a large population. A stratum is a subset of the population that shares at least one common characteristic Purpose: Ensures representation of all subgroups. Example: Sampling students from different grades in a school. For example the population of men and women can be identified by gender and that is striatum. It is a subset of the population that shares at least one common characteristic. The random sample can be possible after creating striatum for choosing final sample 4) Cluster sampling In cluster sampling, researchers divide a population into smaller groups known as clusters. They then randomly select among these clusters to form a sample. Cluster sampling is a method of probability sampling that is often used to study large populations, particularly those that are widely geographically dispersed. For example, an organization intends to survey to analyze the performance of smartphones across India. They can divide the entire country’s population into cities (clusters) and select cities with the highest population and also filter those using mobile devices Purpose: Reduces costs and increases efficiency. Example: Selecting entire classrooms in a school for a survey. Non-Probability Sampling Definition: Not every member has a known or equal chance of being selected. Types: Convenience Sampling Judgmental Sampling Quota Sampling Snowball Sampling Convenience Sampling Definition: Selecting individuals who are easiest to reach. Example: Surveying people at a mall. Judgmental Sampling Definition: Selecting individuals based on the researcher’s judgment. Example: Choosing experts in a field for a survey. Quota Sampling Definition: Ensuring certain characteristics are represented in the sample. Method: Setting quotas for specific subgroups. Example: Ensuring 50% male and 50% female respondents in a survey. Information Gain and Entropy Information gain and entropy are involved in many topics of machine learning such as decision tree and random forest. Entropy: Entropy is a measure of any sort of uncertainty that is present in data. Information gain: information gain indicates how much information a particular feature or a particular variable gives us about final outcomes Difference in Entropy before and after splitting dataset on Attribute A For example, you have to predict whether the match can be played or not, stating the weather conditions Now the value of the target variable will be decided, whether or not the game can be played. The No prediction means that the weather condition is not good and therefore you cannot play the game, and the Yes prediction means the game can be played, so the play has a value that is “yes” or “no”. Now to solve such a problem, we make use of a decision tree. Decision tree. Consider an inverted tree and each branch of the tree denotes some decision. Each branch is known as a branch node, and at each branch, we need to decide in such a manner that you can get an outcome at the end of the branch. Example Training set Day Outlook Humidity Wind Play 1 Sunny High Weak No 2 Sunny High Strong No predictor variables here 3 Cloudy High Weak Yes are : 4 Rain High Weak Yes outlook, humidity, and 5 Rain Normal Weak Yes wind 6 Rain Normal Strong No 7 Cloudy Normal Strong Yes 8 Sunny High Weak No target variable is playing. 9 Sunny Normal Weak Yes target variable 10 Rain Normal Weak Yes is the variable 11 Sunny Normal Strong Yes we need to 12 Cloudy High Strong Yes predict. 13 Cloudy Normal Weak Yes 14 Rain High Strong No Predict will John play 15 Rain High Weak ? tennis ? 14 observations, 9 observations result in a yes, meaning that out of 14 days, the match can be played on only 9 days. days 1, 2, 8, 9, and 11, the outlook has been sunny So basically we are trying to cluster dataset depending on the outlook. When it is sunny, we have two “Yes’s” and three “No’s”; when the outlook is overcast/cloudy, we have all four as Yes’s, meaning that on the 4days when the outlook was overcast, we can play the game. Now when it comes to rainy, we have three “Yes’s” and two “No’s”. -9/14= -0.6428 log 0.6428 /log 2=-0.6375 -0.6428 * -0.6375 =0.4097 Day Outlook Humidity Wind Play 1 Sunny High Weak No 2 Sunny High Strong No 3 Overcast High Weak Yes 4 Rain High Weak Yes 5 Rain Normal Weak Yes 6 Rain Normal Strong No 7 Overcast Normal Strong Yes 8 Sunny High Weak No 9 Sunny Normal Weak Yes 10 Rain Normal Weak Yes 11 Sunny Normal Strong Yes 12 Overcast High Strong Yes 13 Overcast Normal Weak Yes 14 Rain High Strong No 14 observations, 9 observations result in a yes, meaning that out of 14days, the match can be played on only 9 days. This means that the root node should have the most significant variable, The root node is the topmost node in a decision tree. The outlook node has three branches coming out of it, that is, sunny, overcast, and rainy. So basically outlook can have three values that are sunny, overcast, and rainy. These three values are assigned to the intermediate branch node and computed for the possibility of play equal to “yes”. For sunny and rainy branches if it is mix of yes and no, will give impure output. Overcast(cloudy) variable, it results in a 100% pure subject. This shows that the overcast(cloudy) variable will result in a definite and certain output. This is exactly what the entropy is used to measure. It calculates the impurity or the uncertainty. So the lesser the uncertainty or the entropy of a variable, the most significant is that variable. So when it comes to overcast that has no impurity in a dataset, it is a perfect pure subset Decision tree Prediction Predict will Day Outlook Humidity Wind Play John play tennis ? 15 Rain High Weak ? Yes Outlook ? Example 2 Example 3 Calculate the entropy of the entire dataset (S): The dataset contains 8 instances, with 4 instances of "Oak" and 4 instances of "Pine". Entropy of S, is given by: # Density Grain Hardness Class 1 Heavy Small Hard Oak 2 Heavy Large Hard Oak 3 Heavy Small Hard Oak 5 Light Large Hard Pine # Density Grain Hardness Class 4 Light Large Soft Oak 6 Heavy Small Soft Pine 7 Heavy Large Soft Pine 8 Heavy Small Soft Pine # Density Grain Hardness Class 1 Heavy Small Hard Oak 2 Heavy Large Hard Oak 3 Heavy Small Hard Oak 5 Light Large Hard Pine # Density Grain Hardness Class 1 Heavy Small Hard Oak 2 Heavy Large Hard Oak 3 Heavy Small Hard Oak 5 Light Large Hard Pine Confusion matrix The confusion matrix is a matrix that is often used to describe the performance of a model. This is used as classifier or classification models which calculate the accuracy and the performance of the classifier by comparing actual results and protected results. Assessing classification performance The performance of classifiers can be summarised by means of a table known as a contingency table or confusion matrix Each row refers to actual classes as recorded in the test set, and each column to classes as predicted by the classifier. For instance, the first row states that the test set contains 50 positives, 30 of which were correctly predicted and 20 incorrectly. The last column and the last row give the marginals (i.e., column and row sums). True Positive (TP) : Given class is spam and the classifier has been correctly predicted it as spam. False Negative (FN) : Given class is spam however, the classifier has been incorrectly predicted it as non-spam. False positive (FP) : Given class is non-spam however, the classifier has been incorrectly predicted it as spam. True Negative (TN) : Given class is non- spam and the classifier has been correctly predicted it as negative. Ex: Confusion matrix of email classification Classification problem has spam and non-spam classes and dataset contains 100 examples, 65 are Spams and 35 are non-spams. Consider that you give data about 165 patients, out of which 105 patients have a disease and the remaining 60 patients don’t have a disease From a contingency table we can calculate a range of performance indicators Accuracy Accuracy represents the number of correctly classified data instances over the total number of data instances. TP, TN, FN, FP Correctly classified positives and negatives are referred to as true positives and true negatives, respectively. Incorrectly classified positives are, perhaps somewhat confusingly, called false negatives; similarly, misclassified negatives are called false positives. There are 8 women pregnant and 8 women not We had 9 apples and 10 strawberries, but the pregnant, the model classified 6 pregnant (true model identified only 6 apples (true positive) positive) and 5 not pregnant (true negative) and 8 strawberries (true negative) correctly, women correctly, but 3 not pregnant (false moreover, the model predicted 2 positive) women as pregnant and 2 pregnant strawberries as apple (false positive) and 3 (false negative) women as not pregnant. apples (false negative) as strawberries. Probability Theory Probability is a mathematical method used for statistical analysis; therefore, we can say that probability and statistics are interconnected branches of mathematics that deal with analyzing the relative frequency of events. The probability makes use of statistics and statistics makes use of probability. Probability is the measure of how likely an event will occur. It is the ratio of the desired outcome to the total outcomes The probability of all outcomes always has a sum up to one. The probability will always have a sum up to one, and it cannot go decimals such as 0.52 or 0.55 or in the form of 0.50, 0.7, or 0.9, which is valuable as it always stays between zero and one Sample space for rolling die. while rolling die, we will get “6” possible outcomes, i.e. 1, 2, 3, 4, 5, 6 face of a face dice. Now each face have probability of getting 1/6. Similarly, the probability of getting 5 is 1/6. So the entire sum of the probability is 1. Terminology in probability theory A random experiment is an experiment or a process for which the outcomes cannot be predicted with certainty. We can use probability to predict the outcome with some sort of certainty. A sample space is the entire possible set of outcomes of a random experiment, and an event is one or more outcomes of an experiment. If you consider the example of rolling a dice, finding the probability of getting a 2 is the random experiment. The sample space is your entire possibility. That is, 1, 2, 3, 4, 5, and 6 faces are there, and out of those, you need to find the probability of getting a 2. All the possible outcomes will represent your sample space. So 1-6 are all possible outcomes, and this presents your sample space. The event is one or more outcomes of an experiment. In this case, the event is to get a 2 when we roll a dice. The probability of getting the event 2 in 6 face die means, getting 2 is the event in sample space is random experiment. Disjoint and non-disjoint events Disjoint events are events that do not have any common outcome; for example, if you draw a single card from a deck of cards, it cannot be exactly “king” and a “queen”, but it will be either “king” or “queen”. Non-disjoint events are events that have common outcomes; for example, a student who can get 100 marks in statistics and 100 marks in probability, Different types of events Probability Types There are three important types of probability; they are marginal, joint, and conditional probability. The probability of an event occurring unconditional on any other event is known as marginal probability or unconditional probability. The probability of an event occurring unconditional on any other event is known as marginal probability or unconditional probability Probability of individual random variables Joint probability The joint probability is a measure of two events happening at the same time conditional probability. If the probability of an event or an outcome is based on the occurrence of a previous event or an outcome, then it is called conditional probability. The conditional probability of an event is the probability that the event will occur given that an event A has already occurred. P(B|A) = P(A and B)/P(A). Probability Distribution Function (PDF) A Probability Distribution Function (PDF) describes how probabilities are distributed over the values of a random variable. It provides a way to visualize and understand the likelihood of different outcomes. Random Variable: A random variable is something that can take on different values, each with a certain probability. For example, when you roll a die, the outcome (1 through 6) is a random variable. Probability: This is a measure of how likely a particular outcome is. For example, the probability of rolling a 3 on a fair six-sided die is 1/6. Probability Distribution: This tells us how the probabilities are spread over all possible outcomes of the random variable. It shows which outcomes are more likely and which are less likely. Discrete Example: Rolling a die, where each outcome (1, 2, 3, 4, 5, 6) has a probability of 1/6. Continuous Example: Heights of people, where the normal distribution (bell curve) shows the probability of different heights. Understanding PDFs helps in predicting and analyzing the likelihood of various outcomes in different scenarios. This graph is also known as the bell curve, because of its shape Three main probability distribution functions, i.e., density function, normal distribution, and central limit theorem Probability density function (PDF) is concerned with the relative likelihood for a continuous random variable to take on a given value. So, the period gives the probability of a variable that lies between the range a and b. We are trying to find the probability of a continuous random variable over a specified range 1) probability density function (PDF) It displays the probability of a value or range of values within a given distribution. A PDF graph typically has a curve or a series of values plotted against the probability density, with the x-axis representing the possible outcomes and the y-axis representing the probability of each outcome. The graph helps identify the mean, median, mode, and other key statistics of the distribution. In probability theory, a probability density function (PDF) is used to define the random variable’s probability coming within a distinct range of values A function is said to be a probability density function if it represents a continuous probability distribution. The major difference between the probability density function (PDF) and the probability mass function (PMF) is that PDF is used to describe the continuous probability distribution whereas PMF is used to describe the discrete probability distribution. The two conditions for probability density function The probability density function is said to be valid if it obeys the following conditions: 1. f(x) should be non-negative for all values of the random variable. 2. The area underneath f(x) should be equal to 1. The three important properties about PDF Property 1 —The graph of a PDF will be continuous over a range, because you are finding the probability that a continuous variable lies between A and B. Property 2 —The area bounded by the curve of a density function and the x-axis is equal to 1. The area below the curve is equal to one because it denotes the probability; it has to be between 0 and 1. Property 3 —The probability that a random variable assumes a value between a and b is equal to the area under the PDF bounded by a and b. That means the probability value is denoted by the area of the graph, so whatever value that you get is one, which is the probability that a random variable will lie between a and b. It’s the probability of finding the value of a continuous random variable between a and b. 2) Normal distribution The normal distribution is a probability distribution that denotes the symmetric property of the mean. In normal distribution, the data near the mean occurs more frequently than the data away from the mean. The data around the mean represents the data of the entire dataset; if you just take a sample of data around the mean, it can represent the entire dataset. Similar to the PDF, the normal distribution appears as a bell curve. When it comes to the normal distribution, there are two important factors, the mean of the population and the standard deviation. The standard deviation finds the height of the graph, while the mean finds the location of the center of the graph. If the standard deviation is large, the curve is going to be short and wide, and if the standard deviation is small, the standard deviation is tall and narrow 3) Central Limit Theory If we had large population and we divide it into many samples, then the mean of all the samples from the population will be almost equal to the mean of the entire population. Meaning the each of the sample is normally distributed. So if you compared the mean of each of the sample, it will be almost be equal to mean of the population. The accuracy or the resemblance to the normal distribution depends on two main factors: The first is the number of sample points that you consider, and the second is the shape of the underlined population. Now, the shape depends on the standard deviation and the mean of a sample, so the central limit theorem states that each sample will be normally distributed in such a way that the mean of the sample will coincide with the mean of the actual population. That means central limit theorem states true value only for large data set. For large data set, there are more deviations as compared to small data because of scaling factor. Bayes’ Theorem The naive Bayes algorithm is precisely used in a supervised learning classification algorithm used for applications such as Gmails spam filtering. If you open up Gmail, you see that you have a folder called Spam. That is carried out through machine learning, and the algorithm used there is naive Bayes. The Bayes theorem is used to show the relation between conditional probability and its inverse. Mathematically, the naive Bayes is represented by the following equation: P(A|B)=P(B|A)* P(A)/P(B). P(A|B)=P(B|A)* P(A)/P(B). The term on the left-hand side is referred to as the likelihood ratio, which measures the probability of occurrence of event B given event A. The term on the left-hand side is known as the posterior, which means the probability of occurrence of event A given event B. The second term is referred to as the likelihood ratio. This measures the probability of occurrence of event B given that event A occurs. P(A) is also known as the prior that refers to the actual probability distribution of A and P(B) is the given probability of B. This is the Bayes theorem conditional probability P(B/A) means B value depends on A value, whenever we know A value then only we predict the B value conditional probability P(B/A) means B value depends on A value, whenever we know A value then only we predict the B value Example If A is Bonus and Lottery, then we say P(B/Bonus,Lottery), so based on bonus and lottery we predict the B value. This is called Posterior Probability Posterior Probability means probability of event after event B is true prior probability Probability of event before event B The theorem is stated mathematically as: P(A∣B)=P(B∣A)⋅P(A)/P(B) Where: P(A∣B) is the probability of event A occurring given that event B has occurred (posterior probability). P(B∣A) is the probability of event B occurring given that event A has occurred (likelihood). P(A) is the probability of event A occurring (prior probability). P(B) is the probability of event B occurring (marginal probability). Estimating the posterior probability of an event based on prior knowledge of evidences (Observation) Inferential Statistics The inferential statistics deals with forming inferences and predictions about a population based on a sample of data taken from the population The inference can be drawn from prediction sample using point estimation. Point estimation focuses on the sample data to measure a single value that serves as an approximate value of the best estimate of an unknown population parameter. For example, in order to calculate the mean of a huge population, draw the sample of the population and then find the sample mean. The sample mean is used to estimate the population mean. This is the point estimate The two main terms in point estimation are the estimator and the estimate. The estimate is the function of the sample that is used to find out the estimate, e.g., the sample mean. A function that calculates the sample mean is known as an estimator, and to realize the value of the estimator is the estimate. There are four common ways to find the estimates. The first one is the method of moments. Estimation Estimation is the process used to make inferences, from a sample, about an unknown population parameter. Height of students in the class. Pass percentage of past five years There are two types of estimates: Point Estimates— When the estimate is a single number, the estimate is called a "point estimate“ Confidence Interval Estimates — when the estimate is a range of scores, the estimate is called an interval estimate. Most well-known method to find the estimate is known as interval estimation. The estimate interval is the range of values that can be used to estimate a population parameter, i.e. estimating the value of a parameter that find means of the population. We are going to build a range and your value will lie in that range or in that interval. In this way, your output is going to be more accurate because you have not predicted the point estimation, but instead you have estimated an interval within which your value might occur. The interval estimate is more accurate as not focusing on a particular value or a particular point to predict the probability, instead the value lies between the limits of For example, if we say that it takes 30 minutes to reach the school, this is point estimation, but if we say that it will take between 50 minutes and an hour to reach the school, this is an example of interval estimation The estimation gives rise to two important statistical terminologies: One is known as confidence interval and the other is known as the margin of error. confidence interval is the measure of your confidence that the interval estimated contains the population parameter or the population mean or any of those parameters. The statistician uses the confidence interval to describe the amount of uncertainty related to the sample estimate of a population parameter. The greatest possible distance between the point estimate and the value of the parameter is known as margin of error for a given level of confidence that it is estimated. We can say that it is a deviation from the actual point estimate. The margin of error can be calculated using the following formula: where Zc is the critical value of the confidence interval and is multiplied by standard deviation divided by root of the sample size and n is the sample size. Estimate the confidence intervals. The steps that are involved in constructing a confidence interval are as follows: First, we start by identifying a sample statistic. This is a statistic that we will use to estimate a population parameter. This can be anything like the mean of the sample. The next step is to select a confidence level. The confidence level describes the uncertainty of a sampling method. Then, we find the margin of error. Finally, we will specify the confidence interval. A confidence interval (CI) is an interval of numbers believed to contain the parameter value. If the confidence level is selected as 95% means the critical value will be 100-95=5, which is 5% and the critical value for that is 1.96 In a given sample of 97 females having the SD of 64.9 and mean cholesterol of 261.75.Calculate the confidence interval of the mean cholesterol level of the female population with 95% confidence level provided A tree consists of hundreds of apples. 46 apples are randomly chosen. The mean and standard deviation of this instance is found to be 86 and 6.2 respectively. Determine whether the apples are big enough or not.