LBOLYTIC NOTES.docx
Document Details
Uploaded by YouthfulGallium
Tags
Full Transcript
**INTRODUCTION TO STATISTICS** **STATISTICS** - A science that deals with the collection, organization, presentation, analysis and interpretation of data **PURPOSE OF STATISTICS** - To provide information - To provide comparisons - To help discern relationships - To aid in decision mak...
**INTRODUCTION TO STATISTICS** **STATISTICS** - A science that deals with the collection, organization, presentation, analysis and interpretation of data **PURPOSE OF STATISTICS** - To provide information - To provide comparisons - To help discern relationships - To aid in decision making - To estimate unknown quantities - To justify claims or assertions - To predict future outcomes **BRANCHES OF STATISTICS** **DESCRIPTIVE STATISTICS** - Consists of methods concerned with collection, organization, summarization and presentation of a set of data **INFERENTIAL STATISTICS** - Comprised of those methods concerned with making predictions or inferences about an entire population based on information provided by the sample **POPULATION & SAMPLE** **Population** - Consists of the totality of all the elements or entities from which you want to obtain an information **Sample** -- Subset of Population **CENSUS & SURVEY** **Census -** the process of collecting information from the population **Survey** - the process of collecting information from the sample **PARAMETER & STATISTICS** **Parameter -** a summary or numerical measure used to describe a population **Statistic** - a summary or numerical measure used to describe a sample **CONSTANT** - a characteristic or property of a population or sample which makes the members similar to each other. **VARIABLES** - any characteristic or information measurable or observable on every element of the population or sample **QUALITATIVE** (Categorical VARIABLES) - variables that indicate **what kind** of a given characteristic an individual, object, or event possesses. **QUANTITATIVE** (Numerical Variables) - variables that indicate **how much** a given characteristic an individual, object, or event possesses **TYPES OF QUANTITATIVE VARIABLES** **Discrete Variables** - variables whose values are obtained through the process of counting **Continuous Variables** - variables whose values are obtained through the process of measuring **Dependent** - a variable which is affected by another variable **Independent** - a variable which affects the dependent variable **SCALES OF MEASUREMENT OF VARIABLES** **Nominal** - Variables whose values are simply labels or names or categories without any explicit or implicit ordering of the labels\ - Lowest level of measurement known as categorical scale. **Ordinal** - Variables whose values are simply labels or names or categories with an implied ordering in these labels; \- Ranking can be done on the data \- Distance between two labels can not be determined. **Interval** - Variables whose values can be ordered and distance between any two labels are of known size; - Always numeric and have no true zero point. **Ratio** - Variables whose values have all the properties of the interval scale and the ratio of two values is meaningful - Has a true zero point; - Highest level of measurement **DATA PRESENTATION** **TEXTUAL -** Data are presented in paragraph form. It involves enumeration of important characteristics, giving emphasis on significant figures and identifying the important features of the data. **TABULAR -** Thus, we may present data using tables. **FREQUENCY DISTRIBUTION TABLE** - It is a tabular summary of data showing the frequency (or number) of items in each of several non-overlapping classes. **STEPS IN CONSTRUCTION FDT** ![](media/image2.png) **Class Size /Class Width** -- The difference between the upper (or lower) class limits of consecutive classes. All classes should have the same class width. **Lower Class Limit** -- The least value that can belong to a class. **Upper Class Limit** -- The greatest value that can belong to a class. **Class Boundaries (CB)**-- the numbers that separate classes without forming gaps between them. **Class Mark / Midpoint (CM)** -- the middle value of each data class. To find the class midpoint, average the upper and lower class limits. **Relative Frequency (RF)**-- obtained by dividing the frequency of the given class by the total number of observations. **PRESENTATION OF DATA (GRAPHICAL)** Types of Graphs: 1\] Pie chart/ circle graph -- any data MOST POPULAR 2\] Bar graph \- Bar chart \[with gaps between bars\] -- discrete data \- Histogram \[no gaps between bars\] -- continuous data 3\] Line graph \- Frequency polygon -- continuous data Ogive -- Base: Class Interval Height: Cumulative Frequency **SAMPLING TECNIQUES** **Population** - a set which includes all measurements of interest to the researcher **Sample** -- A subset of the population Why Sampling? - Impossible to study the whole population **TYPES OF SAMPLING** **Probability sampling -** Each member of the population is given equal chance or opportunity of being included in the sample.\ \ **Non-probability sampling -** Each member of the population does not have equal chance or opportunity of being included in the sample. **PROBABILITY VS NON PROBABILITY** **[Probability sampling]** 1. You have a complete sampling frame 2. You can select a random sample from your population 3. You can generalize your results from a random sample 4. Can be more expensive and time-consuming **[Non-Probability sampling]** 1. Used when there isn't an exhaustive population list available. 2. Not random 3. Can be effective when trying to generate ideas and getting feedback 4. More convenient and less costly **NON PROBABILITY SAMPLING** **Convenience Sampling -** The researcher uses subjects that are readily available or includes only people who are easy to reach. **Purposive sampling** - The researcher looks for predefined groups that will serve as samples **PROBABILITY SAMPLING** **Simple Random Sampling(SRS)** - All members of the population have a chance of being included in the sample. **Stratified Sampling -** This technique is use when the population can be subdivided into several smaller groups (or strata) and then SRS is applied to get samples from each stratum **Cluster Sampling -** This technique employs the use of cluster (groups) instead of individuals that are randomly chosen **Systematic Sampling** - It selects every *k*th member of the population with the starting point determine at random **SAMPLE SIZE (n) -** IN RESEARCH: THE MORE SAMPLES WE GET THE BETTER! The opinion of 1,000 people is always better than the opinion of 100 people! ![](media/image4.png) ![](media/image6.png) **HYPOTHESIS TESTING** **Hypothesis Testing** - an assumption about the population parameter - an educated guess about the population parameter **Hypotheses Testing:** This is the process of making an inference or generalization on population parameters based on the results of the study on samples. **Statistical Hypotheses:** It is a guess or prediction made by the researcher regarding the possible outcome of the study. TYPES OF STATISTICAL HYPOTHESIS **Null Hypothesis (Ho):** is always hoped to be rejected Always contains "=" sign **[Alternative Hypothesis (Ha):]** - Challenges Ho - Never contains "=" sign - Uses "\< or \> or ≠" - It generally represents the idea which the researcher wants to prove. Level of Significance, **α** and the Rejection Region **α = 0.05,** means the probability of being right is 95% and the probability of being wrong is 5%. **α = 0.01,** means the researcher is taking a 1% risk of being wrong and a 99% risk of being right. **TYPES OF HYPOTHESIS TESTS** 1. **One-tailed** **left directional test** this is used if Ha uses \< symbol 2. **One-tailed right directional test** this is used if Ha uses \> symbol 3. ![](media/image8.png)**Two-tailed test:** Non-directional this is used if Ha uses symbol ![](media/image10.png) **CRITERION:\ One-tailed test (right directional)** "Reject H0 if Zc ≥ Zt" **One-tailed test (left directional)** "Reject H0 if Zc ≤ Zt" **Two-tailed test (both sides)** "Reject H0 if Zc ≥ Zt" and "Reject H0 if Zc ≤ Zt" ![](media/image12.png)EXAMPLES: **FTEST (ANOVA)** - The F-test is a parametric test used to compare the means of two or more groups of independent samples. It is also known as the Analysis of Variance (ANOVA). - Three kinds of analysis of variance : - [One-way analysis of variance] -- only 1 variable involved - [Two-way analysis of variance] -- 2 variables involved, the column and the row variables. -- used to know if there are significant differences between and among columns and rows - [Three-way analysis of variance] -- 3 variables involved **Why use F-test?**\ - To find if there is a significant difference between and among the means of the two or more independent groups. **When to Use F-test?** - If there is normal distribution and when the level of measurement is expressed in interval or ratio data (like t- test & z-test) ![A black background with white text Description automatically generated](media/image14.png) Compute the following to construct the ANOVA table 1. **TSS** -- the total sum of squares minus CF, the correction factor 2. **BSS** -- the between sum of squares minus the CF 3. **WSS** -- within sum of squares or it is the difference between the TSS minus the BSS A screenshot of a computer Description automatically generated - The **Mean Squares Between (MSB)** is equal to *BSS/df* - The **Mean Squares Within (MSW)** is equal to *WSS/df* - To get the **F-computed value**, divide *MSB/MSW* - F-computed value must be compared with the F-tabular value at a given level of significance with the corresponding *df*s of BSS and WSS If **F-computed value \> F-tabular value**, *Disconfirm null hypothesis* in favor of the research hypothesis. This means there is a significant difference between and among the means of the different groups **PEARSON R CORRELATION** - An index of relationship between two variables - x = independent variable, y = dependent variable - the value of r ranges from -1, 0, + 1, if *r = +1 or -1, there is a perfect correlation*, if *r = 0, x and y are independent of each other.* - If the trend of the line graph is **going upward**, the value of r is positive. This indicates that as the value of x increases the value of y also increases, x and y being positively correlated. - If the trend of the line graph is **going downward**, the value of *r* is negative. It indicates that as the value of *x* increases the corresponding value of *y* decreases, *x* and *y* being negatively correlated. - If the trend of the **line graph cannot be established either upward or downward**, then *r = 0*, indicating that there is no correlation between the x and y variables **Why Use Pearson r?**\ - to analyze if a relationship exists between two variables. *If there is a relationship exists between the x and y, then we can determine the extent by which x influences y using the **[coefficient of determination]** which is equal to the square of r and multiplied by 100%*. This can answer or explain how much the independent variable influences the dependent variables or how much *y* depends on *x.* This is now the degree of relationship between the x and y which cannot be seen in other statistical tests of relationship. - a more powerful test of relationship compared with other nonparametric tests **When do we use r, the Pearson Product Moment Coefficient of Correlation?** - **The value of r ranges from +1 through zero -1. There is a perfect positive correlation of r = +1, likewise there is a negative perfect correlation if the value of r =-1. However if r = 0 then there is no correlation between the two variables x and y.** - **Positive correlation, as x increases y also increases or vice versa** - **Negative correlation, as x decreases y increases or vice versa** ![A math equations on a purple background](media/image16.png) A screenshot of a graph **SIMPLE LINEAR REGRESSION ANALYSIS** - predicts the value of y given the value of x. **WHEN TO USE?** when there is a relationship between x and y variables. the data should be normally distributed using the level of measurement which is expressed in an interval or ratio data. **WHY USE?** we are interested in predicting the value of y, the dependent variable. This is used for forecasting and prediction ![A screenshot of a math application Description automatically generated](media/image18.png) **MULTIPLE REGRESSION ANALYSIS** Multiple Regression Analysis is used to predict the dependent variable y given the independent variables x~s~. Aside from predictions we can also see relationship between the dependent variable and the different independent variables. **WHEN DO WE USE MRA?** - We use the MRA when predicting y dependent variable with 2 or more independent variables x~s~. - We want to know if there is a relationship that exists between dependent variable and among the independent variables. **Why do we use MRA?** The MRA is used because we want to know the extent of influence that the independent variables have on the dependent variable: [coefficient of determination] r\^2x100% and [correlation] if it is +/-. A math equations on a white background Description automatically generated **CHI SQUARE TEST** - This is a test of difference between the observed and expected frequencies. The chi-square is considered a unique test due to its 3 functions which are as follows: - Test of goodness-of-fit - Test of homogeneity (uniformity) - Test of independence (association) **Test of Goodness of Fit** ![A math equations on a white sheet Description automatically generated](media/image20.png) A table with numbers and numbers Description automatically generated **Test if Homogeneity (Uniformity)** This test is concerned with two or more samples, with only one criterion variable. It is used to determine if two or more populations are homogeneous. **When to use:** The test of homogeneity is used when we compare significant difference between 2 or more groups ![A math formula on a white background Description automatically generated](media/image22.png) A table with numbers and lines Description automatically generated **Test of Independence (Association)** Test of independence is different from the test of homogeneity. The sample used in this test consists of members randomly drawn from the same population. This test is used to look into whether the measures taken on the two criterion variables are either independent or associated with one in a given population using such variables as level of education and income, performance in class and IQ, etc. ![A math equations and formulas Description automatically generated with medium confidence](media/image24.png) A green and white chart with numbers and letters Description automatically generated ![A table with numbers and numbers Description automatically generated](media/image26.png)