Stat 111 Reviewer PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an introduction to sampling methods, covering probability sampling (e.g., simple random sampling, stratified sampling, systematic sampling, cluster sampling, multistage sampling) and non-probability sampling (e.g., haphazard/convenience sampling, judgment/purposive sampling, quota sampling). It also discusses how data is presented.
Full Transcript
**INTRODUCTION TO SAMPLING** **\*Sample Survey** is a method of systematically gathering information on a segment of the population, such as individuals, families, wildlife,farms, business firms, and unions of workers, for the purpose of inferring quantitative descriptors of the attributes of the p...
**INTRODUCTION TO SAMPLING** **\*Sample Survey** is a method of systematically gathering information on a segment of the population, such as individuals, families, wildlife,farms, business firms, and unions of workers, for the purpose of inferring quantitative descriptors of the attributes of the population. The fraction of the population being studied is called a sample.\ \ **\*Probability Sampling** If data are to be used to make decisions about a population, then how the data is collected is critical. For a sample data to provide reliable information about a population of interest, the sample must be representative of that population. Selecting samples from the population using chance allows the samples to be representative.\ \ **Probability Sampling** is a method of selecting a sample wherein each element in the population has a known, nonzero chance of being included in the sample; otherwise, it is non-probability sampling. **Note: *Probability samples are meant to ensure that the segment taken is representative of the entire population.****\ \ ***Basic Types:** **1.** Simple Random Sampling **2.** Stratified Sampling **3.** Systematic Sampling **4.** Cluster Sampling **5.** Multistage Sampling **1. Simple Random Sampling** a probability sampling method wherein all possible subsets consisting of ***n*** elements selected from the ***N*** elements of the population have the same chances of selection. **Types of SRS:** **a.** *[Simple Random Sampling Without Replacement (SRSWOR)]* a member of a population cannot be selected more than once **b**. *[Simple Random Sampling With Replacement (SRSWR)]* a member of the population can be selected more than once **2. Stratified Sampling** a probability sampling method wherein the selection of the first element is at random and the selection of the other elements in the sample is systematic by subsequently taking every ***kᵗʰ*** element from the random start, where ***k*** is the sampling interval. **\ 3. Systematic Sampling** a probability sampling method wherein we divide the population into non-overlapping sub-populations called strata, and then select one sample from each stratum. The sample consists of all the samples in the different strata. **4. Cluster Sampling** probability sampling method wherein we divide the population into on-overlapping groups called clusters, consisting of one or more elements, and then select a sample of cluster(s). The sample will consist of all the elements in the selected cluster(s)\ \ **5. Multistage Sampling** is a valuable tool for researchers and statisticians who need to sample large, geographically dispersed populations. By understanding the principles and advantages of this technique, you can effectively apply it to your own research projects. \***Non-probability Sampling** is a method where the selection of sample units is not based on a known probability. This means that every individual in the population does not have an equal chance of being selected. **Basic Types:** **1.** Haphazard/Convenience Sampling **2.** Judgment/Purposive Sampling **3.** Quota Sampling **1.** **Haphazard/Convenience Sampling** This involves selecting individuals who are readily available or easy to access.\ \ **2. Judgment/Purposive Sampling** This involves selecting individuals based on the researcher\'s judgment or belief that they are representative of the population. **3. Quota Sampling** This involves setting quotas for different categories of individuals (e.g., age, gender, race) and then selecting individuals to meet those quotas. **PRESENTATION OF DATA** After data collection and analysis, a researcher needs to present the results of his or her study based on the specific research objectives attained. The challenge here is how to present these results in such a way that will facilitate understanding and allow even those who are unfamiliar with the data to have an insight on what the researcher has done and what the researcher wants to impart. **\*Textual presentation** of data incorporates important figures in a paragraph of text. It is clear that we are dealing with a limited amount of information that we want our readers to know and be interested in. Through a series of statements, we rely on the power of words to share vital information. *In a textual presentation we perform the following tasks:* 1\. We choose not to present all the available figures but only the important ones. 2\. We impart the meaning or implications of the selected important figures/summary statistics. 3\. We facilitate understanding of data presented in tables and charts by clarifying and/or emphasizing important results and pointing out the implications. ***Important Notes:*** 1\. Do not start a sentence with a figure. *Incorrect: 50 households joined the project.* *Right: Fifty households joined the project.* 2\. Spell out figures from one to nine. Write figures 10 and above as Arabic numerals. *The community had more than 10 COVID-19 cases reported but only three of them were hospitalized.* 3\. Use figures for two or more numbers which are put together or juxtaposed as well as numbers in a series.\ *The vote on the motion was 91 to 9 Among the 30 respondents, 15 agree, 12 disagree, and 3 cannot decide* **\*Tabular presentation** of data arranges figures in a systematic manner in rows and columns. We use tables not only to describe the data at hand but also to compare and even to show relationships between two or more variables or characteristics of interest. Our tables can contain frequency distributions, proportions, percentages, and other summary measures such as totals and averages.\ Tables should be simple and easy to understand. In particular, we arrange our figures or summary measures in the rows and columns of our table which must have appropriate labels.\ We should decide on the most appropriate summary measures to include in the table. **Basic Types:** **1.** Leader Work **2.** Text Tabulation\ **3.** Formal Statistical Table **1. Leader Work** has the simplest layout as it contains no table title or column headings and has no table borders. This type of table is incorporated within a paragraph of text by presenting one or two columns of figures as supporting data in the textual presentation.\ \ **2. Text Tabulation** In contrast to the leader work, the text tabulation has already column headings and table borders, making it easier to understand than the leader work. However, we still do not have a table title and table number, just like the leader work. Thus, we still need to introduce this table to our readers to ensure that they can fully comprehend the information supplied.\ \ **3. Formal Statistical Table** The lacking parts of a table in the leader work and text tabulation are now present in the formal statistical table. It has all the essential parts of a table. Since it is a stand-alone table, it can be easily understood by the reader even without a descriptive text or introductory statement. **[Heading]** consists of the table number, title, and head note. It is located on top of the table of figures. *[**Table number**]* is the number that identifies the position of the table in a sequence. ***[Table title]*** states in telegraphic form the subject (what), data classification (how classified), and place (where) and period covered (when) by the figures in the table. ***[Head note]*** appears below the title but above the top cross rule of the table and provides additional information about the table. **[Box head]** consists of spanner heads and column heads. ***[Spanner head]*** is a caption or label describing two or more column heads. ***[Column head]*** is a label that describes the figures in a column. ***[Panel]*** is a set of column heads under the same spanner head. **[Stub]** consists of the row captions, center head, and stub head. It is located at the left side of the table. ***[Row caption]*** is a label that describes the figures in a row. ***[Center head]*** is a label describing a set of row captions. ***[Stub head]*** is a caption or label that describes all of the center heads and row captions. It is located at the first row. ***[Block]*** is a set of row captions under the same center head.\ \ ***[Field]*** is the collection of figures in the table. ***[Line]*** is a row of figures. ***[Column]*** is a column of figures. ***[Cell]*** contains the figure in the intersection of a row caption and a column heading.\ \ ***[Footnote]*** is a descriptive statement qualifying or explaining the information presented in, or omitted from specific cells, columns, or lines. It is located at the bottom of the table. ***[Specific footnote]*** -- "*keyed*" statement which qualifies, describes, or explains the information presented in a specific cell, line, or column ***[General footnote]*** -- a statement which qualifies the table as a whole; introduced by the word *"Note"* followed by a colon. ***[Source note]*** gives the name of the agency/institution/entity that collected the data and is also located at the bottom of the table.\ \ **Classification** ***[Quantitative]*** classification is used to compare groups formed through counting or measuring. ***[Qualitative]*** classification is used to compare the summarized data in the different categorical labels of a qualitative variable. ***[Chronological]*** classification is used to discover trends over time. ***[Geographical]*** classification is used to compare the summarized data in the different location, place, or any geographic subdivision. **\*Graphical presentation** of data portrays numerical figures or relationships among variables in pictorial form. Graphs can better capture in one glance the important features in a data set for as long as it is the most appropriate visual presentation. **Basic Types:**\ **1**. Line Chart **2**. Column Chart **3.** Horizontal Bar Chart **4.** Pie Chart **5.** Pictograph **6.** Statistical Map **Uses:** **1**. At the outset, we can use it to get the attention of our readers, especially those who are scared of numbers. **2**. It can exhibit possible associations among the variables, can facilitate the comparison of different groups, and reveal trends over time. **3.** It can be used to support the conclusions that we make in our study. **4.** It can be used to influence others to follow the recommendations that we make in our study. **1. Line Chart** The line chart is known to be the oldest, simplest, most familiar, and most widely used among the statistical charts. It uses the first quadrant of the coordinate system for the data presentation. We place the quantitative variable of interest on the vertical axis and the time unit on the horizontal axis. The line chart is primarily the choice for emphasizing movement rather than actual amount. **Types:** 1\. The ***[simple line chart]*** is used to show the movement of a time series (i.e., data collected at regular intervals of time) in a given time period. 2\. The ***[multiple line chart]*** is used to compare two or more time series on the same chart. **Good Practices** **1.** If the time unit is in months, the grouping of months may be by calendar or fiscal years. **2.** We position the scale figures between the ticks or grid lines on the horizontal axis. That is, we plot the point at the middle of the space of the unit of time. **3.** The ratio of the height of the vertical axis to the length of the horizontal axis should be 2:3 or 3:4 to convey an accurate picture. **2. Column Chart** the column chart or vertical bar chart is primarily used to compare numerical values of a given variable over a period of time. These values, either absolute or percent, are represented by the height of the column. Thus, the column chart emphasizes the differences in magnitude rather than the movement of the values across time. **Guidelines in constructing a column chart** 1. When we have time series data, we arrange the columns in chronological order, starting with the earliest date, along the horizontal axis. We position each column directly above the time label it represents. The height of the column corresponds to the value of the variable at that time label. 2. We need to ensure that our columns are just right---not too wide or too narrow. Graphical Presentation Guidelines in constructing a column chart 3. Like the line chart, the vertical scale should always start with zero. However, this time, it will not be possible to put a break on the vertical scale. 4. We can use horizontal grid lines to aid in the reading of the heights of the columns. 5. For a single time series, we should use only one color for all the columns. For two or more time series, we can use different colors, shadings, or patterns for the different series. ***[Simple Column Chart]*** The simple column chart is used if there is only one time series/variable studied across time and our objective is to show increase or decrease in amount or number of the variable of interest in a historical perspective. ***[Grouped Column Chart]*** The grouped column chart is used if there are two (2) or more time series to be compared.Graphical Presentation ***[Subdivided Column Chart]***This chart is used to show the component parts of a series of values, either a measure of total number/ count or percentage.\ Thus, each column is subdivided into two or more parts, depending on the number of variables being measured across time. ***[Net Deviation Column Chart]*** The purpose of using this chart is to show increases and decreases, gains and losses, and positive and negative numbers of one time series over a period of time. Thus, the vertical axis will show positive, zero, and negative values of the variable of interest. ***[Horizontal Bar Chart]*** is the simplest form of chart comparing data at a specified point in time (e.g., a particular year, school year, academic year, month, quarter, semester, and the like) in contrast to the vertical bar or column chart that compares data at different points in time. It is especially suited for comparing qualitative categories (e.g., degree programs, learning modalities, diseases, etc.).\ \ **Guidelines:**. The bars should not be too wide, too narrow, too long, or too short. Arranging the bars according to length, either in decreasing or increasing order, helps in the comparison. We should always choose appropriate colors or patterns for the bars. We should always show the zero point or start with zero in the x-axis. We always provide a scale label in the x-axis which should be short and easy to understand. We use vertical grid lines for horizontal bar charts since the scale figures are in the x-axis. We place the legend at the right side of the chart, center part, to guide the reader in identifying the subgroups being compared in each group/category. Graphical Presentation **4. Pie Chart** The pie chart is a circular diagram that is divided into sections to show the composition of a whole for a particular period. The size/area of each section indicates the proportion to the total of the corresponding component. Thus, a percentage distribution is presented covering completely all the components of a whole. The sum of the percentages representing the sizes of the sections must then be 100%. **5. Pictograph** the pictograph is a type of chart that makes use of picture symbols or images to convey the meaning of statistical information.\ The symbols or images that are used are pertinent to the data being presented. Thus, it is one of the easiest ways to represent statistical data. Graphical Presentation\ \ **6. Statistical Map** the statistical map is used to present geographical statistics. That is, this type of chart shows statistical data in geographic areas (e.g., barangays, cities, districts, municipalities, provinces, and countries) represented through their actual maps. This choice of chart is made whenever location or geographic distribution is of prime importance to fully comprehend and appreciate the significance of the data. **Types** ***[1. Shaded or Cross-Hatched Map]*** This type of map makes use of shading patterns that indicate the degree/extent of magnitude of the variable of interest in the areas covered. ***[2. Dot Map]*** This type of map gives either the location of an entity or numerical measurement of a variable in a certain geographic area. **FREQUENCY DISTRIBUTION** **Raw Data** ─are collected data that have not been organized numerically. **Array** ─is an arrangement of raw numerical data in descending or ascending order of magnitude. **Frequency** ─the number of times each particular characteristic occurs. **Frequency distribution** ─is a tabular arrangement of data by classes together with the corresponding class frequencies.\ \ **Two types of frequency distribution** **1.** **Simple frequency distribution** ─ contains the observed values with its corresponding frequencies **2. Grouped frequency distribution** ─ the condensed version of simple frequency distribution where the values are grouped into class intervals. **Terms:** **1. Class interval:** ex: 60 − 62; 63 − 65; 66 − 68 **2. Class frequency:** ex: 5, 18, 42\ **3**. **Class limits:** the lower limit (LL) and upper limit (UL) **4**. **Class mark (*Xἰ)***: the midpoint of the class interval or class limits, i.e, ***Xἰ*** = (LL + UL) ÷ 2 **5**. **Class boundaries**: the true class limits; removes discontinuity between classes **6**. **Class size/ class width**: the difference between two successive LL or UL or lower and upper class boundaries Rules for defining class boundaries a. For limits which are whole numbers ±0.5 ex: 𝐿𝐿𝐶𝐵 = 𝐿𝐿 − 0.5 and 𝑈𝐿𝐶𝐵 = 𝑈𝐿 + 0.5 b. For limits with one decimal place ±0.05 c. For limits with two decimal places ±0.005 **Construction of a Frequency Distribution** **with Equal Class Sizes** Steps in constructing a frequency distribution with equal class sizes: (using Example 1) **1.** Determine the range **(R)** 𝑅 = 𝐻𝑂𝑉 − 𝐿𝑂𝑉 = 90 − 25 = 65 **2**. Determine the number of classes **(𝑘)** ❖ Sturge's approximation of 𝑘\ 𝑛 =no. of observations 𝑘 = 1 + 3.3222(log(𝑛)), 𝑘 = 1 + 3.3222 𝑙𝑜𝑔30 𝑘 = 1 + 3.3222 1.477 = 5.91 ≈ 6 **3.** Determine the class size **(C)** 4. Determine the Lower Limit (LL) and Upper Limit (UL) of the first class interval → 𝐿𝐿 = 𝐿𝑂𝑉 → 𝑈𝐿 = 𝐿𝐿 + 𝐶 − 1 𝑈𝐿 = 𝐿𝐿 + 𝐶 − 0.1 5. Enumerate the class intervals **6**. Tally the observations to determine the class frequencies. Relative Frequency and Relative Frequency Percentage ** Relative Frequency** is the class frequency divided by the total number of observations. ** Relative Frequency Percentage** is relative frequency multiplied by 100 Construction of the Less Than and Greater Than Cumulative Frequencies The **Less Than Cumulative Frequency Distribution (\CFD)** shows the number of observations with values larger than or equal to the lower class boundary **HISTOGRAM AND FREQUENCY CURVE** Graphical Presentation of Frequency Distribution [5 general rules for constructing frequency charts] **1**. Label either class boundaries or class marks along the horizontal axis. **2**. The horizontal scales need only include the range of the observed value and one extra interval at each end, if possible. **3**. The vertical axis height should be approximately ¾ the length of the horizontal axis. **4**. The vertical scale must always include zero. **5**. Plot the frequency of each class along the vertical axis above the class mark of the corresponding class. *Note: Demonstrate frequency histogram and polygon.* **\*Frequency polygon** ▪ A graph formed by connecting the midpoints (class marks) of each class intervals using straight lines ▪ The endpoints are dropped to the X-axis to form a closed figure (polygon) ▪ The class marks are scaled on the X-axis and the frequencies on the Y-axis **Standard Types of Distribution** **1.** Symmetrical bell-shaped distribution **2.** Positively skewed distribution **3.** Negatively skewed distribution **4.** J-shaped distribution **5.** Reversed J-shaped distribution **6.** U-shaped distribution **SUMMATION NOTATION** ![](media/image2.png) where: 𝑖 = indicates the placement of the value 1 = lower limit of the summation 𝑁 = upper limit of the summation 𝑋𝑖 = summands Rules of Summation 1. ∑𝑋 → indicates the sum of 𝑋s ∑𝑌 → indicates the sum of 𝑌s **2.** XY → expression representing the product of variables 𝑥 and 𝑦 ∑𝑋𝑌 → sum the products of 𝑥 and 𝑦 **3.** 𝑋² → squared value of a score ∑𝑋² → sum the squared values **4**. X + Y → two variables x and y are added together ∑(X + Y) → sum the sums of x and y **5.** When a constant c is added to every value it is necessary to use parentheses ∑( X + c) **6**. If a constant 𝑐 is multiplied to every, the sum is represented by the expression ∑𝑐𝑋 **7**. If a constant 𝑐 is to be added 𝑛 times, the expression is **8**. If 𝑎 and 𝑏 are constants, then ![](media/image4.png)