6th Year Maths Statistics 1 Notes PDF

Summary

These are class notes for a 6th-year maths class on statistics, covering various topics, including data types, collecting data, and sampling techniques.

Full Transcript

6^th^ Year Maths Statistics 1 http://www.turnerandtownsend.com/1111/Rathgar-Dublin\_2110\_245EG0.jpg.img 2024/2025 Ms Thorp Student Name:\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ **[\ ]** **[Statistics]** 1. **[Types of Data]** **(b) Categorical Data** This is data which fits into a grou...

6^th^ Year Maths Statistics 1 http://www.turnerandtownsend.com/1111/Rathgar-Dublin\_2110\_245EG0.jpg.img 2024/2025 Ms Thorp Student Name:\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ **[\ ]** **[Statistics]** 1. **[Types of Data]** **(b) Categorical Data** This is data which fits into a group or category. It is generally describing words such as colour, favourite sport, country of birth etc. It does not have numbers. Categorical data can be either [nominal] or [ordinal]. ![](media/image3.png) \(c) **Numerical Data** This is data that can be counted or measured. It can be represented with a number. Numerical data can be either [discrete] or [continuous]. \(d) **Univariate and Bivariate Data** Univariate Data: This is when one item of data is collected on its own. For example, - Hours studied per week - distance from school - height Bivariate Data: This is data that contains 2 items of information. For example, - Hours studied per week and marks in exam - height and age - age of car and its price 2. **[Collecting Data]** \(a) Data can be collected in different ways: [Primary data] is collected by the person going to use it (survey, experiment etc). [Secondary data] is collected by somebody else (from internet, census, published survey etc). ![](media/image5.png) (b) Method 1: Experiment - carry out a scientific experiment to determine the effect of something. Method 2: Observation - monitor the behaviour of things like people, traffic, patterns in nature. Method 3: Questionnaire - an interview in which people are asked questions. This can be done in person, online or over the phone. - A carefully designed list of questions is used to gather information and opinions from people - it is important that the questions are: - clear about what info they want - short simple questions - provide response boxes where possible - we don\'t want 1000 different answers from 1000 different people - avoid leading questions - avoid personal questions A questionnaire with a box Description automatically generated ![A questionnaire with a question mark Description automatically generated](media/image7.png) A questionnaire with a question mark Description automatically generated \(c) [Control Group] If we wish to investigate a new drug, we will test its effect on people. So, we take a sample of people and divide them in two. Half the sample are given the drug and the other half are not - this half are given an inactive drug or placebo i.e. they think they are taking the real drug. We can then really see what effect the drug is having by comparing the two groups. The group that are not given the drug are called the control group. \(d) [Designed Experiments] Examples: - tossing a coin three times - throwing a dice five times - recording the side effects of a new drug In these experiments we have 1. a [control variable] (or explanatory variable) - this variable will be controlled 2. [response variable] - this is the variable we will use to measure the effects of the experiment 3. **[Sampling]** \(a) Definitions: **Sampling** is a way of getting information on the whole population. We don't usually want to survey everyone in the whole population, so we just survey a "sample". **Population -** the entire group that we are interested in **Census -** a survey of the entire population **Sample -** a smaller group selected from the population **Survey -** a survey of our sample group Example: I would like to do a survey on the opinions of all the students in the school on the new mobile phone pouches. I pick 50 students at random to ask. - Population = 800 students - sample = 50 students chosen ![A diagram of people in a circle Description automatically generated](media/image9.png) \(b) Bias in Sampling It is important that the sample is picked fairly and randomly. We do not want a biased sample. There are a number of ways a sample can be biased: 1. the sample may not represent the population, for example: - If you decide to conduct a survey on religion outside a church - or conduct a survey about fitness outside a gym - If you wanted opinions on the school canteen and only asked first years 2. failure to respond to a survey for example: - a number of people may not respond to your survey and this may skew your results. people who are very busy and may not answer the survey 3. dishonest answers to questions - you must try to design questions that will encourage people to give the honest answer **(c)** Types of Sampling: How can we pick the Sample 1. **Random Sampling**: Each person has an equal chance of being selected. Can be done by picking names out of a hat, or by assigning a number to each person and randomly picking numbers. 2. **Systematic Sampling**: Each person is given a fixed random number. A fixed, periodic interval is used to select the sample, such as selecting every 5th person in the list. 3. **Stratified Sampling**: The population is divided into subgroups (strata) based on a characteristic (e.g., age, sex, income), and samples are randomly taken from each subgroup, proportionally. 4. **Cluster Sampling**: The population is divided into clusters (groups), and some clusters are chosen randomly. Then all elements in the chosen clusters are sampled. 5. **Convenience Sampling**: Samples are chosen based on availability or ease of access, though this method can introduce bias. It is the lazy approach. You might pick the 1st 20 people you meet! 6. **Quota Sampling**: The population is divided into distinct groups based on characteristics the researcher is interested in. For example, men and women. The interviewer is told how many people are needed from each group and can decide for themselves who to select from within each quota. It does not involve randomization. 4. **[Measure of Location (The 3 M's)]** a. **Mean (Average)** - ∑x = the sum of the numbers - n = number of numbers x 1 2 3 ------ --- --- --- f(x) 4 1 2 marks 1-5 6-10 11-15 ----------- ----- ------ ------- frequency 4 8 3 b. **Mode** c. **Median** d. **Which Average to use....** -- -- -- -- -- -- No. of letters 3 4 5 6 7 ---------------- --- --- --- --- --- Frequency 3 4 9 5 2 i. Find the mode ii. Find the median e. **Related sets of Data** Mean Mode Notice anything... ------------- ------ ------ -------------------- 1,2,5,2,1,1 5. **Measure of Spread (Variability)** 1. Range Range=Maximum Value−Minimum Value Example 10: Find the range of the following data 3, 7, 8, 12, 15. Note: The range only looks at maximum and minimum values! 2. Interquartile Range - Q1 = First Quartile = Value ¼ of the way into the data set = ([\$\\frac{1}{4}{(n + 1)}\^{\\text{th}}\$]{.math.inline}) value - Q3 = Third Quartile = Value ¾ of the way into the data set = ([\$\\frac{3}{4}{(n + 1)}\^{\\text{th}}\$]{.math.inline}) value 3. **Standard Deviation,** [**σ**]{.math.inline} a. Standard Deviation shows the amount of variation there is from the mean. n = Number of data values [*μ*]{.math.inline} or [\$\\overline{x}\$]{.math.inline} = mean [*σ*]{.math.inline} = [\$\\sqrt{\\frac{\\sum\_{}\^{}{(x - \\ \\overline{x})}\^{2}}{n}}\$]{.math.inline} [\$\\sigma = \\ \\sqrt{\\frac{\\sum\_{}\^{}\\text{differences}\^{2}}{\\text{number\\ of\\ numbers}}}\$]{.math.inline} Example 13: Find the standard deviation of 2,3,4,7 (without a calculator) A screenshot of a calculator Description automatically generated Now try with a calculator: Mean Standard Deviation Notice anything... ------------- ------ -------------------- -------------------- 4,7,8,10,11 \(c) **Empirical Rule** The **Empirical Rule** is a statistical rule that applies to normal distributions (bell-shaped curves). It states that for a normally distributed data set: - **68%** of the data falls within **1 standard deviation** of the mean. - **95%** of the data falls within **2 standard deviations** of the mean. - **99.7%** of the data falls within **3 standard deviations** of the mean. **(d)** [**σ**]{.math.inline} **for Frequency Distribution** Example 16: Find the standard deviation for the following data x 1 2 3 4 5 6 ------ --- --- --- --- --- --- f(x) 9 9 6 4 7 3 - This can also be done on the calculator, by turning on the frequency - Follow instructions below and then try doing this example again, with calculator ![A screenshot of a computer Description automatically generated](media/image14.png) - If the frequency table is "grouped" -- use the mid-interval values. 6. **Stem and Leaf Diagrams (Stemplots)** - Shows lovely picture/shape of distribution - Shows original data - Good for small sets of data - Can compare 2 data sets with a back to back stemplot - Include a key to show how to read the stem and leaf combined - Rewrite with leaves in ascending order to complete diagram i. Draw a stem and leaf diagram for the following data 58,65,40,59,68,63,81,76,63,57 ii. Find the mode iii. Find the median iv. Find the range A **back-to-back stem-and-leaf plot** is useful when comparing two sets of data. Let\'s say we have two sets of scores from two different classes: Example 20: Class A: 43, 46, 51, 55, 62, 63, 66, 70, 73, 75, 78, 80, 85\ Class B: 41, 45, 52, 54, 61, 63, 65, 67, 71, 72, 74, 77, 81 Construct a stem and leaf plot, using a common stem! Stem and Leaf Questions: ![A math problem with numbers Description automatically generated with medium confidence](media/image112.png) ![A table with numbers and text Description automatically generated](media/image114.png) 7. **Percentiles** **Percentiles** tell you what percentage of the data falls below a certain value. For example, if a score is in the 75th percentile, it means 75% of the data is below that score. (It does not mean that you scored 75% in the test). Percentiles are very useful when dealing with large data sets. **Example:** In a test Michael scored **80**. The class had the following scores:\ 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 Find Michael's percentile: 1. **Count** the number of scores below 80:\ There are **5** scores below 80 (55, 60, 65, 70, 75). 2. **Total number of scores**: There are 10 scores in total. 3. The formula for percentile is: So, if you scored 80, you are in the **50th percentile**, meaning you scored higher than 50% of the class. 8. **Z-Scores (Standard Scores)** [\$z = \\frac{x - \\ \\mu}{\\sigma}\$]{.math.inline} x = score/value [*μ*]{.math.inline} = mean [*σ*]{.math.inline} = standard deviation Example 21: ----------------------------------------------------------------------------- Saoirse Libby \ \ [*μ*]{.math.display}\ [*σ*]{.math.display}\ --------- --------- ------- ------------------------ ------------------------ French 75 50 60 10 Science 65 40 50 5 ----------------------------------------------------------------------------- Calculate the z-score for both students, in each subject. Which grades was the best overall? Z-Score Questions ![A white paper with black text Description automatically generated](media/image162.png) A screenshot of a test Description automatically generated ![A screenshot of a paper Description automatically generated](media/image164.png) A paper with a graph and a line Description automatically generated 9. **Histograms** Marks 0-20 20-40 40-60 60-80 80-100 ---------------- ------ ------- ------- ------- -------- No of Students 8 21 8 10 24 i. Draw a histogram ii. What is the modal class iii. What interval does the median fall in iv. What is the value of the median mark 10. **Shapes of Distributions** - - - - b. Right-Skewed (Positively Skewed): - Shape: Tail is longer on the right side. - Most data points are clustered to the left, with a long tail on the right. - The mean is greater than the median. - Example: c. Left-Skewed (Negatively Skewed): - Shape: Tail is longer on the left side. - Most data points are clustered to the right, with a long tail on the left. - The mean is less than the median. - Example: d. Uniform Distribution - Shape: Rectangular or flat. - Every outcome has the same frequency. - Example: Rolling a fair die, where each outcome (1 to 6) has equal probability. - Key Features: No peak; constant probability. e. Bimodal Distribution - Shape: Two peaks. - Data has two distinct modes (peaks) or groupings. - Can get multi-modal f. Reverse J-Shape g. Standard Deviation and shapes of distributions 11. **The Normal Distribution** a. This distribution is the most widely used in statistics. Many natural phenomena follow a normal distribution, for example height, weight, blood pressure, and IQ scores. - Can be called a Bell-Shaped curve - Has an axis of symmetry - Mean=mode=median Draw 2 normal distributions on the axes below, with the same mean and different standard deviations Draw 2 normal distributions on the axes below, with different means and the same standard deviations b. Empirical Rule - 68% of values in a normal distribution fall within [ ± 1*σ*]{.math.inline} of the mean - 95% of values in a normal distribution fall within [ ± 2*σ*]{.math.inline} of the mean - 99.7% of values in a normal distribution fall within [ ± 3*σ*]{.math.inline} of the mean Example 23: 100 marks from a test are normally distributed, with a mean of 60 and standard deviation of 6. Find \(i) % of marks between 48-72 \(ii) % of marks between 60-72 \(iii) If 1000 people take the test, how many would get less than 54 marks Questions on Normal Distribution: ![A diagram of normal distribution Description automatically generated](media/image293.png) A paper with text and images Description automatically generated ![A white paper with black text Description automatically generated](media/image295.png) **12. The Standard Normal Distribution** \(a) This is the same as the normal distribution, except we use Z-scores on the x-axis instead of [*μ*]{.math.inline} and [*σ*]{.math.inline} Z-Score of 1 means you were 1[*σ*]{.math.inline} above the mean. \(b) The Area under a standard normal curve presents **probabilities** or **proportions** of data points within a certain range of values in a normally distributed data set. The area under the curve between two values represents the **probability** that a randomly selected data point will fall within that range. - The area between -1 and 1 represents roughly 68% of the total area, since the probability of finding a data point within 1 standard deviation of the mean is about **68%** and we know Area = Frequency. - The Area under the entire curve = total frequency = 100% P(under the curve) = 1 - The shaded area here represents P(z[ ≤ 2)]{.math.inline} - The shaded area here represents P(z[ ≤ 1.2)]{.math.inline} Note 2: The tables do not cover negative z-scores. Use the fact that the distribution is symmetric to help find these probabilities. ![](media/image342.png) A close up of text Description automatically generated ![](media/image344.png) **13. Scatter Plot/Diagram** \(a) A scatter plot/graph is used to plot 2 variables, one on the x-axis and one on the y-axis. The graph can then help to identify a pattern/relationship between the 2 variables. Example 30: Draw a scatter plot for the following data and comment on the relationship The closer they are to a straight line, the stronger the linear relationship. Example 31: ![A graph of a car Description automatically generated with medium confidence](media/image372.png) A paper with text on it Description automatically generated **14. Correlation** Correlation is a measure of the strength of a relationship between 2 variables (x and y). Watch out of outliers -- these can strengthen or hide a correlation! **15. Correlation Coefficient** This is a number, from -1 to 1, indicating the strength of the correlation/relationship. We use the letter r to represent correlation coefficient. The closer to -1 or 1 the number is, the closer to a straight line the points are. To find r on a calculator: Example 32: ![A graph chart with lines and dots Description automatically generated](media/image415.png) A graph chart with numbers and points Description automatically generated with medium confidence Example 33: ![A paper with a grid and a grid with text Description automatically generated with medium confidence](media/image437.png) **16. Causal Relationships and Correlation** If a change in one variable causes a change in another variable, there is a causal relationship between them. For example: Temperature and ice-cream sales Study time and grades achieved Age of car and resale price An example of a relationship that is NOT causal would be: Example 34: A close-up of a paper Description automatically generated **17. Line of Best Fit** A line drawn, as close as possible to the scatter points, with roughly the same number of points on either side, is called the line of best fit. The line of best fit shows the general trend of the relationship. It is usually drawn by eye and includes [\$\\left( \\overline{x},\\overline{y} \\right)\$]{.math.inline}. To form the equation of the line of best fit you will need - The slope of the line - one point on the line - and write in the form y=mx + c Example 35: ![A close-up of a paper Description automatically generated](media/image504.png) Example 36: A paper with a grid and numbers Description automatically generated

Use Quizgecko on...
Browser
Browser