Lecture 6 (1) - Probability and Statistics - Benha University 2021 PDF

Document Details

BeautifulTonalism3025

Uploaded by BeautifulTonalism3025

Benha University

2021

Dr. Ahmed Hagag

Tags

probability and statistics descriptive statistics data analysis mathematics

Summary

These are lecture notes on probability and statistics, specifically covering descriptive statistics, population and samples. The document was created by Dr. Ahmed Hagag in 2021 and is from Benha University.

Full Transcript

BS111 Probability and Statistics Lecture 12 Dr. Ahmed Hagag Faculty of Computers and Artificial Intelligence Benha University 2021 Ch 4: Descriptive Statistics Data Description (1/2) Population...

BS111 Probability and Statistics Lecture 12 Dr. Ahmed Hagag Faculty of Computers and Artificial Intelligence Benha University 2021 Ch 4: Descriptive Statistics Data Description (1/2) Population and Sample in a Statistical Study. Types of Data. Representation of Data. Numerical Summaries of Quantitative Data. Measures of Centrality (Central Tendency/location) Measures of Variability (Dispersion/Spread) Measures of Shape. (Skewness, Kurtosis) Measures of Relative Position. (Percentiles, Quartiles, IQR, Coefficient of Variation). © Ahmed Hagag Probability and Statistics 2 Ch 4: Descriptive Statistics Data Description (2/2) Stem-and-Leaf Diagrams. Box-Whisker Plot (Box Plot). Time Sequence Plots. Scatter Diagrams. © Ahmed Hagag Probability and Statistics 3 Population and Sample (1/3) Definitions (1/9): A population is a collection of all elements that possess a characteristic of interest. Populations can be finite or infinite. A population where all the elements are easily countable may be considered as finite, and a population where all the elements are not easily countable as infinite. © Ahmed Hagag Probability and Statistics 4 Population and Sample (1/3) Definitions (2/9): A portion of a population selected for study is called a sample. Sample Population © Ahmed Hagag Probability and Statistics 5 Population and Sample (1/3) Definitions (3/9): First type is called biased samples. When one or more parts of the population are favored over others. Convenience Sample. Voluntary Responses Sample. © Ahmed Hagag Probability and Statistics 6 Population and Sample (1/3) Definitions (4/9): Convenience Sample: only include elements that are easy to reach from the population. © Ahmed Hagag Probability and Statistics 7 Population and Sample (1/3) Definitions (4/9): Convenience Sample: only include elements that are easy to reach from the population. Sample Population © Ahmed Hagag Probability and Statistics 8 Population and Sample (1/3) Definitions (5/9): Voluntary Responses Sample: consist of people that have chosen to include themselves (e.g., survey: people with a strong interest for the survey topic are the ones who are most likely to respond). © Ahmed Hagag Probability and Statistics 9 Population and Sample (1/3) Definitions (6/9): In order to avoid having bias, we want our sample to be random, called unbiased samples. Simple Random Sample (SRS). Stratified Random Sample. Clustering (Multistage) Random Sample. © Ahmed Hagag Probability and Statistics 10 Population and Sample (1/3) Definitions (7/9): Simple Random Sample (SRS): if each element of the population has the same chance of being included in the sample. © Ahmed Hagag Probability and Statistics 11 Population and Sample (1/3) Definitions (7/9): Simple Random Sample (SRS): if each element of the population has the same chance of being included in the sample. 𝑹𝑵 𝑹𝑵: Random Number 𝟐𝟎 𝑹𝑵𝟏𝟑 𝑹𝑵𝟏 𝑹𝑵𝟖 𝑹𝑵𝟓 𝑹𝑵𝟏𝟎 𝑹𝑵𝟏𝟓 Sample Population © Ahmed Hagag Probability and Statistics 12 Population and Sample (1/3) Definitions (8/9): Stratified Random Sample: elements of the population are divided into groups represent the similar types (called Stratum). Then, The SRS is taken within each stratum to select some items and combine the results to get the sample. © Ahmed Hagag Probability and Statistics 13 Population and Sample (1/3) Sample Definitions (8/9): Stratified Random Sample SRS Population Grouped Population © Ahmed Hagag Probability and Statistics 14 Population and Sample (1/3) Definitions (9/9): Clustering (Multistage) Random Sample: elements of the population are divided into different clusters. Then, the SRS is taken to select one cluster. After that, the SRS is taken again within the selected cluster to select some items. © Ahmed Hagag Probability and Statistics 15 Population and Sample (1/3) Sample Definitions (9/9): Clustering (Multistage) Random Sample SRS to select SRS the cluster Population Clustered Population © Ahmed Hagag Probability and Statistics 16 Population and Sample (2/3) Probability And Inferential Statistics: Probability Statistical Inference © Ahmed Hagag Probability and Statistics 17 Population and Sample (3/3) Statistics: Statistics concerns data; their collection, analysis, and interpretation. Descriptive statistics concerns the summarization of data. We have a dataset and we would like to describe the data set in multiple ways. Descriptive statistics consists of the collection, organization, summarization, and presentation of data. © Ahmed Hagag Probability and Statistics 18 Types of Data (1/6) Types of Data Quantitative Qualitative Logical Missing Other Types Represent Do not represent Data that should numerical numerical True/False be there but are quantities quantities (e.g., not gender, color,…) Discrete: take values in a finite or countably infinite set of numbers. Examples: counts, number of arrivals, or number of cars. Continuous: take values in an interval of numbers. Examples: height, weight, length, time. © Ahmed Hagag Probability and Statistics 19 Types of Data (2/6) Discrete Quantitative Data: Example: 12 people were asked about their own cars and the results were recorded as follows: 𝟐, 𝟎, 𝟒, 𝟐, 𝟐, 𝟑, 𝟐, 𝟐, 𝟒, 𝟐, 𝟐, 𝟐 Car(s) The measuring: numerical. © Ahmed Hagag Probability and Statistics 20 Types of Data (3/6) Continuous Quantitative Data: Example: 10 people were asked about their height and the results were recorded as follows: 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 cm The measuring: numerical. © Ahmed Hagag Probability and Statistics 21 Types of Data (4/6) Qualitative Data: Example: 11 people were asked about their gender and the results were recorded as follows: 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 The measuring: nominal. © Ahmed Hagag Probability and Statistics 22 Types of Data (5/6) Logical Data: The value is either TRUE or FALSE (note that equivalently you can use 1 = TRUE, 0 = FALSE). Example: Using R > x y y TRUE TRUE TRUE FALSE FALSE © Ahmed Hagag Probability and Statistics 23 Types of Data (6/6) Missing Data: Data that should be there but are not. R reserves the special symbol NA to representing missing data. Example: Using R > x y x + y 8 NA NA 6 9 > sum(x, na.rm = TRUE) 21 © Ahmed Hagag Probability and Statistics 24 Revision Questions Determine whether each statement is true or false: 1. A sample consists of all subjects that are being studies. ( ) 2. Descriptive statistics consists of the collection, organization, summarization, and presentation of data. ( ) 3. The number of absences per year that a worker has is an example a continuous variable. ( ) 4. The variable age is an example of discrete variable. ( ) 5. Data that can be classified according to color are measured as nominal. ( ) © Ahmed Hagag Probability and Statistics 25 Revision Questions Determine whether each statement is true or false: 1. A sample consists of all subjects that are being studies. (F) (population) 2. Descriptive statistics consists of the collection, organization, summarization, and presentation of data. (T) 3. The number of absences per year that a worker has is an example a continuous variable. (F) (discrete) 4. The variable age is an example of discrete variable. (F) (continuous) 5. Data that can be classified according to color are measured as nominal. (T) © Ahmed Hagag Probability and Statistics 26 Representation of Data (1/8) © Ahmed Hagag Probability and Statistics 27 Representation of Data (2/8) Example1: (1/6) 11 people were asked about their gender and the results were recorded as follows: 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 The type of this data raw is: Qualitative Data © Ahmed Hagag Probability and Statistics 28 Representation of Data (2/8) Example1: (2/6) 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 Using Frequency Table (Grouped Data) Called: Frequency distribution for the data. © Ahmed Hagag Probability and Statistics 29 Representation of Data (2/8) Example1: (2/6) 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 Using Frequency Table (Grouped Data) Called: Frequency distribution for the data. © Ahmed Hagag Probability and Statistics 30 Representation of Data (2/8) Example1: (2/6) 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 Using Frequency Table (Grouped Data) Called: Frequency distribution for the data. sample size © Ahmed Hagag Probability and Statistics 31 Representation of Data (2/8) Example1: (3/6) 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 Using Frequency Table (Grouped Data) Called: Frequency distribution for the data. © Ahmed Hagag Probability and Statistics 32 Representation of Data (2/8) Example1: (3/6) 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 Using Frequency Table (Grouped Data) Called: Frequency distribution for the data. © Ahmed Hagag Probability and Statistics 33 Representation of Data (2/8) Example1: (4/6) 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 Using Bar Graphs © Ahmed Hagag Probability and Statistics 34 Representation of Data (2/8) Example1: (5/6) 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 Using Bar Graphs © Ahmed Hagag Probability and Statistics 35 Representation of Data (2/8) Example1: (6/6) 𝑴, 𝑭, 𝑴, 𝑴, 𝑴, 𝑭, 𝑭, 𝑴, 𝑴, 𝑴, 𝑴 Using Pie Charts Angle of a slice = (Relative frequency of the given class) × 360 Ex. Angle of a slice for class 𝑀 = 8/11 * 360 = 261.81° © Ahmed Hagag Probability and Statistics 36 Representation of Data (3/8) Example2: (1/4) A study of 552 first-year college students asked about their preferences for online resources. One question asked them to pick their favorite. Here are the results The type of this data raw is: Qualitative Data © Ahmed Hagag Probability and Statistics 37 Representation of Data (3/8) Example2: (2/4) © Ahmed Hagag Probability and Statistics 38 Representation of Data (3/8) Example2: (3/4) Using Bar Graphs © Ahmed Hagag Probability and Statistics 39 Representation of Data (3/8) Example2: (4/4) Using Pie Charts Angle of a slice = (Percent%) × 3.6 Ex. Angle of a slice for class Wikipedia = 9.4* 3.6 = 33.84° © Ahmed Hagag Probability and Statistics 40 Representation of Data (4/8) Example3: (1/7) 12 people were asked about their own cars and the results were recorded as follows: 𝟐, 𝟎, 𝟒, 𝟐, 𝟐, 𝟑, 𝟐, 𝟐, 𝟒, 𝟐, 𝟐, 𝟐 Car(s) The type of this data raw is: Discrete Quantitative Data © Ahmed Hagag Probability and Statistics 41 Representation of Data (4/8) Example3: (2/7) 𝟐, 𝟎, 𝟒, 𝟐, 𝟐, 𝟑, 𝟐, 𝟐, 𝟒, 𝟐, 𝟐, 𝟐 Using Frequency Table (Grouped Data) Called: Frequency distribution for the data. © Ahmed Hagag Probability and Statistics 42 Representation of Data (4/8) Example3: (2/7) 𝟐, 𝟎, 𝟒, 𝟐, 𝟐, 𝟑, 𝟐, 𝟐, 𝟒, 𝟐, 𝟐, 𝟐 Using Frequency Table (Grouped Data) Called: Frequency distribution for the data. sample size © Ahmed Hagag Probability and Statistics 43 Representation of Data (4/8) Example3: (3/7) 𝟐, 𝟎, 𝟒, 𝟐, 𝟐, 𝟑, 𝟐, 𝟐, 𝟒, 𝟐, 𝟐, 𝟐 Using Frequency Table (Grouped Data) Called: Frequency distribution for the data. © Ahmed Hagag Probability and Statistics 44 Representation of Data (4/8) Example3: (4/7) 𝟐, 𝟎, 𝟒, 𝟐, 𝟐, 𝟑, 𝟐, 𝟐, 𝟒, 𝟐, 𝟐, 𝟐 Dot Plot A dot plot is one of the simplest graphs. 0 1 2 3 4 Cars © Ahmed Hagag Probability and Statistics 45 Representation of Data (4/8) Example3: (5/7) 𝟐, 𝟎, 𝟒, 𝟐, 𝟐, 𝟑, 𝟐, 𝟐, 𝟒, 𝟐, 𝟐, 𝟐 Using Bar Graphs © Ahmed Hagag Probability and Statistics 46 Representation of Data (4/8) Example3: (6/7) 𝟐, 𝟎, 𝟒, 𝟐, 𝟐, 𝟑, 𝟐, 𝟐, 𝟒, 𝟐, 𝟐, 𝟐 Using Bar Graphs © Ahmed Hagag Probability and Statistics 47 Representation of Data (4/8) Example3: (7/7) 𝟐, 𝟎, 𝟒, 𝟐, 𝟐, 𝟑, 𝟐, 𝟐, 𝟒, 𝟐, 𝟐, 𝟐 Using Pie Charts Angle of a slice = (Relative frequency of the given class) × 360 Ex. Angle of a slice for class 0 = 1/12 * 360 = 30° © Ahmed Hagag Probability and Statistics 48 Representation of Data (5/8) Example4: (1/2) The following data give the number of defective motors received in 20 different shipments: Construct a dot plot for these data. The type of this data raw is: Discrete Quantitative Data © Ahmed Hagag Probability and Statistics 49 Representation of Data (5/8) Example4: (2/2) Dot Plot © Ahmed Hagag Probability and Statistics 50 Representation of Data (6/8) Example5: (1/9) 10 people were asked about their height and the results were recorded as follows: 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 cm The type of this data raw is: Continuous Quantitative Data © Ahmed Hagag Probability and Statistics 51 Representation of Data (6/8) Example5: (2/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 Because the type of this data is continuous quantitative, we follow the following steps: 1. Find the range 𝑅 of the data that is defined as: Range = 𝑅 = largest data point − smallest data point © Ahmed Hagag Probability and Statistics 52 Representation of Data (6/8) Example5: (3/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 2. Divide the data set into an appropriate number of classes. The classes are also sometimes called intervals, categories, cells, or bins. The number of classes is 𝑘. There is no "best" number of classes. However, Sturges’s formula is often used, given by: Number of classes = 𝑘 = 1 + 3.322 log 𝑛 Or we can use a simple formula as follows: 𝑘= 𝑛 where 𝑛 is the total number of data points in a given data set. © Ahmed Hagag Probability and Statistics 53 Representation of Data (6/8) Example5: (4/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 3. Determine the width of classes as follows: Class width = Range / Number of Classes = 𝑅/𝑘 4. Finally, preparing the frequency distribution table is achieved by assigning each data point to an appropriate class. © Ahmed Hagag Probability and Statistics 54 Representation of Data (6/8) Example5: (5/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 1. Range = 𝑅 = 184 − 160 = 24 2. The number of classes 𝑘 = ⌊1 + 3.322 log 10 ⌋ = 4 3. Class width = 𝑅Τ𝑘 = 24Τ4 = 6 4. The four classes used to prepare the frequency distribution table are as follows: 160 − 166 , 166 − 172 , 172 − 178 , [178 − 184] © Ahmed Hagag Probability and Statistics 55 Representation of Data (6/8) Example5: (6/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 Using Frequency Table (Grouped Data) © Ahmed Hagag Probability and Statistics 56 Representation of Data (6/8) Example5: (6/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 Using Frequency Table (Grouped Data) © Ahmed Hagag Probability and Statistics 57 Representation of Data (6/8) Example5: (7/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 Histograms A histogram is a graphical tool consisting of bars placed side by side on a set of intervals (classes, bins, or cells) of equal width. The bars represent the frequency or relative frequency of classes. The height of each bar is proportional to the frequency or relative frequency of the corresponding class. © Ahmed Hagag Probability and Statistics 58 Representation of Data (6/8) Example5: (7/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 Histograms © Ahmed Hagag Probability and Statistics 59 Representation of Data (6/8) Example5: (8/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 Histograms © Ahmed Hagag Probability and Statistics 60 Representation of Data (6/8) Example5: (9/9) 𝟏𝟕𝟓, 𝟏𝟖𝟒, 𝟏𝟔𝟎, 𝟏𝟕𝟎, 𝟏𝟕𝟑, 𝟏𝟕𝟎, 𝟏𝟕𝟓, 𝟏𝟔𝟓, 𝟏𝟕𝟏, 𝟏𝟕𝟔 Example: Using R > heights hist(heights, breaks=seq(160, 190, by=6)) > install.packages(“HistogramTools”) > library(HistogramTools) > PlotRelativeFrequency(hist(heights, breaks=seq(160, 190, by=6))) © Ahmed Hagag Probability and Statistics 61 Representation of Data (7/8) Histograms: 2−0 For ∆𝑏 = 0.25 , 𝑘 = =8 0.25 © Ahmed Hagag Probability and Statistics 62 Representation of Data (7/8) Histograms: 2−0 For ∆𝑏 = 0.05 , 𝑘 = = 40 0.05 © Ahmed Hagag Probability and Statistics 63 Representation of Data (8/8) Histograms: Interval (0, 2] © Ahmed Hagag Probability and Statistics 64 Representation of Data (8/8) Histograms: Interval (2, 4] © Ahmed Hagag Probability and Statistics 65 Revision Questions (1/2) Given the following frequency table represent the colors represented in a sample: 1. What is the type of data? 2. What is the sample size? 3. What graph can use to represent these data? 4. What is the most frequency colored? © Ahmed Hagag Probability and Statistics 66 Revision Questions (1/2) Given the following frequency table represent the colors represented in a sample: 1. What is the type of data? (Qualitative Data) 2. What is the sample size? = 15 3. What graph can use to represent these data? (Bar/Pie Charts) 4. What is the most frequency colored? “Blue” © Ahmed Hagag Probability and Statistics 67 Revision Questions (2/2) Given the following table: 1. What is the type of data? ------ 2. Complete the table 3. The sample size is ------ 4. The table name is ------ 5. The class interval = ------ 6. The most frequency class = ------ 7. The midpoint for the second class = ------ 8. The name of graph represent this table is ------ © Ahmed Hagag Probability and Statistics 68 Revision Questions (2/2) Given the following table: 1. What is the type of data? (Continuous Quantitative Data) 2. Complete the table 3. The sample size = (30) 4. The table name is (Frequency Table) 5. The class interval = (5) 6. The most frequency class = [20 – 25) 7. The midpoint for the second class = (22.5) 8. The name of graph represent this table is (Histogram) © Ahmed Hagag Probability and Statistics 69 Video Lectures All Lectures: https://www.youtube.com/playlist?list=PLxIvc-MGOs6gW9SgkmoxE5w9vQkID1_r- Lecture #12: https://www.youtube.com/watch?v=9BPOkSrSOyc&list=PLxIvc- MGOs6gW9SgkmoxE5w9vQkID1_r-&index=14 © Ahmed Hagag Probability and Statistics 70 Dr. Ahmed Hagag [email protected]

Use Quizgecko on...
Browser
Browser