Podcast
Questions and Answers
Which of the following is an example of qualitative data?
Which of the following is an example of qualitative data?
- Height of students
- Emotions of people (correct)
- Weight of construction materials
- Number of cars in a parking lot
Which data level is considered the weakest data measurement technique?
Which data level is considered the weakest data measurement technique?
- Ratio
- Nominal (correct)
- Ordinal
- Interval
Why is using a sample often preferred over a census in statistical studies?
Why is using a sample often preferred over a census in statistical studies?
- Samples are more economical and time-saving (correct)
- Samples eliminate the need for statistical analysis
- Samples are more complex and scientific
- Samples always provide more accurate results
Which of the following is a disadvantage of using questionnaires for data collection?
Which of the following is a disadvantage of using questionnaires for data collection?
In the context of data representation, what is the primary purpose of organizing collected data?
In the context of data representation, what is the primary purpose of organizing collected data?
In creating a frequency distribution table, what is the role of the formula $2^k \geq n$?
In creating a frequency distribution table, what is the role of the formula $2^k \geq n$?
What is the purpose of calculating the midpoint in a frequency distribution?
What is the purpose of calculating the midpoint in a frequency distribution?
What is a key advantage of using pie charts for data presentation?
What is a key advantage of using pie charts for data presentation?
What is the primary purpose of scatter plots?
What is the primary purpose of scatter plots?
Which statistical package is highlighted as an essential tool for organizing and manipulating data?
Which statistical package is highlighted as an essential tool for organizing and manipulating data?
In statistics, what is the term for examining a small part of a group to infer conclusions about the entire group?
In statistics, what is the term for examining a small part of a group to infer conclusions about the entire group?
What is inductive statistics primarily concerned with?
What is inductive statistics primarily concerned with?
What distinguishes a continuous variable from a discrete variable?
What distinguishes a continuous variable from a discrete variable?
What is 'raw data' in the context of frequency distributions?
What is 'raw data' in the context of frequency distributions?
How is the range of a dataset defined?
How is the range of a dataset defined?
What does a smaller range indicate about the data?
What does a smaller range indicate about the data?
Why is squaring the differences from the mean important when calculating variance?
Why is squaring the differences from the mean important when calculating variance?
Which of the following is true of standard deviation, compared to variance?
Which of the following is true of standard deviation, compared to variance?
When is the coefficient of variation particularly useful?
When is the coefficient of variation particularly useful?
In a symmetric distribution, how do the mean, median, and mode relate to each other?
In a symmetric distribution, how do the mean, median, and mode relate to each other?
Which of the following is true for a data set skewed to the right?
Which of the following is true for a data set skewed to the right?
If the coefficient of skewness is negative, what does this indicate about the distribution?
If the coefficient of skewness is negative, what does this indicate about the distribution?
What is the median of the first half of a dataset equivalent to?
What is the median of the first half of a dataset equivalent to?
How is the $n^{th}$ percentile of a data set defined?
How is the $n^{th}$ percentile of a data set defined?
If a number we get when calculating a decile is not an integer, what should we do?
If a number we get when calculating a decile is not an integer, what should we do?
In set theory, what does $A \cup B$ represent?
In set theory, what does $A \cup B$ represent?
What does it mean if two sets, A and B, are disjoint?
What does it mean if two sets, A and B, are disjoint?
In set theory, given a set A within a space S, what is represented by ¬A?
In set theory, given a set A within a space S, what is represented by ¬A?
What is a sample space in probability?
What is a sample space in probability?
What is the range of values for the probability of any event?
What is the range of values for the probability of any event?
If P(E) = 1, what does this imply about event E?
If P(E) = 1, what does this imply about event E?
What is the formula for the probability of the complement of an event E?
What is the formula for the probability of the complement of an event E?
What defines classical probability?
What defines classical probability?
How is relative frequency probability determined?
How is relative frequency probability determined?
What is subjective probability based on?
What is subjective probability based on?
What does it mean for two events, E and F, to be independent?
What does it mean for two events, E and F, to be independent?
If events E and F are mutually exclusive, what is P(E|F)?
If events E and F are mutually exclusive, what is P(E|F)?
What is a permutation?
What is a permutation?
In the context of combinations, what distinguishes it from permutations?
In the context of combinations, what distinguishes it from permutations?
Flashcards
Descriptive Statistics
Descriptive Statistics
Techniques to collect, organise and make sense of data.
Probability
Probability
Measures the degree of uncertainty.
Inferential Statistics
Inferential Statistics
Making conclusions from a sample of large data.
Qualitative Data
Qualitative Data
Signup and view all the flashcards
Quantitative Data
Quantitative Data
Signup and view all the flashcards
Nominal Data Level
Nominal Data Level
Signup and view all the flashcards
Ordinal Data Level
Ordinal Data Level
Signup and view all the flashcards
Interval Data Level
Interval Data Level
Signup and view all the flashcards
Ratio Data Level
Ratio Data Level
Signup and view all the flashcards
Population in Statistics
Population in Statistics
Signup and view all the flashcards
Sample in Statistics
Sample in Statistics
Signup and view all the flashcards
Census
Census
Signup and view all the flashcards
Observations
Observations
Signup and view all the flashcards
Personal Interviews
Personal Interviews
Signup and view all the flashcards
Telephonic Interviews
Telephonic Interviews
Signup and view all the flashcards
Questionnaires
Questionnaires
Signup and view all the flashcards
Data Representation
Data Representation
Signup and view all the flashcards
Frequency Distribution Table
Frequency Distribution Table
Signup and view all the flashcards
Pie Chart
Pie Chart
Signup and view all the flashcards
Bar Graphs
Bar Graphs
Signup and view all the flashcards
Histograms
Histograms
Signup and view all the flashcards
Time Series Plot
Time Series Plot
Signup and view all the flashcards
Scatter Plots
Scatter Plots
Signup and view all the flashcards
Mean
Mean
Signup and view all the flashcards
Mode
Mode
Signup and view all the flashcards
Median
Median
Signup and view all the flashcards
Range
Range
Signup and view all the flashcards
Variance
Variance
Signup and view all the flashcards
Standard Deviation
Standard Deviation
Signup and view all the flashcards
Coefficient of Variation
Coefficient of Variation
Signup and view all the flashcards
Skewness
Skewness
Signup and view all the flashcards
Percentile
Percentile
Signup and view all the flashcards
Quartiles
Quartiles
Signup and view all the flashcards
Decile
Decile
Signup and view all the flashcards
Set
Set
Signup and view all the flashcards
Empty Set
Empty Set
Signup and view all the flashcards
Subset
Subset
Signup and view all the flashcards
Space
Space
Signup and view all the flashcards
Complement
Complement
Signup and view all the flashcards
Union of A and B
Union of A and B
Signup and view all the flashcards
Intersection of A and B
Intersection of A and B
Signup and view all the flashcards
Study Notes
- Course notes for APPM1022A are intended to complement lectures and other course materials for Introductory Statistics for Construction.
- Students should consult references for additional material and views.
- This material is under development and feedback is appreciated.
Descriptive Statistics
- Descriptive statistics, probability, and inferential statistics describe statistics.
- Descriptive statistics uses techniques and measures for collecting, organizing, and making sense of data.
- Descriptive statistics also involves turning data into meaningful information like graphs, charts, tables, and summary numerical measures.
- Probability aids decision-makers in measuring uncertainty.
- Inferential statistics involves drawing conclusions from a sample of large data.
Data Collection
- The main statistical role is providing decision-making methods by obtaining and converting data into useful information.
- Qualitative data provides descriptions, such as emotions or perceptions, and uses explorative methods to gain insights and motivations.
- Quantitative data provides quantifiable data for mathematical calculation or statistical analysis.
- Quantitative data answers questions such as how many, how often, or how much.
Data Levels
- Variables can be defined using four levels: nominal, ordinal, interval, and ratio.
- Nominal: Weakest data measurement technique; names variables without order (e.g., eye color).
- Ordinal: Ranks variables in order without determining the difference between them (e.g., happy, unhappy)
- Interval: Numerical variables with known, equal intervals (e.g., time).
- Ratio: Variables with measurable intervals (e.g., weight).
Methods of Collecting Data
- A population is a collection of all items of interest.
- A sample is a subset of a population, chosen randomly or methodically
- A sample must be representative of the population.
- Population is denoted by N, sample is denoted by n.
- Census: Measurement of the entire population, used by governments.
- Using a sample instead of a census is more economical, time-saving, and scientifically sound.
- Observations: Gathering data by observing people, requires trained observers.
- Time and differing perceptions are major considerations in using observation as a data collection form
Personal and Telephonic Interviews
- Interviews are cheaper than observations
- Interviews can use closed-ended questions, which are statistically easier to analyze or open-ended questions, which provide a much better response.
- Interviewer phrasing can influence answers, and sensitive topics may not get honest answers.
- Telephonic interviews rapidly gather data at a lower cost than personal observations.
- The disadvantages include the possibility of a reluctance to answer or easy call termination.
Questionnaires and Contemporary Methods
- Questionnaires are the cheapest data collection method; they eliminate interviewer bias, but may have poor response rate.
- Contemporary methods for data collection include: links, mentimeter, webcams, bluetooth, drones, social media monitoring, online tracking, GPS, wireless/web-based technologies, satellites, spacecrafts bar codes, handwaves, facial/voice recognition technologies and sensing technologies.
- It is very important to consider challenges, errors, and costs when choosing collection methods.
Methods of Representing Data
- Organizing collected data is important in order to give it meaning.
- Data from a hardware store shows item sale prices can be organized through a frequency distribution table.
- Categorize data into classes to create a frequency distribution table.
- Determine the number of classes using formula 2^k >= n where n is the total number of data values and solving for k will give the number of classes.
- Determine class width and the class width formula is class width = (largest value – smallest value) / number of values
- Develop a frequency distribution by counting observations that belong to each class.
- Determine the midpoint of each class with midpoint = (lower class limit + upper class limit) / 2.
- Compute the cumulative frequency by summing the frequency of a class and all preceding classes.
Presenting Data
- Pie charts are effective in representing data, particularly for budgetary allocations, due to their ease of understanding.
- The total is subdivided, and the pieces are proportional to the amounts they represent.
- Bar graphs, or rod diagrams, use vertical and horizontal bars to represent data along axes.
- Each bar represents a single value, making bar charts straightforward and effective.
- Histograms: Visual representation of frequency distribution where the number of observations per class are represented by the height of the bar
- Time series plots represent data measured over time and are useful for identifying changes in variables.
- Scatter plots identify relationships between two variables.
Charts
- Ensure data isn't misrepresented through scaling differences or starting axes from non-zero values.
Computer Applications
- Use statistical computer packages like Microsoft Excel, SPSS, or ANOVA
- Microsoft Excel is a spreadsheet program used to organize, manipulate, and analyze all kinds of data.
- Microsoft Excel is also versatile in presenting data visually and to draw diagrams.
- It is important to both represent and understand data using computer applications
- Media, politicians, and business misrepresent data to try and persuade the public, and get customers to buy their products.
Descriptive Statistics Introduction
- Statistics involves methods for collecting, organizing, summarizing, presenting, and analyzing data to make reasonable decisions.
- Collecting data on characteristics of a group may be impossible
- Instead of examining the population, one can examine a sample.
- If a sample represents a population, sample analysis could infer important conclusions about the larger population.
- Inductive statistics or statistical inference deals with such conditions used to draw the conclusions, probability is also used in stating such conclusions.
- Descriptive or deductive statistics seeks only to describe and analyse a given group without drawing any conclusions or inferences about a larger group
Important Statistical Concepts
- A variable is a symbol like X that assumes values from its domain; a constant has only one value.
- A continuous variable can assume any value between two points, whereas a discrete variable cannot
Discrete Variables
- The number of children in a family that can assume the value 0,1,2..., but not and rational number is a discrete variable.
- The age A of an individual which can be 50 years or 50.8 years depending on measurement accuracy, is a continuous variable.
- The number of children in family is example of discrete data, with heights of students as an example of continuous data.
- Measurements usually yield heights, whereas enumeration yields discrete data
Frequency Distributions
- Raw data: Unorganized numerical data
- Array: An arrangement of raw data in ascending or descending order
- Range: The difference between the smallest and largest data values
- Frequency distributions: Method of distributing the data into classes/categories to determine each classes frequency.
- A tabular arrangement of data classes with their class frequencies is a frequency distribution or table.
- The first category in a frequency table is called first class.
Ungrouped Data
- Data is represented in a list.
Measures of Central Tendencies
- Sample Mean: x = ( Σ Xi ) / n
- Sample Mode: Most frequently appearing value in a dataset, can be multi-modal if there are multiple modes
- Sample median: Arrange data values in ascending or descending order.
- Sample median: If odd numbered, the median is the middle number, but if it is even, then the median is the average of the two middle values
Measures of Variability
- Range: The range of a data set, R, is defined by formula
- Range formula: R = Xmax - Xmin
- A smaller range = less variability, larger range = opposite
Variance
- Variance measures how much a set of data points are dispersed around their mean value.
- Population variance (sigma squared) = sum of squared data differences between observed values and population mean, divided by the total number of observations.
- By squaring, always get non-negative computations, which indicates data dispersion, and distance. Dispersion cannot be negative.
- Squaring also amplifies the effect of large differences.
- Sample variance (s squared) = the sum of squared differences between observed sample values and the sample mean, divided by the number of samples, minus one
- Sample variance formula: s2 = Σ(Χ; −x)2 / n-1
Standard Deviation
- The square root of variance and its result is known as standard deviation.
- Standard deviation is more meaningful than variance
- Sample standard deviation formula: s= √ Σ(Χ – x)2 / n-1
Coefficient of Variation
- The coefficient of variation is the standard deviation divided by the mean.
- Another name: relative standard deviation.
- The population and sample formula can be calculated based on the availability.
- Comparing standard deviations of 2 data sets is meaningless, comparing coefficients of variation of 2 data is not
Coefficient of Skewness
- Data is subdivided into two by the median, and can be described as symmetric or asymmetric.
- In symmetric distribution, right and left sides are minor images, the curve would be represented by bell-shaped curve
- The data set not symmetrical is asymmeteric, and may be skewed.
- Measuring skewness can be done by evaluating how the measures of central tendencies relate to one another.
- If the mean, median, and mode match, the distribution is symmetrical. If they dont match, the skewness is in either the from left to the right.
- The distribution has a long trail extending to the right, it is skewed to the right, and positively skewed.
Types of Data Sets
- For a positively skewed data set:
- Mean > Mode (Always)
- Median > Mode (Always)
- Mean > Median (Most of the time)
- For a left-skewed distribution, the is less than than the median
- Pearson's coefficient measures the strength and direction of skewness with statistics, standard deviation, mean and mode.
- There coefficients of skewness are as follows:
- α3= n / (n-1)(n-2)s3 Σ(Χ-x)3
- α3 = x - mode / standard deviation
- α3 = 3(x-median) / standard deviation
Skewness Implications
- If Mean < Median < Mode, distribution negatively skews left, coefficients will be negative.
- If the mean, median and mode values are equal, has to be a normal distribution and the coefficient is 0.
- When having few data points, the median is favored as a measure of central tendencies.
- Exercise 2.2.1 is an example of how to determine skewness of a data set
Measures of Position
- In statistics, percentiles, quartiles and deciles analyse and give meaning to data
- It shows where a value is compared to the rest of the data set.
- Pk= k(n) / 100, with n indicating total number of a data set arranged in ascending order. It shouldnt be mistaken for precentage.
- Quartiles split data (ascending order) to give different insights.
Permutations and Combinations - Basic Principle of Combinatorial Analysis
- A combination is a selection of items or events, in which we disregard order
- Combinatorial analysis includes events.
- nPr = n(n-1)… (n − r + 1) =n! / (n-r)! is the general formula (4.10) to counting subsets of elements
- Note that . nPn = n!.
Grouped Data Measures
- Grouped data is arranged in a frequency table as such distinct values of x are shown in the first column, and frequency representing its value, f, indicates the amount of value x shown in the column. (section 2.3)
- Mean of grouped data: ( Σ=1 Xi fi) / (Σ fin)
- Standard deviation:2 = Σ(Χ; −x)2 / n-1
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.