Descriptive Statistics: Data Collection

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following is an example of qualitative data?

  • Height of students
  • Emotions of people (correct)
  • Weight of construction materials
  • Number of cars in a parking lot

Which data level is considered the weakest data measurement technique?

  • Ratio
  • Nominal (correct)
  • Ordinal
  • Interval

Why is using a sample often preferred over a census in statistical studies?

  • Samples are more economical and time-saving (correct)
  • Samples eliminate the need for statistical analysis
  • Samples are more complex and scientific
  • Samples always provide more accurate results

Which of the following is a disadvantage of using questionnaires for data collection?

<p>Poor response rate (D)</p> Signup and view all the answers

In the context of data representation, what is the primary purpose of organizing collected data?

<p>To ensure it has meaning and can be interpreted (A)</p> Signup and view all the answers

In creating a frequency distribution table, what is the role of the formula $2^k \geq n$?

<p>To determine the number of classes (A)</p> Signup and view all the answers

What is the purpose of calculating the midpoint in a frequency distribution?

<p>To represent each class with a single value for further calculations (A)</p> Signup and view all the answers

What is a key advantage of using pie charts for data presentation?

<p>Ease of understanding and interpretation (A)</p> Signup and view all the answers

What is the primary purpose of scatter plots?

<p>Identifying relationships between two variables (C)</p> Signup and view all the answers

Which statistical package is highlighted as an essential tool for organizing and manipulating data?

<p>Microsoft Excel (A)</p> Signup and view all the answers

In statistics, what is the term for examining a small part of a group to infer conclusions about the entire group?

<p>Sample (A)</p> Signup and view all the answers

What is inductive statistics primarily concerned with?

<p>Determining conditions under which statistical inference is valid (C)</p> Signup and view all the answers

What distinguishes a continuous variable from a discrete variable?

<p>A continuous variable can theoretically assume any value between two given values. (B)</p> Signup and view all the answers

What is 'raw data' in the context of frequency distributions?

<p>Data that has been collected but not organized numerically (C)</p> Signup and view all the answers

How is the range of a dataset defined?

<p>The difference between the smallest and the largest values (D)</p> Signup and view all the answers

What does a smaller range indicate about the data?

<p>Less variability (D)</p> Signup and view all the answers

Why is squaring the differences from the mean important when calculating variance?

<p>To eliminate negative values and amplify the effect of large differences (A)</p> Signup and view all the answers

Which of the following is true of standard deviation, compared to variance?

<p>Standard deviation is generally more meaningful in most analyses (D)</p> Signup and view all the answers

When is the coefficient of variation particularly useful?

<p>When comparing the standard deviations of two different datasets with different units (D)</p> Signup and view all the answers

In a symmetric distribution, how do the mean, median, and mode relate to each other?

<p>Mean is equal to the median and mode (D)</p> Signup and view all the answers

Which of the following is true for a data set skewed to the right?

<p>Mean is greater than the mode (C)</p> Signup and view all the answers

If the coefficient of skewness is negative, what does this indicate about the distribution?

<p>The distribution is skewed to the left (C)</p> Signup and view all the answers

What is the median of the first half of a dataset equivalent to?

<p>The first quartile ($Q_1$) (A)</p> Signup and view all the answers

How is the $n^{th}$ percentile of a data set defined?

<p>The value at which n percent of the data is below it (B)</p> Signup and view all the answers

If a number we get when calculating a decile is not an integer, what should we do?

<p>Round up to the next integer (C)</p> Signup and view all the answers

In set theory, what does $A \cup B$ represent?

<p>The set of all elements belonging to A or B or both (D)</p> Signup and view all the answers

What does it mean if two sets, A and B, are disjoint?

<p>A and B contain no common elements (A)</p> Signup and view all the answers

In set theory, given a set A within a space S, what is represented by ¬A?

<p>The set of all elements in S that are not elements of A (A)</p> Signup and view all the answers

What is a sample space in probability?

<p>The set of all possible outcomes of a random event (B)</p> Signup and view all the answers

What is the range of values for the probability of any event?

<p>Between 0 and 1 (A)</p> Signup and view all the answers

If P(E) = 1, what does this imply about event E?

<p>E will occur certainly (D)</p> Signup and view all the answers

What is the formula for the probability of the complement of an event E?

<p>1 - P(E) (C)</p> Signup and view all the answers

What defines classical probability?

<p>Equally likely outcomes (A)</p> Signup and view all the answers

How is relative frequency probability determined?

<p>Dividing the number of times an event occurred by the total number of trials (B)</p> Signup and view all the answers

What is subjective probability based on?

<p>Personal opinions and experiences (A)</p> Signup and view all the answers

What does it mean for two events, E and F, to be independent?

<p>The occurrence of one has no influence on the probability of the other (B)</p> Signup and view all the answers

If events E and F are mutually exclusive, what is P(E|F)?

<p>0 (C)</p> Signup and view all the answers

What is a permutation?

<p>An ordered arrangement of all objects in a set (C)</p> Signup and view all the answers

In the context of combinations, what distinguishes it from permutations?

<p>Combinations disregard the order of items. (D)</p> Signup and view all the answers

Flashcards

Descriptive Statistics

Techniques to collect, organise and make sense of data.

Probability

Measures the degree of uncertainty.

Inferential Statistics

Making conclusions from a sample of large data.

Qualitative Data

Data that is descriptive in nature.

Signup and view all the flashcards

Quantitative Data

Data that is quantifiable and used for analysis.

Signup and view all the flashcards

Nominal Data Level

A data measurement technique with no particular order.

Signup and view all the flashcards

Ordinal Data Level

Variables ranked in order, indefinite differences.

Signup and view all the flashcards

Interval Data Level

Numerical variables with known equal intervals.

Signup and view all the flashcards

Ratio Data Level

Variables with measurable intervals.

Signup and view all the flashcards

Population in Statistics

All items of interest to a decision-maker.

Signup and view all the flashcards

Sample in Statistics

A subset of a population, randomly or methodically selected.

Signup and view all the flashcards

Census

Statistical measurement for the entire population.

Signup and view all the flashcards

Observations

Collecting data by observing subjects.

Signup and view all the flashcards

Personal Interviews

Gathering information by asking targeted questions.

Signup and view all the flashcards

Telephonic Interviews

A rapid, low-cost method of gathering data.

Signup and view all the flashcards

Questionnaires

A cheap method that reduces interviewer bias.

Signup and view all the flashcards

Data Representation

Organizing data so that it has meaning.

Signup and view all the flashcards

Frequency Distribution Table

A table that categorises data into classes.

Signup and view all the flashcards

Pie Chart

A way of representing data with a pie chart.

Signup and view all the flashcards

Bar Graphs

Uses bars to represent data along axes.

Signup and view all the flashcards

Histograms

Visual frequency distribution representation.

Signup and view all the flashcards

Time Series Plot

Plotting data measured over time.

Signup and view all the flashcards

Scatter Plots

Identifying relationships between 2 variables.

Signup and view all the flashcards

Mean

Looks for central position measures in data.

Signup and view all the flashcards

Mode

Most frequently appearing value in a dataset.

Signup and view all the flashcards

Median

Middle value when data is ordered.

Signup and view all the flashcards

Range

Difference between the highest and lowest values.

Signup and view all the flashcards

Variance

Dispersion of data points around their mean.

Signup and view all the flashcards

Standard Deviation

Square root of the variance.

Signup and view all the flashcards

Coefficient of Variation

Standard deviation divided by the mean.

Signup and view all the flashcards

Skewness

Describes symmetry of a data set.

Signup and view all the flashcards

Percentile

Value at which 'n' percent of the data is below it.

Signup and view all the flashcards

Quartiles

Divides data into four parts.

Signup and view all the flashcards

Decile

Divides the data into ten parts.

Signup and view all the flashcards

Set

Collection of objects possessing common properties.

Signup and view all the flashcards

Empty Set

Set containing no elements.

Signup and view all the flashcards

Subset

Set A within set B.

Signup and view all the flashcards

Space

Set containing all elements under consideration.

Signup and view all the flashcards

Complement

Elements NOT in subset A.

Signup and view all the flashcards

Union of A and B

Elements belonging to A, B or both.

Signup and view all the flashcards

Intersection of A and B

Elements common to both A and B.

Signup and view all the flashcards

Study Notes

  • Course notes for APPM1022A are intended to complement lectures and other course materials for Introductory Statistics for Construction.
  • Students should consult references for additional material and views.
  • This material is under development and feedback is appreciated.

Descriptive Statistics

  • Descriptive statistics, probability, and inferential statistics describe statistics.
  • Descriptive statistics uses techniques and measures for collecting, organizing, and making sense of data.
  • Descriptive statistics also involves turning data into meaningful information like graphs, charts, tables, and summary numerical measures.
  • Probability aids decision-makers in measuring uncertainty.
  • Inferential statistics involves drawing conclusions from a sample of large data.

Data Collection

  • The main statistical role is providing decision-making methods by obtaining and converting data into useful information.
  • Qualitative data provides descriptions, such as emotions or perceptions, and uses explorative methods to gain insights and motivations.
  • Quantitative data provides quantifiable data for mathematical calculation or statistical analysis.
  • Quantitative data answers questions such as how many, how often, or how much.

Data Levels

  • Variables can be defined using four levels: nominal, ordinal, interval, and ratio.
  • Nominal: Weakest data measurement technique; names variables without order (e.g., eye color).
  • Ordinal: Ranks variables in order without determining the difference between them (e.g., happy, unhappy)
  • Interval: Numerical variables with known, equal intervals (e.g., time).
  • Ratio: Variables with measurable intervals (e.g., weight).

Methods of Collecting Data

  • A population is a collection of all items of interest.
  • A sample is a subset of a population, chosen randomly or methodically
  • A sample must be representative of the population.
  • Population is denoted by N, sample is denoted by n.
  • Census: Measurement of the entire population, used by governments.
  • Using a sample instead of a census is more economical, time-saving, and scientifically sound.
  • Observations: Gathering data by observing people, requires trained observers.
  • Time and differing perceptions are major considerations in using observation as a data collection form

Personal and Telephonic Interviews

  • Interviews are cheaper than observations
  • Interviews can use closed-ended questions, which are statistically easier to analyze or open-ended questions, which provide a much better response.
  • Interviewer phrasing can influence answers, and sensitive topics may not get honest answers.
  • Telephonic interviews rapidly gather data at a lower cost than personal observations.
  • The disadvantages include the possibility of a reluctance to answer or easy call termination.

Questionnaires and Contemporary Methods

  • Questionnaires are the cheapest data collection method; they eliminate interviewer bias, but may have poor response rate.
  • Contemporary methods for data collection include: links, mentimeter, webcams, bluetooth, drones, social media monitoring, online tracking, GPS, wireless/web-based technologies, satellites, spacecrafts bar codes, handwaves, facial/voice recognition technologies and sensing technologies.
  • It is very important to consider challenges, errors, and costs when choosing collection methods.

Methods of Representing Data

  • Organizing collected data is important in order to give it meaning.
  • Data from a hardware store shows item sale prices can be organized through a frequency distribution table.
  • Categorize data into classes to create a frequency distribution table.
  • Determine the number of classes using formula 2^k >= n where n is the total number of data values and solving for k will give the number of classes.
  • Determine class width and the class width formula is class width = (largest value – smallest value) / number of values
  • Develop a frequency distribution by counting observations that belong to each class.
  • Determine the midpoint of each class with midpoint = (lower class limit + upper class limit) / 2.
  • Compute the cumulative frequency by summing the frequency of a class and all preceding classes.

Presenting Data

  • Pie charts are effective in representing data, particularly for budgetary allocations, due to their ease of understanding.
  • The total is subdivided, and the pieces are proportional to the amounts they represent.
  • Bar graphs, or rod diagrams, use vertical and horizontal bars to represent data along axes.
  • Each bar represents a single value, making bar charts straightforward and effective.
  • Histograms: Visual representation of frequency distribution where the number of observations per class are represented by the height of the bar
  • Time series plots represent data measured over time and are useful for identifying changes in variables.
  • Scatter plots identify relationships between two variables.

Charts

  • Ensure data isn't misrepresented through scaling differences or starting axes from non-zero values.

Computer Applications

  • Use statistical computer packages like Microsoft Excel, SPSS, or ANOVA
  • Microsoft Excel is a spreadsheet program used to organize, manipulate, and analyze all kinds of data.
  • Microsoft Excel is also versatile in presenting data visually and to draw diagrams.
  • It is important to both represent and understand data using computer applications
  • Media, politicians, and business misrepresent data to try and persuade the public, and get customers to buy their products.

Descriptive Statistics Introduction

  • Statistics involves methods for collecting, organizing, summarizing, presenting, and analyzing data to make reasonable decisions.
  • Collecting data on characteristics of a group may be impossible
  • Instead of examining the population, one can examine a sample.
  • If a sample represents a population, sample analysis could infer important conclusions about the larger population.
  • Inductive statistics or statistical inference deals with such conditions used to draw the conclusions, probability is also used in stating such conclusions.
  • Descriptive or deductive statistics seeks only to describe and analyse a given group without drawing any conclusions or inferences about a larger group

Important Statistical Concepts

  • A variable is a symbol like X that assumes values from its domain; a constant has only one value.
  • A continuous variable can assume any value between two points, whereas a discrete variable cannot

Discrete Variables

  • The number of children in a family that can assume the value 0,1,2..., but not and rational number is a discrete variable.
  • The age A of an individual which can be 50 years or 50.8 years depending on measurement accuracy, is a continuous variable.
  • The number of children in family is example of discrete data, with heights of students as an example of continuous data.
  • Measurements usually yield heights, whereas enumeration yields discrete data

Frequency Distributions

  • Raw data: Unorganized numerical data
  • Array: An arrangement of raw data in ascending or descending order
  • Range: The difference between the smallest and largest data values
  • Frequency distributions: Method of distributing the data into classes/categories to determine each classes frequency.
  • A tabular arrangement of data classes with their class frequencies is a frequency distribution or table.
  • The first category in a frequency table is called first class.

Ungrouped Data

  • Data is represented in a list.

Measures of Central Tendencies

  • Sample Mean: x = ( Σ Xi ) / n
  • Sample Mode: Most frequently appearing value in a dataset, can be multi-modal if there are multiple modes
  • Sample median: Arrange data values in ascending or descending order.
  • Sample median: If odd numbered, the median is the middle number, but if it is even, then the median is the average of the two middle values

Measures of Variability

  • Range: The range of a data set, R, is defined by formula
  • Range formula: R = Xmax - Xmin
  • A smaller range = less variability, larger range = opposite

Variance

  • Variance measures how much a set of data points are dispersed around their mean value.
  • Population variance (sigma squared) = sum of squared data differences between observed values and population mean, divided by the total number of observations.
  • By squaring, always get non-negative computations, which indicates data dispersion, and distance. Dispersion cannot be negative.
  • Squaring also amplifies the effect of large differences.
  • Sample variance (s squared) = the sum of squared differences between observed sample values and the sample mean, divided by the number of samples, minus one
  • Sample variance formula: s2 = Σ(Χ; −x)2 / n-1

Standard Deviation

  • The square root of variance and its result is known as standard deviation.
  • Standard deviation is more meaningful than variance
  • Sample standard deviation formula: s= √ Σ(Χ – x)2 / n-1

Coefficient of Variation

  • The coefficient of variation is the standard deviation divided by the mean.
  • Another name: relative standard deviation.
  • The population and sample formula can be calculated based on the availability.
  • Comparing standard deviations of 2 data sets is meaningless, comparing coefficients of variation of 2 data is not

Coefficient of Skewness

  • Data is subdivided into two by the median, and can be described as symmetric or asymmetric.
  • In symmetric distribution, right and left sides are minor images, the curve would be represented by bell-shaped curve
  • The data set not symmetrical is asymmeteric, and may be skewed.
  • Measuring skewness can be done by evaluating how the measures of central tendencies relate to one another.
  • If the mean, median, and mode match, the distribution is symmetrical. If they dont match, the skewness is in either the from left to the right.
  • The distribution has a long trail extending to the right, it is skewed to the right, and positively skewed.

Types of Data Sets

  • For a positively skewed data set:
    • Mean > Mode (Always)
    • Median > Mode (Always)
    • Mean > Median (Most of the time)
  • For a left-skewed distribution, the is less than than the median
  • Pearson's coefficient measures the strength and direction of skewness with statistics, standard deviation, mean and mode.
  • There coefficients of skewness are as follows:
  • α3= n / (n-1)(n-2)s3 Σ(Χ-x)3
  • α3 = x - mode / standard deviation
  • α3 = 3(x-median) / standard deviation

Skewness Implications

  • If Mean < Median < Mode, distribution negatively skews left, coefficients will be negative.
  • If the mean, median and mode values are equal, has to be a normal distribution and the coefficient is 0.
  • When having few data points, the median is favored as a measure of central tendencies.
  • Exercise 2.2.1 is an example of how to determine skewness of a data set

Measures of Position

  • In statistics, percentiles, quartiles and deciles analyse and give meaning to data
  • It shows where a value is compared to the rest of the data set.
  • Pk= k(n) / 100, with n indicating total number of a data set arranged in ascending order. It shouldnt be mistaken for precentage.
  • Quartiles split data (ascending order) to give different insights.

Permutations and Combinations - Basic Principle of Combinatorial Analysis

  • A combination is a selection of items or events, in which we disregard order
  • Combinatorial analysis includes events.
  • nPr = n(n-1)… (n − r + 1) =n! / (n-r)! is the general formula (4.10) to counting subsets of elements
  • Note that . nPn = n!.

Grouped Data Measures

  • Grouped data is arranged in a frequency table as such distinct values of x are shown in the first column, and frequency representing its value, f, indicates the amount of value x shown in the column. (section 2.3)
  • Mean of grouped data: ( Σ=1 Xi fi) / (Σ fin)
  • Standard deviation:2 = Σ(Χ; −x)2 / n-1

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser