Chapter 1 - Introduction to Statistics PDF
Document Details
Uploaded by BeneficialDallas1114
Erasmus University Rotterdam
Tags
Summary
This document provides an introduction to statistics, outlining descriptive and inferential statistics, the nature of data and variables, and classifications of measurement scales. It touches on the importance of statistics in research.
Full Transcript
In blue or highlighted: what was said or emphasized by the lecturers Chapter 1 – Introduction to Statistics Statistics = “the study of how we describe and make inferences from data” (Sirkin); is a branch of mathematics used to summarize, analyze and interpret a...
In blue or highlighted: what was said or emphasized by the lecturers Chapter 1 – Introduction to Statistics Statistics = “the study of how we describe and make inferences from data” (Sirkin); is a branch of mathematics used to summarize, analyze and interpret a group of numbers or observations Two ways of evaluating information: o Descriptive statistics = applying statistics to organize, summarize and make sense of information. Are typically presented graphically, in tabular form or as summary statistics (single values) o Inferential statistics = applying statistics to interpret the meaning of information à to answer a question or make an actionable decision Mark Twain: “There are lies, damned lies and statistics” Data = are measurements or observations that are typically numeric. A datum is a single measurement or observation, usually referred to as a score or raw score Remembering ISSR: Variable is a measured property of each of the units of analysis (e.g.: age, GDP, household income, annual revenue) Descriptive statistics Typically used to quantify the behavior researchers measure Instead of listing each individual score or increase on an exam, we could summarize all scores by stating the average (mean), middle (median) or most common (mode) score among all individuals, which can be more meaningful Inferential statistics Is a conclusion reached on the basis of evidence and reasoning Allow researchers to infer or generalize observations made with samples to the larger population from which they were selected Inferential statistics are used to help the researcher infer how well statistics in a sample reflect parameters in a population Population parameter = a characteristic (usually numeric) that describes a population Sample statistic = a characteristic (usually numeric) that describes a sample The characteristics of interest are typically descriptive analysis Research methods and statistics Science = the study of phenomena, such as behavior, through strict observation, evaluation, interpretation, and theoretical explanation Research method (or scientific method) = a set of systematic techniques used to acquire, modify, and integrate knowledge concerning observable and measurable phenomena Scales of measurement Scales of measurement = rules for how the properties of numbers can change with different uses; imply that the extent to which a number is informative depends on how it was used or measured In blue or highlighted: what was said or emphasized by the lecturers Scales of measurement are characterized by three properties: 1. Order: does a larger number indicate a greater value than a smaller number? 2. Difference: does subtracting two numbers represent some meaningful value? 3. Ratio: does dividing (or taking the ratio of) two numbers represent some meaningful value? Nominal Ordinal Interval Ratio Order No Yes Yes Yes Difference No No Yes Yes Ratio No No No Yes S. S. Stevens = Harvard psychologist who coined the terms nominal, ordinal, interval and ratio Nominal scales = measurements in which a number is assigned to represent something or someone; are often data that have been collected o Group classifications, no meaningful ranking possible, numerical coding arbitrary o E.g.: A person’s sex, race, nationality, sexual orientation, hair and eye color, season of birth, marital status, or other demographic or personal information o Coding (converting a nominal or categorical variable to a numeric value) words with numeric values is useful when entering names of groups for a research study into statistical programs such as a SPSS because it can be easier to enter and analyze data when group names are entered as numbers, not words Ordinal scales = measurements that convey order or rank alone o E.g.: Finishing order in a competition, education level and rankings o Indicate that one value is greater than or less than another o Difference between ranks are unknown/not equal and have no meaning Interval scales = measurements that have no true zero and are distributed in equal units (equidistant) o True zero = when the value of 0 truly indicates nothing on a scale of measurement o E.g.: Rating scale, temperature scale in Celsius degrees (a temperature equal to zero does not mean that there is no temperature; it is just an arbitrary zero point) o Implication in not having a true zero = there is no outright value to indicate the absence of the phenomenon you are observing, so a zero proportion is not meaningful Ratio scales = measurements that have a true zero and are distributed in equal units; are the most informative scales of measurement o Counts and measures of length, height, weight and time o Order and differences are informative o It is meaningful to state that 60 pounds is twice as heavy as 30 pounds In blue or highlighted: what was said or emphasized by the lecturers We always first need to know the level of measurement in order to know which statistical techniques we may use for the given variable Nominal à ordinal à interval à ratio Qualitative variables à quantitative variables Types of variables for which data are measured Continuous variable = measured along a continuum at any place beyond the decimal point. Can be measured in fractional units Discrete variable = measured in whole units or categories that are not distributed along a continuum Quantitative variable = varies by amount; is measured numerically and is often collected by measuring or counting Qualitative variable = varies by class; is often represented as a label and describes nonnumeric aspects of phenomena à only discrete variables In blue or highlighted: what was said or emphasized by the lecturers Chapter 3 – Summarizing Data: Central Tendency Measures of central tendency = statistical measures for locating a single score that is most representative or descriptive of all scores in a distribution; are values at or near the center of a distribution o Although we lose some meaning anytime we reduce a set of data to a single score, statistical measures of central tendency ensure that the single score meaningfully represents a set of data o E.g.: mean, median, mode Population size = N Sample size = n Mean = arithmetic mean or average = balance point in a distribution; works for interval and ratio levels of measurement; its values shifts in a direction that balances a set of scores; it is the sum of (Σ) a set of scores (x) divided by the total number of scores summed, in either a sample (n) (sample mean) or a population (N) (population mean) o The mean can be misleading when a data set has an outlier because the mean will shift toward the value of that outliner. Outliers in a data set influence the value of the mean but not the median %& o Population mean: " = ' %& o Sample mean: ( = ) o Changing an existing score will change the mean à if you increase the value of an existing score the mean will increase; if you decrease the value of an existing score the mean will decrease o Adding a new score or removing an existing score will change the mean, unless that value equals the mean § if the new score added is less than the previous mean, the mean will decrease; if the new score added is greater than the previous mean, the mean will increase § deleting a score below the mean will increase the value of the mean; deleting a score above the mean will decrease the value of the mean § the only time that a change in a distribution of scores does not change the value of the mean is when the value that is added or removed is exactly equal to the mean o Adding, subtracting, multiplying or dividing each score in a distribution by a constant will cause the mean to change by that constant o Summing the differences of scores from their mean equals 0 § When the mean is subtracted from each score (x), then summed, the solution is always zero § Σ(+ − () = 0 o The sum of the squared (SS) differences of scores from their mean is minimal à the smallest possible positive number greater than 0 § Σ(+ − ()/ = 0121034 Weighted mean (Mw) = the combined mean of two or more groups of scores in which the number of scores in each group is disproportionate or unequal; can In blue or highlighted: what was said or emphasized by the lecturers be used to compute the mean for multiple groups of scores when the size of each group is unequal (when some samples have more scores than others) Σ(( × 2) (5 = Σn o The weighted mean is larger than the arithmetic mean when the larger sample scored higher Median = the middle value in a distribution of data listed in numeric order; works for ordinal, interval and ratio levels of measurement 2+1 089132 :;