Intro to Stats & Probability: Lectures 1 & 2

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In statistical terms, what distinguishes a 'stochastic' variable from a 'non-stochastic' variable?

  • Stochastic variables are always integers, while non-stochastic variables can be decimals.
  • Stochastic variables have an associated probability structure, while non-stochastic variables are deterministic. (correct)
  • Stochastic variables are deterministic, while non-stochastic variables are probabilistic.
  • Stochastic variables are used in statistics, while non-stochastic variables are used in probability.

How does the application of statistics aid decision-making processes?

  • Statistics ensures decisions are made quickly, regardless of available data.
  • Statistics eliminates the need for human judgment in decision-making.
  • Statistics guarantees decisions will always be correct.
  • Statistics provides methods for gathering and analyzing information to inform decision-makers. (correct)

What is the primary focus of theoretical or mathematical statistics?

  • Developing and proving statistical theorems, formulas, rules, and laws. (correct)
  • Collecting and summarizing data for presentation.
  • Applying statistical formulas to solve everyday business problems.
  • Communicating statistical information to a general audience.

Which scenario best illustrates the use of statistics in government?

<p>A district council uses statistics to determine recreational needs for area development. (C)</p>
Signup and view all the answers

How do descriptive and inferential statistics differ in their application?

<p>Descriptive statistics focus on organizing and summarizing data, while inferential statistics use sample results to make predictions about a population. (D)</p>
Signup and view all the answers

What is the key difference between a 'parameter' and a 'statistic' in statistical analysis?

<p>A parameter describes a population, while a statistic describes a sample. (A)</p>
Signup and view all the answers

When is a 'systematic sample' most likely to provide a representative sample of the population?

<p>When the population order is completely random. (C)</p>
Signup and view all the answers

Why do researchers use stratified sampling?

<p>to ensure specific characteristics are proportionally represented in the sample. (A)</p>
Signup and view all the answers

How does cluster sampling differ from stratified sampling?

<p>Cluster sampling is used when the population is spread geographically and it divides the sample into clusters that reflect the population; Under stratified sampling the population is divided into strata. (D)</p>
Signup and view all the answers

In quota sampling, what distinguishes proportional quota sampling from non-proportional quota sampling?

<p>Proportional quota sampling matches the sample proportions to the population, while non-proportional quota sampling does not require matching. (A)</p>
Signup and view all the answers

What is a key consideration when dividing a population into strata for quota sampling?

<p>Strata must be mutually exclusive, meaning units can only qualify for one subgroup. (D)</p>
Signup and view all the answers

How does convenience sampling differ from quota sampling?

<p>Convenience sampling is primarily guided by proximity or ease of access to the researcher, without knowing the characteristics of the units in advance. (D)</p>
Signup and view all the answers

What differentiates primary data from secondary data?

<p>Primary data is collected under the direct supervision of the person doing the study, while secondary data is originally collected by another source. (A)</p>
Signup and view all the answers

Which of the following variables would be considered quantitative?

<p>Temperature (B)</p>
Signup and view all the answers

How does the level of measurement impact statistical analysis?

<p>It determines the type of statistical analysis that can be applied. (B)</p>
Signup and view all the answers

Which of the following best describes 'nominal' level data?

<p>Data that are labeled with no specific order. (A)</p>
Signup and view all the answers

Which of the following is the best example for data on an ordinal scale?

<p>Listing shirt sizes as small, medium and large. (C)</p>
Signup and view all the answers

What process is referred to as 'Data Reduction'?

<p>The process of putting data in such a way that meaning can be made. (C)</p>
Signup and view all the answers

What is a key characteristic of classes in a frequency distribution?

<p>They should be mutually exclusive (i.e., non-overlapping). (C)</p>
Signup and view all the answers

What is the purpose of Sturge's rule in constructing frequency distributions?

<p>To determine the number of classes intervals. (B)</p>
Signup and view all the answers

Flashcards

Stochastic variables

Variables that have an associated probability structure.

Non-stochastic variables

Variables that are deterministic without a probability attachment.

Statistics (as numbers)

Numerical facts and figures that describe something.

Statistics (as a field)

The science of collecting, organizing, and analyzing data to draw inferences.

Signup and view all the flashcards

Theoretical statistics

Theories and formulas that form the foundation of statistics.

Signup and view all the flashcards

Applied statistics

Using statistical rules to solve real-world problems.

Signup and view all the flashcards

Descriptive statistics

Methods for organizing, presenting, and summarizing data.

Signup and view all the flashcards

Inferential statistics

Methods using samples to make predictions about a population.

Signup and view all the flashcards

Variable (in statistics)

A characteristic or condition that can change or take on different values.

Signup and view all the flashcards

Population

The entire collection of persons, places, or things of interest.

Signup and view all the flashcards

Sample

A set of individuals, places, or things selected from a population.

Signup and view all the flashcards

Parameter

A descriptive value for a population.

Signup and view all the flashcards

Statistic

A descriptive value for a sample.

Signup and view all the flashcards

Sampling error

The discrepancy between a sample statistic and population parameter.

Signup and view all the flashcards

Census

Survey that includes all population members.

Signup and view all the flashcards

Sample survey

Survey collecting information from a portion of the population.

Signup and view all the flashcards

Simple Random sample

Each sample of the same size has equal chance of being selected

Signup and view all the flashcards

Systematic sampling (probability sampling)

Method to choose sample based on a regular interval.

Signup and view all the flashcards

Stratified sampling

Population divided into subgroups before sampling.

Signup and view all the flashcards

Cluster sampling

Samples from clusters.

Signup and view all the flashcards

Study Notes

  • Introduction to Statistics and Probability I is taught by Prof. Ezekiel N. N. Nortey, Department of Statistics and Actuarial Science, University of Ghana, January 2025
  • Lecture 1 covers the introduction to statistics and data collection methods
  • Lecture 2 covers descriptive statistics techniques

Introduction

  • Presents the concepts of statistics and probability.
  • There are two types of variables: non-stochastic (deterministic) and stochastic (random)
  • Stochastic variables are random variables possessing an associated probability structure.
  • For example, when tossing a die, the result is unknown in advance.
  • Non-stochastic variables are deterministic that do not have a probability attachment.
  • Examples include interest and annuity calculations based on fixed time periods, or the number of females in STAT 111 class.

Purpose and Outcome of Learning

  • Equips' students with basic statistical ideas, uses, and applications.
  • By the end of the lesson, students should be able to:
  • Define and explain what statistics is and its usefulness
  • Identify data types and their uses
  • Identify three key areas where statistics can be applied
  • Explain levels of measurement in statistics and its usefulness
  • Distinguish between samples and populations, census and sample surveys
  • Explain the difference between discrete and continuous variables
  • Evaluating stochastic variables requires the use of basic probability and statistical tools.
  • Focus is first given to statistical tools, and then to probability concepts.

Definition of Statistics

  • Decision-making is improved through the effective and meaningful use of available information.
  • Statistics' primary role provides decision-makers with methods for obtaining and analyzing information to help in making decisions.
  • Statistics answers long-range planning questions.
  • The word "statistics" has two meanings: In common usage, it refers to numerical facts and figures.
  • Numerical facts and figures include diastolic blood pressure, a student’s heart rate, or a university graduate's starting salary
  • Secondly it refers to the field and discipline of study; Statistics is a science concerned with:
  • Collection, organization, and summarization of data.
  • Drawing inferences about a body of data (numerical facts) when only part is observed.
  • Statistics comprise a set of methods and rules for organizing and interpreting observations.
  • Statistical procedures help ensure observations (data) are presented and interpreted accurately and informatively.

Theoretical and Applied Statistics

  • Statistics has two aspects: theoretical and applied.
  • Theoretical/mathematical statistics is concerned with the development, derivation, and proof of statistical theorems, formulas, rules, and laws.
  • Applied statistics involves applying theorems, formulas, rules, and laws to solve real-world problems.
  • Statistics helps with collecting, organizing, summarizing, analyzing, and communicating quantitative information.

Uses of Statistics

  • Statistical techniques are used extensively by managers in marketing, accounting, quality control, consumers, professional, sports people, hospital administrators, educators, politicians, physicians, etc.
  • Statistics as a subject is used for description, comparison, projection, prediction, and decision-making.
  • Statistics and Government: Governments need to correctly collect, process, and analyze statistical data on national output, earnings, expenditure, imports and exports, employment, population growth and decline, and health to make decisions
  • Statistics and the District Council: District councils need statistics on education, welfare, transport, infrastructure, and recreational needs to plan the area's development.
  • Statistics and Business: Firms need statistics on production sales, payrolls, capital expenditure, and depreciation to effectively make decisions and projections.
  • Statistics and Other Professions: Statistical analysis is used in practically every profession
  • Testing the efficiency of a production technique in industries
  • Testing the effectiveness of a new drug in medicine
  • Testing the effectiveness of fertilizer on a crop in agriculture
  • Analyzing the results of a drug rehabilitation program in sociology
  • Testing production design or package design that maximizes sales in business
  • Forecasting voting patterns in politics

Definition of Terminologies

  • Applied statistics is broadly divided into descriptive and inferential statistics.
  • Descriptive statistics comprises methods for organizing, presenting, and summarizing data using tables, graphs and summary measures.
  • Descriptive statistics presents data in an informative way.
  • Inferential statistics comprises methods that use sample results to help make decisions or predictions about a population.
  • Variables: A variable is a characteristic or condition that can change or take on different values for different elements. Examples are diastolic blood pressure, heart rate, or STAT 111 test score.
  • A population is the collection of all persons, places, or things of interest in a particular study.
  • Individual items of the population e.g. people, places, or things, are elements of the population; individual member is referred to as a subject when elements of the population are human beings.
  • A sample is a set of individuals, places, or things selected from a population to represent it in a study.
  • A descriptive value for a population is known as a parameter, whereas a descriptive value for a sample is called a statistic.
  • Sampling Error is the discrepancy between a sample statistic and its population parameter.
  • Census: survey that includes every member of the population
  • Sample Survey is a survey that collects information from only a portion of the population.

Methods of Collecting Samples from the Population

  • Simple random sample: Each sample of the same size has an equal chance of being selected
  • Systematic sample: Randomly select a starting point and take every nth piece of data from a population list
  • Stratified sample: Divide the population into groups called strata, then take a sample from each stratum.
  • Cluster sample: Divide the population into strata and then randomly select some of the strata; All members from the selected strata comprise the cluster sample.
  • Simple random sampling is used to make statistical inferences about a population; it helps ensure internal validity because randomization is the best method to reduce the impact of potential confounding variables.
  • A simple random sample with a large sample size has high external validity and represents the characteristics of the larger population.
  • Implementing simple random sampling can be challenging, which has some prerequisites:
  • A complete list of every population member (a complete frame).
  • An ability to contact or access each population member if they are to be selected.
  • Enough time and resources to collect the necessary sample size.
  • Simple random sampling is best used if there is a lot of time and resources, or if it is possible to easily sample a population.
  • Systematic sampling selects a sample based on regular intervals, rather than fully random selections; It can be used without a complete population list.
  • Stratified sampling is appropriate when specific characteristics need proportional representation within the sample, for example you split your population into strata (gender or race), and then randomly select from each of these subgroups.
  • Cluster sampling occurs when sampling the entire population is not feasible; the sample is divided into clusters that approximately represent the whole population, then a random selection of these clusters, for a random sample.
  • Quota sampling is a non-probability sampling method that relies on non-random selections with a predetermined number or proportion of units.
  • When conducting Quota Sampling:
  • Population needs to be divided into mutually exclusive subgroups (called strata)
  • Sample units are recruited until a specific quota is reached.
  • Selected units share specific characteristics, prior determined before forming strata.
  • The aim of quota sampling is to control the sample composition, where the design may:
  • Replicate the true makeup the population of interest
  • Consist of equal numbers of different types of respondents
  • Over-sample a particular type of respondent, even if population proportions differ.
  • Types of quota sampling:
  • Proportional
  • Non-proportional.
  • In proportional quota sampling, major characteristics of the population are represented by sample in regards to their proportion in population.
  • Proportional quota samples are often used in surveys and opinion polls, where the number of people to be surveyed is typically decided in advance.

Example

  • Summer travel intentions among residents in your city is being researched, and a 1,000 person sample will be drawn
  • To make the sample demographically representative, divide your sample into distinct subgroups (strata):
  • Gender Identity
  • Age
  • Working Status
  • Residential Location
  • Housing Situation
  • Strata are combined in a hierarchical structure.
  • The sample is first stratified by gender identity, then by age within gender identity, and then by employment status within age groups
  • When a quota is defined by more than one variable, it is called interlocking.
  • Information from the last census is used to determine the quota for each subgroup by selecting sample proportional to the population census.
  • Sampling is terminated upon reaching a number of respondents across combined strata in the same proportions as the population of the city.
  • Non-proportional quota sampling is less restrictive.
  • The minimum number of sampled units is specified, and doesn't require numbers to match the proportions in the population.
  • An example of when to use Non- proportional quota sampling is for brands that want to serve their customers with inclusive sizes.
  • An online focus group aims for equal representation of customers who choose sizes S through L, and size XL through 3X
  • The response can determine how to create products with the same ease of access for all customers.
  • Quota sampling does not require researchers to follow strict rules or random selection processes, but there are general guidelines to keep in mind; a quota sample can be drawn in three steps:
  • Divide the population into strata
  • Determine a quota for each stratum
  • Continue recruitment until the quota for each stratum is met
  • Step 1: Divide the population into strata
    • This requires identifying relevant strata or subgroups, where units can only qualify for one subgroup.
  • Step 2: Determine a quota for each stratum
  • The proportion of each stratum in the population is estimated.
  • This estimation uses existing records such as administrative data, or previous studies; otherwise, you're free to use your judgment concerning the units to choose from each subgroup.
  • Example: The difference between economics and education students regarding career goals gets examined.
  • The unit chosen is based on the number of students in each major, which should be included, and is proportional to the number of students in each program.
  • For example, if there are 2,000 students, made up of 800(40%) education and 1,200(60%) economics, the sample should be comprised of 40% education and 60% economics students.
  • If the sample size is 100 students, the sample should include a quota of 40 education and 60 economics students.
  • Keep in mind that you can divide your quota into further subcategories (40 education student includes a proportional number of undergraduate and graduate degrees). If the proportion is 50/50, you would choose 20 undergraduate and 20 graduate students.
  • Step 3: Continue recruiting until the quota for each stratum is met.
  • Continue recruiting units to take part in your research until you have reached your specific goals determined above.
  • Convenience is contrasted with quota sampling, and differentiating can be difficult since they're both non-probability sampling methods; the key differences between the two are:
  • Convenience is primarily guided by proximity or ease of access; in convenience sampling, characteristics are not known beforehand, so a representative sample is not possible.
  • Quota sampling is when characteristics of each unit are known in order to divide among the subgroups or strata and how many participants are needed from each stratum.
  • Quota sampling provides an assurance of diverse segments represented in the sample.
  • Non proportional quota sampling is similar to convenience sampling, because both methods use judgement selections by the researcher.
  • Data Types: Values (observations or measurements) that can be assumed by a variable, for example scores on a statistic test.
  • Statistical Data Sources: Broken down into two parts, primary and secondary.
  • Primary data: Data that is collected under the party doing the study
  • Secondary data: Data originally collected not under the supervision of the person or organization such as analyzing census data and/or analyzing data from print journals or from the internet

Qualitative vs Quantitative Variables

  • Value of a variable that comes about as a result of change represents a random variable.
  • Qualitative variables: placed into groups or categories, according to a characteristic or attribute.
  • If subjects are classified according to gender(male or female), then the variable and gender is qualitative
  • Quantitative Data: numerical in nature, and can be ordered and ranked; the variable “height” is numerical and can be related to peoples value and height.
  • Quantitative values can be age, weight, income, and body temperature.
  • Random quantitative variables can be either discrete or continuous.
  • Discrete variables can assume a countable number of values.
  • Continuous variables, can assume values corresponding to any points contained in one or more intervals.
  • In other words, continuous random variable has infinitely many values, and its values can be associated with measurements on a continuous scale without gaps or interruptions.
  • Example 1: Each examples as attribute (qualitative) or numerical (quantitative) variables.
  • The residence hall for each student in a statistics class.
  • The amount of gasoline pumped by the next 10 customers at the local Unimart.
  • The amount of radon in the basement of each of 25 homes in a new development.
  • The color of the baseball cap worn by each of 20 students.
  • The length of time to complete a mathematics homework assignment.
  • The state in which each truck is registered when stopped and inspected at a weigh station.
  • Example 2: Identify each of the following as examples of qualitative or numerical variables:
  • The temperature in Barrow, Alaska at 12:00 pm on any given day.
  • The make of automobile driven by each faculty member.
  • Whether or not a 6 volts lantern battery is defective.
  • The weight of a lead pencil.
  • The length of time billed for a long distance telephone call.
  • The brand of cereal children eat for breakfast.
  • The type of book taken out of the library by an adult.

Levels of Measurements

  • Quantitative definition of the variable attributes
  • Understanding scales of measurement allows you to see the different types of data that you can gather.
  • Required statistical analyzation
  • Four levels. Called scales of measurement: nominal, ordinal, interval, and ratio.
  • Nominal level: simply labelled with no specific order. religious affiliation, gender, hall of residence etc.
  • Ordinal level: kind of categorical data with a set order or scale. A student ranks her level of satisfaction from 1-5 of life on campus, ranking peoples opinion on a subject matter usually, 5,7 or 10 point Linkert scale. 1. Very satisfied, 2. Satisfied 3. Indifferent 4.Dissatisfied 5. Very dissatisfied.
  • another eg. Child(0-12yrs), Teenager(13-19yrs), Youth(20-35), Middle age(36-58), Old(59yrs+).
  • Interval data: classifies and orders a measurement is specifies that a distance between each interval scale is equivalent from low interval to high interval. eg 90 to 100 equivalent to 110 to 120.
  • Does not represent a true zero.
  • Can add, subtract and multiply but cannot find ratio between them.
  • The ratio can represent values below zero.
  • Measure below zero degree celsius such as -10 degrees
  • Ratio scale: represents a true zero. eg. height and weight, age, money
  • Can add, subtract, multiple and can find ratio
  • Example 3: Identify as nominal, ordinal, discrete, and continuous variables.
  • The length of time until a pain reliever begins to work.
  • The number of chocolate chips in a cookie.
  • The number of colors used in a statistics textbook.
  • The brand of refrigerator in a home.
  • The overall satisfaction rating of a new car.
  • The number of files on a computer's hard disk.
  • The pH level of the water in a swimming pool.
  • The number of staples in a stapler.
  • Microsoft Excel Terms
  • When you use Microsoft Excel, you place the data you have collected in worksheets.
  • The intersections of the columns and rows of worksheets form boxes called cells.
  • If you want to refer to a group of cells that forms a contiguous rectangular area, you can use a cell range.
  • Worksheets exist inside a workbook, a collection of worksheets and other types of sheets, including chart sheets that help visualize data.

Guidelines for designing an effective worksheet

  • Associate column cell ranges with variables.
  • Avoid skipping rows as you enter data; column cell ranges should never contain empty cells
  • Place variables on a worksheet separated from the worksheet containing statistical results
  • Allow the user to follow the chain of calculations from the starting data.
  • Create two copies of the worksheets: one optimized for the screen, and the other for the printer.

Lecture 1 - Summary

  • Introduces Key Definitions for:
  • Population and Sample & their relation.
  • Primary and Secondary date types & their relation.
  • Categorical date & their relation to Numerical data.
  • Examines Descriptive statistics to Inferential.
  • Reviews Data types and Measurement levels
  • Discusses Microsoft Excel terms and tips
  • Every statistical effort begins with collecting necessary data
  • Data management is a challenge to organizations
  • Raw natural data is meaningless.
  • Data Reduction: Process of putting data in such a way that meeting can be made
  • Data must be organized to determine significance of data

Data Reduction Techniques

  • Used to organize data
  • Frequency Tables
  • Cross tabulation for bivariate data (Contingency tables)
  • Stem and leaf display
  • Bar charts
  • Pie Charts
  • Scatter plots
  • Histograms
  • Box-and-whisker plots
  • Line graphs
  • Ogives and Frequency polygons
  • Organizing is raw data into separate classes with a count of elements in each class is known as Frequency Distribution.
  • Mutually Exclusive Classes
  • Frequency tables include normal frequency distribution, relative distribution and cumulative frequency.
  • Examples of ungrouped frequencies and grouped tables:
  • Ungroup: Each class in the table consists of a single value, then the distribution is called ungrouped with distributions frequency values.
  • Grouped data: If the class consists of a range of values, is then called grouped frequency table.
  • Relative frequency data: It is the ratio of each frequency to the test of the total frequency. The following formula represents it. RF = frequency-of-class-i / Total frequency When relative frequency is multiplied by 100, result is percentage frequency. Cumulative Frequency Distribution is a table obtained by all classes to a specific class.
  • Grouped vs Ungrouped Data: group data is given with individuals, the number of classes are intervals determine by Sturge’s rule where the equation equal I= 1 + 3.33log10^I
  • Class Width: is the range of number of classes, can be done as Class Width = Range / no of classes

Contingency Tables

  • A tabular mechanism with at least two rows and two columns used in statistics to represent categorical data in terms of frequency counts and is sometimes called a two-way frequency table
  • Precisely, an r x c contingency shows observed frequency with observed frequencies arranged into r rows and c columns.
  • The intersection of Row/Column is a cell Ex: a Row and Column table between Genter/Favorite food

Stem and Leaf Plots

  • Way to represent continuity and discrete data for sample sizes. Plot similar to bar or histogram, but contains more information.
  • 5-12 intervals of equal size which span observations.
  • Width, width of plots contains 0.2, to the power of 10. Stem is the consisting of first and last numbers in each data point.
  • Box Whisket Plots: graphical descriptions, drawing a box to lower quartine. each from and of the box maximum of observation. image of concentration and show how far values are from data. quick picture of data.
  • Pie chart vs bar chats: pie chart is useful when only categorical variable, chart bar is grapcial. Pie charts measure with frequency, bar carts are used in representation for numerical and categoriac variables. Also used for multiple. Horizontal asis spied by categorical.
  • Other charts: graphs, equical distribution, grouped into equal distribution, equcy distribution

Construction of Frequency Table

  • Some simple data: sort the date or decide in number of intervals, determine with intervals range. After, the interval to be stopping
  • After, list all the numbers of observation
  • The distribution groups data, absolute numbers, the distubtions should be on 4
  • Find what the difference is in Cumulative to Relative frequency Data: the basic of stem and deaf is dividing data,used in quantive data
  • Other charts: graphs, frequncy

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser