Intro to Statistics - Computer Science

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following best describes the primary purpose of descriptive statistics?

  • To collect, organize, summarize, and present data in a meaningful way. (correct)
  • To perform estimations and hypothesis tests on data.
  • To generalize findings from a sample to a larger population.
  • To determine relationships among different variables within a dataset.

A researcher wants to understand the average income of families in a specific city. They collect income data from a randomly selected group of 200 families. Which term best describes the numerical value obtained from all families in the city?

  • Variable
  • Parameter (correct)
  • Sample
  • Statistic

Which data collection method involves gathering information from all members of a statistical population, ensuring high accuracy, clarity, and detail?

  • Random sampling
  • Comprehensive survey (correct)
  • Stratified sampling
  • Systematic sampling

In stratified sampling, what is the primary criterion used to divide the statistical population into subgroups?

<p>Homogeneity within groups (D)</p> Signup and view all the answers

A researcher wants to study software developers in a large tech company. They randomly select 5 development teams and survey all members within those teams. Which sampling technique is being used?

<p>Cluster sampling (D)</p> Signup and view all the answers

A quality control engineer selects every 20th item from an assembly line for inspection. What type of sampling method is being employed?

<p>Systematic sampling (D)</p> Signup and view all the answers

When constructing a frequency distribution, what is the general guideline for determining the number of classes or intervals to use?

<p>Use between 5 and 15 classes. (A)</p> Signup and view all the answers

In a frequency distribution, how are class boundaries calculated to ensure there are no gaps between classes?

<p>By subtracting 0.5 from the lower class limit and adding 0.5 to the upper class limit. (A)</p> Signup and view all the answers

How is the class midpoint calculated in a frequency distribution?

<p>By adding the lower and upper class limits and dividing by 2. (B)</p> Signup and view all the answers

What does the cumulative frequency for a particular class in a frequency distribution represent?

<p>The number of data values less than or equal to the upper class boundary of that class. (B)</p> Signup and view all the answers

When creating a histogram, what determines the height of each vertical bar?

<p>The frequency of the class (C)</p> Signup and view all the answers

What is represented by the points connected by lines in a frequency polygon?

<p>Class midpoints and their corresponding frequencies (B)</p> Signup and view all the answers

In an ogive, which of the following values are plotted against the cumulative frequencies?

<p>Upper class boundaries (B)</p> Signup and view all the answers

In a symmetric distribution, what is the relationship between the mean, median, and mode?

<p>Mean = Median = Mode (C)</p> Signup and view all the answers

Which type of graph is most suitable for representing data that occurs over a specific period?

<p>Line graph (B)</p> Signup and view all the answers

Which graph is best for comparing two or more categories of two or more groups?

<p>Grouped bar chart (A)</p> Signup and view all the answers

What is the primary purpose of a stem-and-leaf diagram?

<p>To present data in a way that preserves individual data values while summarizing the overall distribution. (D)</p> Signup and view all the answers

A dataset consists of the following values: 2, 5, 8, 11, 14. Using the summation notation, express the sum of these values.

<p>$\Sigma x = 40$ (A)</p> Signup and view all the answers

Consider a dataset with values x₁, x₂, ..., xₙ, where the mean is denoted as x̄. Which of the following statements describes the sum of deviations from the mean?

<p>$\Sigma (x_i - \overline{x}) = 0$ (C)</p> Signup and view all the answers

What is the main characteristic of the mean?

<p>It is not usually one of the data values. (A)</p> Signup and view all the answers

Which of the following is a property of the arithmetic mean?

<p>The sum of deviations from the mean is zero. (A)</p> Signup and view all the answers

In what scenario is the weighted mean most appropriate?

<p>When different values in a dataset have different levels of importance or frequency. (B)</p> Signup and view all the answers

Which measure of central tendency represents the middle value in an ordered dataset?

<p>Median (D)</p> Signup and view all the answers

Which measure of central tendency is least affected by extreme values (outliers) in a dataset?

<p>Median (C)</p> Signup and view all the answers

What is a key limitation of using the median as a measure of central tendency?

<p>It is not suitable for further algebraic calculation. (B)</p> Signup and view all the answers

Which measure of central tendency identifies the value that occurs most frequently in a dataset?

<p>Mode (C)</p> Signup and view all the answers

A dataset has two values that occur with equal frequency and greater frequency than any other value. How is this dataset described concerning modes?

<p>Bimodal (B)</p> Signup and view all the answers

When is the mode is best used?

<p>When the extreme values are not known. (A)</p> Signup and view all the answers

In a positively skewed distribution, how do the mean, median, and mode typically relate?

<p>Mode &lt; Median &lt; Mean (B)</p> Signup and view all the answers

What is the main use of the geometric mean?

<p>To find the average of percentages, ratios, indexes, or growth rates. (B)</p> Signup and view all the answers

What is an example of the geometric mean use?

<p>To calculate the average percentage raise per year. (C)</p> Signup and view all the answers

What is a limitation of the geometric mean?

<p>It cannot be calculated, if any value of a series is zero. (C)</p> Signup and view all the answers

What type of data is each of the harmonic mean data?

<p>Values divided by the sum of the reciprocals of each value. (A)</p> Signup and view all the answers

What measures are used to determine the relative positioning of data values?

<p>Measures of position (A)</p> Signup and view all the answers

What is the unit measures of the quartile?

<p>Original unit of data (D)</p> Signup and view all the answers

What percentile corresponds to the median ($Q_2$)?

<p>$P_{50}$ (B)</p> Signup and view all the answers

How related are quartile and deciles?

<p>$Q_{k}$ = $D_{\frac{10k}{9}}$ (B)</p> Signup and view all the answers

Flashcards

What is a Population?

The collection of all individuals or items being studied.

What is a Limited Population?

A limited group with a countable number of individuals.

What is an Unlimited Population?

A group with an infinite number of individuals.

What is a Sample?

A part of the population from which information is collected.

Signup and view all the flashcards

What is Data?

Observations taken during a study; can be numerical or non-numerical.

Signup and view all the flashcards

What is a Parameter?

Numerical summary of all the data of a population.

Signup and view all the flashcards

What is a Statistic?

Numerical summary of sample data, used to make inferences.

Signup and view all the flashcards

What are Variables?

Characteristics that vary from one person or thing to another.

Signup and view all the flashcards

What are Qualitative Variables?

Variables that yield non-numerical data such as gender, hair color, etc.

Signup and view all the flashcards

What are Quantitative Variables?

Variables that yield numerical data such as height, weight, etc.

Signup and view all the flashcards

What is Random Sampling?

A method where every member of the population has an equal opportunity to be chosen.

Signup and view all the flashcards

What is Stratified Sampling?

Used when the population is divided into homogenous groups.

Signup and view all the flashcards

What is Cluster Sampling?

Population is divided into groups then subgroups, smallest subgroup is called a cluster.

Signup and view all the flashcards

What is Systematic Sampling?

Obtain samples by numbering each subject then selecting every 'kth' subject.

Signup and view all the flashcards

What is Frequency Distribution?

Organizing raw data into a table using classes and frequencies.

Signup and view all the flashcards

What is the Range (R)?

The difference between the highest and lowest values in raw data.

Signup and view all the flashcards

What are Histograms?

Graphs using contiguous vertical bars to represent frequencies.

Signup and view all the flashcards

What are Frequency Polygons?

Graphs that displays data by connecting points plotted for frequencies at class midpoints.

Signup and view all the flashcards

What is an Ogive?

Graph that represents cumulative frequencies for classes in a frequency distribution.

Signup and view all the flashcards

What is a Line Graph?

Type of chart used to representing data that occur over a specified period of time.

Signup and view all the flashcards

What are Bar Graphs?

Data is represented using vertical or horizontal bars.

Signup and view all the flashcards

What is a Pie Chart?

Circle divided into sections/wedges according to the percentage of frequencies.

Signup and view all the flashcards

What is a Stem-and-Leaf diagram?

Uses part of the data value as the stem and part as the leaf to form groups.

Signup and view all the flashcards

What is the Mean?

Sum of the values divided by the total number of values.

Signup and view all the flashcards

What is the Weighted Mean?

The type of mean that weighs the value in a non equally represented data.

Signup and view all the flashcards

What is the Median?

It is at the halfway point array after the data has been arranged in order.

Signup and view all the flashcards

What is the Mode?

The value that occurs most often in a data set.

Signup and view all the flashcards

What is Unimodal data?

Data set has only one value that occurs with the greatest frequency.

Signup and view all the flashcards

What is Multimodal data?

Data set has more than one value that occurs with greatest frequency.

Signup and view all the flashcards

What is a Geometric Mean?

The nth root of the product of n values.

Signup and view all the flashcards

What is Harmonic Mean?

Is defined as the numbers of values divided by the sum of each reciprocal value.

Signup and view all the flashcards

What are Quartiles?

Measure that divide distribution into four groups.

Signup and view all the flashcards

What are Deciles?

Measure that divide distribution into ten groups.

Signup and view all the flashcards

What are Percentiles?

Measure that divide distribution into 100 groups.

Signup and view all the flashcards

Study Notes

  • This text provides lecture notes on probability and statistics for Computer Science students at South Valley University in 2025.
  • Part I of the notes focuses on statistics.

Introduction to Statistics

  • Familiarity with probability and statistics comes from radio, newspapers, and magazines
  • Example statements include university attendance rates, average salaries, and COVID-19 infection probability
  • Statistics is defined as the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data

Branches of Statistics

  • Descriptive statistics involves the collection, organization, summarization, and presentation of data.
  • Methods of descriptive statistics include frequency distributions and measures of central tendency and dispersion.
  • Inferential statistics involves generalizing from samples to populations, estimations, hypothesis tests, determining relationships among variables, and making predictions.

Population and Sample

  • Population is the collection of all individuals or items being studied.
  • Limited population has a finite number of individuals. Example: students in a class, like the 329-Stat class.
  • Unlimited population has an infinite number of individuals that can be distinguished from each other. Example: the number of fish in the sea.
  • Sample is a part of the population from which information is collected.
  • Using a sample saves time and effort. Example: examining a sample of eggs or light bulbs.

Data Types

  • Data is a set of observations taken during a study.
  • Quantitative/Numerical data are data that can be expressed through numbers. Example: heights and weights
  • Qualitative/Non-numerical data can not be expressed through numbers. Example: skin color and gender

Parameters and Statistics

  • Parameter: A numerical value summarizing all the data of an entire population.
  • Example parameter: average monthly income of families in the Arabic World.
  • Statistic: A numerical value summarizing the sample data, which is used to make inferences about unknown parameters.
  • Example statistic: average monthly income of a sample of 100 families in the Arabic World.

Variables

  • Variables are characteristics that vary from one person or thing to another.
  • Qualitative variables yield non-numerical data. Example: gender, hair color, eye color.
  • Quantitative variables yield numerical data. Example: weight, height, IQ measurements.

Sources of Data Collection

  • Historical data is collected from archived records, like birth and death records
  • Second type of data is that collected from surveys, either from direct contact or indirect methods. Example: mail, email, phone

Data Collection/Sampling Techniques

  • Comprehensive survey involves collecting data from all elements of the statistical population. Results have high accuracy, clarity, detail, and reliability
  • Samplings involve using different methods, including random, stratified, cluster, and systematic sampling.

Random Sample

  • Every member of the population has an equal opportunity to be chosen for the sample.
  • In a sample of 15 out of 85, number each subject from 01 to 85, generate random numbers via calculator, and select those subjects.

Stratified Sampling

  • Applied when the statistical population is divided into homogeneous groups.
  • Formula:
    • Number of strata sample = (Total number of strata / Total number of population) x Number of sample to be selected.
  • College of Science example:
    • Total of 30 students from college of science are needed.
    • There are 130 students from life sciences, 110 from chemistry, 50 from mathematics, and 100 from physics
    • Total number of students = 390
    • Life science students number = (130/390) x 30 = 10
    • Chemistry students number = (110/390) x 30 = 8
    • Mathematics students number = (50/390) x 30 = 4
    • Physics students number = (100/390) x 30 = 8

Cluster Sampling

  • The population is divided into groups that are divided into subgroups, down to a smallest subgroup called a cluster.
  • Cluster sampling involves choosing from each cluster a simple random sample to get a cluster sample.
  • Example 1.2: Cluster sampling is the best method to study the opportunities for appointing King Khalid University students after graduation as there are college and department students.

Systematic Sampling

  • Researchers number each subject of the population, then select every kth subject.
  • Example 1.3: There are 2000 subjects and 50 are needed for a sample, so every 40th (k = 40) subject is selected (2000/50 = 40)
  • The first subject is chosen randomly from the first 40 numbered subjects.
  • All subsequent subjects are chosen in intervals of 40 subjects from the randomly chosen subject
  • For example if the 12th subject is the first selected, the sample consists of subjects 12, 52, 92 etc.

Chapter 2 Content

  • Organizing Data
  • Histograms, Frequency Polygons, and Ogives
  • Other Types of Graphs

Organizing data

  • Raw data when collected in original form
  • Frequency distribution makes it easier to deal with raw data by using classes and frequencies, denoted by 'f'.
    1. Find the lowest (L) and highest (H) values in the raw data.
    2. Calculate the Range (R) where R = H - L
    3. Decide on the number of classes (n) (5 to 15 classes).
    4. Find the width (W) of the class using W = R/n (round up).
    5. Find the lower limit (LL) and upper limit (UL) of the first class: LL = L, UL = LL + W - 1
    6. For the second class limits: LL = upper limit of first class + 1, UL = LL + W - 1, use same method for other classes and then calculate the frequency for each class

Frequency Distribution Table

  • Example 2.1 data set: 27 36 72 47 48 29 18 57 33 61 44 10 76 15 67 52 35 43 71 73 56 32 81 64 85 55 19 69 50 46 68 25 36 43 54 52 27 44 98 64 61 42 36 29 42 51 38 90 67 63.
  • L = 10 and H = 98 therefore R = 88.
  • For n = 9, W ≈ 10.
  • First Class: LL = 10, UL = 19.
  • For Limits: LL = upper = 19 + 1 = 20, UL = 20+10-1 = 29
  • The frequency of class determined from data set
  • Class Limits Frequency fi Table
    • 10-19 4
    • 20-29 5
    • 30-39 7
    • 40-49 9
    • 50-59 8
    • 60-69 9
    • 70-79 4
    • 80-89 2
    • 90-99 2
    • Sum 50

Statistical Table Construction Facilitation

  • Necessary for frequency tables to have real class boundaries
  • Class limit 10-19 is read as from 10 to 19
    • Class boundaries: Subtract 0.5 from lower class limit/add 0.5 to upper
    • For example:
      • Lower boundary limit (-0.5) = 10 - 0.5 = 9.5
      • Upper boundary limit (+0.5) = 19 + 0.5 = 19.5
    • Class Midpoint = xi is obtained by adding the lower and upper adding the limits and divide by 2.
      • xi = (lower limit + upper limit)/2
      • For example:
        • x₁ = midpoint(9.5 – 19.5) = (9.5 + 19.5) / 2 = 14.5

Relative Frequency

  • Relative frequency (RF): RF = (Frequency of the Class) / (Sum of Frequencies) For example: For example: RF(10 – 19) = 4/50 = 0.08.

Percentage frequency

  • Percentage Frequency (PF): PF = RF x 100
  • For example: PF(10 – 19) = 0.80 × 100 = 80%

Frequency Distribution Table Example: - Class Limits & Class boundary limits & Class midpoints & Frequency fi & Percentage Frequency (%) Table - 10 − 19 and 9.5 − 19.5 and 14.5 and 4 and 8 - 20 − 29 and 19.5 − 29.5 and 24.5 and 5 and 10 - 30 − 39 and 29.5 − 39.5 and 34.5 and 7 and 14 - 40 − 49 and 39.5 − 49.5 and 44.5 and 9 and 18 - 50 − 59 and 49.5 − 59.5 and 54.5 and 8 and 16 - 60 − 69 and 59.5 − 69.5 and 64.5 and 9 and 18 - 70 − 79 and 69.5 − 79.5 and 74.5 and 4 and 8 - 80 − 89 and 79.5 − 89.5 and 84.5 and 2 and 4 - 90 − 99 and 89.5 − 99.5 and 94.5 and 2 and 4 - Sum and and 50 and 1 and 100

Cumulative Frequency

  • Shows a distribution of data values 'less than or equal' a specific value.
  • Values are found by adding the frequency values less than/equal to upper class boundary of class
  • Classes Cumulative Frequency Table
    • Less than 9.5 and 0
    • Less than 19.5 and 4
    • Less than 29.5 and 9
    • Less than 39.5 and 16
    • Less than 49.5 and 25
    • Less than 59.5 and 33
    • Less than 69.5 and 42
    • Less than 79.5 and 46
    • Less than 89.5 and 48
    • Less than 99.5 and 50

Bivariate Frequency Tables

  • Used when statistical data summarizes more than one variable
  • A grid of squares with the commonality of two phenomena appearing inside and the sum of duplicates at row end and column end.

Bivariate table Example

  • Organizing Chemistry/Mathematics grades of 20 students
  • Mathematics horizontal axis/Chemistry vertical axis then summing duplicates creates Bivariate frequency table with these values:
    •    Chem / Math =  A  and B  and C  and D  and E  and Sum
      
    • A and = 2 and 1 and 2 and 0 and 0 and 5
    • B and = 1 and 1 and 3 and 1 and 0 and 6
    • C and = 2 and 3 and 2 and 0 and 0 and 7
    • D and = 0 and 0 and 0 and 0 and 0 and 0
    • E and = 0 and 0 and 0 and 1 and 1 and 2
    • Sum and = 5 and 5 and 7 and 2 and 1 and 20

Quantitative Data In A Bivariate Frequency Table.

  • Bivariate data representing marks out of 30 of math and statistics students.
  • Appropriate width for class limits is 10
    • Statistical row = limits are 50 − 59, and 60 - 69, and 70 - 79, and 80 - 89, and 90 - 99, Sum
    • Mathematics column = limits are 50 − 59, and 60 - 69, and 70 - 79, and 80 - 89, and 90 - 99, and Sum
  • The limits = 70 and 76 (from math stat table).
  • Count the number of limits into the corresponding cells and the other numbers from the example into their corresponding places, to create the bivariate table.
  • For 50-59 limit, numbers were 3,1,0,0,0,4 etc up to the value 90-99

Graphic Display

  • Graphical display is used to describe data in distribution shape and data centralization.
  • Presentation of the data is visually faster and easier
  • Displayed/organized graphically using various graphs: histograms, polgygons, charts etc

Histograms

  • Uses contiguous vertical bars, whose various heights represent the frequencies of the classes.
    • Draw and label the x and y axes.
    • The x axis is the horizontal axis, and the y axis is the vertical axis
    • Represent the frequency on the y axis and class boundaries on the x axis. Using the frequencies as the heights draw vertical bars for each class

Frequency Polygons

  • Uses lines that connect points plotted for the frequencies at the midpoints of the classes.
    • the frequencies represented by the heighs of points at the x and y axes.
      1. Find the midpoints of each class.
      2. Label the x axis with the midpoint of each class and the y axis with the frequencies
      3. Plot Using frequency as y add midpoint for X
      4. Connect these line segments

Line Closing

  • To close the frequency polygon, draw a line back to the x axis at the beginning and end of the graph, at the same distance as the previous and next midpoints are located

Frequency Curves

  • By following the same previous steps in drawing the polygon The broken lines are smoothed into a curve to pass through most points The relative and percent frequency curves can be drawn in the same way

Cumulative Frequency

  • Ogive graphs represent the cumulative frequencies for the classes in a frequency distribution Ogives are plotted on a graph: 1. Find the cumulative frequency for each class. 2. Draw the x- and y-axes (label x axis as class boundaries and apply appropriate scale on y axis to present cumulative frequencies) 3. Plot the cumulative frequency each upper class boundaries. 4. Connect adjacent points with line segments starting at the first-class boundary

Visual Representation And Interpretation

  • Cumulative frequency graphs are used to visually represent how many values are below a certain upper class boundary
  • This can be used to find the class limits on the x axis based on the student numbers located on the Y axis

Shapes For Frequency Curves - Symmetric and Asymmetric

  • Symmetric distributions mirror each other on both sides
  • Asymmetric distributions occur when one side is skewed, meaning it has zero skewness, whether left or right.

Other Types Of Graphs

  • Line Graph
  • Bar Graph
  • Pie Chart
  • Stem-and-Leaf diagrams

Line Graphs

  • This type of chart represents data that has occurred over a period of time:
    • horizontal axis (x) represents the time (day, month, year)
    • vertical axis (y) represents data values

Bar Graphs vs Histograms

  • Bar graphs are used for qualitative and categorical data.
  • Can be drawn using either vertical or horizontal bars
  • Bar graphs differ from histograms for three main reasons:
    • the columns (bars) are positioned over a label that represents a categorical variable.
    • The columns do not have a class width.
    • There is a gab between columns.
  • Three types of bar graphs exist:
    • simple bar * Used to represent a classification of one variable
    • grouped bar * Represents and compares categories of two or more groups
    • stacked bar * Used to compare each segment in the bar chart with the total

Grouped Bar Graphs

  • Use the following steps:
    • Draw two or more adjacent bars that correspond to the value of the category being studied; bar length must also directly correspond with the variable number.
    • Differentiate between bars being drawn, whether it be by shading or color. Add a legend.
    • Ensure each base of bar equals distances among them

Stacked Bar Charts

  • Breakdown and compare the parts of a whole
  • Each bar represent parts in the chart (segments that represent categories within the whole

Pie Charts

  • Circle divided into portions that represents the category or distribution's percentages
  • To construct a pie graph for the data, follow the steps:
    • Since there are 360° in a circle, converting to a proportional of each of the circle is done by using the formula in degrees. Degrees =Frequency/Number of frequencies x 360
    • Conversion is done by using the formula % = Frequency of all,Number of frequencies x 100.
    • Using a protractor and a compass, draw a graph, using the appropriate degrees from the first step
    • Label each with section with name and percenatges

Calculating From Percentage When Angle Not Available

  • To calculate the angle, the percentage then is calculated (Angle = % x 3.6
  • Stem-and-Leaf diagrams
  • It uses part of the value as the stem where groups/classes are made
  • It was pioneered by statistician John Tony in 1960 and has important advantages

Advantages of Stem and Leaf Plots

  • It helps one gain insight regarding given data in term of what it is. Its extents as well
  • It also helps to indicate any sort of outlier or even gabs

Stem And Leave Breakdown

  • Leave - first value on the extreme right
    • Example, the number 35, 5 is the "leave"
  • Stem-Number
    • Example, using 35 again, 3 is the "stem"

How To Contruct The Diagram

  • One will 1st need to arrange the numbers 1st
  • 2nd be that One must split each number

Stem And Leave Contruction Steps

  • Arrange
  • Split
  • Show and display, which is the foremost digit, (stem, leaf)

Measures Of Tendency And Position

  • Central + tendancy + summation
  • Summation has these values
    • let x-1 to x-n be data/observation
      • x1, x2, etc = Σ n 1 x 1
      • Let there be y and a constant, a. Some properties
      • Σ ni1 (xi + yi) - Σ -xi + Σ -yi
      • Σ ncxi - cΣ n1xi
      • Σ c -nc

Average (Arithmetic)

  • is one of the fundamental elements that are commonly statisticaly
  • Definition 3: the mean is sum of the value, the data divided by its total number of values and the average means. The x
    • and the Greek letters = mean of population - To find
  • To find:
  • One will differentiate what there is
  • There are things that one can use, mean and data
  • Mean, in basic terms
    • xn * sigma x1/n

Second

  • For the populating equation
  • average-sum of all population
    • mean = Σn xi/N

Note Regarding The Average

  • one must calculate 1st
    • mean-total numbers provided number of them, this is 3.0

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Statistics in Computer Science
12 questions
Data Science Fundamentals
30 questions

Data Science Fundamentals

JubilantPrudence9416 avatar
JubilantPrudence9416
Introduction to Data Science
20 questions
Use Quizgecko on...
Browser
Browser