BA311 Prelim Reviewer PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document is a reviewer for a business analytics course, specifically covering descriptive analytics and exploratory data analysis. It details concepts like data types, variables, frequency distributions, measures of central tendency, and variability.
Full Transcript
MODULE 1 Descriptive Analytics or Exploratory Data Analysis - is a type of Business Analytics application where data is described and summarized using basic statistical tools and graphs to produce reports and dashboards for decision making. It: Tell Me What has Happened and...
MODULE 1 Descriptive Analytics or Exploratory Data Analysis - is a type of Business Analytics application where data is described and summarized using basic statistical tools and graphs to produce reports and dashboards for decision making. It: Tell Me What has Happened and Why Tell Me What is Happening Right Now BAP is a data driven process ODMP on the other hand is descriptive in nature and maybe considered to be more subjective in its analysis BAP emphasizes on objectivity and fact-based Data are numerical quantitative figures as well as qualitative facts. Data are facts and figures collected, tabulated, summarized and analyzed for presentation and interpretation Variable is a characteristic of an element Element is a unit of which data are collected Measurements of a variable provide data. Measurements obtained for a particular element is called an observation Decision Variables are values of some variables that are under direct control of the decision maker. Random variable/uncertain variable are quantities whose values are not known Population Data A population includes all of the elements from a set of data. Sample Data A sample consists of one or more observations drawn from the population. Quantitative Data Can be characterized as measurable data Categorical Data Can be characterized as countable data Nominal Scale Variable Number or codes given to objects or events or naming or classifying only Ordinal Scale Variable Classify data into categories and order Cross Sectional Data Cross-sectional data are observations that come from different individuals or groups at a single point in time. Time Series Data Time-series data is a set of observations collected at usually discrete and equally spaced time intervals. Experimental Data A variable of interest is identified Observational Data A variable is measured without trying to change or affect the values Factors to consider in Data Gathering The time and cost of collecting data should be a factor The COUNT function counts the number of cells that contain numbers, dates and times. The COUNTA function counts the number of cells that are not empty in a range. The COUNTIF function count the number of cells that meet a criterion The IF function allows you to make logical comparisons between a value and what you expect. MODULE 2 Central tendency is defined as “the statistical measure that identifies a single value as representative of an entire distribution” Measures Of Central Tendency Mean (𝑋ത) Median (Md) Mode (Mo) Arithmetic Mean Most common measure of location. Computed as the average value of the data set. Median Another measure of central location, it is the value in the middle when the data are arranged in ascending order (smallest to largest value) Mode MODE is the value that occurs most frequently in a data set. It is the data with the greatest frequency value Measures of Variability Range Interquartile Range Mean Absolute Deviation Variance Standard Deviation Coefficient of Variation Range Difference between the highest value and the lowest value Interquartile Range It is computed on the middle 50% of the observations, after elimination of highest and lowest 25% observations in a data set which is arranged in ascending order. Unlike interquartile range it is not sensitive to extreme values. Mean Absolute Deviation Gets the absolute difference between the actual value and the measure of central location/tendency Variance Measure of variability that utilizes all the data. The variance is based on the deviation from the mean (difference between the value of each observation and the mean) Standard Deviation While mean indicates representative value for a data set, standard deviation shows the dispersion or variability across data points. It is considered as the most efficient measure of dispersion. Coefficient of Variation Measures dispersion in relation to the mean. Measures of Dispersion Variance is based on the SQUARED distance from the mean. Variance is also always positive, regardless of direction, since squaring a positive or negative real number always creates a positive number (or zero). So when a value is far from the mean, variance tends to consider these points more heavily than MAD would. MODULE 3 Data Distribution A data distribution is a function or a listing which shows all the possible values (or intervals) of the data. Frequency Distribution Frequency distribution is a summary of data that shows the number (frequency) of observations in each of several non-overlapping classes or can also be referred to as bins/cells Frequency tells you how often something happened. Frequency Distribution For Quantitative Data Frequency distributions can also be created for quantitative data, it must be ensured that there are no overlapping bins Defining classes for frequency distribution Step 1: Determine the number of non-overlapping classes/cells/bins. Step 2: Determine the width of each classes/cells/bins. Step 3: Determine the classes/cells/bins limits Measure of Shape Measures of shape tells us whether data is normally distributed or not Symmetrical Distribution Most of the statistical analysis is based on the assumption of normal or symmetrical distribution. It is very rare that binomial or poison or other types of distribution is used for statistical analysis. Asymmetrical Distribution Asymmetric distribution means the distribution is not normal positively skewed distribution. Frequencies in the distribution are spread out over a greater range of high-end values on the right side of the distribution from the centre point. skewness is negative. Frequencies in the distribution are spread out over a greater range of low-ends on the left side of the distribution from the center point. Measures Of Skewness Skewness refers to the lack of symmetry in the shape of the distribution Measures of skewness of any distribution is defined in relation to normal distribution Skewness is the difference between the manner in which observations are distributed in a particular distribution compared to a normal distribution Positively Skewed – if its tail extends farther to the right than to the left, skewed to the right Negatively Skewed - its tail extends to the left than to the right, skewed to the left The coefficient compares the sample distribution with a normal distribution. The larger the value, the larger the distribution differs from a normal distribution. A value of zero means no skewness at all. A large negative value means the distribution is negatively skewed. A large positive value means the distribution is positively skewed. Concept Of Kurtosis Kurtosis refers to the degree of flatness or peakedness of a frequency distribution. It is always measured in relation to the peakedness of the normal curve. It tells us the extent to which a distribution is more peaked or flat than the normal curve. Mesokurtic- The frequency distribution exactly coincide with a normal curve. Resulting value is close to zero Leptokurtic – Distribution is more peaked than the normal curve. Items are more closely clustered around the mean. Resulting value is positive. Platykurtic – The distribution is flatter than the normal curve. The observations are more dispersed from the mean than the normal curve. Resulting value is negative. Cumulative Distributions Provides another tabular summary of quantitative data, uses the number of classes class widths and class limits developed for frequency distribution