BUS 312 Midterm Study Guide PDF

Summary

This document is a study guide for a business analytics course. It covers topics such as data and information, different types of data analysis, business functions, and how to make decisions based on data analysis.

Full Transcript

BUS 312 Midterm Study Guide Notes Processes ○ Businesses have processes ○ Understanding processes, help organizations makes better decisions ○ Continuous improvement refers to understanding those processes and continuously looking for easy to improve the, D...

BUS 312 Midterm Study Guide Notes Processes ○ Businesses have processes ○ Understanding processes, help organizations makes better decisions ○ Continuous improvement refers to understanding those processes and continuously looking for easy to improve the, Do you try to improve your processes? ○ Yes, to stay competitive Data/Context- information- knowledge-decision Data and Information ○ Data- raw facts that have little meaning on their own ○ Context- the setting, event, statement or situation ○ Information- data organized in a way to be useful to the analyst or user combining data with context ○ Knowledge- understanding or familiarity with information gained ○ Decisions- conclusion reached after consideration of knowledge is considered Data and information (Examples) ○ Data- data dump of instagram posts ○ Context- instagram posts regarding tide (detergent) pods ○ Information- current level of consumer sentiment regarding tide pods ○ Knowledge- knowledge of current and planned marketing campaign and consumer response on tide pods ○ Decisions- decisions regarding future marketing campaign of tide pods Business analytics- the use of data to create knowledge, to help draw conclusions and address business questions ○ Process of transforming data into insight for decision making ○ Different business functions have different business analytics needs Marketing analytics- measures and attempts to improve its marketing performance ○ The most important component of marketing analytics is providing insights into consumer preferences and trends Accounting analytics- uses business analytics to help measure accounting performance and address accounting questions in the audit, financial accounting, and managerial accounting and tax areas Financial analytics- uses business analytics ro help company measure and evaluate its financial performance, from predicting receivables collection from its customers to helping management evaluate future investments based on expected investments in equipment or employee training or stocks and bonds Operation analytics- measure and improve the efficacy and effectiveness of the company's operations About 25 quintillion bytes of data are created each day, and the rate of data growth continuous to accelerate Companies get a lot of data, management's challenge is to see which process creates the most value and minimize the costs of those processes, to help make a decision they hire business analysts, business analyst help the organization make a decision Who gets involved with data? ○ Decision maker- needs knowledge and information to make decisions ○ Data scientist- a specialist who knows how to work with, manipulate, and statistically test data ○ business/ data analyst- the interpreter or Liaison the one that knows business, knows what data is needed, and knows how to communicate with both the decision maker and the data scientist Analytics mindset ○ Ask the right questions ○ Extract, transform and load relevant data ○ Apply appropriate data analytic techniques ○ Interpret and share the results with stakeholders SOAR ○ Specify the question Descriptive analytics: What happened? Diagnostic analytics:Why did it happen? Predictive analytics: What is likely to happen in the future? Prescriptive analytics: What actions should we take, based on what we expect will happen? Adaptive analytics: How does the system adapt to changes? ○ Obtain the data What data is available? Which data needs to be collected? Will the data adequately address the question? Is the data relevant to the question being asked and/or reliable enough to address the question? Is it clean of errors or inconsistencies? Does it have lots of missing data? Is the data biased in some way? ○ Analyze the data Descriptive analytics: What is the average age of our customers? Diagnostic analytics: Why did shipping time increase? Predictive analytics: What is next year’s demand? Prescriptive analytics: Should we make or buy the product? Adaptive analytics: How should the marketing strategy change? Hindsight, insight, foresight, rightsight ○ Report the results What is the best way to communicate our data analysis findings? Static visualization Reports graphs tables Dynamic visualizations dashboards Slide 15 graph Three basic components of a relational database ○ Relational databases- are an efficient means of storing data in one place, in one table instead of multiple places and have the following components Tables- data organized into sets of columns (fields) and rows (records) Fields- these are the columns that contain descriptive information about the observations in the table (including primary and foreign keys) Primary key- unique identifier in each table Foreign key- exist to create relationships or links between two tables Records- these are the rows in a table; each row; or record, corresponds to a unique instance of what is being described in the table Relational databases reduce information redundancy ○ Information redundancy- the duplication of data or storing the same information in multiple places Information integrity- measures the quality of information Integrity constraint- rules that help ensure the quality of information External Data Sources ○ Social media data ○ Census data ○ Small business administration data ○ Publicly available data Financial statements of all publicly traded companies Stock price data Summarized financial data internal /external databases -> data warehouse -> marketing data mart, inventory data mart, sales data mart Data marts ○ Data aggression- collection of data from various sources for the purpose of data processing ○ Extraction, transformation, and loading (ETL)- a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse ○ Data mart- contains a subset of data warehouse information The 4 V’S of Big Data ○ Volume Social media ○ Veracity Untrusted uncleansed ○ Variety Unstructured semi structured Structured ○ Velocity Speed of generation Rate of analysis Variety of data ○ Structured data- highly organized data that fit nicely in a table or database Financial statements Database of customer orders and preferences ○ Unstructured data- data without organization or structure Blogs, tweets, pictures ○ Semi structured data- elements of both structured and unstructured data Sensory data like weather updates, traffic, video footage Structured data types ○ Categorical- tend to be represented by words such as categorizing a group of people by gender (male, female, nonbinary), or categorizing transaction types (sales versus returns) Nominal- cannot be ranked Gender Transaction type Location of sale Ordinal- allows/implies ranking and sorting Gold, silver, and bronze Survey answers: agree, indifferent, disagree ○ Numerical- meaningful numbers, such as transaction amount, net income, age, or the score on an exam Discrete data Whole number Finite set of values between any two observations Continuous data- any numerical value, not just whole numbers Infinite set of values between any two observations Interval- an equal interval between each observation, so that not only does summing the data make sense, so does multiplication and other more complex numerical calculations SAT scores Ratio data- numerical data with an equal and definitive ratio between each data point and absolute “zero” in ratio data is the point origin height , weight SLIDE 28 TIDE ○ Numerical data Price Average review rating In stock- quantity on hand ○ Categorical data Product name Flag for prime In stock- a flag for yes or no Additional ways to classify data ○ String text short text or alphanumeric/ data/ geographic Preparing data for analysis (4 steps of inputting data) ○ Ensure data quality (check) Make sure that the data types for each attribute are appropriate Check all numerical data types Check all text data types ○ Validate the data (compare) Compare the number of records that were extracted to the number of records in the source data Compare descriptive statistics for numerical fields and investigate the outliers Validate date/time fields Compare string limits for text fields ○ Cleanse the data (remove/ correct/ format/ determine) Three choices regarding missing values Leave the missing values as is Remove the records that have missing values Impute values to replace the missing values ○ Perform preliminary exploratory analysis (determine/ensure) Enterprise system (also called enterprise rouse planning systems ) (ERP) ○ A company’s interconnected information systems that connect to each other ○ Centralized database Data ethics- refers to the moral responsibility associated with gathering, using and protecting personally identifiable information Business advantages of a relational database ○ Increase Fleavility Scalability and performance Information integrity Information security ○ Decrease Information redundancy The cost of accurate and complete information ○ 1. Competes but with known errors ○ 2. perfect information pricey ○ 3. Not very useful may be a proto-type only ○ 4. Very incomplete but accurate Is the company handling data ethically ○ Does the company send a privacy notice to individuals when their personal data are collected? ○ Do the companies third party data providers follow ethical practices when gathering and sharing sensitive data ○ If the credit card info is taken, what assurance do customers have that their credit card number will be protected? ○ Does the company keep the data secure and private, and does it have safeguard in place to protect the data? ○ Has the company established effective practices to mitigate the risks of data misuse? PivotTables ○ Is atoll that allows you to summarize large amounts of data ○ Powerful ○ Beautiful ○ Fast ○ Accurate ○ Flexible ○ Rows field- categories that go down the left side ○ Columns- has categories that go actress the top of a pivot table ○ Filters- lets you filter the whole pivot table by the categories ○ Values- contains the data you want to summarize Tools used ○ Collect data Survey money qualtrics ○ Analyze data Excel pivot tables Excel functions Excel analysis toolpak ○ Reporting Excel Tableau Power BI Data> Analysis> data analysis > Descriptive statistics ○ simplify large data sets and derive meaningful patterns and insights. Population vs Sample ○ Population: a group with something in common Parameter: characteristic of a population Ex: survey all restaurants ○ Sample: a subset of a population Statistic: characteristics of a sample Descriptive- measures that describe a population/sample Inferential- measures calculated only using a sample of the desired population Hypothesis- proposed explanation made based on limited evidence as a starting point for further investigation Used to make inferences (conclusions about the characteristics of a population) Ex: survey of selected restaurants Statistical Technique for Descriptive analytics ○ Counts- determine frequency of occurrence ○ Graphs: bar charts, histograms- provide summary visualization ○ minimums , maximums, mediums, standard deviations- provide summary measures of dispersion ○ Pivot tables- provide a flexible way to summarize large amounts of data ○ Ratios- compare two numbers ○ Totals, sums, average, subtotals- provide summary measures of performances Sampling Methods ○ Simple random sampling Every observation in the population has an equal chance of being selected into the sample Effective if you hope to gather a representative sample of the entire population and you're not concerned about selecting a particular subset and/or when the entire population is relatively homogeneous (similar) ○ Stratified random sampling includes all groups (strata) Used when the population has clear groups that you want to ensure have adequate representation in your sample Divide the population into groups (called strata) Calculate the proportion of the population that each group makes up Perform a random sample of each group to ensure the appropriate number of each group (stratum) is represented in the overall sample ○ Cluster sampling select few groups (clusters) Dividing the population into groups (clusters) and calculating the proportion of the population that each group makes up Used when only a select few clusters are pertinent to the stud, yet you still want the distribution across the groups (clusters) to be proportionate ○ convenience/ non-probability sampling Method of collecting data from convenient, easy to access data points is convenience sampling Not recommended because it may not create a sample that is representative of the population, but it us used when time is limited, and budgets are low Takes two forms If the data still needs to be collected, you distribute the survey digitally or in paper format, and you collecting data once responses are received Common method for data reduction is filtering Measures of central tendency ○ Mean: average= Sum/n ○ Median: midpoint of the data distribution ○ Mode: most common observation in a data set ○ Kurtosis: distribution shape (data central or in tails) ○ Symmetry: Mean=Median=Mode Skewness How dispersed the data set is ○ Range: maximum- minimum ○ Interquartile range: 4 quartiles ○ Variance: average of squared deviations from the mean ○ Standard deviation (variability of data from the mean): square root of the variance, same unit as data values Normal distribution ○ Empirical rule refers to a statistical rule that mentions that all data or information is covered around three standard deviations of the average in a normal distribution Slide 130 ○ People retain 95 percent of visual messages compared to 10 percent of text messages ○ Aka infographics (information graphics) Data visualization ○ Charts and reports are not the data, they are representation of data ○ Chart is a general (broader) term. Graphs, tables, and diagrams are types of charts ○ A good visualization of our finding follows ASK It is accurate Tells a clear story Adds knowledge ○ Tools Excel Add a chart to numerical analysis Power BI & Tableau Large datasets blend/ merge datasets Charts Dashboards More intuitive in visualization Questions to ask when deciding on a Chart ○ What business question are you trying to answer? ○ Who is the target audience for the visualization? ○ Which type of data is being visualized, categorical or numerical? ○ What type of analysis have you performed? Exploratory confirmatory Exploratory Visualization ○ Initial descriptive and diagnostic analytics ○ Understand historic data to generate questions and hypotheses ○ Charts Bar charts Line charts: time series data Sort chronologically Trend, not precise Pie charts Charts for Categorical Data ○ Bar charts Proportion of each category Easier to interpret than a pie chart ○ Pie chart Proportion of each category 5 or less categories ○ Tree map Strengths: patterns, outliers, many categories Weaknesses: not exact proportion or number ○ Symbol map Strengths: proportions across geographical areas Weakness: not exact proportion or number ○ Heat map Strengths: visualize data across multiple categories, patterns, correlation Weaknesses: not exact proportion or number Charts for numerical data ○ Box and whisker plots and histograms Distribution and outliers ○ Line charts Trends, data changes ○ Scatterplots 2 variables on own axis, correlation ○ Filled geographic maps How values differ over geographic region Histograms vs bar charts ○ Histogram Represent bins/intervals Use numerical data No gaps Y-axis is frequency/count of observations ○ Bar charts Represent categories Gaps Y-axis can be a variety of statistics Summary of chart types ○ Conceptual (qualitative) Comparison Bar chart Pie chart Stacked bar chart Tree map Heat map Geographic data Symbol map Text data Word cloud ○ Data driven (quantitative) Outlier detection Box and whisker plot Data changes or trends over time Line chart Possible relationship between two variables, line of best fit scatterplot Geographic data Symbol map Filled (geographic) map Dashboards ○ Tells a business’s life story ○ Combines multiple visualizations, tables, and key performance measures ○ Static versus interactive- interactive provides user specified filters to drill down, up, and through data How charts lie ○ Ways a chart might lie, confuse, manipulate, and mislead Poor design Use of incorrect/appropriate amount of data Suggestion of misleading patterns Support for pre-existing desired outcomes, opinions, or assumptions Unclear communication of uncertainty Best practices for Effective Charts ○ Select best chart to communicate results ○ Use a legend for multiple colors/line types ○ Use bright colors to differentiate/highlight ○ Answer one question per chart ○ Use a natural consistent interval scale ○ Use colors according to generally accepted interpretations ○ Include title, labels, tick marks ○ Include all relevant data to inform but not overwhelm Tableau ○ Is a business intelligence and data visualization tool ○ With an intuitive interface we can get insights from data, find hidden trends, and make business decisions ○ Like pivot tables has drag and drop features ○ Used for large data sets ○ The desktop version provides link to external data ○ Shes mainly going to ask about its layout and usage ○ Like the pivot table in excel you can drag and drop variables in rows, columns, or filter shelfs ○ Numerical data are usually called measures ○ Categorical data are usually called Dimensions (labels)

Use Quizgecko on...
Browser
Browser