Podcast
Questions and Answers
What is a primary characteristic of structured data?
What is a primary characteristic of structured data?
Which of the following statements best defines data integrity?
Which of the following statements best defines data integrity?
What does data granularity refer to in the context of data readiness for analytics?
What does data granularity refer to in the context of data readiness for analytics?
Which of the following accurately describes unstructured data?
Which of the following accurately describes unstructured data?
Signup and view all the answers
What is one of the critical aspects related to data security according to the nature of data?
What is one of the critical aspects related to data security according to the nature of data?
Signup and view all the answers
Which of the following best describes the primary difference between nominal and ordinal data?
Which of the following best describes the primary difference between nominal and ordinal data?
Signup and view all the answers
What is a defining feature of ratio data compared to interval data?
What is a defining feature of ratio data compared to interval data?
Signup and view all the answers
Which of the following processes is NOT typically included in data preprocessing?
Which of the following processes is NOT typically included in data preprocessing?
Signup and view all the answers
In the context of categorical data, what is the primary characteristic of ordinal variables?
In the context of categorical data, what is the primary characteristic of ordinal variables?
Signup and view all the answers
What problem does data reduction mainly address during data preprocessing?
What problem does data reduction mainly address during data preprocessing?
Signup and view all the answers
Study Notes
Business Intelligence, Analytics, and Data Science: A Managerial Perspective - Chapter 2 Summary
- Data: A collection of facts, usually obtained from experiences, observations, or experiments. Data can be numbers, words, images, etc. It's the fundamental building block for information and knowledge. Data quality and integrity are critical for analysis. Data integrity includes accuracy, completeness, consistency, and validity of an organization's data.
Nature of Data
-
Data types include structured, unstructured, and semi-structured.
- Structured data: Standardized format, well-defined structure, follows a consistent order, and easily accessed. Examples include names, dates, addresses, credit card numbers, stock information, geolocation. Usually easily understood by machine language.
- Unstructured data: Any combination of textual, imagery, voice, and web content—not organized in a predefined format.
- Semi-structured data: Data that doesn't conform to a rigid schema but has certain organizational elements (like tags or labels). Examples include XML, JSON, HTML, log files.
-
Data is categorized by properties:
-
Categorical data: Represents different types or groups.
- Nominal: Labels variables without quantitative value. (e.g., male/female, hair color, nationality).
- Ordinal: Variables have an order, but the differences between the values are not meaningful. Examples include Likert scales.
-
Numerical data: Represents values that can be measured and ordered.
- Interval: Concerned with both the order and difference between variables. (e.g., classifying people as teenager, youth, etc.)
- Ratio: Deals with the order of variables, differences between them, and has a meaningful zero point. (e.g., income, height, weight)
-
Categorical data: Represents different types or groups.
Metrics for Analytics-Ready Data
- Data source reliability: Is the source dependable?
- Data content accuracy: Does the data match the task/job?
- Data accessibility: Is the data easy to access when needed?
- Data security and data privacy: Who has access to the data and under what conditions?
- Data richness : Is the data complete or near complete?
- Data consistency: Is the data collected and combined accurately?
- Data currency/timeliness: Is the data up-to-date, recent, and new?
- Data validity: Does the data match the expected data?
- Data relevancy: Are the variables in the data relevant to the study?
- Data granularity: Are the variables and data defined at the lowest level of detail possible?
Data Preprocessing
-
The real-world data is often not ready for analysis due to its dirty, misaligned, complex, inaccurate nature.
-
Data preprocessing techniques are needed to ready the data for analysis— including data consolidation, cleaning, transformation, and reduction.
-
Data reduction: Focuses on reducing the dimensionality (number of features) or the volume of data.
- Variable selection: Selecting relevant variables from a large set.
- Dimensional reduction: Reducing the number of variables.
- Sampling: Selecting a representative subset of data.
- Balancing/stratification: Reducing imbalances in data sets. (e.g., stratified sampling)
-
Discretization: Converting continuous data into discrete intervals or categories (e.g., age groups, income brackets).
-
Data normalization: Reorganizing data to remove unstructured and redundant information, standardizing the data format.
-
Creating attributes: Define attributes to store more information (like priority levels).
Statistical Modeling for Business Analytics
-
Descriptive Analytics: Describes the data as it is (e.g., using mean, median, mode, histograms , skew, kurtosis)
-
Predictive Analytics: Focuses on understanding trends and patterns to predict future outcomes.
-
Prescriptive Analytics: Recommends actions based on predictions.
-
Statistics: A collection of mathematical techniques to characterize and interpret data
-
Descriptive statistics: Describing the data. (mean, median, mode, measures of dispersion, histograms, skew, kurtosis)
-
Arithmetic Mean
-
Median
-
Mode
-
Range
-
Variance
-
Standard Deviation
-
Mean Absolute Deviation (MAD)
-
Inferential statistics: Drawing inferences about the population based on sample data (e.g., regression modeling).
Regression Modeling for Inferential Statistics
- Regression: A technique in inferential statistics used to identify relationships between explanatory (input) and response (output) variables. Can used for hypothesis testing and forecasting.
Business Reporting
- Report: Information leading to decision-making, a communication artifact conveying specific information. Reports fulfill multiple functions, ensuring proper departmental functioning, providing information, providing analysis results, persuading action, and creating organizational memory.
Types of Business Reports
- Metric management reports: Use KPIs and metrics to manage business performance.
- Dashboard-type reports: Graphical displays of performance indicators.
- Balanced scorecard-type reports: Management system that improves business processes by analyzing internal and external outcomes.
Data Visualization
- The use of visual representations to explore, make sense of, and communicate data. Involves aggregation, summarization and contextualization of data. Related to information graphics scientific visualization and statistical graphics.
- Using Charts, graphs, illustrations, etc.
- Different types of graphs and charts include histograms, pie charts, scatter plots, line graphs.
Performance Dashboards
- Common in BPM software and BI platforms
- Provide visual displays of important information on a single screen for quick understanding and drilling down for analysis.
- Effective dashboards should be designed with clear presentation of data and exceptions enabling quick assimilation.
- Include design elements like prioritizing alerts, clear use of visuals, information layers (monitoring, analysis, and management).
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Dive into Chapter 2 of 'Business Intelligence, Analytics, and Data Science: A Managerial Perspective,' where we explore the fundamental concepts of data. Learn about the different types of data, including structured, unstructured, and semi-structured data, and why data quality and integrity are essential for effective analysis.