Podcast
Questions and Answers
In a database, what is the primary purpose of a foreign key?
In a database, what is the primary purpose of a foreign key?
- To establish relationships or links between two tables. (correct)
- To automatically generate unique identifiers for records.
- To enforce data type constraints across multiple tables.
- To uniquely identify each record within its own table.
Which of the following is an example of publicly available external data that a company might use for analysis?
Which of the following is an example of publicly available external data that a company might use for analysis?
- Customer purchase histories with personally identifiable information removed.
- Internal sales records from the past decade.
- Employee performance reviews from the human resources department.
- Financial statements of all publicly traded companies. (correct)
A small business is looking to understand demographic trends in their local market. Which external data source would be most relevant?
A small business is looking to understand demographic trends in their local market. Which external data source would be most relevant?
- Stock price data.
- Census data. (correct)
- Summarized financial data.
- Social media data.
In a relational database containing customer and transaction information, what is the most likely purpose of the CustomerID
field in the Customer table?
In a relational database containing customer and transaction information, what is the most likely purpose of the CustomerID
field in the Customer table?
If a table contains a 'Transaction_ID' field, what role does this field most likely play in the table's structure?
If a table contains a 'Transaction_ID' field, what role does this field most likely play in the table's structure?
Which of the following datasets is best categorized as ordinal data?
Which of the following datasets is best categorized as ordinal data?
Why is it more meaningful to calculate the ratio between two data points in a ratio scale compared to an interval scale?
Why is it more meaningful to calculate the ratio between two data points in a ratio scale compared to an interval scale?
A dataset includes transaction dates of purchases made at a store. What type of data is this, and how can it be summarized?
A dataset includes transaction dates of purchases made at a store. What type of data is this, and how can it be summarized?
Which of the examples below could be classified as interval data?
Which of the examples below could be classified as interval data?
In what manner can ordinal data be effectively summarized and presented?
In what manner can ordinal data be effectively summarized and presented?
Which activity is crucial in preparing data for analysis, ensuring its quality and suitability for producing reliable insights?
Which activity is crucial in preparing data for analysis, ensuring its quality and suitability for producing reliable insights?
A retail company wants to understand customer purchasing patterns. Which type of internal data would be MOST relevant?
A retail company wants to understand customer purchasing patterns. Which type of internal data would be MOST relevant?
A marketing team aims to improve the effectiveness of their campaigns. How could they ethically use personally identifiable information (PII)?
A marketing team aims to improve the effectiveness of their campaigns. How could they ethically use personally identifiable information (PII)?
A researcher is studying climate change and needs historical weather data. What type of data source would be MOST suitable?
A researcher is studying climate change and needs historical weather data. What type of data source would be MOST suitable?
When dealing with structured data, which format facilitates efficient analysis and integration with various software tools?
When dealing with structured data, which format facilitates efficient analysis and integration with various software tools?
A company wants to integrate data from different departments. Which challenge is MOST likely to arise regarding data formats?
A company wants to integrate data from different departments. Which challenge is MOST likely to arise regarding data formats?
During which stage of the SOAR analytics model is a specific business problem or area of interest clearly defined?
During which stage of the SOAR analytics model is a specific business problem or area of interest clearly defined?
What ethical consideration is paramount when collecting and using personally identifiable information (PII) for business analytics?
What ethical consideration is paramount when collecting and using personally identifiable information (PII) for business analytics?
Which data format is best suited for storing large datasets that exceed the row and column limits of spreadsheet software?
Which data format is best suited for storing large datasets that exceed the row and column limits of spreadsheet software?
An analyst needs to perform custom calculations and transformations on a dataset. Which type of data would be most suitable?
An analyst needs to perform custom calculations and transformations on a dataset. Which type of data would be most suitable?
What is the primary characteristic of aggregated data?
What is the primary characteristic of aggregated data?
A social media analyst wants to collect a large number of posts to identify trending topics. Which data format is the most likely source?
A social media analyst wants to collect a large number of posts to identify trending topics. Which data format is the most likely source?
Which of the following is a key advantage of using .csv files for data storage?
Which of the following is a key advantage of using .csv files for data storage?
In what scenario would using aggregated data be more appropriate than raw data?
In what scenario would using aggregated data be more appropriate than raw data?
Which file format is generally preferred when transferring data between different software applications?
Which file format is generally preferred when transferring data between different software applications?
A researcher wants to study individual survey responses to identify nuanced patterns. Which format is optimal?
A researcher wants to study individual survey responses to identify nuanced patterns. Which format is optimal?
What is the primary restriction regarding the use of data preparation materials provided by McGraw Hill LLC?
What is the primary restriction regarding the use of data preparation materials provided by McGraw Hill LLC?
An instructor wants to share data preparation tools sourced from McGraw Hill LLC with a colleague at another institution. What guideline applies?
An instructor wants to share data preparation tools sourced from McGraw Hill LLC with a colleague at another institution. What guideline applies?
Which activity concerning McGraw Hill LLC data preparation tools is permissible without violating copyright restrictions?
Which activity concerning McGraw Hill LLC data preparation tools is permissible without violating copyright restrictions?
A university department chair wants to include McGraw Hill LLC’s data preparation tools in a resource pack distributed to all incoming students. What step must they take to comply with copyright regulations?
A university department chair wants to include McGraw Hill LLC’s data preparation tools in a resource pack distributed to all incoming students. What step must they take to comply with copyright regulations?
An instructor finds data analysis tools prepared by McGraw Hill LLC. Under what conditions can they share these tools on a restricted-access course website for their students?
An instructor finds data analysis tools prepared by McGraw Hill LLC. Under what conditions can they share these tools on a restricted-access course website for their students?
Which type of data is best described as highly organized and easily fitting into a traditional database?
Which type of data is best described as highly organized and easily fitting into a traditional database?
A company is analyzing customer feedback from social media posts. What type of data are they primarily working with?
A company is analyzing customer feedback from social media posts. What type of data are they primarily working with?
Which of the following data types combines elements of both structured and unstructured formats?
Which of the following data types combines elements of both structured and unstructured formats?
A retail company wants to analyze both its sales transactions (quantifiable) and customer reviews (subjective). What approach to data analysis would best leverage both?
A retail company wants to analyze both its sales transactions (quantifiable) and customer reviews (subjective). What approach to data analysis would best leverage both?
Why is recognizing the variety of data important for data-driven decision-making?
Why is recognizing the variety of data important for data-driven decision-making?
Considering the 'Four V's' of Big Data to assess Tesla's financial statements, which V relates to the frequency with which the statements are released?
Considering the 'Four V's' of Big Data to assess Tesla's financial statements, which V relates to the frequency with which the statements are released?
A company aims to improve its marketing strategy by analyzing customer data. They have transactional data (structured) and customer service call transcripts (unstructured). How should they approach this?
A company aims to improve its marketing strategy by analyzing customer data. They have transactional data (structured) and customer service call transcripts (unstructured). How should they approach this?
Which of the following exemplifies how unstructured data enhances the insights gained from structured data in a business context?
Which of the following exemplifies how unstructured data enhances the insights gained from structured data in a business context?
Flashcards
Internal Data Sources
Internal Data Sources
Data originating from within an organization.
External Data Sources
External Data Sources
Data obtained from sources outside the organization.
Structured Data
Structured Data
Data organized in a predefined format, often in tables with rows and columns.
Data Preparation
Data Preparation
Signup and view all the flashcards
Common Data Formats
Common Data Formats
Signup and view all the flashcards
Data Ethics
Data Ethics
Signup and view all the flashcards
Personally Identifiable Information (PII)
Personally Identifiable Information (PII)
Signup and view all the flashcards
SOAR Analytics Model
SOAR Analytics Model
Signup and view all the flashcards
Primary Key
Primary Key
Signup and view all the flashcards
Foreign Key
Foreign Key
Signup and view all the flashcards
Social Media Data
Social Media Data
Signup and view all the flashcards
Census Data
Census Data
Signup and view all the flashcards
Financial Statements
Financial Statements
Signup and view all the flashcards
Internal Data
Internal Data
Signup and view all the flashcards
External Data
External Data
Signup and view all the flashcards
Data Collection
Data Collection
Signup and view all the flashcards
Text Data
Text Data
Signup and view all the flashcards
Tabular Data
Tabular Data
Signup and view all the flashcards
.CSV Files
.CSV Files
Signup and view all the flashcards
How .CSV Stores Data
How .CSV Stores Data
Signup and view all the flashcards
Aggregated Data
Aggregated Data
Signup and view all the flashcards
Raw Data
Raw Data
Signup and view all the flashcards
Why prefer raw data?
Why prefer raw data?
Signup and view all the flashcards
Analyst prefers raw data when?
Analyst prefers raw data when?
Signup and view all the flashcards
Semi-structured Data
Semi-structured Data
Signup and view all the flashcards
Database of Customer Orders
Database of Customer Orders
Signup and view all the flashcards
Blogs
Blogs
Signup and view all the flashcards
Tweets
Tweets
Signup and view all the flashcards
Pictures
Pictures
Signup and view all the flashcards
Categorical Data
Categorical Data
Signup and view all the flashcards
Ordinal Data
Ordinal Data
Signup and view all the flashcards
Ratio Data
Ratio Data
Signup and view all the flashcards
Summarizing Ordinal Data
Summarizing Ordinal Data
Signup and view all the flashcards
Study Notes
- Chapter 2 focuses on how to obtain data for business analytics, including various internal and external sources, data formats, types of structured data, and ethical considerations.
- The SOAR Analytics Model includes Specifying the Question, Obtaining the Data, Analyzing the Data, and Reporting the Results.
Internal and External Data Sources
- Internal data sources come from within the organization, while external data sources come from outside the organization.
- Enterprise Systems (ERP) are interconnected information systems with a centralized database, enabling data sharing across departments; examples include SAP, Oracle, and Workday.
- Relational databases store data efficiently in tables with columns (fields) and rows (records).
- Tables are organized into fields and rows, where fields are columns containing descriptive information, and records are rows representing unique instances.
- Common fields, Primary Keys, and Foreign Keys connect individual tables in a relational database.
- Primary Key uniquely identifies each table (e.g., TransactionID in a Transaction Table).
- Foreign Key exists to create relationships or links between two tables.
- External data sources include social media data, census data, Small Business Administration data, publicly available financial statements, and stock price data.
The Four V's of Big Data
- Big data is defined by Volume, Velocity, Variety, and Veracity.
- Volume refers to the amount of data.
- Velocity refers to the speed of generation and rate of analysis.
- Variety refers to the different types of data, and Veracity refers to the trustworthiness of the data.
Variety of Data
- Structured data is highly organized and fits neatly in a table or database.
- Unstructured data lacks a predefined organization.
- Semi-structured data contains elements of both structured and unstructured data.
Obtaining the Data
- Data can be text (tweets, hashtags, documents) or tabular (structured in rows and columns).
- Tabular data is structured into rows and columns.
- Data is generally delivered as comma-separated files with a ".csv" extension.
- .csv files store data as text but can be converted to a tabular format.
- .csv files don't have the same row and column limits as Excel, which allows them to hold Big Data that exceeds Excel's size limit.
Level of Aggregation
- You can receive aggregated or raw data.
- Aggregated data is already processed and transformed.
- Raw data provides the analyst with flexibility to process data as needed.
Structured Data Types
- Structured data can be categorical or numerical.
- Categorical data is represented by words and is used to categorize items, such as gender or transaction type.
- Nominal data is categorical data that cannot be ranked.
- Ordinal data allows for ranking and sorting.
- Numerical data uses meaningful numbers.
Numerical Data
- In interval data, there is an equal interval between each observation.
- Ratio data has an equal and definitive ratio between each data point, with an absolute zero.
Preparing Data for Analysis
- The data must be validated for completeness and integrity.
- The data must be cleansed. Use trim and clean functions to prepare the data.
- Preliminary exploratory Analysis is required.
Preparing Data and Ensuring Quality
- Data quality should be ensured by validating data types, completeness, and consistency.
- Data completeness is the degree to which all required data is present.
- Data integrity refers to the accuracy and consistency of data over its lifecycle.
- Data cleansing involves removing headings, subtotals, leading zeroes, and formatting inconsistencies.
- Missing values can be left as they are, removed, or replaced with imputed values.
Tools for Data Prep
- Common software tools for data preparation and analysis include Microsoft Excel, Tableau, Microsoft Power BI, Microsoft Power Query, Tableau Prep, Alteryx, Open-source Tools (with Tableau and Excel), Gretl, R, and Python.
Ethical Data Handling
- Data ethics involves the moral responsibility for gathering, using, and protecting personally identifiable information.
- Companies gathering data must send privacy notices and offer opt-out options to individuals.
- Safeguards are a necessity to protect sensitive data, and effective practices mitigate misuse risks.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.