Chapter 2

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In a database, what is the primary purpose of a foreign key?

  • To establish relationships or links between two tables. (correct)
  • To automatically generate unique identifiers for records.
  • To enforce data type constraints across multiple tables.
  • To uniquely identify each record within its own table.

Which of the following is an example of publicly available external data that a company might use for analysis?

  • Customer purchase histories with personally identifiable information removed.
  • Internal sales records from the past decade.
  • Employee performance reviews from the human resources department.
  • Financial statements of all publicly traded companies. (correct)

A small business is looking to understand demographic trends in their local market. Which external data source would be most relevant?

  • Stock price data.
  • Census data. (correct)
  • Summarized financial data.
  • Social media data.

In a relational database containing customer and transaction information, what is the most likely purpose of the CustomerID field in the Customer table?

<p>To provide a unique identifier for each customer, acting as the primary key. (C)</p> Signup and view all the answers

If a table contains a 'Transaction_ID' field, what role does this field most likely play in the table's structure?

<p>It uniquely identifies each transaction record, serving as the primary key. (B)</p> Signup and view all the answers

Which of the following datasets is best categorized as ordinal data?

<p>Customer satisfaction ratings on a scale of 'Very Unsatisfied,' 'Unsatisfied,' 'Neutral,' 'Satisfied,' and 'Very Satisfied.' (A)</p> Signup and view all the answers

Why is it more meaningful to calculate the ratio between two data points in a ratio scale compared to an interval scale?

<p>Because the zero point in an interval scale is arbitrary, whereas in a ratio scale, it represents a true absence of the quantity being measured. (A)</p> Signup and view all the answers

A dataset includes transaction dates of purchases made at a store. What type of data is this, and how can it be summarized?

<p>Categorical data, which can be summarized by counting the number of transactions per day. (B)</p> Signup and view all the answers

Which of the examples below could be classified as interval data?

<p>Celsius temperature readings. (D)</p> Signup and view all the answers

In what manner can ordinal data be effectively summarized and presented?

<p>Determining proportions, ranking, and using count of groupings. (B)</p> Signup and view all the answers

Which activity is crucial in preparing data for analysis, ensuring its quality and suitability for producing reliable insights?

<p>Data validation. (C)</p> Signup and view all the answers

A retail company wants to understand customer purchasing patterns. Which type of internal data would be MOST relevant?

<p>Sales transaction records. (A)</p> Signup and view all the answers

A marketing team aims to improve the effectiveness of their campaigns. How could they ethically use personally identifiable information (PII)?

<p>Anonymize PII to analyze demographic trends. (C)</p> Signup and view all the answers

A researcher is studying climate change and needs historical weather data. What type of data source would be MOST suitable?

<p>Government databases. (D)</p> Signup and view all the answers

When dealing with structured data, which format facilitates efficient analysis and integration with various software tools?

<p>Spreadsheet files (.csv, .xlsx). (A)</p> Signup and view all the answers

A company wants to integrate data from different departments. Which challenge is MOST likely to arise regarding data formats?

<p>Different departments use incompatible formats. (A)</p> Signup and view all the answers

During which stage of the SOAR analytics model is a specific business problem or area of interest clearly defined?

<p>Specify the Question. (D)</p> Signup and view all the answers

What ethical consideration is paramount when collecting and using personally identifiable information (PII) for business analytics?

<p>Ensuring transparency and obtaining informed consent. (A)</p> Signup and view all the answers

Which data format is best suited for storing large datasets that exceed the row and column limits of spreadsheet software?

<p>.csv (B)</p> Signup and view all the answers

An analyst needs to perform custom calculations and transformations on a dataset. Which type of data would be most suitable?

<p>Raw data (A)</p> Signup and view all the answers

What is the primary characteristic of aggregated data?

<p>It is pre-processed and transformed. (A)</p> Signup and view all the answers

A social media analyst wants to collect a large number of posts to identify trending topics. Which data format is the most likely source?

<p>Text data (D)</p> Signup and view all the answers

Which of the following is a key advantage of using .csv files for data storage?

<p>Ability to handle larger datasets compared to some spreadsheet applications. (D)</p> Signup and view all the answers

In what scenario would using aggregated data be more appropriate than raw data?

<p>When generating a summary report for stakeholders. (C)</p> Signup and view all the answers

Which file format is generally preferred when transferring data between different software applications?

<p>.csv (A)</p> Signup and view all the answers

A researcher wants to study individual survey responses to identify nuanced patterns. Which format is optimal?

<p>Raw survey data with individual responses (A)</p> Signup and view all the answers

What is the primary restriction regarding the use of data preparation materials provided by McGraw Hill LLC?

<p>Reproduction or distribution requires the prior written consent of McGraw Hill LLC. (C)</p> Signup and view all the answers

An instructor wants to share data preparation tools sourced from McGraw Hill LLC with a colleague at another institution. What guideline applies?

<p>Sharing requires prior written consent from McGraw Hill LLC. (D)</p> Signup and view all the answers

Which activity concerning McGraw Hill LLC data preparation tools is permissible without violating copyright restrictions?

<p>Using the tools for personal study and preparation. (C)</p> Signup and view all the answers

A university department chair wants to include McGraw Hill LLC’s data preparation tools in a resource pack distributed to all incoming students. What step must they take to comply with copyright regulations?

<p>Obtain written consent from McGraw Hill LLC before distributing the resource pack. (C)</p> Signup and view all the answers

An instructor finds data analysis tools prepared by McGraw Hill LLC. Under what conditions can they share these tools on a restricted-access course website for their students?

<p>Only after obtaining prior written consent from McGraw Hill LLC. (A)</p> Signup and view all the answers

Which type of data is best described as highly organized and easily fitting into a traditional database?

<p>Structured Data (C)</p> Signup and view all the answers

A company is analyzing customer feedback from social media posts. What type of data are they primarily working with?

<p>Unstructured Data (D)</p> Signup and view all the answers

Which of the following data types combines elements of both structured and unstructured formats?

<p>Semi-structured Data (C)</p> Signup and view all the answers

A retail company wants to analyze both its sales transactions (quantifiable) and customer reviews (subjective). What approach to data analysis would best leverage both?

<p>Combining structured sales data with insights extracted from unstructured customer reviews. (D)</p> Signup and view all the answers

Why is recognizing the variety of data important for data-driven decision-making?

<p>It enables the selection of appropriate analysis methods and tools for each data type. (D)</p> Signup and view all the answers

Considering the 'Four V's' of Big Data to assess Tesla's financial statements, which V relates to the frequency with which the statements are released?

<p>Velocity (C)</p> Signup and view all the answers

A company aims to improve its marketing strategy by analyzing customer data. They have transactional data (structured) and customer service call transcripts (unstructured). How should they approach this?

<p>Use separate tools to analyze each data type and then integrate the findings. (C)</p> Signup and view all the answers

Which of the following exemplifies how unstructured data enhances the insights gained from structured data in a business context?

<p>Analyzing social media comments (unstructured) to understand customer sentiment towards the product. (D)</p> Signup and view all the answers

Flashcards

Internal Data Sources

Data originating from within an organization.

External Data Sources

Data obtained from sources outside the organization.

Structured Data

Data organized in a predefined format, often in tables with rows and columns.

Data Preparation

The process of readying your data for processing.

Signup and view all the flashcards

Common Data Formats

Data that is often requested for processing.

Signup and view all the flashcards

Data Ethics

Principles guiding the ethical collection and use of data, especially personal information.

Signup and view all the flashcards

Personally Identifiable Information (PII)

Information that can identify an individual.

Signup and view all the flashcards

SOAR Analytics Model

A framework for approaching a problem requiring analysis

Signup and view all the flashcards

Primary Key

Unique identifier for each record in a table.

Signup and view all the flashcards

Foreign Key

A key used to link two tables together.

Signup and view all the flashcards

Social Media Data

Data from platforms like Facebook, Twitter, and Instagram.

Signup and view all the flashcards

Census Data

Demographic and economic information collected by the government.

Signup and view all the flashcards

Financial Statements

Reports providing details on a company's assets, liabilities, and equity.

Signup and view all the flashcards

Internal Data

Data found within the company

Signup and view all the flashcards

External Data

Data from external sources

Signup and view all the flashcards

Data Collection

Data collected for a specific purpose

Signup and view all the flashcards

Text Data

Unstructured information like tweets or documents.

Signup and view all the flashcards

Tabular Data

Data organized into rows and columns.

Signup and view all the flashcards

.CSV Files

Files where data is separated by commas.

Signup and view all the flashcards

How .CSV Stores Data

Files that store data as text with comma separation.

Signup and view all the flashcards

Aggregated Data

Data already summarized (e.g., averages, sums).

Signup and view all the flashcards

Raw Data

Data in its original, unprocessed form.

Signup and view all the flashcards

Why prefer raw data?

Analyst to have flexibility to process data as they see fit.

Signup and view all the flashcards

Analyst prefers raw data when?

When the analyst wants full control over data processing

Signup and view all the flashcards

Semi-structured Data

Data that contains elements of both structured and unstructured data.

Signup and view all the flashcards

Database of Customer Orders

A collection of customer purchases, preferences, and details, stored in an organized manner.

Signup and view all the flashcards

Blogs

Informal written records of personal thoughts, opinions, and experiences.

Signup and view all the flashcards

Tweets

Short-form social media posts expressing opinions or sharing experiences.

Signup and view all the flashcards

Pictures

Visual representations captured by cameras or other imaging devices.

Signup and view all the flashcards

Categorical Data

Data that can be sorted into categories.

Signup and view all the flashcards

Ordinal Data

Categorical data implying rank and order.

Signup and view all the flashcards

Ratio Data

Numerical data enabling multiplication due to equal ratios and an absolute zero point.

Signup and view all the flashcards

Summarizing Ordinal Data

Summarizing ordinal data involves counting, grouping, proportions, and ranking to understand the distribution and order of categories.

Signup and view all the flashcards

Study Notes

  • Chapter 2 focuses on how to obtain data for business analytics, including various internal and external sources, data formats, types of structured data, and ethical considerations.
  • The SOAR Analytics Model includes Specifying the Question, Obtaining the Data, Analyzing the Data, and Reporting the Results.

Internal and External Data Sources

  • Internal data sources come from within the organization, while external data sources come from outside the organization.
  • Enterprise Systems (ERP) are interconnected information systems with a centralized database, enabling data sharing across departments; examples include SAP, Oracle, and Workday.
  • Relational databases store data efficiently in tables with columns (fields) and rows (records).
  • Tables are organized into fields and rows, where fields are columns containing descriptive information, and records are rows representing unique instances.
  • Common fields, Primary Keys, and Foreign Keys connect individual tables in a relational database.
  • Primary Key uniquely identifies each table (e.g., TransactionID in a Transaction Table).
  • Foreign Key exists to create relationships or links between two tables.
  • External data sources include social media data, census data, Small Business Administration data, publicly available financial statements, and stock price data.

The Four V's of Big Data

  • Big data is defined by Volume, Velocity, Variety, and Veracity.
  • Volume refers to the amount of data.
  • Velocity refers to the speed of generation and rate of analysis.
  • Variety refers to the different types of data, and Veracity refers to the trustworthiness of the data.

Variety of Data

  • Structured data is highly organized and fits neatly in a table or database.
  • Unstructured data lacks a predefined organization.
  • Semi-structured data contains elements of both structured and unstructured data.

Obtaining the Data

  • Data can be text (tweets, hashtags, documents) or tabular (structured in rows and columns).
  • Tabular data is structured into rows and columns.
  • Data is generally delivered as comma-separated files with a ".csv" extension.
  • .csv files store data as text but can be converted to a tabular format.
  • .csv files don't have the same row and column limits as Excel, which allows them to hold Big Data that exceeds Excel's size limit.

Level of Aggregation

  • You can receive aggregated or raw data.
  • Aggregated data is already processed and transformed.
  • Raw data provides the analyst with flexibility to process data as needed.

Structured Data Types

  • Structured data can be categorical or numerical.
  • Categorical data is represented by words and is used to categorize items, such as gender or transaction type.
  • Nominal data is categorical data that cannot be ranked.
  • Ordinal data allows for ranking and sorting.
  • Numerical data uses meaningful numbers.

Numerical Data

  • In interval data, there is an equal interval between each observation.
  • Ratio data has an equal and definitive ratio between each data point, with an absolute zero.

Preparing Data for Analysis

  • The data must be validated for completeness and integrity.
  • The data must be cleansed. Use trim and clean functions to prepare the data.
  • Preliminary exploratory Analysis is required.

Preparing Data and Ensuring Quality

  • Data quality should be ensured by validating data types, completeness, and consistency.
  • Data completeness is the degree to which all required data is present.
  • Data integrity refers to the accuracy and consistency of data over its lifecycle.
  • Data cleansing involves removing headings, subtotals, leading zeroes, and formatting inconsistencies.
  • Missing values can be left as they are, removed, or replaced with imputed values.

Tools for Data Prep

  • Common software tools for data preparation and analysis include Microsoft Excel, Tableau, Microsoft Power BI, Microsoft Power Query, Tableau Prep, Alteryx, Open-source Tools (with Tableau and Excel), Gretl, R, and Python.

Ethical Data Handling

  • Data ethics involves the moral responsibility for gathering, using, and protecting personally identifiable information.
  • Companies gathering data must send privacy notices and offer opt-out options to individuals.
  • Safeguards are a necessity to protect sensitive data, and effective practices mitigate misuse risks.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Use Quizgecko on...
Browser
Browser