MIS: Data Management in Operational Systems

AccommodativeTigerSEye avatar
AccommodativeTigerSEye
·
·
Download

Start Quiz

Study Flashcards

23 Questions

What percentage of critical data in Fortune 1000 companies is likely to be flawed?

Over 25%

Data cleansing is a one-time process.

False

What is the main objective of data cleansing?

To weed out and fix or discard inconsistent, incorrect, or incomplete data

Data cleansing tools and procedures are used to analyze, standardize, correct, match, and ______________ data.

consolidate

When does data cleansing occur during the ETL process?

During the Extract phase

Data quality is essential for effective decision-making.

True

Match the following data cleansing process steps with their descriptions:

ETL Process = Extract, Transform, Load process Data Cleansing = Process of weeding out and fixing or discarding inconsistent, incorrect, or incomplete data Data Sources = Data warehouses often contain data from several databases, including external sources Outcome = Ideally, scrubbed data is accurate and consistent

What is the outcome of the data cleansing process?

Ideally, scrubbed data is accurate and consistent

What is the primary usage of a data lake?

Analytics applications

Data lakes can only store relational data.

False

What is the estimated cost of low-quality data to U.S. businesses annually?

$600 billion

Data lakes are often associated with __________________ storage.

Hadoop

What is a consequence of low-quality data?

Affects decision-making, especially in advertising strategies

Match the following terms with their descriptions:

Dirty Data = Erroneous or flawed data Data Quality = Impact of Low-Quality Data on businesses Data Lake = Stores non-relational data for analytics Hadoop = Platform for processing and storing non-relational data

Complete removal of dirty data is always possible.

False

What is the primary purpose of a data lake in terms of data querying?

To provide a smaller dataset for analysis to help answer a business question

What is the primary goal of data quality audits?

To determine the accuracy and completeness of data

Achieving perfect data is possible with unlimited resources.

False

What is the purpose of regular data cleansing processes?

To analyze, standardize, correct, match, and consolidate data

Companies may trade _______________ for completeness in terms of data quality.

accuracy

Match the following data quality characteristics with their definitions:

Accuracy = Data is correct Completeness = Data has no blanks Data Quality = The degree to which data is accurate and complete

Low-quality data has no impact on decision-making processes.

False

What is the purpose of standardized software tools in data quality management?

To analyze, standardize, correct, match, and consolidate data

Study Notes

Contact Data in Operational Systems

  • Standardizing a customer's name in operational systems is crucial.

Data Cleaning

  • Data cleaning involves weeding out and fixing or discarding inconsistent, incorrect, or incomplete data.
  • Specialized software tools are used for analyzing, standardizing, correcting, matching, and consolidating data.

The Challenge of Perfect Data

  • Achieving perfect data is almost impossible due to the trade-offs in data quality.
  • Companies may prioritize accuracy over completeness, or vice versa.
  • Examples: a birth date of 2/31/25 is complete but inaccurate, while an address with "Denver, Colorado" without a zip code is accurate but incomplete.

Data Quality Audits

  • Companies perform data quality audits to determine the accuracy and completeness of data.
  • Most organizations set acceptable thresholds to balance quality and cost.
  • Example: achieving 85% accuracy and 65% completeness for making good decisions at a reasonable cost.

Impact on Decision Making

  • Low-quality data can significantly affect decision-making processes.
  • Businesses must formulate strategies to maintain clean and high-quality data.

Maintaining Data Quality

  • Regular audits and cleansing processes are essential.
  • Specialized software tools help analyze, standardize, correct, match, and consolidate data.
  • Ensuring data quality across multiple databases and systems, both internal and external, is crucial.

Dirty Data Problems

  • Over 25% of critical data in Fortune 1000 companies will continue to be flawed (Gartner Inc.).
  • Data may be inaccurate, incomplete, or duplicated.

The Problem of Dirty Data

  • Dirty data is essential for maintaining quality data in data warehouses or data marts.
  • It increases the effectiveness of decision-making.

Data Cleansing Process

  • Data cleansing occurs first during the ETL (Extract, Transform, Load) process.
  • It occurs again once the data is in the data warehouse.
  • Ideally, scrubbed data is accurate and consistent.

Functionality of Data Lakes

  • Data lakes can be queried for all relevant data when a business question arises.
  • They provide a smaller dataset for analysis to help answer the question.
  • Data lakes are often associated with Hadoop storage.

Hadoop Data Lakes

  • Hadoop data lakes comprise one or more Hadoop clusters.
  • They process and store non-relational data (e.g., log files, clickstream records, sensor data, images, social media posts).
  • Hadoop data lakes support analytics applications, not transaction processing.

Data Lakes vs. Data Warehouses

  • Data lakes are different from data warehouses in terms of their functionality and storage.

Importance of Data Quality

  • Low-quality data costs U.S. businesses $600 billion annually (The Data Warehousing Institute).
  • It affects decision-making, especially in advertising strategies.
  • Dirty data is erroneous or flawed data that cannot be completely removed.

This quiz covers the importance of standardizing customer names and data cleaning activities in operational systems, highlighting the challenges of achieving perfect data.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Use Quizgecko on...
Browser
Browser