MIS: Data Management in Operational Systems
23 Questions
4 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What percentage of critical data in Fortune 1000 companies is likely to be flawed?

  • Over 75%
  • Over 25% (correct)
  • Over 10%
  • Over 50%
  • Data cleansing is a one-time process.

    False

    What is the main objective of data cleansing?

    To weed out and fix or discard inconsistent, incorrect, or incomplete data

    Data cleansing tools and procedures are used to analyze, standardize, correct, match, and ______________ data.

    <p>consolidate</p> Signup and view all the answers

    When does data cleansing occur during the ETL process?

    <p>During the Extract phase</p> Signup and view all the answers

    Data quality is essential for effective decision-making.

    <p>True</p> Signup and view all the answers

    Match the following data cleansing process steps with their descriptions:

    <p>ETL Process = Extract, Transform, Load process Data Cleansing = Process of weeding out and fixing or discarding inconsistent, incorrect, or incomplete data Data Sources = Data warehouses often contain data from several databases, including external sources Outcome = Ideally, scrubbed data is accurate and consistent</p> Signup and view all the answers

    What is the outcome of the data cleansing process?

    <p>Ideally, scrubbed data is accurate and consistent</p> Signup and view all the answers

    What is the primary usage of a data lake?

    <p>Analytics applications</p> Signup and view all the answers

    Data lakes can only store relational data.

    <p>False</p> Signup and view all the answers

    What is the estimated cost of low-quality data to U.S. businesses annually?

    <p>$600 billion</p> Signup and view all the answers

    Data lakes are often associated with __________________ storage.

    <p>Hadoop</p> Signup and view all the answers

    What is a consequence of low-quality data?

    <p>Affects decision-making, especially in advertising strategies</p> Signup and view all the answers

    Match the following terms with their descriptions:

    <p>Dirty Data = Erroneous or flawed data Data Quality = Impact of Low-Quality Data on businesses Data Lake = Stores non-relational data for analytics Hadoop = Platform for processing and storing non-relational data</p> Signup and view all the answers

    Complete removal of dirty data is always possible.

    <p>False</p> Signup and view all the answers

    What is the primary purpose of a data lake in terms of data querying?

    <p>To provide a smaller dataset for analysis to help answer a business question</p> Signup and view all the answers

    What is the primary goal of data quality audits?

    <p>To determine the accuracy and completeness of data</p> Signup and view all the answers

    Achieving perfect data is possible with unlimited resources.

    <p>False</p> Signup and view all the answers

    What is the purpose of regular data cleansing processes?

    <p>To analyze, standardize, correct, match, and consolidate data</p> Signup and view all the answers

    Companies may trade _______________ for completeness in terms of data quality.

    <p>accuracy</p> Signup and view all the answers

    Match the following data quality characteristics with their definitions:

    <p>Accuracy = Data is correct Completeness = Data has no blanks Data Quality = The degree to which data is accurate and complete</p> Signup and view all the answers

    Low-quality data has no impact on decision-making processes.

    <p>False</p> Signup and view all the answers

    What is the purpose of standardized software tools in data quality management?

    <p>To analyze, standardize, correct, match, and consolidate data</p> Signup and view all the answers

    Study Notes

    Contact Data in Operational Systems

    • Standardizing a customer's name in operational systems is crucial.

    Data Cleaning

    • Data cleaning involves weeding out and fixing or discarding inconsistent, incorrect, or incomplete data.
    • Specialized software tools are used for analyzing, standardizing, correcting, matching, and consolidating data.

    The Challenge of Perfect Data

    • Achieving perfect data is almost impossible due to the trade-offs in data quality.
    • Companies may prioritize accuracy over completeness, or vice versa.
    • Examples: a birth date of 2/31/25 is complete but inaccurate, while an address with "Denver, Colorado" without a zip code is accurate but incomplete.

    Data Quality Audits

    • Companies perform data quality audits to determine the accuracy and completeness of data.
    • Most organizations set acceptable thresholds to balance quality and cost.
    • Example: achieving 85% accuracy and 65% completeness for making good decisions at a reasonable cost.

    Impact on Decision Making

    • Low-quality data can significantly affect decision-making processes.
    • Businesses must formulate strategies to maintain clean and high-quality data.

    Maintaining Data Quality

    • Regular audits and cleansing processes are essential.
    • Specialized software tools help analyze, standardize, correct, match, and consolidate data.
    • Ensuring data quality across multiple databases and systems, both internal and external, is crucial.

    Dirty Data Problems

    • Over 25% of critical data in Fortune 1000 companies will continue to be flawed (Gartner Inc.).
    • Data may be inaccurate, incomplete, or duplicated.

    The Problem of Dirty Data

    • Dirty data is essential for maintaining quality data in data warehouses or data marts.
    • It increases the effectiveness of decision-making.

    Data Cleansing Process

    • Data cleansing occurs first during the ETL (Extract, Transform, Load) process.
    • It occurs again once the data is in the data warehouse.
    • Ideally, scrubbed data is accurate and consistent.

    Functionality of Data Lakes

    • Data lakes can be queried for all relevant data when a business question arises.
    • They provide a smaller dataset for analysis to help answer the question.
    • Data lakes are often associated with Hadoop storage.

    Hadoop Data Lakes

    • Hadoop data lakes comprise one or more Hadoop clusters.
    • They process and store non-relational data (e.g., log files, clickstream records, sensor data, images, social media posts).
    • Hadoop data lakes support analytics applications, not transaction processing.

    Data Lakes vs. Data Warehouses

    • Data lakes are different from data warehouses in terms of their functionality and storage.

    Importance of Data Quality

    • Low-quality data costs U.S. businesses $600 billion annually (The Data Warehousing Institute).
    • It affects decision-making, especially in advertising strategies.
    • Dirty data is erroneous or flawed data that cannot be completely removed.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    This quiz covers the importance of standardizing customer names and data cleaning activities in operational systems, highlighting the challenges of achieving perfect data.

    More Like This

    Introduction to Information Systems
    10 questions

    Introduction to Information Systems

    SelfSatisfactionVeena6892 avatar
    SelfSatisfactionVeena6892
    Information Systems in Business
    40 questions
    Sistemas de Información Empresariales
    37 questions
    Use Quizgecko on...
    Browser
    Browser