Big Data Overview
30 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the estimated global data volume by 2025?

  • 175 zettabytes (correct)
  • 250 zettabytes
  • 50 zettabytes
  • 100 zettabytes
  • Which of the following best describes 'Velocity' in the context of Big Data?

  • The reliability of data sources
  • The total amount of structured data
  • The speed at which data is generated and processed (correct)
  • The different formats of data
  • What percentage of global data is estimated to be unstructured?

  • 30%
  • Less than 10%
  • 80% (correct)
  • 50%
  • Which of the following statements about Data Lakes is accurate?

    <p>They store raw data in a centralized repository.</p> Signup and view all the answers

    What is the purpose of the Hadoop Distributed File System (HDFS)?

    <p>To handle large volumes of data across multiple servers</p> Signup and view all the answers

    What is the primary function of a Database Management System (DBMS)?

    <p>To act as an interface between users and the database</p> Signup and view all the answers

    Which level of database design is considered the lowest level of abstraction?

    <p>Physical Level</p> Signup and view all the answers

    Which feature of a DBMS helps maintain data accuracy and consistency?

    <p>Constraints and validation rules</p> Signup and view all the answers

    What is one of the significant advantages of using a DBMS over a list stored in a spreadsheet?

    <p>Enhanced querying capabilities</p> Signup and view all the answers

    Which of the following is NOT a function of a database management tool?

    <p>Data destruction</p> Signup and view all the answers

    What role does indexing play in the performance optimization of a database?

    <p>It enables faster data retrieval</p> Signup and view all the answers

    In the context of database design, what is included in the Logical Level?

    <p>Detailed tables, columns, and relationships</p> Signup and view all the answers

    What advantage does the INDEX and MATCH combination have over VLOOKUP?

    <p>It provides more flexibility in data retrieval.</p> Signup and view all the answers

    Which of the following is NOT a reason to use NoSQL databases?

    <p>Requirement for strict schema enforcement.</p> Signup and view all the answers

    In what scenario would a document store database like MongoDB be particularly advantageous?

    <p>When handling varied data structures in content management.</p> Signup and view all the answers

    What is a key feature of NoSQL databases?

    <p>They can handle structured, semi-structured, and unstructured data.</p> Signup and view all the answers

    Which of the following statements about key-value stores is incorrect?

    <p>They store data in a tabular format.</p> Signup and view all the answers

    Why are NoSQL databases preferred for Big Data applications?

    <p>They allow for dynamic schema design.</p> Signup and view all the answers

    What is a use case for Redis in an online shopping platform?

    <p>Managing user sessions.</p> Signup and view all the answers

    Which of the following correctly describes the flexibility of document stores like MongoDB?

    <p>Documents can contain different fields and structures.</p> Signup and view all the answers

    What is a significant limitation of using INDEX and MATCH functions?

    <p>They require frequent manual updates when the data structure changes.</p> Signup and view all the answers

    What is a limitation of using relationships in Excel?

    <p>Manual attention to data consistency is necessary.</p> Signup and view all the answers

    Which of the following best defines 2NF in database normalization?

    <p>Separating entities in different tables while using primary keys and foreign keys.</p> Signup and view all the answers

    Which tool can help highlight potential errors in Excel data?

    <p>Conditional Formatting.</p> Signup and view all the answers

    In the context of Excel, which statement is true regarding unique IDs?

    <p>They are necessary to avoid duplicates in a single table.</p> Signup and view all the answers

    What does 1NF refer to in database normalization?

    <p>Separating atomic values into distinct fields.</p> Signup and view all the answers

    Which Excel function can simulate basic JOIN operations found in SQL?

    <p>VLOOKUP.</p> Signup and view all the answers

    Which of the following is a feature that does NOT help maintain data quality in Excel?

    <p>Creating duplicate records.</p> Signup and view all the answers

    What is a characteristic of a many-to-many relationship in Excel?

    <p>It usually involves bridge tables as a workaround.</p> Signup and view all the answers

    What do validation rules in Excel accomplish?

    <p>They restrict data input to specified criteria.</p> Signup and view all the answers

    Study Notes

    Big Data

    • Data is crucial for decision-making across all business areas.
    • Global data volume is projected to reach 175 zettabytes (ZB) by 2025 (1 ZB = 1 billion gigabytes).
    • 90% of data has been generated in the last two years.
    • 2.5 million gigabytes of data are generated daily.

    The 5 Vs of Big Data

    • Velocity: Batch, near real-time, real-time, and streaming data.
    • Variety: Structured, unstructured, and semi-structured data.
    • Volume: Large amounts of data (terabytes, records, transactions, etc.).
    • Veracity: Trustworthiness, authenticity, origin, reputation, and accountability of data.
    • Value: Statistical analysis, and correlations.

    Data Sources

    • Facebook, Twitter, Instagram
    • IoT devices (75 million connected devices generate data).
    • Less than 20% of global data resides in relational databases.
    • 80% of data is unstructured (text, images, videos).
    • Big data is stored in big data architectures, cloud, and NoSQL databases.

    Big data storage

    • Different technologies needed for storing, processing, and analyzing large volumes of data that cannot be handled by traditional databases.
    • Hadoop Distributed File System (HDFS) used to handle large data volumes across multiple servers by dividing data into small blocks and distributing them across different nodes (servers) for high redundancy.
    • Data Lakes: a centralized storage for a variety of data (structured, semi-structured and unstructured) is raw and stored as it's generated for long-term analysis.
    • NoSQL databases are used for storing unstructured data.

    Economic and Financial Data Sources

    • Diverse sources for economic and financial data analysis.
    • Descriptive analysis → summarize and describe a dataset (employment rate in Spain by age group).
    • Trend analysis → analyze how data changes with time (Changes in employment rate in different regions of Spain in the last year).
    • Comparative analysis → analyze differences between regions, groups, or variables (changes in unemployment rates by month in different regions).
    • INE (Spanish National Statistics Institute) provides a wide range of data on the country's economy, demographics, and social aspects.
    • Ministry of Economy, Trade, and Enterprise offers various financial data and statistics.

    Relational Databases

    • Store data related to one another. Data is structured in tables with rows and columns.
    • Tables define an entity and each row a record.
    • Relationships are stored using keys (primary and foreign) to link related data in different tables.
    • Ensure data integrity and consistency.
    • Efficient data retrieval.
    • Reduce data redundancy.

    Introduction to SQL

    • Structured Query Language (SQL) is used for managing and manipulating relational databases.
    • Provides powerful commands for creating, reading, updating, and deleting data.
    • Widely used for data analysis, report generation, and query operations.

    Microsoft Excel

    • A tool that can function as a flat-file database.
    • Stores data in a single table or sheet.
    • Useful for small data analysis, rapid prototyping, simple queries and explorations of datasets.
    • Data model for relationships between tables.
    • Important consideration for data integrity.

    NoSQL Databases

    • Non-relational databases.
    • Flexible schemas (schema-less).
    • Used to handle unstructured or semi-structured data.
    • Useful when dealing with large volumes of data or highly flexible schemas.
    • Examples: MongoDB, Redis, Neo4j

    PowerBI as a Database Tool

    • Focuses on data analysis and visualization, not data storage.
    • Connects data from various sources including SQL or Excel.
    • Tool to create custom applications for data analysis and reporting.
    • Tool for interactive visualization, reporting, and business intelligence with ease of use.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Big Data Concepts PDF

    Description

    Explore the crucial aspects of big data, including its significance in decision-making and the 5 Vs that define its characteristics. Discover the sources and types of data generated in today's digital landscape and the implications for businesses as they navigate vast amounts of information.

    Use Quizgecko on...
    Browser
    Browser