Big Data Analysis and Statistics Quiz
30 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the estimated global volume of data by 2025?

  • 175 zettabytes (correct)
  • 150 zettabytes
  • 100 zettabytes
  • 200 zettabytes
  • Which characteristic of big data refers to the types of data, such as structured and unstructured?

  • Volume
  • Velocity
  • Veracity
  • Variety (correct)
  • In a Hadoop Distributed File System (HDFS), how is data typically divided for storage?

  • Into small blocks of 128 MB or 256 MB (correct)
  • Into files of 64 MB
  • Into packets of 512 MB
  • Into large chunks of 1 GB
  • What percentage of global data is estimated to be stored in relational databases?

    <p>Less than 20%</p> Signup and view all the answers

    What is the primary purpose of a data lake?

    <p>To store flat files of all types of raw data</p> Signup and view all the answers

    What type of analysis is used to summarize and describe a dataset, such as unemployment by age range?

    <p>Descriptive analysis</p> Signup and view all the answers

    Which analysis method would be most appropriate to examine changes in unemployment over the last year, selected by month?

    <p>Trend analysis</p> Signup and view all the answers

    Which of the following is NOT a type of analysis facilitated by statistical data sources?

    <p>Normative analysis</p> Signup and view all the answers

    What kind of statistical data does the INE provide regarding the labor market?

    <p>Active population survey data</p> Signup and view all the answers

    Which aspect of society does the INE provide statistics on?

    <p>Living conditions survey</p> Signup and view all the answers

    Which ministry provides a broad range of financial data and statistics, according to the content?

    <p>Ministry of Economy, Trade and Enterprise</p> Signup and view all the answers

    Which type of analysis would you conduct to compare unemployment rates between different regions?

    <p>Comparative analysis</p> Signup and view all the answers

    Which of the following statements best describes the purpose of a Database?

    <p>To enable efficient access, management, and updates of interrelated data.</p> Signup and view all the answers

    What significant development occurred in the 1970s related to database design?

    <p>The introduction of the Entity-Relationship model as a standard.</p> Signup and view all the answers

    Which of the following best describes NoSQL databases?

    <p>Databases designed to handle unstructured data such as images and text.</p> Signup and view all the answers

    What is the purpose of the INNER JOIN in SQL?

    <p>To return only rows with matching values from both tables.</p> Signup and view all the answers

    What was the impact of IBM's development of SQL in the 1980s?

    <p>It established SQL as the standard language for database management.</p> Signup and view all the answers

    Which of the following best characterizes the evolution of database management technologies from the 1990s onwards?

    <p>The ability to manage large volumes of data alongside cloud solutions.</p> Signup and view all the answers

    Which SQL command is used to delete an entire table?

    <p>DROP TABLE table_name;</p> Signup and view all the answers

    What will the SQL query 'SELECT MAX(column) FROM table_name;' return?

    <p>The highest value in the specified column.</p> Signup and view all the answers

    Which of the following is a limitation of traditional data management methods mentioned?

    <p>Difficulty in searching and retrieving information quickly.</p> Signup and view all the answers

    What is the result of executing a LEFT JOIN between table1 and table2?

    <p>All records from table1 and matched rows from table2, with NULLs for non-matches.</p> Signup and view all the answers

    What concept allows for the analysis of data from multiple databases?

    <p>Data Lake.</p> Signup and view all the answers

    In an INSERT query, what must be true about the values provided?

    <p>They must correspond to the columns in the specified order.</p> Signup and view all the answers

    Which industry is mentioned as heavily relying on the advancement of database technologies?

    <p>Banking and Financial Services.</p> Signup and view all the answers

    Which database solution is known for its compatibility with large volumes of data and often has an open-source version?

    <p>Open source databases like MySQL.</p> Signup and view all the answers

    What type of SQL query would you use to add a new column to an existing table?

    <p>ALTER TABLE table_name ADD column_name datatype;</p> Signup and view all the answers

    Which statement accurately describes the purpose of SQL operators?

    <p>They manipulate and retrieve data through various operations.</p> Signup and view all the answers

    In the DELETE query syntax, what role does the WHERE clause play?

    <p>It identifies the exact records to delete from the table.</p> Signup and view all the answers

    What does the SELECT * FROM table_name; command accomplish?

    <p>It fetches all available rows and columns from the specified table.</p> Signup and view all the answers

    Study Notes

    Big Data

    • Data is crucial for decision-making in all areas of business
    • By 2025, the global data volume is estimated to be 175 zettabytes (ZB)
    • 1 ZB = 1 billion gigabytes
    • 2010 data volume was just 2 ZB
    • 2.5 million gigabytes of data are generated daily
    • 90% of data generated in the last 2 years

    5 Vs of Big Data

    • Velocity: batch, near time, real-time, streams
    • Variety: structured, unstructured, semi-structured, all the above
    • Volume: terabytes, records, transactions, tables, files
    • Veracity: trustworthiness, authenticity, origin, reputation, accountability
    • Value: statistical, events, correlations, hypothetical

    Data Sources

    • Facebook
    • Twitter (500,000 tweets per minute)
    • Instagram (347,222 posts per minute)
    • Internet of Things (IoT) – 75 million connected devices generating data

    Storage of Generated Data

    • Less than 20% of global data is stored in relational databases
    • 80% of global data is unstructured (text, images, video)
    • Big Data Architectures, cloud and NoSQL Databases are used for storage

    Data Storage in HDFS

    • Designed to handle large data volumes across multiple servers
    • Data divided into small blocks (typically 128 MB or 256 MB), distributed across nodes
    • Data is redundantly stored for protection against node failure
    • Ideal for unstructured or semi-structured data

    Data Lake

    • Centralized repository for all types of data
    • Raw data stored as it's generated, without transformation
    • Ideal for long-term analysis when the analysis type isn't yet known

    NoSQL Databases

    • Store unstructured data (images, texts, audios)
    • Data warehousing (DataLake: place where there are many databases to analyze data from different databases) and Data mining(data analysis) appear
    • Suitable for storing and managing non-tabular data

    Big Data/Cloud Databases

    • Open-source databases (MySQL, PostgreSQL, Neo4j, MongoDB)
    • Database products for very large volumes of data
    • Data Lakes and Cloud based databases

    Economic and Financial Data Sources

    • Descriptive analysis: summarizing and describing a dataset (e.g., unemployment figures)
    • Trend analysis: analyzing how data changes over time (e.g., unemployment trends)
    • Comparative analysis: comparing data between groups or regions (e.g., unemployment comparisons between regions)

    Economic and Financial Data Institutions

    • INE: statistical data on economic, demographic, and social aspects of a country
    • Ministry of Economy, Trade, and Enterprise: financial data and statistics (e.g., macroeconomic data, public finances, labor market, financial system, foreign trade)

    Other Data Resources

    • European Statistical Office (Eurostat) – high-quality statistics on Europe
    • Spanish Government
    • World Bank – free and open access to global development data
    • International Monetary Fund – access to macroeconomic and financial data
    • Madrid Stock Market
    • Spanish Bank – Interest rate statistics

    Introduction to Databases

    • Databases are essential for efficient data management in various industries.
    • Before databases, information was stored in paper, magnetic tapes, books and electronic files.
    • These methods faced limitations in searching, retrieving data, handling large volumes and security.

    Evolution of Databases

    • Entity Relationship (ER) diagrams for database design.
    • Oracle introduced first Relational Database Management Systems (RDBMS)
    • SQL creation.
    • RDBMS introduction and growth.
    • Creation of more robust databases (like Microsoft SQL Server).
    • Development of NoSQL databases and data warehousing systems.

    SQL and its Importance

    • SQL (Structured Query Language) is used to manage and manipulate relational databases.
    • Provides commands for creating, reading, updating, and deleting data
    • Essential tool for data analysis.
    • Handles large volumes efficiently and is relatively easy to learn

    Data Types

    • INT - whole numbers
    • FLOAT - floating-point numbers
    • DOUBLE- double-precision floating-point numbers
    • DECIMAL - fixed-point numbers with precision and scale
    • VARCHAR, CHAR - variable/fixed-length text
    • TEXT - large text amounts
    • DATE, TIME, DATETIME, TIMESTAMP - date and time values
    • BLOB - binary large object

    Main SQL Structures

    • Creating Tables: CREATE TABLE
    • Selecting Records: SELECT
    • Inserting Data: INSERT INTO
    • Updating Data: UPDATE
    • Deleting Data: DELETE FROM
    • Joining Tables: JOIN (INNER, LEFT OUTER, RIGHT OUTER)

    Using Subqueries

    • Subqueries are queries within other SQL queries.
    • Used to break down complex queries into simpler parts
    • Can be used in SELECT, FROM, WHERE, HAVING clauses

    Using Microsoft Excel for Data Analysis

    • Excel can function as a flat-file database.
    • Useful for smaller applications and quick analysis
    • Creating relationships between tables for data analysis and complex queries is possible.
    • Excel does not enforce data integrity constraints

    NoSQL Databases

    • Non-relational databases (NoSQL) are used for storing and retrieving non-tabular data (e.g., documents, keys, values)
    • Horizontally scalable to handle large volumes of data
    • Flexible schema (do not require predefined table structures)
    • Commonly used in scenarios requiring scalability, flexibility or fast data access

    Graph Databases

    • They represent data as nodes and connections (edges).
    • Designed for managing relationships effectively.
    • Suitable for social networks, recommendations, and fraud detection, etc.

    PowerBI

    • PowerBI is a business analytics service for creating reports and dashboards.
    • It's suitable for exploring and visualizing database data, rather than for core storage

    Relational Databases

    • An organized database model, with tables, columns and rows.
    • Uses relationships between tables, represented as primary and foreign keys, for data linking
    • Essential for ensuring data integrity and efficiently retrieving related data

    Database Design

    • Defining the structure, storage, and retrieval mechanisms of data in a database system.
    • Blueprint for storing, accessing, and managing data in the database.
    • Includes steps like scheme definition, normalization and physical implementation.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Big Data Concepts PDF

    Description

    Test your knowledge on big data concepts, including storage methods, types of analysis, and data statistics. This quiz covers topics like Hadoop, data lakes, and global data trends to challenge your understanding of current data practices.

    More Like This

    Use Quizgecko on...
    Browser
    Browser