Podcast
Questions and Answers
What is the estimated global data volume by 2025?
What is the estimated global data volume by 2025?
- 175 zettabytes (correct)
- 250 zettabytes
- 50 zettabytes
- 100 zettabytes
Which of the following best describes 'Velocity' in the context of Big Data?
Which of the following best describes 'Velocity' in the context of Big Data?
- The reliability of data sources
- The total amount of structured data
- The speed at which data is generated and processed (correct)
- The different formats of data
What percentage of global data is estimated to be unstructured?
What percentage of global data is estimated to be unstructured?
- 30%
- Less than 10%
- 80% (correct)
- 50%
Which of the following statements about Data Lakes is accurate?
Which of the following statements about Data Lakes is accurate?
What is the purpose of the Hadoop Distributed File System (HDFS)?
What is the purpose of the Hadoop Distributed File System (HDFS)?
What is the primary function of a Database Management System (DBMS)?
What is the primary function of a Database Management System (DBMS)?
Which level of database design is considered the lowest level of abstraction?
Which level of database design is considered the lowest level of abstraction?
Which feature of a DBMS helps maintain data accuracy and consistency?
Which feature of a DBMS helps maintain data accuracy and consistency?
What is one of the significant advantages of using a DBMS over a list stored in a spreadsheet?
What is one of the significant advantages of using a DBMS over a list stored in a spreadsheet?
Which of the following is NOT a function of a database management tool?
Which of the following is NOT a function of a database management tool?
What role does indexing play in the performance optimization of a database?
What role does indexing play in the performance optimization of a database?
In the context of database design, what is included in the Logical Level?
In the context of database design, what is included in the Logical Level?
What advantage does the INDEX and MATCH combination have over VLOOKUP?
What advantage does the INDEX and MATCH combination have over VLOOKUP?
Which of the following is NOT a reason to use NoSQL databases?
Which of the following is NOT a reason to use NoSQL databases?
In what scenario would a document store database like MongoDB be particularly advantageous?
In what scenario would a document store database like MongoDB be particularly advantageous?
What is a key feature of NoSQL databases?
What is a key feature of NoSQL databases?
Which of the following statements about key-value stores is incorrect?
Which of the following statements about key-value stores is incorrect?
Why are NoSQL databases preferred for Big Data applications?
Why are NoSQL databases preferred for Big Data applications?
What is a use case for Redis in an online shopping platform?
What is a use case for Redis in an online shopping platform?
Which of the following correctly describes the flexibility of document stores like MongoDB?
Which of the following correctly describes the flexibility of document stores like MongoDB?
What is a significant limitation of using INDEX and MATCH functions?
What is a significant limitation of using INDEX and MATCH functions?
What is a limitation of using relationships in Excel?
What is a limitation of using relationships in Excel?
Which of the following best defines 2NF in database normalization?
Which of the following best defines 2NF in database normalization?
Which tool can help highlight potential errors in Excel data?
Which tool can help highlight potential errors in Excel data?
In the context of Excel, which statement is true regarding unique IDs?
In the context of Excel, which statement is true regarding unique IDs?
What does 1NF refer to in database normalization?
What does 1NF refer to in database normalization?
Which Excel function can simulate basic JOIN operations found in SQL?
Which Excel function can simulate basic JOIN operations found in SQL?
Which of the following is a feature that does NOT help maintain data quality in Excel?
Which of the following is a feature that does NOT help maintain data quality in Excel?
What is a characteristic of a many-to-many relationship in Excel?
What is a characteristic of a many-to-many relationship in Excel?
What do validation rules in Excel accomplish?
What do validation rules in Excel accomplish?
Flashcards are hidden until you start studying
Study Notes
Big Data
- Data is crucial for decision-making across all business areas.
- Global data volume is projected to reach 175 zettabytes (ZB) by 2025 (1 ZB = 1 billion gigabytes).
- 90% of data has been generated in the last two years.
- 2.5 million gigabytes of data are generated daily.
The 5 Vs of Big Data
- Velocity: Batch, near real-time, real-time, and streaming data.
- Variety: Structured, unstructured, and semi-structured data.
- Volume: Large amounts of data (terabytes, records, transactions, etc.).
- Veracity: Trustworthiness, authenticity, origin, reputation, and accountability of data.
- Value: Statistical analysis, and correlations.
Data Sources
- Facebook, Twitter, Instagram
- IoT devices (75 million connected devices generate data).
- Less than 20% of global data resides in relational databases.
- 80% of data is unstructured (text, images, videos).
- Big data is stored in big data architectures, cloud, and NoSQL databases.
Big data storage
- Different technologies needed for storing, processing, and analyzing large volumes of data that cannot be handled by traditional databases.
- Hadoop Distributed File System (HDFS) used to handle large data volumes across multiple servers by dividing data into small blocks and distributing them across different nodes (servers) for high redundancy.
- Data Lakes: a centralized storage for a variety of data (structured, semi-structured and unstructured) is raw and stored as it's generated for long-term analysis.
- NoSQL databases are used for storing unstructured data.
Economic and Financial Data Sources
- Diverse sources for economic and financial data analysis.
- Descriptive analysis → summarize and describe a dataset (employment rate in Spain by age group).
- Trend analysis → analyze how data changes with time (Changes in employment rate in different regions of Spain in the last year).
- Comparative analysis → analyze differences between regions, groups, or variables (changes in unemployment rates by month in different regions).
- INE (Spanish National Statistics Institute) provides a wide range of data on the country's economy, demographics, and social aspects.
- Ministry of Economy, Trade, and Enterprise offers various financial data and statistics.
Relational Databases
- Store data related to one another. Data is structured in tables with rows and columns.
- Tables define an entity and each row a record.
- Relationships are stored using keys (primary and foreign) to link related data in different tables.
- Ensure data integrity and consistency.
- Efficient data retrieval.
- Reduce data redundancy.
Introduction to SQL
- Structured Query Language (SQL) is used for managing and manipulating relational databases.
- Provides powerful commands for creating, reading, updating, and deleting data.
- Widely used for data analysis, report generation, and query operations.
Microsoft Excel
- A tool that can function as a flat-file database.
- Stores data in a single table or sheet.
- Useful for small data analysis, rapid prototyping, simple queries and explorations of datasets.
- Data model for relationships between tables.
- Important consideration for data integrity.
NoSQL Databases
- Non-relational databases.
- Flexible schemas (schema-less).
- Used to handle unstructured or semi-structured data.
- Useful when dealing with large volumes of data or highly flexible schemas.
- Examples: MongoDB, Redis, Neo4j
PowerBI as a Database Tool
- Focuses on data analysis and visualization, not data storage.
- Connects data from various sources including SQL or Excel.
- Tool to create custom applications for data analysis and reporting.
- Tool for interactive visualization, reporting, and business intelligence with ease of use.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore the crucial aspects of big data, including its significance in decision-making and the 5 Vs that define its characteristics. Discover the sources and types of data generated in today's digital landscape and the implications for businesses as they navigate vast amounts of information.