Podcast
Questions and Answers
What is the estimated global volume of data by 2025?
What is the estimated global volume of data by 2025?
Which characteristic of big data refers to the types of data, such as structured and unstructured?
Which characteristic of big data refers to the types of data, such as structured and unstructured?
In a Hadoop Distributed File System (HDFS), how is data typically divided for storage?
In a Hadoop Distributed File System (HDFS), how is data typically divided for storage?
What percentage of global data is estimated to be stored in relational databases?
What percentage of global data is estimated to be stored in relational databases?
Signup and view all the answers
What is the primary purpose of a data lake?
What is the primary purpose of a data lake?
Signup and view all the answers
What type of analysis is used to summarize and describe a dataset, such as unemployment by age range?
What type of analysis is used to summarize and describe a dataset, such as unemployment by age range?
Signup and view all the answers
Which analysis method would be most appropriate to examine changes in unemployment over the last year, selected by month?
Which analysis method would be most appropriate to examine changes in unemployment over the last year, selected by month?
Signup and view all the answers
Which of the following is NOT a type of analysis facilitated by statistical data sources?
Which of the following is NOT a type of analysis facilitated by statistical data sources?
Signup and view all the answers
What kind of statistical data does the INE provide regarding the labor market?
What kind of statistical data does the INE provide regarding the labor market?
Signup and view all the answers
Which aspect of society does the INE provide statistics on?
Which aspect of society does the INE provide statistics on?
Signup and view all the answers
Which ministry provides a broad range of financial data and statistics, according to the content?
Which ministry provides a broad range of financial data and statistics, according to the content?
Signup and view all the answers
Which type of analysis would you conduct to compare unemployment rates between different regions?
Which type of analysis would you conduct to compare unemployment rates between different regions?
Signup and view all the answers
Which of the following statements best describes the purpose of a Database?
Which of the following statements best describes the purpose of a Database?
Signup and view all the answers
What significant development occurred in the 1970s related to database design?
What significant development occurred in the 1970s related to database design?
Signup and view all the answers
Which of the following best describes NoSQL databases?
Which of the following best describes NoSQL databases?
Signup and view all the answers
What is the purpose of the INNER JOIN in SQL?
What is the purpose of the INNER JOIN in SQL?
Signup and view all the answers
What was the impact of IBM's development of SQL in the 1980s?
What was the impact of IBM's development of SQL in the 1980s?
Signup and view all the answers
Which of the following best characterizes the evolution of database management technologies from the 1990s onwards?
Which of the following best characterizes the evolution of database management technologies from the 1990s onwards?
Signup and view all the answers
Which SQL command is used to delete an entire table?
Which SQL command is used to delete an entire table?
Signup and view all the answers
What will the SQL query 'SELECT MAX(column) FROM table_name;' return?
What will the SQL query 'SELECT MAX(column) FROM table_name;' return?
Signup and view all the answers
Which of the following is a limitation of traditional data management methods mentioned?
Which of the following is a limitation of traditional data management methods mentioned?
Signup and view all the answers
What is the result of executing a LEFT JOIN between table1 and table2?
What is the result of executing a LEFT JOIN between table1 and table2?
Signup and view all the answers
What concept allows for the analysis of data from multiple databases?
What concept allows for the analysis of data from multiple databases?
Signup and view all the answers
In an INSERT query, what must be true about the values provided?
In an INSERT query, what must be true about the values provided?
Signup and view all the answers
Which industry is mentioned as heavily relying on the advancement of database technologies?
Which industry is mentioned as heavily relying on the advancement of database technologies?
Signup and view all the answers
Which database solution is known for its compatibility with large volumes of data and often has an open-source version?
Which database solution is known for its compatibility with large volumes of data and often has an open-source version?
Signup and view all the answers
What type of SQL query would you use to add a new column to an existing table?
What type of SQL query would you use to add a new column to an existing table?
Signup and view all the answers
Which statement accurately describes the purpose of SQL operators?
Which statement accurately describes the purpose of SQL operators?
Signup and view all the answers
In the DELETE query syntax, what role does the WHERE clause play?
In the DELETE query syntax, what role does the WHERE clause play?
Signup and view all the answers
What does the SELECT * FROM table_name; command accomplish?
What does the SELECT * FROM table_name; command accomplish?
Signup and view all the answers
Study Notes
Big Data
- Data is crucial for decision-making in all areas of business
- By 2025, the global data volume is estimated to be 175 zettabytes (ZB)
- 1 ZB = 1 billion gigabytes
- 2010 data volume was just 2 ZB
- 2.5 million gigabytes of data are generated daily
- 90% of data generated in the last 2 years
5 Vs of Big Data
- Velocity: batch, near time, real-time, streams
- Variety: structured, unstructured, semi-structured, all the above
- Volume: terabytes, records, transactions, tables, files
- Veracity: trustworthiness, authenticity, origin, reputation, accountability
- Value: statistical, events, correlations, hypothetical
Data Sources
- Twitter (500,000 tweets per minute)
- Instagram (347,222 posts per minute)
- Internet of Things (IoT) – 75 million connected devices generating data
Storage of Generated Data
- Less than 20% of global data is stored in relational databases
- 80% of global data is unstructured (text, images, video)
- Big Data Architectures, cloud and NoSQL Databases are used for storage
Data Storage in HDFS
- Designed to handle large data volumes across multiple servers
- Data divided into small blocks (typically 128 MB or 256 MB), distributed across nodes
- Data is redundantly stored for protection against node failure
- Ideal for unstructured or semi-structured data
Data Lake
- Centralized repository for all types of data
- Raw data stored as it's generated, without transformation
- Ideal for long-term analysis when the analysis type isn't yet known
NoSQL Databases
- Store unstructured data (images, texts, audios)
- Data warehousing (DataLake: place where there are many databases to analyze data from different databases) and Data mining(data analysis) appear
- Suitable for storing and managing non-tabular data
Big Data/Cloud Databases
- Open-source databases (MySQL, PostgreSQL, Neo4j, MongoDB)
- Database products for very large volumes of data
- Data Lakes and Cloud based databases
Economic and Financial Data Sources
- Descriptive analysis: summarizing and describing a dataset (e.g., unemployment figures)
- Trend analysis: analyzing how data changes over time (e.g., unemployment trends)
- Comparative analysis: comparing data between groups or regions (e.g., unemployment comparisons between regions)
Economic and Financial Data Institutions
- INE: statistical data on economic, demographic, and social aspects of a country
- Ministry of Economy, Trade, and Enterprise: financial data and statistics (e.g., macroeconomic data, public finances, labor market, financial system, foreign trade)
Other Data Resources
- European Statistical Office (Eurostat) – high-quality statistics on Europe
- Spanish Government
- World Bank – free and open access to global development data
- International Monetary Fund – access to macroeconomic and financial data
- Madrid Stock Market
- Spanish Bank – Interest rate statistics
Introduction to Databases
- Databases are essential for efficient data management in various industries.
- Before databases, information was stored in paper, magnetic tapes, books and electronic files.
- These methods faced limitations in searching, retrieving data, handling large volumes and security.
Evolution of Databases
- Entity Relationship (ER) diagrams for database design.
- Oracle introduced first Relational Database Management Systems (RDBMS)
- SQL creation.
- RDBMS introduction and growth.
- Creation of more robust databases (like Microsoft SQL Server).
- Development of NoSQL databases and data warehousing systems.
SQL and its Importance
- SQL (Structured Query Language) is used to manage and manipulate relational databases.
- Provides commands for creating, reading, updating, and deleting data
- Essential tool for data analysis.
- Handles large volumes efficiently and is relatively easy to learn
Data Types
- INT - whole numbers
- FLOAT - floating-point numbers
- DOUBLE- double-precision floating-point numbers
- DECIMAL - fixed-point numbers with precision and scale
- VARCHAR, CHAR - variable/fixed-length text
- TEXT - large text amounts
- DATE, TIME, DATETIME, TIMESTAMP - date and time values
- BLOB - binary large object
Main SQL Structures
- Creating Tables: CREATE TABLE
- Selecting Records: SELECT
- Inserting Data: INSERT INTO
- Updating Data: UPDATE
- Deleting Data: DELETE FROM
- Joining Tables: JOIN (INNER, LEFT OUTER, RIGHT OUTER)
Using Subqueries
- Subqueries are queries within other SQL queries.
- Used to break down complex queries into simpler parts
- Can be used in SELECT, FROM, WHERE, HAVING clauses
Using Microsoft Excel for Data Analysis
- Excel can function as a flat-file database.
- Useful for smaller applications and quick analysis
- Creating relationships between tables for data analysis and complex queries is possible.
- Excel does not enforce data integrity constraints
NoSQL Databases
- Non-relational databases (NoSQL) are used for storing and retrieving non-tabular data (e.g., documents, keys, values)
- Horizontally scalable to handle large volumes of data
- Flexible schema (do not require predefined table structures)
- Commonly used in scenarios requiring scalability, flexibility or fast data access
Graph Databases
- They represent data as nodes and connections (edges).
- Designed for managing relationships effectively.
- Suitable for social networks, recommendations, and fraud detection, etc.
PowerBI
- PowerBI is a business analytics service for creating reports and dashboards.
- It's suitable for exploring and visualizing database data, rather than for core storage
Relational Databases
- An organized database model, with tables, columns and rows.
- Uses relationships between tables, represented as primary and foreign keys, for data linking
- Essential for ensuring data integrity and efficiently retrieving related data
Database Design
- Defining the structure, storage, and retrieval mechanisms of data in a database system.
- Blueprint for storing, accessing, and managing data in the database.
- Includes steps like scheme definition, normalization and physical implementation.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on big data concepts, including storage methods, types of analysis, and data statistics. This quiz covers topics like Hadoop, data lakes, and global data trends to challenge your understanding of current data practices.