Podcast
Questions and Answers
Which of the following best describes big data?
Which of the following best describes big data?
- Small datasets easily processed by traditional techniques
- A type of database management system
- Large datasets that cannot be adequately processed using traditional techniques (correct)
- A statistical tool used for data analysis
Big data only includes structured data, like that found in traditional databases.
Big data only includes structured data, like that found in traditional databases.
False (B)
What are the three 'V's initially used to define big data, as identified by Doug Laney?
What are the three 'V's initially used to define big data, as identified by Doug Laney?
Volume, velocity, and variety
The 'V' representing the speed at which data is generated and processed is known as ______.
The 'V' representing the speed at which data is generated and processed is known as ______.
Match the following big data 'V' characteristics with their descriptions:
Match the following big data 'V' characteristics with their descriptions:
Which of the following is an example of a 'Volume' aspect of big data?
Which of the following is an example of a 'Volume' aspect of big data?
Hadoop is a technology that has increased the burden of data storage.
Hadoop is a technology that has increased the burden of data storage.
What does the term 'Variety' refer to in the context of big data?
What does the term 'Variety' refer to in the context of big data?
The characteristic of big data that relates to the consistency and accuracy of the data is known as ______.
The characteristic of big data that relates to the consistency and accuracy of the data is known as ______.
What is the significance of 'Velocity' in the context of big data?
What is the significance of 'Velocity' in the context of big data?
The main purpose of big data analytics is to complicate decision-making processes for organizations.
The main purpose of big data analytics is to complicate decision-making processes for organizations.
Name one technology that is often associated with addressing the 'Volume' aspect of big data.
Name one technology that is often associated with addressing the 'Volume' aspect of big data.
The 5 V's of Big Data are Volume, Velocity, Variety, Veracity and ______.
The 5 V's of Big Data are Volume, Velocity, Variety, Veracity and ______.
Which of the following is an example of 'variety' in big data?
Which of the following is an example of 'variety' in big data?
According to the information, most companies in the U.S. store less than 1 Terabyte of data.
According to the information, most companies in the U.S. store less than 1 Terabyte of data.
What is the primary purpose of analyzing 'big data' for an organization?
What is the primary purpose of analyzing 'big data' for an organization?
RFID tags, sensors, and smart metering contribute significantly to the ______ aspect of big data.
RFID tags, sensors, and smart metering contribute significantly to the ______ aspect of big data.
Which of the following is NOT typically considered a category of 'Big Data'?
Which of the following is NOT typically considered a category of 'Big Data'?
Recalculating risk portfolios can take days, even with big data technologies.
Recalculating risk portfolios can take days, even with big data technologies.
Name one industry sector that utilizes big data technology.
Name one industry sector that utilizes big data technology.
Using analytics to identify how consumers feel about products is called ______ analysis.
Using analytics to identify how consumers feel about products is called ______ analysis.
Which of the following describes 'Black Box Data'?
Which of the following describes 'Black Box Data'?
Big data only helps in generating coupons and is not used to detect fraudulent behavior.
Big data only helps in generating coupons and is not used to detect fraudulent behavior.
Name two barriers that are imposed on big data.
Name two barriers that are imposed on big data.
The power grid data holds information consumed by a node in terms of ______.
The power grid data holds information consumed by a node in terms of ______.
Which of the following is a use of business analytics/business intelligence?
Which of the following is a use of business analytics/business intelligence?
In order to capitalize on big data, one should require infrastructure that only manages structured data.
In order to capitalize on big data, one should require infrastructure that only manages structured data.
Match the following data unit to the corresponding number of bytes:
Match the following data unit to the corresponding number of bytes:
What is the difference between Operational Big Data and Analytical Big Data?
What is the difference between Operational Big Data and Analytical Big Data?
The New York Stock Exchange captures 1 ______ of trade information during each trading session
The New York Stock Exchange captures 1 ______ of trade information during each trading session
Flashcards
What is Big Data?
What is Big Data?
Large datasets that are difficult to process using traditional methods, involving various tools, techniques, and frameworks.
Volume in Big Data
Volume in Big Data
The amount of data collected from various sources like business transactions and social media. Technologies like Hadoop help manage this.
Velocity in Big Data
Velocity in Big Data
Describes the speed at which data is generated and processed, often requiring real-time operations using technologies like RFID tags and sensors.
Variety in Big Data
Variety in Big Data
Signup and view all the flashcards
Veracity in Big Data
Veracity in Big Data
Signup and view all the flashcards
Value in Big Data
Value in Big Data
Signup and view all the flashcards
Black Box Data
Black Box Data
Signup and view all the flashcards
Social Media Data
Social Media Data
Signup and view all the flashcards
Power Grid Data
Power Grid Data
Signup and view all the flashcards
Transport Data
Transport Data
Signup and view all the flashcards
Importance of Big Data
Importance of Big Data
Signup and view all the flashcards
Big Data Technology Requirements
Big Data Technology Requirements
Signup and view all the flashcards
Operational Big Data
Operational Big Data
Signup and view all the flashcards
Analytical Big Data
Analytical Big Data
Signup and view all the flashcards
MapReduce
MapReduce
Signup and view all the flashcards
Big Data Technology Users
Big Data Technology Users
Signup and view all the flashcards
Barriers of Big Data
Barriers of Big Data
Signup and view all the flashcards
Sources of Big Data
Sources of Big Data
Signup and view all the flashcards
Business Intelligence
Business Intelligence
Signup and view all the flashcards
Sentiment Analysis
Sentiment Analysis
Signup and view all the flashcards
Study Notes
What is Big Data?
- Big Data is a collection of large datasets that cannot be adequately processed using traditional processing techniques.
- Big Data is more than just data; it has evolved into a complete subject that utilizes various tools, techniques, and frameworks.
- The term "Big Data" refers to the volume of structured and unstructured data that businesses use daily.
- Analyzing in-depth concepts through Big Data leads to better decisions and strategic development for organizations.
The Evolution of Big Data
- The idea of Big Data emerged in the early 2000s.
- Doug Laney defined big data using three categories
- Organizations gather data from various sources like business transactions, social media, and machine-to-machine data.
- Technologies like Hadoop have alleviated storage concerns.
- Data streams are now faster than ever and are improved in a timely manner.
- RFID tags, sensors, and smart metering necessitate real-time data processing.
- Big Data comes in many forms, including structured, numeric data in databases and unstructured text documents, emails, videos, audio, stock tickers, and financial transactions.
The 5 V's of Big Data
- Volume: the sheer amount of data generated every second.
- Velocity: the speed at which data emanates and changes occur.
- Veracity: the trustworthiness of data.
- Variety: The different forms of data.
- Value: Turning big data into something useful.
- 40 Zettabytes of data will be created by 2020.
- this will be a 300x increase since 2005
- It is estimated that 2.5 Quintillion Bytes of data are collected everyday
- Modern cars have around 100 sensors
- By 2016, there were 18.9 Billion network connections
- As of 2011, the global data in healthcare was 150 exabytes
Volume: Scale of Data
- Unit | Value | Size
- bit | 0 or 1 | 1/8 of a byte
- byte | 8 bits | 1 byte
- kilobyte | 1000^1 bytes | 1,000 bytes
- megabyte | 1000^2 bytes | 1,000,000 bytes
- gigabyte | 1000^3 bytes | 1,000,000,000 bytes
- terabyte | 1000^4 bytes | 1,000,000,000,000 bytes
- petabyte | 1000^5 bytes | 1,000,000,000,000,000 bytes
- exabyte | 1000^6 bytes | 1,000,000,000,000,000,000 bytes
- zettabyte | 1000^7 bytes | 1,000,000,000,000,000,000,000 bytes
- yottabyte | 1000^8 bytes | 1,000,000,000,000,000,000,000,000 bytes
- 90% of today's data was created in the last 2 years.
- There are 2.5 quintillion bytes of data created every day.
- This would fill 10 million Blu-ray discs
- 40 zettabytes (40 trillion gigabytes) of data are expected to be generated by 2020, which is 300 times more than in 2005.
- This equals 5,200 gigabytes for every person on Earth.
- Most companies in the US store over 100 terabytes (100,000 gigabytes) of data.
- The New York Stock Exchange captures 1 TB of trade information in each session
Variety: Different Forms of Data
- As of 2014:
- There were 420 million wearable wireless health monitors
- Over 4 Billion hours of video watched on Youtube per month
- Over 30 Billion pieces of content shared on Facebook every month
- Over 400 million tweets sent per day
Veracity: Trustworthiness of Data
- Includes: -Origin -Authenticity -Trustworthiness -Completeness -Integrity
- 1 in 3 business leaders do not trust information they use to make decisions
- Poor data quality costs the US about $3.1 Trillion a year
Categories of Big Data
- Black Box Data: Includes communications between crew members and technical staff.
- Social Media Data: Information and views from social networking sites like Facebook and Twitter.
- Stock Exchange Data: Details about business transactions and share decisions.
- Power Grid Data: Information on power consumption within a network.
- Transport Data: Data from transport sectors, including vehicle model, capacity, distance, and availability.
- Search Engine Data: Large amounts of data retrieved by search engines from various sources.
Importance of Big Data
- Finding the root cause of failures, issues and defects in real time operations.
- Generating coupons at the point of sale seeing the customer's habits of buying goods.
- Recalculating entire risk portfolios in just minutes.
- Detecting fraudulent behavior before it affects and risks your organization.
- Big Data Technology in: -Banking -Government -Education -Health Care -Manufacturing -Retail
How Businesses Utilize Big Data
- To understand customer wants and rapidly moving products
- To meet end-user expectations for customer service
- To speed up marketing timelines, reduce costs, and build efficient economies of scale
Big Data Technologies
- Accurate analysis increases efficiency, reduces costs, and lowers business operation risk.
- Capitalizing requires infrastructure to manage and process large volumes of structured and unstructured data in real-time while ensuring privacy and security.
- Technologies from vendors like Amazon, IBM, and Microsoft can be used to approach Big Data.
Operational Big Data vs. Analytical Big Data
- Operational Big Data:
- Includes applications like MongoDB
- Provides operational capabilities for interactive and real-time workloads
- Data is generally captured and stored.
- Capitalizes on new cloud computing architectures
- Allows access for massive computations to be run reasonably and efficiently.
- Analytical Big Data:
- Owns systems like Massively Parallel Processing database and MapReduce
- Offers analytical capabilities for complex analysis.
- MapReduce provides a new method for analyzing data
- Can be scaled up from single servers to many machines.
Barriers to Big Data
- Capture data
- Storage capacity
- Searching
- Sharing
- Transfer
- Analysis
- Presentation
Where does Big Data come from?
- Contracts
- Credit
- Weather
- Population
- Economic
- Enterprise "Dark" Data
- Public
- Social Media
- Partner, Employee, Customer, Supplier
- Commercial
- Sensor
- Industry
- Sentiment
- Monitoring
- Transactions
- Network
Business Analytics/Business Intelligence
- Business Analytics/Business Intelligence (BI) is a broad category of applications, technologies, and processes.
- The purpose is for: -gathering -storing -accessing -analyzing data
- Helps business users to make better decisions.
Things are getting more complex
- Many companies are performing new kinds of analytics (sentiment analysis, etc.)
- This allows them to better and more quickly understand and respond to what customers are saying about them and their products
- The cloud and appliances are being used as data stores.
- Advanced analytics are growing in popularity and importance.
- Sentiment analysis (opinion mining) uses natural language processing, text analysis, and computational linguistics to identify and extract subjective information.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.