Big Data Value Chain

EffortlessVerdelite3575 avatar
EffortlessVerdelite3575
·
·
Download

Start Quiz

Study Flashcards

10 Questions

What is the main purpose of Hadoop?

To make interaction with big data easier

What is the importance of High Availability in clustered computing?

It guarantees access to data and processing despite failures

What is the benefit of Easy Scalability in clustered computing?

It makes it easy to scale or expand horizontally by adding additional machines

What is the first step in the Big Data Life Cycle with Hadoop?

Ingesting data into the system

What is the primary component of a computer system that clustered computing aims to provide high availability for?

All of the above

What is the source of inspiration for Hadoop?

A technical document published by Google

What is the main advantage of using Hadoop for big data processing?

It provides a simple programming model for distributed processing

What is the last step in the Big Data Life Cycle with Hadoop?

Visualizing the results

What is the primary benefit of using clustered computing for big data processing?

It provides high availability and easy scalability

What is the primary component of a computer system that Hadoop is designed to work with?

All of the above

Study Notes

Big Data Value Chain (DVC)

  • The Big Data-Value-Chain describes the information flow within a big data system that aims to generate values and useful insights from data.
  • The Big Data Value Chain identifies the following key high-level activities:
    • Data Acquisition
    • Data Analysis
    • Data Curation
    • Data Storage
    • Data Usage

Data Acquisition

  • Data Acquisition is the process of gathering, filtering, and cleaning data before it is put in a data warehouse or any other storage solution on which data analysis can be carried out.
  • The infrastructure required to support the acquisition of big data must provide:
    • Low latency
    • High volumes of transaction
    • Flexible and dynamic data structures

Data Analysis

  • Data Analysis is concerned with making the raw data acquired amenable to use in decision-making as well as domain-specific usages.
  • Data scientists need to:
    • Be curious and result-oriented
    • Be good at communication skills that allow them to explain highly technical results to their non-technical counterparts
    • Have a strong quantitative background in statistics and linear algebra as well as programming knowledge with focuses in data warehousing, mining, and modeling to build and analyze algorithms

Data Vs Information

  • Data can be described as unprocessed facts and figures.
  • Data can be defined as a collection of facts, concepts, or instructions in a formalized manner.
  • Data should be interpreted, or processed by human or electronic machine to have a true meaning.
  • Data can be presented in the form of:
    • Alphabets (A-Z, a-z)
    • Digits (0-9)
    • Special characters (+,-,/,*,,= etc.)

Information

  • Information is the processed data on which decisions and actions are based.
  • It is data that has been processed into a form that is meaningful to the recipient and is of real or perceived value in the current or the prospective action or decision of recipient.
  • Information is interpreted data; created from organized, structured, and processed data in a particular context.

Data Types

  • From a data analytics point of view, there are three common data types or structures:
    • Structured data
    • Semi-structured data
    • Unstructured data

Structured Data

  • Structured data is data that adheres to a pre-defined Data Model and is therefore straightforward to analyze.
  • Structured data conforms to a tabular format with a relationship between the different rows and columns.
  • Common examples of structured data are Excel files or SQL databases.

Semi-Structured Data

  • Semi-structured data is a form of structured data that does not obey the tabular structure of data models associated with relational databases or other forms of data tables.
  • Semi-structured data contains tags or other markers to separate semantic elements within the data.
  • Examples of semi-structured data include XML, JSON, etc.

Unstructured Data

  • Unstructured data does not have a predefined data model and is not organized in a pre-defined manner.
  • Unstructured information is typically text-heavy but may contain data such as dates, numbers, and facts as well.
  • Unstructured data is difficult to understand using traditional programs as compared to data stored in structured databases.
  • Common examples of unstructured data include audio files, video files, PDF, Word file or No-SQL databases.

Data Curation

  • Data curation is the active management of data over its life cycle to ensure it meets the necessary data quality requirements for its effective usage.
  • Data curation is performed by expert curators (Data curators, scientific curators, or data annotators) that are responsible for improving the Accessibility, Quality, Trustworthy, Discoverable, Accessible and Reusable of data.

Data Storage

  • Data Storage is the persistence and management of data in a scalable way that satisfies the needs of applications that require fast access to the data.
  • The best solution to store Big data is a data lake because it can support various data types and typically are based on Hadoop clusters, cloud object storage services, NoSQL databases or other big data platforms.

Data Usage

  • Data usage in business decision-making can enhance competitiveness through the reduction of costs, increased added value, or any other parameter that can be measured against existing performance criteria.
  • Data usage covers the data-driven business activities that need access to data, its analysis, and the tools needed to integrate the data analysis within the business activity.

Big Data Life Cycle with Hadoop

  • Activities or life cycle involved with big data processing are:
    • Ingesting data into the system
    • Processing data in the storage
    • Computing and analyzing data
    • Visualizing the results

Learn about the Big Data Value Chain, a process that generates insights from data, involving data acquisition, analysis, curation, storage, and usage.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Big Data Fundamentals
12 questions

Big Data Fundamentals

ImprovedSerenity avatar
ImprovedSerenity
Introduction to Big Data
18 questions

Introduction to Big Data

SimplifiedPorcupine avatar
SimplifiedPorcupine
Big Data Chapter 6
23 questions
Data Science Chapter 2
10 questions

Data Science Chapter 2

EffortlessVerdelite3575 avatar
EffortlessVerdelite3575
Use Quizgecko on...
Browser
Browser