Big Data Value Chain
10 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the main purpose of Hadoop?

  • To make interaction with small data easier
  • To replace traditional computing systems
  • To make interaction with big data easier (correct)
  • To visualize big data results
  • What is the importance of High Availability in clustered computing?

  • It increases the risk of hardware or software failures
  • It reduces the need for real-time analytics
  • It guarantees access to data and processing despite failures (correct)
  • It is only necessary for small-scale systems
  • What is the benefit of Easy Scalability in clustered computing?

  • It makes it easy to scale or expand horizontally by adding additional machines (correct)
  • It only applies to small-scale systems
  • It increases the physical resources on a machine
  • It reduces the need for additional machines
  • What is the first step in the Big Data Life Cycle with Hadoop?

    <p>Ingesting data into the system</p> Signup and view all the answers

    What is the primary component of a computer system that clustered computing aims to provide high availability for?

    <p>All of the above</p> Signup and view all the answers

    What is the source of inspiration for Hadoop?

    <p>A technical document published by Google</p> Signup and view all the answers

    What is the main advantage of using Hadoop for big data processing?

    <p>It provides a simple programming model for distributed processing</p> Signup and view all the answers

    What is the last step in the Big Data Life Cycle with Hadoop?

    <p>Visualizing the results</p> Signup and view all the answers

    What is the primary benefit of using clustered computing for big data processing?

    <p>It provides high availability and easy scalability</p> Signup and view all the answers

    What is the primary component of a computer system that Hadoop is designed to work with?

    <p>All of the above</p> Signup and view all the answers

    Study Notes

    Big Data Value Chain (DVC)

    • The Big Data-Value-Chain describes the information flow within a big data system that aims to generate values and useful insights from data.
    • The Big Data Value Chain identifies the following key high-level activities:
      • Data Acquisition
      • Data Analysis
      • Data Curation
      • Data Storage
      • Data Usage

    Data Acquisition

    • Data Acquisition is the process of gathering, filtering, and cleaning data before it is put in a data warehouse or any other storage solution on which data analysis can be carried out.
    • The infrastructure required to support the acquisition of big data must provide:
      • Low latency
      • High volumes of transaction
      • Flexible and dynamic data structures

    Data Analysis

    • Data Analysis is concerned with making the raw data acquired amenable to use in decision-making as well as domain-specific usages.
    • Data scientists need to:
      • Be curious and result-oriented
      • Be good at communication skills that allow them to explain highly technical results to their non-technical counterparts
      • Have a strong quantitative background in statistics and linear algebra as well as programming knowledge with focuses in data warehousing, mining, and modeling to build and analyze algorithms

    Data Vs Information

    • Data can be described as unprocessed facts and figures.
    • Data can be defined as a collection of facts, concepts, or instructions in a formalized manner.
    • Data should be interpreted, or processed by human or electronic machine to have a true meaning.
    • Data can be presented in the form of:
      • Alphabets (A-Z, a-z)
      • Digits (0-9)
      • Special characters (+,-,/,*,,= etc.)

    Information

    • Information is the processed data on which decisions and actions are based.
    • It is data that has been processed into a form that is meaningful to the recipient and is of real or perceived value in the current or the prospective action or decision of recipient.
    • Information is interpreted data; created from organized, structured, and processed data in a particular context.

    Data Types

    • From a data analytics point of view, there are three common data types or structures:
      • Structured data
      • Semi-structured data
      • Unstructured data

    Structured Data

    • Structured data is data that adheres to a pre-defined Data Model and is therefore straightforward to analyze.
    • Structured data conforms to a tabular format with a relationship between the different rows and columns.
    • Common examples of structured data are Excel files or SQL databases.

    Semi-Structured Data

    • Semi-structured data is a form of structured data that does not obey the tabular structure of data models associated with relational databases or other forms of data tables.
    • Semi-structured data contains tags or other markers to separate semantic elements within the data.
    • Examples of semi-structured data include XML, JSON, etc.

    Unstructured Data

    • Unstructured data does not have a predefined data model and is not organized in a pre-defined manner.
    • Unstructured information is typically text-heavy but may contain data such as dates, numbers, and facts as well.
    • Unstructured data is difficult to understand using traditional programs as compared to data stored in structured databases.
    • Common examples of unstructured data include audio files, video files, PDF, Word file or No-SQL databases.

    Data Curation

    • Data curation is the active management of data over its life cycle to ensure it meets the necessary data quality requirements for its effective usage.
    • Data curation is performed by expert curators (Data curators, scientific curators, or data annotators) that are responsible for improving the Accessibility, Quality, Trustworthy, Discoverable, Accessible and Reusable of data.

    Data Storage

    • Data Storage is the persistence and management of data in a scalable way that satisfies the needs of applications that require fast access to the data.
    • The best solution to store Big data is a data lake because it can support various data types and typically are based on Hadoop clusters, cloud object storage services, NoSQL databases or other big data platforms.

    Data Usage

    • Data usage in business decision-making can enhance competitiveness through the reduction of costs, increased added value, or any other parameter that can be measured against existing performance criteria.
    • Data usage covers the data-driven business activities that need access to data, its analysis, and the tools needed to integrate the data analysis within the business activity.

    Big Data Life Cycle with Hadoop

    • Activities or life cycle involved with big data processing are:
      • Ingesting data into the system
      • Processing data in the storage
      • Computing and analyzing data
      • Visualizing the results

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    Learn about the Big Data Value Chain, a process that generates insights from data, involving data acquisition, analysis, curation, storage, and usage.

    More Like This

    Big Data Fundamentals
    12 questions

    Big Data Fundamentals

    ImprovedSerenity avatar
    ImprovedSerenity
    Introduction to Big Data
    18 questions

    Introduction to Big Data

    SimplifiedPorcupine avatar
    SimplifiedPorcupine
    Einführung in Big Data
    119 questions
    Big Data Characteristics
    14 questions

    Big Data Characteristics

    AmenableCosecant4039 avatar
    AmenableCosecant4039
    Use Quizgecko on...
    Browser
    Browser