Data Science Chapter 2
10 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the main goal of data analysis?

  • To ensure data quality requirements
  • To store data in a scalable way
  • To highlight relevant data and extract useful hidden information (correct)
  • To integrate data analysis within business activities
  • What is data curation?

  • The process of data mining and business intelligence
  • The persistence and management of data in a scalable way
  • The data-driven business activities that need access to data
  • The active management of data over its life cycle (correct)
  • Who is responsible for improving the Accessibility, Quality, Trustworthy, Discoverable, Accessible, and Reusable of data?

  • Data miners
  • Business intelligence experts
  • Data curators (correct)
  • Data analysts
  • What is the main solution to data storage?

    <p>Relational Database Management Systems (RDBMS) (B)</p> Signup and view all the answers

    What is the best solution to store Big Data?

    <p>Data lakes (A)</p> Signup and view all the answers

    What is data usage in business decision-making?

    <p>The data-driven business activities that need access to data (C)</p> Signup and view all the answers

    What is the benefit of using data analysis in business decision-making?

    <p>It enhances competitiveness through the reduction of costs (D)</p> Signup and view all the answers

    What is the primary goal of data analysis and data mining?

    <p>To extract useful hidden information from data (D)</p> Signup and view all the answers

    What is related to data analysis?

    <p>Data mining, business intelligence, and machine learning (C)</p> Signup and view all the answers

    What is the primary goal of data curation?

    <p>To ensure data quality requirements for its effective usage (B)</p> Signup and view all the answers

    Study Notes

    Overview of Data Science

    • Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured, semi-structured, and unstructured data.
    • Data science offers a range of roles and requires a range of skills to analyze data.
    • Data science focuses on extracting knowledge from data sets.

    Data Science Evolving

    • Data science continues to evolve as one of the most promising and in-demand career paths for skilled professionals.
    • Data science combines domain expertise, such as programming skills, mathematics, statistics, machine learning, information science, and data mining to extract meaningful insights from data.

    Data Scientists

    • Data scientists are data experts who master the full spectrum of the data science life cycle to uncover useful intelligence from data for an organization.
    • Data scientists need to be curious, result-oriented, and have good communication skills to explain highly technical results to non-technical counterparts.
    • They require a strong quantitative background in statistics and linear algebra, as well as programming knowledge with focuses on data warehousing, mining, and modeling to build and analyze algorithms.

    Data vs. Information

    • Data can be described as unprocessed facts and figures.
    • Data can be defined as a collection of facts, concepts, or instructions in a formalized manner.
    • Data should be interpreted, or processed by humans or electronic machines to have a true meaning.

    Data vs. Information (Continued)

    • Information is the processed data on which decisions and actions are based.
    • Information is data that has been processed into a form that is meaningful to the recipient and is of real or perceived value in the current or prospective action or decision of the recipient.

    Data Processing Cycle

    • Data processing is the re-structuring or re-ordering of data by people or machines to increase their usefulness and add value for a particular purpose.
    • The data processing cycle consists of three basic steps: Input, Processing, and Output.

    Data Processing Cycle (Continued)

    • Input step: preparing input data in a convenient form for processing.
    • Processing step: converting input data to produce data in a more useful form.
    • Output step: the result of processing is collected.

    Data Types and Representation

    • Data types can be described from diverse perspectives, such as computer science and computer programming, and data analytics.
    • From a computer programming perspective, data types include integers, booleans, characters, floating-point numbers, and alphanumeric strings.

    Data Types from Data Analytics Perspective

    • From a data analytics perspective, there are three common data types: structured, semi-structured, and unstructured data.

    Structured Data

    • Structured data adheres to a pre-defined data model and is straightforward to analyze.
    • Structured data conforms to a tabular format with a relationship between the different rows and columns.
    • Examples of structured data include Excel files or SQL databases.

    Unstructured Data

    • Unstructured data does not have a predefined data model and is not organized in a pre-defined manner.
    • Unstructured data is typically text-heavy but may contain data such as dates, numbers, and facts as well.
    • Examples of unstructured data include audio files, video files, PDF, Word files, or No-SQL databases.

    Semi-Structured Data

    • Semi-structured data is a form of structured data that does not obey the tabular structure of data models associated with relational databases or other forms of data tables.
    • Semi-structured data contains tags or other markers to separate semantic elements within the data.

    Data Curation

    • Data curation is the active management of data over its life cycle to ensure it meets the necessary data quality requirements for its effective usage.
    • Data curation involves content creation, selection, classification, transformation, validation, and preservation of data.

    Data Storage

    • Data storage is the persistence and management of data in a scalable way that satisfies the needs of applications that require fast access to the data.
    • Relational Database Management Systems (RDBMS) have been the main solution to data storage.
    • The best solution to store big data is a data lake, which can support various data types and is typically based on Hadoop clusters, cloud object storage services, NoSQL databases, or other big data platforms.

    Data Usage

    • Data usage covers the data-driven business activities that need access to data, its analysis, and the tools needed to integrate the data analysis within the business activity.
    • Data usage in business decision-making can enhance competitiveness through the reduction of costs, increased added value, or any other parameter that can be measured against existing performance criteria.

    Basic Concepts of Big Data

    • Big data refers to the large and complex data sets that traditional data processing tools and techniques cannot handle.
    • Big data is characterized by its volume, velocity, and variety.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    This chapter covers the fundamentals of data science, including data types, data value chain, and big data concepts. Learn about the role of data scientists and the data processing life cycle.

    More Like This

    Types of Big Data Analysis
    18 questions
    Data Science Overview and Data Types
    37 questions
    Data Types and Datasets in Data Science
    29 questions
    Use Quizgecko on...
    Browser
    Browser