Data, Information, Knowledge and Wisdom
41 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Where can data be stored practically?

  • Exclusively in a File System
  • Nowhere, data is ephemeral
  • Either in a File System, a Database Management System (DBMS), or both (correct)
  • Exclusively in a Database Management System (DBMS)

Which of the following represents the correct progression from raw data to wisdom?

  • Knowledge -> Wisdom -> Data -> Information
  • Data -> Information -> Knowledge -> Wisdom (correct)
  • Information -> Data -> Wisdom -> Knowledge
  • Wisdom -> Knowledge -> Information -> Data

What is the key characteristic that differentiates 'information' from 'data'?

  • Information is always numerical, while data is textual.
  • Information represents processed data, giving it context and meaning. (correct)
  • Data is used for decision making purposes, information is not.
  • There is no difference; the terms are interchangeable.

How does 'knowledge' build upon 'information' in the data progression?

<p>Knowledge involves applying information to make decisions or solve problems. (C)</p> Signup and view all the answers

Which of the following best describes 'wisdom' in the context of data progression?

<p>The practical application of knowledge, often involving judgment and ethical considerations. (D)</p> Signup and view all the answers

A company observes an increase in sales after implementing a targeted marketing campaign. Considering the data progression, which of the following represents the 'knowledge' stage in this scenario?

<p>The realization that targeted marketing campaigns are effective in increasing sales, derived from analyzing the sales figures and campaign data. (B)</p> Signup and view all the answers

Imagine a self-driving car uses sensor data to identify a pedestrian crossing the street (Information). It then uses known traffic laws to predict the pedestrian's path (Knowledge). Which action would represent 'Wisdom' in this scenario?

<p>The car doesn't just stop, but also analyzes the surrounding environment to prevent potential secondary accidents, prioritizing the safety of all involved based on ethical considerations. (E)</p> Signup and view all the answers

Which of the following best describes analytics?

<p>The discovery and communication of meaningful patterns in data. (C)</p> Signup and view all the answers

According to the information provided, what three disciplines are simultaneously applied in analytics?

<p>Statistics, computer programming, and operations research. (D)</p> Signup and view all the answers

Which level of analytics answers the question "Why did that happen?"

<p>Diagnostic Analytics (C)</p> Signup and view all the answers

A company wants to determine the optimal pricing strategy for a new product line. Which type of analytics would be most suitable for this?

<p>Prescriptive Analytics (C)</p> Signup and view all the answers

To provide assistance to a prospective client, a consultancy firm aims to forecast a client's future sales, incorporating macroeconomic indicators, historical sales data, and competitor activities. However, the model they developed shows extreme sensitivity to minor modifications in competitor strategies, generating drastically different sales projections with only marginal alterations.What primary course of action should the consultancy firm consider to enhance the robustness and dependability of their predictive model?

<p>Implement regularization techniques, conduct sensitivity analyses on key variables, and potentially integrate ensemble methods to consolidate predictions across multiple models. (B)</p> Signup and view all the answers

In the context of data objects, what is a data instance?

<p>A raw fact or raw data record. (C)</p> Signup and view all the answers

Which of the following best describes a data schema?

<p>The skeleton structure and properties of data. (A)</p> Signup and view all the answers

Which of the following is an example of a data instance, according to the provided information?

<p>Mr.Somboon Sae-tae (B)</p> Signup and view all the answers

What elements are considered key factors for data science?

<p>Data and the ability to extract useful knowledge from it. (D)</p> Signup and view all the answers

In a database implementation perspective, what is the relationship between 'Person', 'Name', 'Born', and 'Twitter'?

<p>'Person' is a data schema, and the rest are attributes. (D)</p> Signup and view all the answers

How does the 'database design perspective' differentiate between data and data schema?

<p>Data is the raw fact, while the data schema is the skeleton Structure. (B)</p> Signup and view all the answers

Considering a database containing information about books, which of the following would be an example of an attribute and its possible values?

<p>Category: Classic Literature (B)</p> Signup and view all the answers

Given the increasing importance of both 'Data' and 'the capability to extract useful knowledge', which business strategy would be MOST aligned with leveraging these assets for competitive advantage?

<p>Developing comprehensive analytics infrastructure paired with training initiatives to cultivate a data-literate workforce. (C)</p> Signup and view all the answers

Alice and Bob are data instances in a database. They each have attributes for 'Name', 'Born', and 'Twitter'. If the database were to transition from a relational model to a NoSQL document store, how would the representation of Alice and Bob MOST likely change?

<p>Alice and Bob, along with all their attributes, would be encapsulated into individual JSON or XML documents, enabling flexible and schema-less representation. (B)</p> Signup and view all the answers

Which of the following best describes the primary goal of data science?

<p>Extracting knowledge and insights from data. (B)</p> Signup and view all the answers

What is the role of 'algorithms' in data science, as described in the provided content?

<p>To extract knowledge and insights from noisy data. (D)</p> Signup and view all the answers

According to the material, what distinguishes data science from other fields dealing with data?

<p>Its application across a broad range of domains. (B)</p> Signup and view all the answers

In the context of data science, why is data sometimes 'anonymized'?

<p>To protect privacy and confidentiality. (B)</p> Signup and view all the answers

What does the material suggest is the relationship between 'data' and 'knowledge'?

<p>Knowledge is extracted from data through analysis. (D)</p> Signup and view all the answers

Which of the following is NOT explicitly identified as a step in data science?

<p>Quantum entanglement (B)</p> Signup and view all the answers

What crucial role does data cleaning play in the data science process according to the text?

<p>It prepares data by handling inconsistencies and errors. (C)</p> Signup and view all the answers

According to the material, what is meant by 'noisy' data in the context of data science?

<p>Data containing errors, outliers, or irrelevant information. (A)</p> Signup and view all the answers

Consider a scenario where a data scientist is tasked with predicting customer churn. Which of the following actions would MOST directly align with the principles of data science as outlined in the material?

<p>Collecting and cleaning customer data, then applying algorithms to identify patterns indicative of churn. (C)</p> Signup and view all the answers

A team is using diverse datasets: structured sales records, unstructured social media posts, and sensor data from IoT devices, to predict market trends. To effectively integrate these datasets, which of the following transformations would be MOST critical, considering the principles described?

<p>Employing techniques to handle the variety of data types, potentially transforming them into compatible formats while preserving essential features and relationships. (D)</p> Signup and view all the answers

What is the core principle behind Data-Driven Decision Making (DDD)?

<p>Basing decisions on the analysis of available data. (A)</p> Signup and view all the answers

Why is data engineering and processing considered critical to data science?

<p>Because it provides the necessary infrastructure and access to data. (D)</p> Signup and view all the answers

According to the provided information, what is the defining characteristic of 'Big Data'?

<p>Its datasets are too large for traditional data processing systems. (C)</p> Signup and view all the answers

Which of the following is NOT listed as a key technology in Big Data processing?

<p>Microsoft Excel (C)</p> Signup and view all the answers

What benefit does extracting value from Big Data provide that wasn't previously possible?

<p>It allows for analysis of data that was previously considered. (C)</p> Signup and view all the answers

In the context of Data-Driven Decision Making, which scenario exemplifies its application?

<p>A marketing team choosing ad campaigns based on consumer reaction analysis. (D)</p> Signup and view all the answers

Consider a scenario where a company wants to analyze social media data to understand customer sentiment. Which Big Data technology would be most directly involved in storing and managing the unstructured data?

<p>MongoDB (A)</p> Signup and view all the answers

A data scientist needs to perform complex statistical analysis on a massive dataset. Which of the following technologies would be most suitable for parallel processing of this data?

<p>MapReduce (C)</p> Signup and view all the answers

Imagine a scenario where a financial institution needs to store and manage extremely large volumes of transaction data with high availability and fault tolerance. Which Big Data technology would be the LEAST suitable choice for this purpose?

<p>Traditional Relational Database (A)</p> Signup and view all the answers

A retailer wants to implement a real-time recommendation system based on customer purchase history and browsing behavior. Given the need for low-latency data retrieval and analysis, which combination of Big Data technologies would be the MOST appropriate?

<p>Cassandra and Spark (C)</p> Signup and view all the answers

Flashcards

Data Storage Options

Locations for storing data.

Data

Raw, unorganized facts.

Information

Processed data that provides context.

Knowledge

Applied information.

Signup and view all the flashcards

Wisdom

The ultimate level, involving insights, judgment, and ethical considerations.

Signup and view all the flashcards

Data to Wisdom

Progression from Data to Wisdom

Signup and view all the flashcards

Raw Fact

The initial, unprocessed form of facts and figures without context or organization.

Signup and view all the flashcards

Analytics

The process of discovering meaningful patterns in data and communicating them.

Signup and view all the flashcards

Analyzing Big Data

Extracting actionable insights from large datasets by forming hypotheses and discovering correlations.

Signup and view all the flashcards

Descriptive Analytics

What happened in the past?

Signup and view all the flashcards

Diagnostic Analytics

Why did something happen?

Signup and view all the flashcards

Predictive Analytics

What is likely to happen in the future?

Signup and view all the flashcards

Data Instance

A raw fact or piece of information, either structured or unstructured (e.g., Mr. Somboon Sae-tae, 6288000, Male).

Signup and view all the flashcards

Data Schema

The skeleton structure of the data. It defines the properties or characteristics of data (e.g., Student_Name, Student_ID, Gender)

Signup and view all the flashcards

Database

A structured collection of data objects and their relationships.

Signup and view all the flashcards

Database

A collection of related, structured data organized for efficient access and management.

Signup and view all the flashcards

Data Object

Combination of raw data (facts) and data schema (structure).

Signup and view all the flashcards

Attribute

A characteristic or property of a data object, defining what type of data is stored.

Signup and view all the flashcards

Data Science Key Factor

Ability to extract valuable knowledge from data.

Signup and view all the flashcards

Data Architecture

The layout or plan that defines how data is organized and the relationships among data elements in a database.

Signup and view all the flashcards

Data Science

An interdisciplinary field using scientific methods to extract knowledge from data.

Signup and view all the flashcards

Application Domains

Using data science across a broad range of fields.

Signup and view all the flashcards

Data Mining

Techniques that find previously unknown, valid, novel, useful, and understandable patterns in large data sets.

Signup and view all the flashcards

Data Collection

Cleaning large quantities of data of diverse variety relevant for solving real-life problems.

Signup and view all the flashcards

Anonymizing Data

Transforming data to protect sensitive information.

Signup and view all the flashcards

Data Science Goal

Understanding phenomena via the automated analysis of data.

Signup and view all the flashcards

Data-Driven Decision Making (DDD)

Basing decisions on data analysis rather than intuition.

Signup and view all the flashcards

Data engineering and processing

Critical for data science, facilitating access and sophisticated processing.

Signup and view all the flashcards

Big Data

Datasets too large for traditional processing, requiring new technologies.

Signup and view all the flashcards

Hadoop

A key Big Data technology, enabling distributed processing of large datasets.

Signup and view all the flashcards

HDFS

Hadoop Distributed File System: distributed file system designed to store large amounts of data

Signup and view all the flashcards

NoSQL

Non-relational database (SQL) – provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.

Signup and view all the flashcards

MapReduce

Programming model and software framework for writing applications that process vast amounts of data in parallel on large clusters of commodity hardware.

Signup and view all the flashcards

MongoDB

Cross-platform document-oriented database program and classified as a NoSQL database program.

Signup and view all the flashcards

Cassandra

Free and open-source, distributed, wide-column store, NoSQL database management system.

Signup and view all the flashcards

PIG

High-level platform for creating MapReduce jobs that are written in Pig Latin.

Signup and view all the flashcards

Study Notes

  • This module will discuss AI, ML, and Data Science
  • It will cover data and database concepts
  • It will also cover Data Analytics, Data-Driven Decision Making and Data Science Life Cycle
  • It will cover the principles of Data Science

Data vs. Information

  • Data consists of raw, unprocessed facts
  • It lacks inherent meaning
  • It forms the building blocks of information
  • Data management involves generation, storage, and retrieval
  • Information is produced by processing data to reveal its meaning
  • Context is required to reveal the meaning of information
  • Knowledge creation is enabled through information
  • Information should be accurate, relevant, and timely for effective decision-making

Data Repositories

  • Data is kept in repositories
  • Repositories are for machine-processable, understandable formats
  • Repositories use database language (SQL) for searching/retrieving
  • Repositories support a human-understandable format
  • Data is stored in either a File System or a Database Management System (DBMS), or both

Progression of Data

  • Raw facts are the most basic
  • Processed data comes next
  • Actionable information is after that
  • Applied knowledge follows
  • Wisdom is the pinnacle

Understanding Data

  • Data is a collection of facts in raw form
  • Data a base building block
  • Information is easier to measure and visualize
  • Information is a derived second building block
  • Knowledge links the data to apply information to achieve a goal and the relevant third building block
  • Wisdom is the top of the DIKW hierarchy and is about guiding

Data Object (Database Design Perspective)

  • A Data Object consists of Data Instance and Data Schema
  • Data Instance are Raw facts or Raw data
  • Data Instance Examples are Mr.Somboon Sae-tae, 6288000, Male
  • Data Schema is a skeleton structure of data, its characteristics and properties
  • Data Schema is defined as an Attribute of data
  • The Data Schema for names are Student_Name, Student_ID, Gender

Introducing Databases

  • A database is a shared, integrated computer structure
  • It stores a collection of raw facts of interest to the end user
  • Metadata provides descriptions of data characteristics and relationships
  • DBMS manages the database structure
  • DBMS allows to store the actual data in the database
  • DBMS secures and controls access to the database

Connecting Data and Extracting Knowledge

  • Extracting useful knowledge from data are key for data science

Defining Data Science

  • Data science is an interdisciplinary field
  • It uses scientific methods, processes, algorithms, and systems
  • It extracts knowledge and insights from noisy, structured, and unstructured data
  • It applies knowledge and actionable insights from it to a range of application domains
  • Data science is concerned with collection, cleaning, and anonymizing large quantities of data
  • Data science solves real-life problems and analyzing them to initiate meaningful actions
  • Data science studies principles, processes, and techniques for understanding phenomena
  • Data science includes automated analysis of data

Data-Driven Decision Making

  • Data-Driven Decision Making (DDD) refers to decisions based on the analysis of data
  • DDD prioritizes data over intuition

Relationship of Data Engineering and Data Science

  • Data engineering and processing are critical to support data science
  • Data science benefits from sophisticated data engineering

"Big Data"

  • Big data means datasets too large for traditional data processing systems
  • Big data requires new processing technologies
  • Big Data consists of technologies like Hadoop, HDFS, NoSQL, MapReduce, MongoDB, Cassandra, PIG, HIVE, and HBASE
  • "They work together to achieve the goal like extracting value from data"

Defining Data Analytics

  • Analytics is the communication and discovery or meaningful patterns within data

Analyzing Data

  • Analytics relies on Statistics, Programming and Operations Research

Data Mining

  • Data analytics is concerned with extraction of actionable knowledge and insights from big data
  • Hypothesis formulation that is based often on conjectures gathered from experience is used
  • Data mining also utilizes discovering correlations among variables

Level of Analytics

  • There are four main levels
  • Descriptive Analytics answer "What happened?" questions
  • Diagnostic Analytics answer "Why did that happen?" questions
  • Predictive Analytics answer "What will happen?" questions
  • Prescriptive Analytics answer "Best course of action?" questions

Business Questions

  • "Who Are the Most Profitable Customers" is a simple descriptive stats question
  • "Is there a difference in value to the company of these customers?" is a testing question
  • "What are the common characteristics of these customers?" is a Segmentation/Classification question
  • "Will this new customer become a profitable customer? If so, how profitable?" is a prediction

Business Questions: Techniques

  • Most business questions are causal: what would happen if?
  • Other easier questions are: "what happened in the past?"

Supervised and Unsupervised Learning

  • Supervised Learning: is used for Classification and Regression
  • Unsupervised Learning: is used for Clustering and Dimension Reduction
  • Unsupervised Learning is often used inside a larger, Supervised learning problem such as auto-encoders for image recognition

Supervised Learning Algorithms

  • KNN (k Nearest Neighbors)
  • Naive Bayes
  • Logistic Regression
  • Support Vector Machines
  • Random Forests

Unsupervised Learning:

  • Clustering
  • Factor analysis
  • Latent Dirichlet Allocation

Data Science Life Cycle

  • Includes analysis
  • Includes experiments

Stages of cycle

  • Data Collection
  • Data Management
  • Data Analytics
  • Presentation/Visualization
  • Archiving/Preservation

Main Challenges in Data Science

  • include Privacy
  • include Security
  • include Data Governance
  • include Data & Information Sharing
  • include Cost/Operational Expenditures
  • include Data Ownership
  • Includes volume, velocity and variety

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Explore the data processing pipeline. Learn about data storage, the progression from raw data to wisdom, and differentiate between information and data. Understand how knowledge builds upon information and the role of analytics.

More Like This

Information Management Fundamentals
8 questions
Data, Information, Knowledge, and Wisdom
5 questions
Information Quality and Knowledge Concepts
10 questions
Knowledge Management Concepts
40 questions
Use Quizgecko on...
Browser
Browser