Podcast
Questions and Answers
What is the main goal of data analysis?
What is the main goal of data analysis?
What is data curation?
What is data curation?
Who is responsible for improving the Accessibility, Quality, Trustworthy, Discoverable, Accessible, and Reusable of data?
Who is responsible for improving the Accessibility, Quality, Trustworthy, Discoverable, Accessible, and Reusable of data?
What is the main solution to data storage?
What is the main solution to data storage?
Signup and view all the answers
What is the best solution to store Big Data?
What is the best solution to store Big Data?
Signup and view all the answers
What is data usage in business decision-making?
What is data usage in business decision-making?
Signup and view all the answers
What is the benefit of using data analysis in business decision-making?
What is the benefit of using data analysis in business decision-making?
Signup and view all the answers
What is the primary goal of data analysis and data mining?
What is the primary goal of data analysis and data mining?
Signup and view all the answers
What is related to data analysis?
What is related to data analysis?
Signup and view all the answers
What is the primary goal of data curation?
What is the primary goal of data curation?
Signup and view all the answers
Study Notes
Overview of Data Science
- Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured, semi-structured, and unstructured data.
- Data science offers a range of roles and requires a range of skills to analyze data.
- Data science focuses on extracting knowledge from data sets.
Data Science Evolving
- Data science continues to evolve as one of the most promising and in-demand career paths for skilled professionals.
- Data science combines domain expertise, such as programming skills, mathematics, statistics, machine learning, information science, and data mining to extract meaningful insights from data.
Data Scientists
- Data scientists are data experts who master the full spectrum of the data science life cycle to uncover useful intelligence from data for an organization.
- Data scientists need to be curious, result-oriented, and have good communication skills to explain highly technical results to non-technical counterparts.
- They require a strong quantitative background in statistics and linear algebra, as well as programming knowledge with focuses on data warehousing, mining, and modeling to build and analyze algorithms.
Data vs. Information
- Data can be described as unprocessed facts and figures.
- Data can be defined as a collection of facts, concepts, or instructions in a formalized manner.
- Data should be interpreted, or processed by humans or electronic machines to have a true meaning.
Data vs. Information (Continued)
- Information is the processed data on which decisions and actions are based.
- Information is data that has been processed into a form that is meaningful to the recipient and is of real or perceived value in the current or prospective action or decision of the recipient.
Data Processing Cycle
- Data processing is the re-structuring or re-ordering of data by people or machines to increase their usefulness and add value for a particular purpose.
- The data processing cycle consists of three basic steps: Input, Processing, and Output.
Data Processing Cycle (Continued)
- Input step: preparing input data in a convenient form for processing.
- Processing step: converting input data to produce data in a more useful form.
- Output step: the result of processing is collected.
Data Types and Representation
- Data types can be described from diverse perspectives, such as computer science and computer programming, and data analytics.
- From a computer programming perspective, data types include integers, booleans, characters, floating-point numbers, and alphanumeric strings.
Data Types from Data Analytics Perspective
- From a data analytics perspective, there are three common data types: structured, semi-structured, and unstructured data.
Structured Data
- Structured data adheres to a pre-defined data model and is straightforward to analyze.
- Structured data conforms to a tabular format with a relationship between the different rows and columns.
- Examples of structured data include Excel files or SQL databases.
Unstructured Data
- Unstructured data does not have a predefined data model and is not organized in a pre-defined manner.
- Unstructured data is typically text-heavy but may contain data such as dates, numbers, and facts as well.
- Examples of unstructured data include audio files, video files, PDF, Word files, or No-SQL databases.
Semi-Structured Data
- Semi-structured data is a form of structured data that does not obey the tabular structure of data models associated with relational databases or other forms of data tables.
- Semi-structured data contains tags or other markers to separate semantic elements within the data.
Data Curation
- Data curation is the active management of data over its life cycle to ensure it meets the necessary data quality requirements for its effective usage.
- Data curation involves content creation, selection, classification, transformation, validation, and preservation of data.
Data Storage
- Data storage is the persistence and management of data in a scalable way that satisfies the needs of applications that require fast access to the data.
- Relational Database Management Systems (RDBMS) have been the main solution to data storage.
- The best solution to store big data is a data lake, which can support various data types and is typically based on Hadoop clusters, cloud object storage services, NoSQL databases, or other big data platforms.
Data Usage
- Data usage covers the data-driven business activities that need access to data, its analysis, and the tools needed to integrate the data analysis within the business activity.
- Data usage in business decision-making can enhance competitiveness through the reduction of costs, increased added value, or any other parameter that can be measured against existing performance criteria.
Basic Concepts of Big Data
- Big data refers to the large and complex data sets that traditional data processing tools and techniques cannot handle.
- Big data is characterized by its volume, velocity, and variety.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This chapter covers the fundamentals of data science, including data types, data value chain, and big data concepts. Learn about the role of data scientists and the data processing life cycle.