Data Categorization in Data Science
16 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What type of data is organized in relational tables?

  • Record data (correct)
  • Graphs and networks
  • Document data
  • Transaction data

What does NOIR stand for in data classification?

Nominal, Ordinal, Interval, Ratio

Nominal data can have a logical order.

False (B)

What is an attribute in data science?

<p>A measurable or observable property of an entity</p> Signup and view all the answers

Which of the following is an example of nominal data?

<p>Blood groups (C)</p> Signup and view all the answers

Name the two general types of data.

<p>Quantitative and Qualitative</p> Signup and view all the answers

A ______ variable has exactly two mutually exclusive categories.

<p>binary</p> Signup and view all the answers

Which scale of measurement involves numerical values that can be ordered?

<p>Ordinal (B)</p> Signup and view all the answers

Match the following types of data with their descriptions:

<p>Nominal = Categorical data without order Ordinal = Categorical data with order Interval = Numeric data without true zero Ratio = Numeric data with true zero</p> Signup and view all the answers

What does the CRISP-DM methodology stand for?

<p>Cross Industry Standard Process for Data Mining (A)</p> Signup and view all the answers

What are the two types of data collection methods?

<p>Primary and Secondary Data Collection Methods</p> Signup and view all the answers

Primary data collection is typically less time-consuming than secondary data collection.

<p>False (B)</p> Signup and view all the answers

What is the difference between qualitative and quantitative data collection methods?

<p>Qualitative does not use mathematical calculations and is used for understanding reasons, while quantitative expresses data in numbers.</p> Signup and view all the answers

Which of the following is an example of secondary data?

<p>Data collected from sensors (D)</p> Signup and view all the answers

Data collection is either __________ or __________.

<p>qualitative, quantitative</p> Signup and view all the answers

What is the first step to take before collecting data?

<p>Define the aim of the problem</p> Signup and view all the answers

Study Notes

Data Sets

  • Different forms of datasets
  • Data within Data Science

Data Categorization

  • NOIR Topology: Categorization of data based on properties. NOIR represents: Nominal, Ordinal, Interval, Ratio
  • Nominal Scale: Categories without any sort of order or rank. Examples:
    • Gender: (M, F)
    • Blood Groups: (A, B, AB, O)
    • Country Code: (048, 040)
  • Binary Scale: A type of Nominal scale with exactly two mutually exclusive categories.
    • Examples:
      • Switch: (On, Off)
      • Gender: (Male, Female)

### Ordinal Scale

  • Categories with a natural order or rank. Examples:
    • Education Level: (High School, Bachelor’s, Master's, PhD)
    • Satisfaction Rating: (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied)
    • Movie Ratings: (1 Star, 2 Stars, 3 Stars, etc.)

Interval and Ratio Scales

  • Interval Scale: Data with consistent intervals between values. Examples:
    • Temperature (Celsius or Fahrenheit):
      • 10 degrees Celsius - 0 degrees Celsius = 10 degrees Celsius - 20 degrees Celsius (same interval)
    • Time (measured in hours):
      • 10 am - 9 am = 11 am - 10 am (same interval)
  • Ratio Scale: Data with a true zero point. Examples:
    • Height: (0 cm to 200 cm)
    • Weight: (0 kg to 100 kg)
    • Age: (0 years to 100 years)

Types of Datasets

  • Record Data:
    • Relational Records: Databases with highly structured data stored in relational tables
    • Data Matrix: Numerical or categorical data displayed in rows and columns (e.g., Spreadsheets)
    • Transaction Data: Transactions from a point of sale system, including timestamps and items
    • Document Data: Text documents represented as Term-Frequency Vectors (a matrix showing word frequency in a document)
  • Graphs and Networks: Data represented as nodes (points) and edges (connections) used in network analysis.

Data in Data Science

  • Entities: Objects or things.
  • Attributes: Measurable properties of an entity.
  • Data: Measurement of an attribute. Computers can work with different types of data, such as audio, video, text, etc.

CRISP-DM Methodology

  • Stands for Cross Industry Standard Process for Data Mining
  • A cycle that describes commonly used approaches for data mining experts

Business Understanding

  • Understanding the project objectives and requirements from a business perspective
  • Converting this knowledge into a data mining problem definition

Data Understanding

  • Initial data collection
  • Activities to get familiar with the data
  • Identifying data quality problems
  • Discovering insights into the data

Data Preparation

  • Constructing the final dataset
  • Transforming and cleaning data for modeling tools
  • Tasks include table, record, and attribute selection

Modeling

  • Selecting and applying various modeling techniques
  • Calibrating model parameters to optimal values
  • Often requires returning to the data preparation phase

Evaluation

  • Thoroughly evaluating models before deployment
  • Reviewing the steps executed to construct the model
  • Ensuring it achieves the business objectives

Deployment

  • Creating the model is generally not the end
  • Organizing and presenting knowledge in a way that is useful to the customer

Data Collection

  • Systematic process of gathering observations or measurements
  • Gaining first-hand knowledge and insights

Why Data Collection?

  • Analyzing a problem and learning about its outcome and future trends
  • Arriving at a solution for a question about the future

Data Collection Methods

  • Primary: Original data collected directly from the source
  • Secondary: Data collected by another person that is readily available

Primary Data Collection Methods

  • Qualitative: Analyzing the quality and understanding reasons behind something
  • Quantitative: Expressing data in figures or numbers using traditional or online methods

Secondary Data Collection Methods

  • Data from sensors, magazines, and documents

What Before Data Collection?

  • Aim of the problem
  • Type of data to be collected
  • Methods and procedures to collect, store, and process data

Step 1: Define the Aim of the Problem

  • Identifying the exact problem you want to solve
  • Clearly defining the objectives for data collection

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Lect3-Data Categorization PDF

Description

This quiz explores the various forms of data, particularly focusing on the NOIR topology used to categorize data based on properties. It covers different types of scales including nominal, ordinal, interval, and ratio, along with examples for each category. Test your understanding of how data is structured in the context of data science!

More Like This

Data Types and Categorization Chapter 4
40 questions
Classificació i Anàlisi de Dades
40 questions
Introduction to Data Science
7 questions
Data Science Fundamentals Quiz
7 questions
Use Quizgecko on...
Browser
Browser