AI Course: Intro and Data Quality

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

In the context of data analysis, what does 'ETL' stand for in the staging area of a data warehouse architecture?

Extract, Transfer, Load
Evaluate, Translate, Load
Execute, Transform, Load
Extract, Transform, Load (correct)

What is the primary focus of Statistics, Data Mining, and Data Science concerning the 'inductive phase of learning'?

Creating new data collection methods.
Moving from idea/theory to observation.
Moving from observation to idea/theory/hypothesis. (correct)
Analyzing data to validate existing theories.

What is the general equation that represents the composition of data?

Data = Signal - Noise
Data = Information + Error
Data = Signal + Interference
Data = Fit + Noise (correct)

Which "V" of Big Data refers to the trustworthiness and accuracy of the data?

Veracity (A) Signup and view all the answers

What is the primary goal when dealing with data?

To reveal information, models, patterns, associations, trends, and clusters hidden in the data. (B) Signup and view all the answers

What is 'data warehousing or datalakes'?

Assembling historical data in a consistent manner from transactional processes. (A) Signup and view all the answers

In the context of machine learning, what process relies primarily on inductive learning?

Using computers to make machines program themselves. (A) Signup and view all the answers

What is the Machine Learning equivalent of 'Individuals' in Statistics?

Instances (B) Signup and view all the answers

Within the context of Data Science as an interdisciplinary field, which components are combined?

Computer Science/IT, Math and Statistics, and Domain/Business Knowledge (A) Signup and view all the answers

Which of the following is NOT a typical step in the data science process?

Deployment to production (C) Signup and view all the answers

Under what condition does multivariate data arise?

When researchers record the values of several variables/attributes on a set of units in which they are interested. (A) Signup and view all the answers

What is a key characteristic of the rows in a data file?

They represent individuals or instances. (B) Signup and view all the answers

What is a key restriction regarding columns of a data file?

Same variables measured in all individuals and in the same order. (B) Signup and view all the answers

What type of variable is 'citizenship' ('Mexican', 'German', etc.)?

Nominal Variable (B) Signup and view all the answers

What type of variable is 'size in clothes' ('XXL', 'XL', 'L', 'M', 'S')?

Ordinal Variable (B) Signup and view all the answers

What is a primary characteristic of 'Count' data?

It is the result of a count. (A) Signup and view all the answers

Which file is used to understand the meaning of the variables in the data?

the metadata file (A) Signup and view all the answers

What is the purpose of the 'Feature Selection' step in data preprocessing?

To filter out the uninteresting variables. (C) Signup and view all the answers

What does 'complete case analysis' imply in handling missing values?

Omiting any case with a missing value on any of the variables. (A) Signup and view all the answers

What should be done about the presence of outliers?

Detect them and treat them in any way. (C) Signup and view all the answers

What is a goal of multivariate outlier detection?

To detect observations not detected by univariate methods. (D) Signup and view all the answers

What does 'LOF' refers to?

Local Outlier Factor is an algorithm to find density-based local outliers. (A) Signup and view all the answers

Which is a technique to detecting outliers by means of Random Forest?

Isolation Tree Algorithm. (D) Signup and view all the answers

Why do outliers use fewer divisions than normal data, with the Isolation Tree method?

Because the anomalies require less divisions because it is easier to isolate them, and these are easy to detect. (E) Signup and view all the answers

When a PCA model is obtained, such as calculating eigenvectors, projections, and means, what is done?

The initial observations can be reconstructed and used. (C) Signup and view all the answers

How is the mean squared reconstruction error calculated?

It is the average of the squared differences between the variables with the reconstructed variables. (B) Signup and view all the answers

What is Attribute wise Learning for Scoring Outliers?

A unsupervised outlier detection algorithm. (B) Signup and view all the answers

When you have detected outliers, what do you have to do?

Evaluate all actions, evaluate them, to see which benefits in the data and which ones don't. (C) Signup and view all the answers

Which of the following is true about preprocessing?

It is used to create EDA to automate analysis to understand the data faster. (B) Signup and view all the answers

Why is preprocessing important?

The data needs cleaning to get better data quality. (C) Signup and view all the answers

What is one of the first steps inside the data mining chain?

The first summary of Data: measures of central tendency and dispersion. (A) Signup and view all the answers

What is the role Variables in the data?

It must have meaning from the metadata. (C) Signup and view all the answers

What are the steps on data processing?

The steps are data colletion, data preparation (wrangling during interactive data analysis) which leverage visualization for exploratory data analysis (EDA), data preprocessing and future engineering. (E) Signup and view all the answers

In the context of handling outliers, what does 'Declare outliers as missing values' mean?

Treat outliers as null/undefined values. (D) Signup and view all the answers

What is the use of 'Internal encoding'?

Order ordinal variables. (C) Signup and view all the answers

In which situations you have to treat for bias data?

Unbias, Balance, (Detection & Mitigation) in a data pre processing step. (C) Signup and view all the answers

What is the first step for data preprocessing?

Inventory Data Sources. (B) Signup and view all the answers

In data preprocessing, what does the Scale(Normalize, Standardize) step involve?

Tranform to the same level of variables by Scaling or Normalization. (B) Signup and view all the answers

What is the main characteristics outliers detection in Random Forest(Anomaly score)?

They required less divisions to isolate than nominal data. (B) Signup and view all the answers

Flashcards

What is Learning?

An iterative process that happens between real-world facts and the hypothesized theories.

What is deduction?

Movement from an idea, theory or hypothesis to observation.