Master Data Preprocessing

Master Data Preprocessing

Created by
@CozyOctopus

Questions and Answers

What is the purpose of exploratory data analysis (EDA)?

To build up a general and detailed picture of the data

What are the types of visualizations used in EDA?

Univariate, bivariate, and multivariate

What are some examples of univariate analysis?

Descriptive statistics, one-sample tests, tests for autocorrelation

What is data imputation?

<p>The process of filling in missing values in real data</p> Signup and view all the answers

Which type of variables can logistic regression be used to predict?

<p>Nominal binary variables</p> Signup and view all the answers

What is the primary use of logistic regression from an econometric perspective?

<p>Inference</p> Signup and view all the answers

Why is interpreting logistic regression results more difficult than interpreting linear regression results?

<p>Because logistic regression results cannot be interpreted directly</p> Signup and view all the answers

What is the focus of logistic regression in this course?

<p>Prediction</p> Signup and view all the answers

What is recommended for learning the principles of logistic regression from an econometric perspective?

<p>Chapter 5.2 of this course</p> Signup and view all the answers

What is the cost function used for logistic regression?

<p>Cross-entropy (log-loss)</p> Signup and view all the answers

What is the purpose of the link function in GLM?

<p>To relate the linear model to the response variable</p> Signup and view all the answers

What is the advantage of using logistic regression over linear regression?

<p>Logistic regression can handle non-linear relationships</p> Signup and view all the answers

What is the difference between binary logistic regression and multinomial logistic regression?

<p>Binary logistic regression can only handle two classes, while multinomial logistic regression can handle more than two classes</p> Signup and view all the answers

What is the purpose of the odds ratio in logistic regression?

<p>To compare the odds of an event occurring in two different groups</p> Signup and view all the answers

What is the purpose of feature engineering during the ETL process?

<p>To transform sets into a form consumable by models</p> Signup and view all the answers

What is the purpose of feature engineering after the ETL process?

<p>To improve the predictive power of the algorithm</p> Signup and view all the answers

What is the purpose of scaling to a range in numeric variable transformations?

<p>To uniformly distribute the feature across a fixed range</p> Signup and view all the answers

What is the purpose of clipping (winsorization) in numeric variable transformations?

<p>To clip the feature if it is greater than max</p> Signup and view all the answers

What is the main challenge in feature engineering after the ETL process?

<p>Capturing non-linear relationships</p> Signup and view all the answers

What is the purpose of multinomial logistic regression?

<p>To classify more than two classes</p> Signup and view all the answers

What is logistic regression?

<p>A selected model representing the entire class of Generalized Linear Models (GLMs)</p> Signup and view all the answers

What is the sigmoid function?

<p>A mathematical function having a characteristic 'S'-shaped curve or sigmoid curve</p> Signup and view all the answers

What are the useful properties of the logistic function?

<p>It maps solution space to probability functions, it is differentiable</p> Signup and view all the answers

What course is used in the text to present the concepts, mathematical foundations, and interpretation of logistic regression?

<p>Machine Learning University (MLU)-Explain course created by Amazon</p> Signup and view all the answers

What is the advantage of machine learning over classical econometrics in terms of feature engineering?

<p>Machine learning algorithms are able to select relevant variables themselves</p> Signup and view all the answers

What are some examples of super powerful encoders mentioned in the text?

<p>Hashing, BaseN, CatBoost</p> Signup and view all the answers

What is a cautionary note given by the author regarding feature engineering in financial problems?

<p>We should not overdo our creativity</p> Signup and view all the answers

What types of interactions can we look for between variables during feature engineering?

<p>Numeric &amp; numeric, categorical &amp; categorical, or numeric &amp; categorical</p> Signup and view all the answers

What are some techniques for dealing with missing values in a dataset?

<p>Fill in the missing values using univariate or multivariate techniques</p> Signup and view all the answers

When should variables/columns with missing values be removed from a dataset?

<p>When the variable has more than 10% missing values</p> Signup and view all the answers

What is feature engineering?

<p>A process for generating new variables</p> Signup and view all the answers

What is one way to fill in missing values for time series variables?

<p>Use the last or next observed value</p> Signup and view all the answers

What is one multivariate technique for dealing with missing values in a dataset?

<p>Use a supervised machine learning algorithm like KNN</p> Signup and view all the answers

What is the purpose of a problem statement worksheet in machine learning projects?

<p>To formalize the definition of the business task</p> Signup and view all the answers

What are the elements of a data preparation process in machine learning projects?

<p>Data selection, data transformation, and data combination</p> Signup and view all the answers

What is the role of consulting firms in machine learning projects?

<p>To formulate a problem statement worksheet</p> Signup and view all the answers

What should be applicable later on the test set in a machine learning project?

<p>Parameters learned on the training set for normalization</p> Signup and view all the answers

More Quizzes Like This

Use Quizgecko on...
Browser
Browser