Data Analysis Quiz PDF

Summary

This is a data analysis quiz document that includes multiple choice, true/false, and identification questions. The topics covered in the quiz include exploratory data analysis (EDA), data visualization in Python using libraries like Pandas and Matplotlib, data transformations (like normalization and standardization), feature engineering techniques like one-hot encoding and binning, and data manipulation.

Full Transcript

**What is the main goal of Exploratory Data Analysis (EDA)?**\ A. To store data in a structured format\ B. To visualize and understand data patterns\ C. To clean data by removing duplicates\ D. To prepare data for storage **Which of the following libraries is commonly used for data visualization in...

**What is the main goal of Exploratory Data Analysis (EDA)?**\ A. To store data in a structured format\ B. To visualize and understand data patterns\ C. To clean data by removing duplicates\ D. To prepare data for storage **Which of the following libraries is commonly used for data visualization in Python?**\ A. Numpy\ B. Pandas\ C. Matplotlib\ D. Scipy **Which data transformation technique adjusts data to have a mean of 0 and a standard deviation of 1?**\ A. Normalization\ B. Encoding\ C. Scaling\ D. Standardization **In feature engineering, one-hot encoding is used to handle which type of data?**\ A. Continuous\ B. Ordinal\ C. Categorical\ D. Numerical **Which of the following visualizations is best for showing the relationship between two numerical variables?**\ A. Histogram\ B. Scatter Plot\ C. Bar Chart\ D. Pie Chart **What does a correlation matrix display?**\ A. Missing values in the dataset\ B. Relationships between variables\ C. Summary statistics for each variable\ D. Categorical data distribution **The process of grouping continuous values into discrete intervals is known as:**\ A. Scaling\ B. Encoding\ C. Normalization\ D. Binning **Which function is used in pandas to check for missing values?**\ A. isnull()\ B. notnull()\ C. dropna()\ D. fillna() **The purpose of feature engineering is to:**\ A. Remove redundant data\ B. Create new features that add value\ C. Split data into training and testing sets\ D. Perform exploratory analysis **Which of the following Python libraries is mainly used for data manipulation?**\ A. Seaborn\ B. Scipy\ C. Pandas\ D. Matplotlib **Which step of the data science workflow includes splitting the dataset into training and testing sets?\ **A. Data Collection\ B. Data Cleaning\ C. Model Evaluation\ D. Model Building **Which of the following describes feature interaction?\ **A. Removing unnecessary features\ B. Creating new features by combining existing ones\ C. Encoding categorical variables\ D. Scaling features **What type of chart is used to visualize the distribution of a categorical variable?**\ A. Histogram\ B. Line Chart\ C. Pie Chart\ D. Scatter Plot **Which method is commonly used to encode categorical data in a binary format?**\ A. Normalization\ B. One-hot encoding\ C. Binning\ D. Scaling **A heatmap is often used to:**\ A. Show individual data points\ B. Display the distribution of one variable\ C. Display correlation between multiple variables\ D. List missing data **What does EDA typically include?**\ A. Predicting future data\ B. Creating new features\ C. Summarizing main characteristics of data\ D. Formatting data for storage **When performing feature engineering, which of the following transforms categorical data** to a numerical format?\ A. Normalization\ B. Standardization\ C. Encoding\ D. Binning **Which function in pandas is used to get a summary of numerical features?\ **A. describe()\ B. summary()\ C. info()\ D. count() **In Python, which library is commonly used alongside Seaborn for visualizations?**\ A. Pandas\ B. Matplotlib\ C. Numpy\ D. Scipy **The main purpose of feature selection is to**:\ A. Add more features to improve model complexity\ B. Reduce model overfitting and simplify computation\ C. Replace missing data\ D. Normalize the features **Test II: True or False (20 Items)**\ **Direction:** Write \"True\" if the statement is correct; otherwise, write \"False.\" **NO ERASURE.** 1. Data science only involves the use of statistics. 2. Normalization adjusts the scale of data to a range between 0 and 1. 3. One-hot encoding is a technique used for encoding numerical data. 4. A scatter plot is useful for understanding the relationship between two continuous variables. 5. Feature engineering is about removing unnecessary features from the data. 6. EDA is usually done after model evaluation. 7. Binning is the process of grouping continuous values into discrete intervals. 8. Label encoding converts categorical values into unique numerical values. 9. A pie chart is appropriate for visualizing categorical data. 10. Correlation analysis can help identify relationships between numerical features. 11. Standardization changes the data distribution to have a mean of 1 and standard deviation of 0. 12. Missing values can be handled by filling, dropping, or flagging them. 13. The primary goal of feature engineering is to improve the model's interpretability. 14. Bar plots are often used to visualize relationships among multiple numerical features. 15. Label encoding is used to convert categorical variables with a high cardinality into binary format. 16. EDA includes visualizing the data to uncover patterns and trends. 17. Removing outliers is a common technique in feature engineering. 18. Heatmaps are useful for visualizing missing data in a dataset. 19. The process of feature extraction involves transforming data into a smaller set of new features. 20. Encoding is unnecessary if the dataset contains only numerical data. **Test III: Identification (10 Items)**\ **Direction:** Write your answer in the blank. **NO ERASURE.** 1. The process of adjusting data to have a mean of 0 and a standard deviation of 1. 2. This type of plot is often used to display the distribution of a numerical variable. 3. The process of transforming categorical data into a binary format. 4. A Python library commonly used for data manipulation and analysis. 5. A graphical representation of the correlation between variables. 6. The process of dividing continuous data into intervals or groups. 7. This term refers to the process of summarizing data to understand its main characteristics. 8. A Python library that is frequently used for creating static, interactive, and animated visualizations. 9. The process of selecting relevant features to reduce complexity and prevent overfitting. 10. A data transformation technique that scales values to a range between 0 and 1.

Use Quizgecko on...
Browser
Browser