PT1 Past Paper PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document contains a past paper on data science topics, covering SQL questions, Pandas DataFrame operations, R libraries, and Python programming concepts. It includes questions and potential answers. The questions are about basic data analysis and programming concepts.
Full Transcript
PT1 === 1\. Which of the following SQL statements is used to retrieve data from a database? \- Đúng: SELECT \- UPDATE \- DELETE \- INSERT 2\. Which method is used to remove duplicates from a Pandas DataFrame? \- df.remove\_duplicates() \- df.delete\_duplicates() \- df.clear\_duplicates() \...
PT1 === 1\. Which of the following SQL statements is used to retrieve data from a database? \- Đúng: SELECT \- UPDATE \- DELETE \- INSERT 2\. Which method is used to remove duplicates from a Pandas DataFrame? \- df.remove\_duplicates() \- df.delete\_duplicates() \- df.clear\_duplicates() \- Đúng: df.drop\_duplicates() 3\. Which of the following is NOT a popular R library for data science? \- dplyr \- caret \- Đúng: TensorFlow \- ggplot 4\. What is the main purpose of regression analysis? \- To categorize data \- To visualize data \- Đúng: To measure the strength of the relationship between variables \- To collect data samples 5\. What is the role of Development Environments, also known as IDEs, in data science? \- To facilitate teamwork on code assets \- Đúng: To help data scientists implement, test, and deploy their work \- To manage data storage \- To monitor model performance 6\. What process is referred to as ETL in data science? \- Đúng: Extract, Transform, and Load \- Extract, Test, and Load \- Examine, Transfer, and Load \- Extract, Transfer, and Link 7\. What is a key characteristic of Fully Integrated Visual Tools in data science? \- They only provide support for code asset management. \- They are exclusively for model monitoring and assessment. \- Đúng: They support all data science tasks, either partially or completely. \- They only focus on model building 8\. What is the purpose of Model Deployment in data science? \- Đúng: To make a machine learning model accessible to third-party applications \- To improve data visualization \- To clean and preprocess data \- To manage code assets for a project 9\. What does the mode represent in a dataset? \- The middle value when the data is ordered \- The range between the highest and lowest values \- Đúng: The value that occurs most frequently \- The sum of all values divided by the total number of values 10\. What is the primary purpose of the ggplot2 library? \- Data manipulation \- Machine learning \- Đúng: Data visualization \- String operations 11\. Which of the following best defines REST APIs? \- They are used for real-time data analytics. \- They focus solely on data visualization. \- Đúng: They enable interaction with web services via the internet. \- They are exclusively for managing datasets. 12\. What type of visualization is most appropriate for showing the relationship between two continuous variables? \- Stacked column chart \- Line chart \- Đúng: Scatterplot \- Pie chart 13\. Which command lets you see the state of your working directory? \- Đúng: git status \- git log \- git branch \- git reset 14\. Why are samples often used instead of the entire population? \- To eliminate the need for hypothesis testing \- Đúng: To reduce the cost of data collection \- To increase data collection costs \- To avoid the need for statistical analysis 15\. Which of the following is an example of an explanatory variable in a regression model? \- Đúng: Beauty score \- Teaching evaluation score \- Error term \- Constant term 16\. What is a primary feature of Execution Environments in data science? \- They are used exclusively for data visualization. \- Đúng: They facilitate model training and deployment. \- They are used for data storage and retrieval. \- They manage access rights and replication. 17\. What happens to the t-distribution as the degrees of freedom increase? \- It becomes skewed to the right. \- It becomes skewed to the left. \- It diverges from the standard normal distribution. \- Đúng: It approaches the standard normal distribution. 18\. What is JupyterLab? \- A cloud-based data warehouse \- A standalone desktop application \- Đúng: An interactive environment for Jupyter Notebook \- A code version control system 19\. What does the Z-value represent in a standard normal distribution? \- The range of the data set \- The sum of the values in the data set \- The difference between the mean and the mode \- Đúng: The number of standard deviations a value is from the mean 20\. What file format is used to save Jupyter Notebook files? \- pdf \- CSV \- html \- Đúng: ipynb 21\. Which of the following is NOT a type of machine learning? \- Unsupervised learning \- Reinforcement learning \- Đúng: Visual learning \- Supervised learning 22\. What are the three main measures of central tendency? \- Mean, variance, standard deviation \- Đúng: Mean, median, mode \- Mode, range, variance \- Range, variance, standard deviation 23\. When rolling two standard six-sided dice, how many possible outcomes are there? \- Đúng: 36 \- 24 \- 06 \- 12 24\. What is the range of values for probability? \- 0 to 10 \- 0 to 100 \- Đúng: 0 to 1 \- -1 to 1 25\. Which tool type is responsible for data visualization during both initial exploration and final deliverables? \- Data Integration Tools \- Model Deployment Tools \- Data Management Tools \- Đúng: Data Visualization Tools 26\. Which data type is characterized by a natural zero point? \- Ordinal data \- Interval data \- Đúng: Ratio data \- Categorical data 27\. Which programming languages does Jupyter Notebook primarily support? \- Đúng: Julia, Python, R \- Java, C++, Ruby \- Scala, Kotlin, Swift \- PHP, JavaScript, Go 28\. Which of the following is a characteristic of R? \- Đúng: R integrates well with languages like C++ and Python. \- R has fewer packages for statistical analysis than Python. \- R is most often used for web development. \- R is primarily used by data engineers. 29\. In the context of normally distributed data, what does IQR represent? \- Interval Query Range \- Đúng: Interquartile Range \- Intermediate Quantitative Result \- Inverse Quantile Regression 30\. What is the key characteristic of the median? \- It is the sum of all values divided by the number of values. \- It is affected by extreme values. \- Đúng: It divides the data into two equal halves. \- It represents the most frequent value in the data. PT2 === 1\. Which of the following is an example of an open data source? \- Private company database \- Personal information databases \- Confidential corporate reports \- Đúng: Kaggle datasets 2\. What is the purpose of using a T-test in regression analysis? \- To create a visual representation of data \- To calculate the correlation between two variables \- To measure variance within multiple groups \- Đúng: To determine if there is a statistically significant difference between two group means 3\. What is one of the biggest challenges in data science today? \- Excessive regulation making data science illegal \- No available tools for data analysis \- Đúng: Overabundance of data and the ability to process it \- Lack of data 4\. Which of the following is NOT a characteristic of NumPy arrays compared to Python lists? \- Đúng: They can contain elements of different data types. \- They are stored in contiguous memory locations. \- They are fixed in size. \- They provide efficient mathematical operations. 5\. In Python, what does the \`//\` operator do? \- Raises a number to a power \- Calculates the remainder \- Đúng: Performs floor division \- Performs regular division 6\. What is the purpose of the \`\_\_init\_\_\` method in a Python class? \- To delete an object \- To initialize the object as a string \- To define a class variable \- Đúng: To initialize an object\'s attributes 7\. What does the \`groupby()\` function in Pandas accomplish? \- It pivots the DataFrame for better visualization. \- It sorts the DataFrame based on a column. \- It merges multiple DataFrames. \- Đúng: It groups the DataFrame rows based on a column\'s values. 8\. What does the term \"cross-validation\" refer to in data science? \- A process of cleaning data by removing duplicates \- Đúng: A technique for assessing the generalization of results \- A method for encrypting data \- A way to visualize data using cross-tabulation 9\. What is the output of the following Python code? \`print(type((1, 2, 3)))\` \- List \- Dictionary \- Set \- Đúng: Tuple 10\. Which method in Pandas is used to handle missing data by replacing them with a specified value? \- \`dropna()\` \- \`notnull()\` \- \`isnull()\` \- Đúng: \`fillna()\` 11\. Which library would you use for machine learning algorithms in Python? \- TensorFlow \- Đúng: Scikit-learn \- BeautifulSoup \- Seaborn 12\. What is a \"join\" in SQL? \- A command used to delete data \- Đúng: A method to combine more tables \- A command used to filter data \- A method to sort data within a table 13\. Which command lets you see the state of your working directory in Git? \- Đúng: \`git status\` \- \`git log\` \- \`git branch\` \- \`git reset\` 14\. Why are samples often used instead of the entire population? \- To eliminate the need for hypothesis testing \- Đúng: To reduce the cost of data collection \- To increase data collection costs \- To avoid the need for statistical analysis 15\. Which of the following is an example of an explanatory variable in a regression model? \- Đúng: Beauty score \- Teaching evaluation score \- Error term \- Constant term 16\. What is a primary feature of execution environments in data science? \- They are used exclusively for data visualization. \- Đúng: They facilitate model training and deployment. \- They are used for data storage and retrieval. \- They manage access rights and replication. 17\. What happens to the t-distribution as the degrees of freedom increase? \- It becomes skewed to the right. \- It becomes skewed to the left. \- It diverges from the standard normal distribution. \- Đúng: It approaches the standard normal distribution. 18\. What is JupyterLab? \- A cloud-based data warehouse \- A standalone desktop application \- Đúng: An interactive environment for Jupyter Notebook \- A code version control system 19\. What does the Z-value represent in a standard normal distribution? \- The range of the data set \- The sum of the values in the data set \- The difference between the mean and the mode \- Đúng: The number of standard deviations a value is from the mean 20\. What file format is used to save Jupyter Notebook files? \- pdf \- CSV \- html \- Đúng: ipynb 21\. Which of the following is NOT a type of machine learning? \- Unsupervised learning \- Reinforcement learning \- Đúng: Visual learning \- Supervised learning 22\. What are the three main measures of central tendency? \- Mean, variance, standard deviation \- Đúng: Mean, median, mode \- Mode, range, variance \- Range, variance, standard deviation 23\. When rolling two standard six-sided dice, how many possible outcomes are there? \- Đúng: 36 \- 24 \- 6 \- 12 24\. What is the range of values for probability? \- 0 to 10 \- 0 to 100 \- Đúng: 0 to 1 \- -1 to 1 25\. Which of the following best explains why domain expertise is important in data science? \- Đúng: It helps in understanding the context and making informed decisions based on data. \- It ensures compliance with international data laws. \- It reduces the amount of data that needs to be processed. \- It eliminates the need for data analysis. 26\. Which of the following is NOT a basic data type in Python? \- Đúng: Function \- String \- List \- Integer 27\. Which programming language is most commonly associated with data science due to its extensive libraries and ease of use? \- HTML \- JavaScript \- C++ \- Đúng: Python 28\. Why is understanding the business problem crucial in data science? \- It allows data scientists to select the cheapest solution. \- It ensures the data scientists can work independently. \- Đúng: It helps in defining objectives and choosing the right approach to provide actionable insights. \- It reduces the time needed for data preprocessing. 29\. Which of the following best describes the role of data science in business? \- Minimizing the amount of data collected by the organization. \- Replacing all existing IT infrastructure. \- Automating all business processes without human oversight. \- Đúng: Providing insights that can drive strategic decision-making and improve operational efficiency. 30\. Which of the following best describes the term \"Big Data\"? \- Data that only includes text and numbers. \- Data stored on large physical servers. \- Data that is outdated and no longer useful. \- Đúng: Data that requires advanced tools and methods to process.