Data Science Midterm Exam
48 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does ETL stand for in the context of data processing?

Extract, Transform, Load

How does structured data differ from unstructured data?

Structured data is organized and easily analyzed, while unstructured data lacks a predefined format.

What is meant by the term 'veracity' in data quality?

Veracity refers to the accuracy and truthfulness of the data.

What does the term 'velocity' refer to in the context of big data?

<p>Velocity refers to the speed at which new data is generated.</p> Signup and view all the answers

What process do companies use to convert raw data into useful information?

<p>Data Mining</p> Signup and view all the answers

What is a dataset in relation to data management?

<p>A dataset consists of collections of data corresponding to specific rows or records.</p> Signup and view all the answers

In data architecture, where is data shared with consumers?

<p>Applications</p> Signup and view all the answers

What phase typically precedes the Modeling phase in the data lifecycle?

<p>Business Understanding</p> Signup and view all the answers

What is the most popular form of traditional data-integration and storage software?

<p>Relational database management system (RDBMS)</p> Signup and view all the answers

What type of system is described as a user-friendly decision support system for data aggregation, integration, and reporting?

<p>BI Solution</p> Signup and view all the answers

To set up R, what is the first step required?

<p>Downloading and installing it</p> Signup and view all the answers

In R, which space displays the graphs created during exploratory data analysis?

<p>Graphical Output</p> Signup and view all the answers

What area in R provides the space to write and run code?

<p>R Script</p> Signup and view all the answers

Which area in R displays external elements such as datasets and functions?

<p>R Environment</p> Signup and view all the answers

What is the significance of constructing a data matrix in data science?

<p>It is essential for organizing and analyzing data effectively.</p> Signup and view all the answers

Describe the Data Science Life Cycle.

<p>It is an iterative process involving research and discovery that guides predictive modeling tasks.</p> Signup and view all the answers

What do you click to run selected code lines in the R Script?

<p>Ctrl + Enter</p> Signup and view all the answers

What type of analysis would you perform to answer questions about 'how much' or 'how many'?

<p>Regression analysis is used for such queries.</p> Signup and view all the answers

What is the primary function of the R Console in the R environment?

<p>To execute R commands interactively</p> Signup and view all the answers

What data science process would you use to determine group classifications?

<p>Clustering is the process used for identifying groups.</p> Signup and view all the answers

What is a charter document in data science projects?

<p>It is a living document that gets updated with new data or requirements.</p> Signup and view all the answers

What does a data dictionary provide in a data science project?

<p>It describes the supplied data, schema, and entity-relation diagrams.</p> Signup and view all the answers

What is the purpose of the data acquisition process?

<p>It involves obtaining data through an ETL pipeline.</p> Signup and view all the answers

How is feature engineering related to applied machine learning?

<p>Feature engineering can be considered as applied machine learning.</p> Signup and view all the answers

What are the three basic ways to handle data in R?

<p>Split, Apply, and Combine.</p> Signup and view all the answers

Name two libraries in R Studio that aid in data visualization.

<p>ggplot2 and dplyr.</p> Signup and view all the answers

What is a key difference between the normal R Environment and R Studio?

<p>R Studio is an integrated development environment that offers advanced features, while the R Environment is basic and lacks such tools.</p> Signup and view all the answers

Describe the purpose of the R Console in R Studio.

<p>The R Console displays output from code and allows for direct command input.</p> Signup and view all the answers

What is the R Script component used for in R Studio?

<p>It is used to write and save code, allowing the execution of selected lines.</p> Signup and view all the answers

How does the R Environment component help users in R Studio?

<p>It shows all created or loaded data, variables, and functions.</p> Signup and view all the answers

What type of output does the Graphical Output component in R Studio display?

<p>It displays graphs and plots generated from the code executed.</p> Signup and view all the answers

Why is R widely used in scientific research?

<p>R is extensively used for statistical analysis and scientific computing.</p> Signup and view all the answers

What is the primary activity involved in data mining?

<p>It involves going through large amounts of data to find relevant or pertinent information.</p> Signup and view all the answers

Define big data in terms of its characteristics.

<p>Big data refers to large volumes of raw data that are collected, stored, and analyzed to enhance organizational efficiency.</p> Signup and view all the answers

What is the purpose of the R Console in R programming?

<p>The R Console allows users to run R code and see immediate output.</p> Signup and view all the answers

What makes unstructured data more challenging to analyze?

<p>Unstructured data is difficult to analyze due to its variety of formats and lack of a predefined data model.</p> Signup and view all the answers

What does veracity in data science refer to?

<p>Veracity refers to the accuracy and trustworthiness of the data.</p> Signup and view all the answers

Which function in R would you use to determine the low-level data type of an object?

<p>You would use the <code>typeof()</code> function.</p> Signup and view all the answers

What is the main purpose of data science in a business context?

<p>The main purpose of data science is to improve decision-making within an organization.</p> Signup and view all the answers

How does the length() function differ when applied to one-dimensional and two-dimensional objects in R?

<p>The <code>length()</code> function returns the number of elements in one-dimensional objects and the total number of elements in a two-dimensional object.</p> Signup and view all the answers

What is the function of class() in R programming?

<p><code>class()</code> identifies the high-level type of an object in R.</p> Signup and view all the answers

Explain customer segmentation in data science terminology.

<p>Customer segmentation, also known as clustering, involves grouping customers based on similarities in their behavior or characteristics.</p> Signup and view all the answers

In R, what does transforming data refer to?

<p>Transforming data involves changing its structure or values using functions like <code>apply()</code>.</p> Signup and view all the answers

What is the process called that identifies products frequently bought together?

<p>This process is known as association-rule mining.</p> Signup and view all the answers

What operations does aggregating and merging refer to in the context of handling data in R?

<p>Aggregating summarizes data values, while merging combines datasets.</p> Signup and view all the answers

What does it mean for insights to be actionable in data science?

<p>Actionable insights are those that we have the capacity to use effectively for decision-making.</p> Signup and view all the answers

What is the significance of the R Environment in R programming?

<p>The R Environment keeps track of all variables and data sets you have created during your session.</p> Signup and view all the answers

How is the attributes() function used in R?

<p><code>attributes()</code> retrieves metadata about an object, such as names and dimensions.</p> Signup and view all the answers

Study Notes

Data Science Midterm Exam

  • Data Mining: The process of analyzing large datasets to uncover relevant information.
  • Big Data: A massive amount of raw data collected, stored, and analyzed to improve efficiency and decision making.
  • Unstructured Data: Data difficult to analyze due to various formats and not easily processed by traditional methods.
  • Veracity: The accuracy of data.
  • Data Science: The process of improving decision making, largely relevant to business.
  • Clustering: In data science, a method to segment customers.
  • Association Rule Mining: Discovering patterns in data identifying frequently bought products together.
  • Actionable: Insights gained from analysis that can be used in practice.
  • Attributes: Characteristics of entities, including name, address, etc.
  • Dataset: A collection of data values for a given variable.
  • Analytics Record: The construction of this data matrix is a prerequisite for doing data science.
  • Data Science Life Cycle: An iterative process for research and discovery aiding building predictive models.
  • Regression: Determining how much or how many, based on the question.

Exam Questions and Answers (Multiple Choice)

  • Data Transformation: The process of aggregating and transforming raw data.
  • Feature Engineering: Creating distinctive features from raw data for analysis.
  • Outlier Detection: Identifying data points far from the rest of the data.
  • Binning: Grouping data points into bins.
  • Data Acquisition Phase: The first step in ETL (Extract, Transform, Load), which involves collecting and storing data.
  • Business Understanding: The step in data science that is the initial exploration and understanding of the problem.
  • Modeling: Creating predictive models, which is one of the steps in the Data Science Life Cycle
  • Deployment: Putting the predictive models into action, which is one of the steps in the Data Science Life Cycle.
  • Structured Data: Data that is easily analyzed and organized into a database.
  • Huge Data: More easily analyzed and organized data.
  • Big Data: More easily analyzed and organized data.
  • Variety: Refers to the different types of data involved.
  • Volume: Amount of data.
  • Velocity: Data generation rate.
  • Veracity: Accuracy of the data.

Other Exam Topics (Short Answer)

  • Data Dictionary: A description of data, schemas, and entities and relationships in diagrams.
  • Data Acquisition Process: Generally achieved through an ETL pipeline.
  • Data Pipeline: Diagram or description of data and related processes, showing solution architecture.
  • Feature Engineering: Applied machine learning that modifies data inputs
  • Compiler (R): Needed to run R code.
  • Graphical Output: Where graphs are displayed related to exploratory data analysis.
  • R Environment: Data fields, variables, and vectors are shown in this interactive element.
  • R Script: Used to save and execute code commands.
  • R Console: Where the output of running codes can be seen.
  • Class(): Functions in R programming to identify objects (low-level).
  • Length(): R functions to understand length of objects.
  • Typeof(): R function that returns object's type.
  • Attributes(): R functions concerned with characteristics of objects.
  • Apply: R function about transforming and recalculating data.
  • Split: Dividing data into smaller sets.
  • Combine: Combining data from different sets.
  • Handling Data in R: Splitting, applying, or combining data sets to manage data.
  • R and R Studio difference: R is a programming language, R Studio is an IDE for R.
  • R Interface Components: R Console, R Script, R Environment, and Graphical Output areas.
  • Dplyr Package: Important for Data Handling and manipulation.
  • Event: A possible outcome of an experiment in probability.
  • Sample Size (n): Number of data observations in a sample.
  • Mode: Most frequent value in a data set.
  • Interquartile Range (IQR): Middle 50% of sorted/ordered data.
  • Statistical Inference: Used to explore measures from unobserved data.
  • Standard Deviation: How far data is spread from the average.
  • Outliers: Data points far from the rest of the data.
  • Range: Difference between the largest and smallest data point in a sample.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Test your knowledge on key concepts in data science with this midterm exam. Questions cover essential topics such as data mining, big data, unstructured data, and clustering. Perfect for students looking to assess their understanding of the material.

More Like This

Big Data: Data Mining
5 questions
Análisis de Datos en Big Data
22 questions

Análisis de Datos en Big Data

SelfSatisfactionTuring avatar
SelfSatisfactionTuring
Global Scope of Data Mining
5 questions
Use Quizgecko on...
Browser
Browser