Data Subsetting and dplyr in R
41 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the purpose of subsetting in data manipulation?

  • To organize data by date
  • To change the order of columns in a data frame
  • To create a new data set with selected rows and columns (correct)
  • To remove all missing values from data

How can you exclude specific observations when subsetting a data frame in R?

  • By using the `remove()` function
  • By listing all observations you want to keep
  • By specifying `FALSE` for those observations
  • By using negative indexing with the `-c()` function (correct)

Which of the following commands creates a subset of the iris data containing only the first three variables?

  • sub1 = iris[c(1, 30, 50), c(1, 2, 3)]
  • sub1 = iris[c(1, 30, 50), 1:3] (correct)
  • sub1 = iris[-c(1, 30, 50), c(1, 2, 3)]
  • sub1 = iris[1:3, c(1, 30, 50)]

Which method is used to extract a specific variable from a data frame in R?

<p>Using $ followed by the variable name (D)</p> Signup and view all the answers

In the second example, what is the result of the command sub2 = iris[-c(1, 30, 50),]?

<p>A data frame without the first, 30th, and 50th observations (D)</p> Signup and view all the answers

What will the variable sub1 contain after executing the command sub1 = iris[c(1, 30, 50), 1:3]?

<p>Only the first three variables of the 1st, 30th, and 50th observations (A)</p> Signup and view all the answers

What happens if you use the syntax iris[1:10, 1:2]?

<p>You get the first 10 observations and the first 2 variables (C)</p> Signup and view all the answers

What is the correct way to create a subset that includes observations from rows 2 to 6 of the iris dataset?

<p>sub3 = iris[2:6, ] (B)</p> Signup and view all the answers

What operator is used in R to combine logical conditions with 'and'?

<p>&amp; (C)</p> Signup and view all the answers

Which subset condition identifies Setosa species with a Sepal.Length greater than 5?

<p>cond4 = (iris$Species == 'setosa') &amp; (iris$Sepal.Length &gt; 5) (D)</p> Signup and view all the answers

How would you create a subset of the irises that are not Setosas or have Sepal.Width less than or equal to 4?

<p>cond5 = (iris$Species != 'setosa') | (iris$Sepal.Width &lt;= 4) (D)</p> Signup and view all the answers

What is the purpose of the select() function in R's dplyr package?

<p>To choose variables (columns) by locations or names. (B)</p> Signup and view all the answers

Which of the following correctly uses select() to subset the first, second, and fifth columns?

<p>sub1 = iris %&gt;% select(1, 2, 5) (D)</p> Signup and view all the answers

In the example provided, what does the pipe operator (%) do in R?

<p>It separates two commands in a sequence. (D)</p> Signup and view all the answers

What data structure is returned when using select() on the iris dataset?

<p>Data frame (A)</p> Signup and view all the answers

Which of the following conditions will result in a subset that excludes Setosa species?

<p>cond5 = (iris$Species != 'setosa') (C)</p> Signup and view all the answers

What function is used to select specific columns by their names in a data frame?

<p>select() (D)</p> Signup and view all the answers

Given the command sub2 = iris %>% select(Sepal.Length, Sepal.Width, Species), which columns are retained in the new data frame?

<p>Sepal.Length, Sepal.Width, Species (C)</p> Signup and view all the answers

Which command would exclude the first two and fifth variables from the iris dataset?

<p>sub3 = iris %&gt;% select(-1, -2, -5) (D)</p> Signup and view all the answers

What does the filter() function accomplish in data manipulation?

<p>Subsets rows based on conditions (C)</p> Signup and view all the answers

Which of the following commands correctly creates a subset of the iris data frame where Species is 'setosa' and Sepal.Length is greater than 5?

<p>sub5 = iris %&gt;% filter(Species == 'setosa' &amp; Sepal.Length &gt; 5) (A)</p> Signup and view all the answers

If the following command is executed: sub4 = iris %>% select(-Sepal.Length, -Sepal.Width, -Species), which columns are included in sub4?

<p>Petal.Length and Petal.Width (B)</p> Signup and view all the answers

When using the dplyr package in R, which operator is commonly used to chain commands together?

<p>%&gt;% (B)</p> Signup and view all the answers

In a data manipulation context, which statement best describes the purpose of excluding variables?

<p>To simplify the data frame by removing non-essential columns (D)</p> Signup and view all the answers

What does the function data.frame() do in the context of creating subsets in R?

<p>It creates a new data frame from specified columns. (B)</p> Signup and view all the answers

How do you create a condition for selecting rows where the Sepal.Length is greater than 5?

<p>cond2 = (iris$Sepal.Length &gt; 5) (A)</p> Signup and view all the answers

What operator is used in R to represent a logical 'and' when combining conditions?

<p>&amp; (D)</p> Signup and view all the answers

Which of the following will create a subset containing only the irises that are 'setosa'?

<p>sub4 = iris[iris$Species == 'setosa',] (C)</p> Signup and view all the answers

What will happen if you use cond3 = (iris$Species == 'setosa') to create a subset?

<p>It will create a subset with only non-setosa species. (B)</p> Signup and view all the answers

What is the proper way to display the first few rows of a new data frame in R?

<p>head(sub6) (A)</p> Signup and view all the answers

Which of the following statements about the logical condition syntax in R is true?

<p>Logical conditions can be combined with both 'and' and 'or' operators. (D)</p> Signup and view all the answers

What result will sub6 = iris[cond3,] yield if cond3 is defined as (iris$Species != 'setosa')?

<p>It will yield rows of species that are not 'setosa'. (C)</p> Signup and view all the answers

Which of the following correctly creates a subset of rows where Sepal.Length is less than or equal to 4.5?

<p>cond = (iris$Sepal.Length &lt;= 4.5) (D)</p> Signup and view all the answers

Which code correctly creates a subset of irises with Species not equal to 'setosa' or Sepal.Width less than or equal to 4?

<p>sub6 = iris %&gt;% filter(Species != 'setosa' | Sepal.Width &lt;= 4) (A)</p> Signup and view all the answers

What will happen if the select function is called before the filter function in this code: iris %>% select(-Species) %>% filter(Species == 'setosa')?

<p>It will produce an error indicating 'Species' not found. (B)</p> Signup and view all the answers

When creating a subset of the iris dataset to only include species Setosas while excluding the Species column, which command is correct?

<p>sub7 = iris %&gt;% filter(Species == 'setosa') %&gt;% select(-Species) (A)</p> Signup and view all the answers

How do you create a subset of the iris dataset that contains only the last two variables?

<p>iris %&gt;% select(dim(iris)-1, dim(iris)) (A)</p> Signup and view all the answers

Which command correctly filters the iris dataset to only include records with Petal.Length greater than 6?

<p>iris %&gt;% filter(Petal.Length &gt; 6) (D)</p> Signup and view all the answers

Which of the following statements about the use of the filter and select functions is true?

<p>Select can be used after filter to include more variables. (B)</p> Signup and view all the answers

What is the output of this command: iris %>% filter(Petal.Length > 6) %>% select(Species)?

<p>It will display only the Species of irises with Petal.Length greater than 6. (C)</p> Signup and view all the answers

When creating a subset of the iris dataset that contains only those records with Sepal width greater than 4 while including only the two sepal variables, which command is correct?

<p>iris %&gt;% filter(Sepal.Width &gt; 4) %&gt;% select(Sepal.Length, Sepal.Width) (C)</p> Signup and view all the answers

Study Notes

Data Subsetting

  • data.frame objects can be subsetted similarly to matrices
  • Subset rows by index: iris[c(1, 30, 50),]
  • Subset columns by index: iris[, 1:3]
  • Subset specific rows and columns: iris[c(1, 30, 50), 1:3]
  • Use '-' to exclude specific elements: iris[-c(1, 30, 50),]
  • Extract variables using '′followedbythevariablename:iris' followed by the variable name: iris′followedbythevariablename:irisSepal.Length

Subsetting by Conditions

  • Create logical conditions using comparison operators:
    • == (equal to)
    • != (not equal to)
    • > (greater than)
    • < (less than)
    • >= (greater than or equal to)
    • <= (less than or equal to)
  • Combine logical conditions using:
    • & (and)
    • | (or)
  • Subset rows based on logical conditions: iris[iris$Species == 'setosa',]

dplyr Package for Data Manipulation

  • select() function:
    • Select columns by their location: iris %>% select(1, 2, 5)
    • Select columns by their name: iris %>% select(Sepal.Length, Sepal.Width, Species)
    • Exclude specific columns: iris %>% select(-1, -2, -5) or iris %>% select(-Sepal.Length, -Sepal.Width, -Species)
  • filter() function:
    • Subset rows based on conditions: iris %>% filter(Species == 'setosa')
    • Combine multiple conditions: iris %>% filter((Species == 'setosa') & (Sepal.Length > 5))
  • %>% (pipe operator): Used to chain multiple dplyr operations

Exercise 2 Notes

  • Create a subset of iris with the last two variables: iris %>% select(dim(iris)-1, dim(iris))
  • Create a subset of iris with petal length greater than 6: iris %>% filter(Petal.Length > 6)
  • Create a subset of iris with the two sepal variables and sepal width greater than 4: iris %>% select(Sepal.Length, Sepal.Width) %>% filter(Sepal.Width > 4)

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Lec5.pdf

Description

This quiz covers the basics of data subsetting in R using data.frame objects, logical conditions, and the dplyr package for data manipulation. You'll learn how to select, exclude, and extract specific rows and columns from datasets, as well as apply logical operations for filtering data effectively.

More Like This

Use Quizgecko on...
Browser
Browser