R Data Types and Data.Table Subsetting

UsefulJoy avatar
UsefulJoy
·
·
Download

Start Quiz

Study Flashcards

16 Questions

What does the %ge% operator represent in R?

Greater than or equal

What is the purpose of subsetting data in analyses?

To exclude outliers or select specific participants

What is the rule for data merges in R?

One join at a time, with the x dataset on the left

What type of join results in a dataset with only rows present in both x and y?

Natural join

What is the purpose of reshaping data?

To prepare data for repeated measures/longitudinal/panel data

What is a characteristic of wide data?

Each individual entity occupies their own row, and each of their variables occupy a single column

What is an advantage of long data?

It is easier to perform functions like filtering, aggregating and transforming

What does the %in% operator represent in R?

In

What is the purpose of the by argument in the data.table subsetting structure?

To group the data by a specific variable

What is the data type in R used for storing logical data?

Logical

What is the purpose of using logical operators in data management?

To find outliers and filter data based on conditions

What is the default treatment of TRUE and FALSE in arithmetic operations?

TRUE is treated as 1 and FALSE is treated as 0

What is the purpose of the i argument in the data.table subsetting structure?

To specify the row(s) to retrieve

What is the data type in R used for storing text type data?

Character

What is the purpose of the j argument in the data.table subsetting structure?

To specify the column(s) to retrieve

What is the data type in R used for storing real numbers?

Numeric

Study Notes

Data Structure and Manipulation

  • Data.table subsetting structure: DT[i, j, by], where DT is the data.table, i is the rows, j is the columns, and by is the grouping variable.

Data Types in R

  • Logical: used for logical data (TRUE or FALSE).
  • Integer: used for whole numbers (e.g., 0, 1, 2).
  • Numeric: used for real numbers (e.g., 1.1, 4.8) and can also be used for integer data, but is less efficient.
  • Factor: a special representation of numeric data for discrete data (e.g., study condition coded as 0, 1, 2).
  • Characters: used for text type data (e.g., names, qualitative data).

Operators

  • Logical operators: return TRUE or FALSE and are used for data management, such as finding outliers, checking values, and recoding continuous variables.
  • Examples of operators:
    • = or %ge%: Greater than or equal
    • %gl%: Greater than AND less than
    • %gel%: Greater than or equal AND less than
    • %gle%: Greater than AND less than or equal
    • %gele%: Greater than or equal AND less than or equal
    • %in%: In
    • %!in% or %nin%: Not in
    • %c%: Chain operations on the RHS together
    • %e%: Set operator, to use set notation

Subsetting Data

  • Subsetting data: a common task in analyses, often used to exclude outliers or select specific participants.
  • The order of subsetting matters, and it's essential to consider the impact of different orders.

Merging Data

  • Rules for data merges in R:
    • One join at a time
    • The x dataset is always on the left, and the y dataset is always on the right
  • Types of joins:
    • Natural join: resulting data has only rows present in both x and y (all = FALSE)
    • Full outer join: resulting data has all rows in x and all rows in y (all = TRUE)
    • Left outer join: resulting data has all rows in x (all.x = TRUE)
    • Right outer join: resulting data has all rows in y (all.y = TRUE)

Reshaping Data

  • Necessary for repeated measures/longitudinal/panel data
  • Two types of data structures:
    • Wide: each measure has a separately-named variable for each time point it was measured
    • Long: time point (or wave) is a variable, and IDs will have multiple rows
  • Characteristics of wide data:
    • Easy to read and interpret
    • Each individual entity occupies their own row, and each of their variables occupies a single column
    • Generally considered people-friendly
  • Characteristics of long data:
    • Machine-friendly data structure
    • Easier to perform functions like filtering, aggregating, and transforming
    • Often used in data analysis and modeling

This quiz covers the basics of data types in R, including logical, integer, numeric, and factor, as well as the data.table subsetting structure in R programming.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free
Use Quizgecko on...
Browser
Browser