Data Types in R

LovableOrphism avatar
LovableOrphism
·
·
Download

Start Quiz

Study Flashcards

3 Questions

What is the purpose of Cronbach's alpha?

To assess the internal consistency or reliability of a scale

Define Data to Ink Ratio in data visualization.

The Data to Ink Ratio is the ratio of data presented to the amount of ink used in the graph, aiming to maximize efficiency and clarity in graphical representation.

In a Simple Linear Regression equation 𝑦=𝑏0+𝑏1𝑥+𝜖, 𝑏0 represents the ______.

Intercept

Study Notes

Data Types in R

  • Character: Text data (e.g., names, qualitative data).
  • Numeric: Real numbers, including integers and decimals (e.g., 1.1, 3.14).
  • Integer: Whole numbers (e.g., 0, 1, 2).
  • Complex: Numbers with real and imaginary parts.
  • Logical: Boolean values, TRUE or FALSE.
  • Dates: Represented as date objects (e.g., "2024-06-01").
  • Factors: Categorical data stored as integers with labels.

Operators in R

  • Logical Operators:
    • ==: Equal to.
    • !=: Not equal to.
    • >=: Greater than or equal to.
    • %in%: Element in a vector.
    • %!in%: Element not in a vector.
  • Boolean Values:
    • TRUE and FALSE: Used in logical operations.

Subsetting Data

  • Full outer join: The data has all rows in x and all rows in y. Argument: all = TRUE.
  • Left outer join: Resulting data has all rows in x. Argument: all.x = TRUE.
  • Right outer join: Resulting data has all rows in y. Argument: all.y = TRUE.

Long vs. Wide Format

  • Long Format: Multiple observations per ID/grouping variable. Used for repeated measures or longitudinal data.
  • Wide Format: Each subject's repeated measures are in a single row with multiple columns.

Data Visualisation

  • Scoring:
    • Summing Scores: Add individual item scores directly.
    • Using rowMeans: Calculate the mean of item scores for each participant, excluding missing data.

Cronbach's alpha

  • Purpose: Used to assess the internal consistency or reliability of a scale.
  • Why We Do Reliability: To ensure that the scale measures consistently across different items.
  • Calculation: Collect item scores and use the alpha() function from the psych package.

Data to Ink Ratio

  • Definition: The ratio of data presented to the amount of ink used in the graph.
  • Goal: Maximize data-to-ink ratio to ensure the graph is efficient and not cluttered.
  • Good Practices: Remove unnecessary borders, gridlines, and background colours; use simple, clean designs.

Types of Graphs/Plots

  • Histogram: Visualise the distribution of a continuous variable.
  • Density: Show the distribution of a continuous variable using a smooth curve.
  • Stack dotplot: Display individual data points, useful for small datasets.
  • Barchart: Compare counts or portions across categories.
  • Scatterplot with regression line: Show the relationship between two continuous variables.
  • Violin plots: Combine boxplots and density plot to show data distribution.

GLM Part 1

  • Line of Best Fit: The line that best represents the data in a scatter plot by minimizing the sum of the squared residuals.
  • Residuals: The differences between the observed values and the values predicted by the regression model.
  • Unstandardised Coefficients (B): Useful for understanding the real-world impact of predictors.
  • Standardised Coefficients (Beta): Useful for comparing the relative importance of predictors.
  • Simple Linear Regression: Equation: 𝑦=𝑏0+𝑏1𝑥+𝜖.
  • Multiple Linear Regression: Equation not provided.

Learn about the different data types in R, including character, numeric, integer, complex, and logical, as well as special data types like dates and factors.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free
Use Quizgecko on...
Browser
Browser