R Data Types and Data.Table Subsetting

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is the purpose of 'psych::' in the function to find Cronbach's alpha?

To specify the package where the alpha function is located (correct)
To ensure the function works with large datasets
To create a scatterplot of the data
To select a specific type of plot for data visualization

What type of plot is used to compare the distribution of data between groups?

Barplot
Histagram
Scatterplot
Violin plot (correct)

What is the purpose of labeling axes with quantiles in data visualization?

To provide more data to the reader (correct)
To remove unnecessary ink
To make the plot more visually appealing
To simplify the plot

What type of plot is used to show the distribution of data using a smooth density function?

Density plot (B) Signup and view all the answers

What is the purpose of using shapes on scatterplots?

To distinguish between categorical variables (A) Signup and view all the answers

What is the purpose of a QQ plot?

To check if two sets of quantiles come from the same distribution (A) Signup and view all the answers

What is the purpose of using a deviates plot?

To check if a variable follows a normal distribution (A) Signup and view all the answers

What is the purpose of using themes in data visualization?

To remove unnecessary ink (B) Signup and view all the answers

What type of plot is used to show the raw data for small datasets?

Dot plot (D) Signup and view all the answers

What is the goal of data visualization in terms of data-to-ink ratio?

Less ink, more data (B) Signup and view all the answers

Which type of regression is most suitable for analyzing the number of children people have?

Poisson Regression (A) Signup and view all the answers

What is the link function used in Poisson regression?

η=g(λ)=ln(λ) (C) Signup and view all the answers

What is the assumption about the mean and variance in Poisson regression?

Mean is equal to variance (C) Signup and view all the answers

What does an Incident Rate Ratio (IRR) of 2 indicate?

A one unit increase in the predictor is associated with twice the events of the outcome (D) Signup and view all the answers

What is the purpose of exponentiating coefficients in Poisson regression?

To interpret the results on the original scale (D) Signup and view all the answers

Which type of regression is suitable for analyzing a binary outcome, such as the presence or absence of major depression?

Binary Logistic Regression (A) Signup and view all the answers

Why is linear regression rarely used for count outcomes?

All of the above (D) Signup and view all the answers

What is the distribution assumed in Poisson regression?

Poisson distribution (D) Signup and view all the answers

What is a common application of Poisson regression?

Examining risk factors for the number of accidents someone gets into (B) Signup and view all the answers

What is a potential issue with Poisson regression?

It requires a large sample size (B) Signup and view all the answers

What is the purpose of the 'by' argument in the data.table subsetting structure DT[ i , j , by ]?

To specify the grouping variable for aggregation (A) Signup and view all the answers

What is the most efficient data type to store whole numbers in R?

Integer (C) Signup and view all the answers

What is the purpose of the 'factor' data type in R?

To store categorical data with a specific order (B) Signup and view all the answers

What is the result of using logical operators in R?

A logical value (TRUE or FALSE) (B) Signup and view all the answers

What is the convention for treating boolean values in arithmetic operations in R?

TRUE is treated as 1 and FALSE is treated as 0 (A) Signup and view all the answers

What is the main purpose of using logical operators in data management?

To find outliers and values that meet specific conditions (D) Signup and view all the answers

What is the purpose of the 'i' argument in the data.table subsetting structure DT[ i , j , by ]?

To specify the row(s) to select (C) Signup and view all the answers

What is the difference between the 'numeric' and 'integer' data types in R?

Numeric is used for real numbers and integer is used for whole numbers (C) Signup and view all the answers

What is the purpose of subsetting data in analyses?

To exclude outliers (B), To select only participants who meet specific criteria (C) Signup and view all the answers

What is the rule for data merges in R?

One join at a time and the x dataset is always on the left (C) Signup and view all the answers

What is the purpose of reshaping data?

To prepare data for repeated measures/longitudinal/panel data analysis (D) Signup and view all the answers

What is a characteristic of wide data?

Each individual entity occupies a single row (A) Signup and view all the answers

What is the advantage of using the rowMeans() function?

It does not return NA even if some of the data is missing (C) Signup and view all the answers

What is the disadvantage of literally adding items together to get a total score?

If a participant misses any single item, they will be missing on the entire subscale (B) Signup and view all the answers

What is the recommended approach to scoring questionnaire scales?

Using rowMeans() and multiplying the results by the number of items (C) Signup and view all the answers

What is the purpose of the psych::alpha() function?

To calculate the reliability of a scale (C) Signup and view all the answers

What is the result of a natural join?

The data has only rows present in both x and y (D) Signup and view all the answers

What is the characteristic of long data?

Each individual entity occupies multiple rows (A) Signup and view all the answers

What is the purpose of the 'Call' section in the output of a linear regression model?

To serve as a handy reminder of the variables and outcome used in the model (A) Signup and view all the answers

In the 'Coefficients' section of a linear regression output, what does the 'Estimate' column represent?

The model parameter estimates or regression coefficients (A) Signup and view all the answers

What is the purpose of the link function in general linear models (GLMs)?

To do some transformation on eta (n) (C) Signup and view all the answers

What is the assumption of linear regression referred to as 'L.I.N.E.'?

Linear relationship, Independent variables, Normally-distributed errors, Equal variance (D) Signup and view all the answers

What is the purpose of the density plot of residuals in assessing model diagnostics?

To verify the normally-distributed errors assumption (A) Signup and view all the answers

What type of regression is used when the outcome variable is a count variable?

Poisson regression (D) Signup and view all the answers

What is the purpose of the 'Residuals' section in the output of a linear regression model?

To display the residual standard error and degrees of freedom (B) Signup and view all the answers

What is the relationship between the F-statistic and the t-statistic in a linear regression model with one predictor?

The F-statistic is identical to the t-statistic (C) Signup and view all the answers

What is the purpose of the QQ plot of residuals in assessing model diagnostics?

To identify outliers in the data (D) Signup and view all the answers

What is the definition of homoscedasticity in linear regression?

The variance of the error term is constant for each value of the predictor (A) Signup and view all the answers

What does the subscript 𝑖 in the equation yi=b0+b1∗xi+εi indicate?

That each person has their own value of 𝑦 and 𝑥 and there is some unexplained residual. (C) Signup and view all the answers

What is the purpose of squaring the residuals in linear regression?

Because we don't care if they are above or below the line. (C) Signup and view all the answers

What is the main difference between simple linear regression and multiple linear regression?

The number of predictor variables. (A) Signup and view all the answers

How do you interpret the regression coefficient 𝑏1 in multiple linear regression?

The change in 𝑦 for a one unit change in 𝑥1, controlling for all other predictors. (B) Signup and view all the answers

What is the generalized linear model (GLM) an extension of?

The linear model. (C) Signup and view all the answers

What is the purpose of the lm() function in R?

To fit a linear model. (B) Signup and view all the answers

What is the primary output of the summary() function when used with a linear model object in R?

A quick summary of the model. (D) Signup and view all the answers

What is the normal distribution also known as?

Gaussian distribution. (C) Signup and view all the answers

What are the parameters of a normal distribution?

Mean and standard deviation. (B) Signup and view all the answers

What is a probability distribution?

A function that describes the probability of a value occurring. (C) Signup and view all the answers

What is the primary advantage of using a Generalized Linear Model (GLM) over traditional linear regression for binary outcomes?

It addresses the issue of non-normality and restricts predictions to be within 0 and 1 (B) Signup and view all the answers

What is the link function used in logistic regression?

Logit function (A) Signup and view all the answers

What is the primary assumption of the Bernoulli distribution in logistic regression?

The outcome variable is a probability ranging from 0 to 1 (A) Signup and view all the answers

What is the interpretation of an odds ratio greater than 1 in logistic regression?

A positive relationship between the predictor and outcome variables (A) Signup and view all the answers

What is the purpose of checking for separation in logistic regression?

To prevent a predictor variable from perfectly predicting the outcome (A) Signup and view all the answers

What is the advantage of using logistic regression over traditional linear regression for predicting binary outcomes?

It is better suited to handle non-normality and bounded outcomes (A) Signup and view all the answers

What is the relationship between the odds ratio and the probability of the outcome occurring?

The odds ratio indicates the change in the probability of the outcome occurring (B) Signup and view all the answers

What is the purpose of the marginal effect in logistic regression?

To quantify the instantaneous effect of a change in the predictor variable (A) Signup and view all the answers

What is the primary assumption of independent errors in logistic regression?

The errors are uncorrelated (C) Signup and view all the answers

What is the requirement for the sample size in logistic regression?

The sample size should be large enough to ensure normality of the parameter distributions (A) Signup and view all the answers

What is a consequence of missing data?

Bias in results (C) Signup and view all the answers

What type of missing data is considered to be unbiased?

Missing Completely at Random (MCAR) (C) Signup and view all the answers

What is the name of the method where only complete cases are analyzed?

Listwise deletion (D) Signup and view all the answers

What is the condition required for recovering unbiased estimates in MAR data?

The missing data mechanism is conditionally independent of the estimate (A) Signup and view all the answers

What is the characteristic of data that are missing not at random?

The missingness mechanism is associated with the estimate (A) Signup and view all the answers

What is the consequence of analyzing only complete cases in MAR data?

Biased estimates (D) Signup and view all the answers

What is the assumption required for unbiased estimates in listwise deletion?

The data are missing completely at random (C) Signup and view all the answers

What is the name of the approach that involves analyzing only complete cases?

Listwise deletion (C) Signup and view all the answers

What is the purpose of the likelihood ratio test in model comparison?

To compare nested models and determine if the additional parameters significantly improve the fit (C) Signup and view all the answers

Why can't we use log likelihood for non-nested models?

Because it will always increase as additional predictors are added (D) Signup and view all the answers

What is the advantage of using the BIC over the AIC?

It has a stronger penalty for complex models (B) Signup and view all the answers

What is the purpose of fitting polynomials of different degrees?

To compare the fit of different polynomial degrees (A) Signup and view all the answers

What is the critical step in using the likelihood ratio test and AIC/BIC for model comparison?

Ensuring the observations are identical across models (D) Signup and view all the answers

What is the role of ontology in research?

It sets the assumptions about the nature of the world and the phenomenon being studied (A) Signup and view all the answers

What is the purpose of epistemology in research?

It sets the assumptions about knowledge and how it is acquired (C) Signup and view all the answers

Why is it important to consider both ontology and epistemology in research?

Because they shape our understanding of the phenomenon being studied (C) Signup and view all the answers

What is the relationship between ontology and epistemology?

They are complementary and influence each other (B) Signup and view all the answers

What is the main difference between fixed effects and random effects in regression analysis?

Fixed effects have a constant coefficient for all individuals, while random effects have varying coefficients for each individual (D) Signup and view all the answers

What is the intraclass correlation coefficient (ICC) used for?

To compare the variability between individuals to the variability within individuals (C) Signup and view all the answers

What is the purpose of the Meandeviations() function?

To calculate the between and within versions of a repeated measures variable (B) Signup and view all the answers

What is the advantage of using restricted maximum likelihood over maximum likelihood?

It is less biased and provides better variance estimates (B) Signup and view all the answers

What is the purpose of an intercept-only model?

To compare the fit of a more complex model (D) Signup and view all the answers

In a linear mixed model, what is assumed about the distribution of individual units' deviations from the fixed effect?

They follow a normal distribution with mean 0 and standard deviation equal to the standard deviation of the deviations (D) Signup and view all the answers

Which of the following research paradigms suggests that there is no fixed social reality?

Constructionism (B) Signup and view all the answers

What is the main assumption of linear mixed models?

That the random effects follow a normal distribution (A) Signup and view all the answers

What is the primary concern of qualitative research?

Understanding and interpretation of social phenomena (C) Signup and view all the answers

What is the purpose of reflexive thematic analysis?

To identify and code patterns in qualitative data (B) Signup and view all the answers

What is the difference between covariance and correlation?

Covariance measures the linear relationship between two variables, while correlation measures the strength of the relationship (D) Signup and view all the answers

What is a covariance matrix?

A set of covariance values for each pair of variables (D) Signup and view all the answers

What is the primary distinction between critical realism and constructionism?

Critical realism assumes a fixed reality, while constructionism assumes a subjective reality (C) Signup and view all the answers

What are the four elements of trustworthiness in qualitative research?

Credibility, transferability, dependability, and confirmability (C) Signup and view all the answers

What is the purpose of the 'random effects' heading in the output of a linear mixed model?

To display the variance components of the random effects (A) Signup and view all the answers

What is the purpose of qualitative sampling?

To select participants based on their competence and relevance to the study (C) Signup and view all the answers

Why is dual coding not relevant in qualitative research?

Because qualitative research assumes a subjective reality (B) Signup and view all the answers

What is the key element of qualitative research?

Defensibility (B) Signup and view all the answers

When is it possible to recover unbiased estimates?

When data are missing at random (MAR) (B) Signup and view all the answers

What is the purpose of multiple imputation?

To address missing data by generating multiple datasets (A) Signup and view all the answers

What is the formula to determine total uncertainty in the average estimate in multiple imputation?

T = V¯ + B + B/m (A) Signup and view all the answers

What is an issue with using imputed datasets with general linear models?

All of the above (D) Signup and view all the answers

What is the purpose of examining missing data before imputation?

To understand the patterns of missing data (D) Signup and view all the answers

What does the aggr() function in the VIM package in R show?

All of the above (D) Signup and view all the answers

What is the implication of mean positive affect being a cause of missingness?

Mean positive affect is MNAR (B) Signup and view all the answers

What is the purpose of pooling the results from the analyses run on each imputed dataset?

To generate an overall estimate with some estimate of uncertainty (C) Signup and view all the answers

What is the consequence of having small sample sizes when using imputed datasets with general linear models?

Increased uncertainty due to sampling variation (B) Signup and view all the answers

Why is it important to examine the patterns of missing data?

To develop an effective strategy for addressing missing data (A) Signup and view all the answers

What is the primary advantage of using linear mixed models over repeated measures ANOVA?

They can handle data with continuous time points (D) Signup and view all the answers

What is the purpose of a margin plot in identifying patterns of missing data?

To show the values of one variable when missing on the other (B) Signup and view all the answers

What is the assumption of linear regression regarding observations?

Observations are independent of each other (B) Signup and view all the answers

What is the difference between fixed effects and random effects in linear mixed models?

Fixed effects assume identical coefficients for each participant, while random effects allow for different coefficients per participant (D) Signup and view all the answers

What is the purpose of using linear mixed models instead of traditional linear regression?

To relax the assumption of independence (C) Signup and view all the answers

What is the characteristic of data that is clustered within a higher-order unit?

The data is clustered within a higher-order unit (C) Signup and view all the answers

What is the purpose of examining the distribution of stress when negative affect is missing in a margin plot?

To compare the distribution of stress when negative affect is missing or not (A) Signup and view all the answers

What is the difference between fixed effects and random intercepts?

Fixed effects have a fixed intercept, while random intercepts have a random intercept (C) Signup and view all the answers

What is the purpose of using linear mixed models in repeated measures data?

To relax the assumption of independence and account for clustering within participants (B) Signup and view all the answers

What is the characteristic of data that is repeated measures data?

The data is collected at discrete time points (A) Signup and view all the answers

What is the main purpose of calculating the Mahalanobis distance in a multivariate normal distribution?

To identify multivariate outliers (C) Signup and view all the answers

In a linear mixed model, what does the subscript 'j' indicate?

Between-person variance (D) Signup and view all the answers

What is the purpose of the likelihood ratio test (LRT)?

To compare the fit of two nested models (C) Signup and view all the answers

What is the consequence of not including a random intercept in a linear mixed model?

The model will not account for between-person variance (D) Signup and view all the answers

What is the difference between marginal and conditional effects in a linear mixed model?

Marginal effects include only fixed effects, while conditional effects include both fixed and random effects (A) Signup and view all the answers

What is the solution to convergence warnings in a linear mixed model?

All of the above (D) Signup and view all the answers

What is the purpose of including random slopes in a linear mixed model?

To account for within-person variance (D) Signup and view all the answers

What is the characteristic of a nested model?

One model is a restricted or constrained version of the other (A) Signup and view all the answers

What is the consequence of having a singularity warning in a linear mixed model?

The model is over-parameterized (B) Signup and view all the answers

What is the purpose of using the chi-squared distribution to evaluate the Mahalanobis distance?

To test for multivariate normality (B) Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Data Types and Operators

Data types in R:
- Logical: used for logical data, i.e., TRUE or FALSE
- Integer: used for whole numbers, e.g., 0, 1, 2
- Numeric: used for real numbers, e.g., 1.1, 4.8; can also be used for integer data, but it's a less efficient format
- Factor: a special representation of numeric data when the data are fundamentally discrete
- Characters: used for text type data, e.g., names, qualitative data
Operators:
- Logical operators: used to compare values and return TRUE or FALSE
- Examples of operators: =, %>%, %<%, %in%, %!in%, %c%, %e%
- Boolean values: can be used to refer to things that return a boolean value (TRUE or FALSE)

Data Management

Subsetting data:
- A common task in analyses, e.g., excluding outliers, selecting specific participants
- Order of subsetting can matter
Merging data:
- Rules: one join at a time, x dataset on the left, y dataset on the right
- Types of joins: natural, full outer, left outer, right outer
Reshaping data:
- Necessary for repeated measures/longitudinal/panel data
- Wide format: each measure has a separate variable for each time point
- Long format: time point is a variable, IDs have multiple rows

Scoring Questionnaire Scales

Two ways to score questionnaire scales:
- Add items together to get a sum total score
- Calculate an average of all items
Using the rowMeans() function:
- Can be used to exclude missing data
- Multiply results by the number of items to get a total score
psych::alpha() function: used to calculate Cronbach's alpha, a measure of scale reliability

Data Visualization

Types of plots:
- Bivariate plots: show the relationship between two variables
- Univariate plots: show the distribution of a single variable
- Violin plots: used to compare the distribution of data between groups
- Histograms: show the distribution of a single variable
- Density plots: show the distribution of a single variable
- Dot plots: show the distribution of a single variable
- QQ plots: used to compare the distribution of two variables
Best practices:
- Aim for a high data-to-ink ratio
- Use themes to achieve this
- Axes can be useful for providing more data
- Shapes can be used to quickly identify categorical variables

Linear Regression

Simple linear regression:
- Equation: yi = b0 + b1 * xi + εi
- Parameters: b0 (intercept), b1 (slope), εi (residual)
Multiple linear regression:
- Equation: yi = b0 + b1 * x1i + ... + bk * xki + εi
- Parameters: b0 (intercept), b1, ..., bk (slopes), εi (residual)
Line of best fit: the regression line that minimizes the sum of squared residuals
Residuals: the difference between the observed and predicted values
Interpretation of R output:
- Coefficients: the estimated regression coefficients
- Std. Error: the standard error of the coefficients
- t value: the t-value for each coefficient
- p value: the probability value for each coefficient

Generalized Linear Models (GLMs)

GLMs: extend linear regression to different outcomes
Examples of GLMs:
- Linear regression: continuous, normally distributed variables
- Logistic regression: binary 0/1 variables
- Poisson regression: count variables
Link function: transforms the linear predicted value to the desired scale
Inverse link function: transforms the predicted value back to the original scale

Poisson Regression

Poisson regression: used for count variables
Assumptions:
- Poisson distribution
- Mean and variance are equal
- Linear relationship on the link scale (ln)
- No need to worry about normally distributed errors or equal variance
Link function: η = ln(λ)
Incident rate ratios (IRRs): the ratio of the expected outcome for a one-unit change in the predictor
How to do Poisson regression in R:
- Use the glm() function with the family = poisson argument### Interpreting IRRs in Poisson Regression
IRRs are interpreted as a multiplicative change in the outcome for each one unit change in the predictor score.
An IRR of 1 means no change in the outcome, equivalent to a coefficient of 0 on the link (log) scale.
To interpret Poisson regression outcomes, coefficients need to be exponentiated to take them out of log space.

Binary Logistic Regression

Binary logistic regression is used for outcomes with only two values: 0 or 1.
It is useful for questions such as predicting disease occurrence, treatment outcomes, or probability of events.
Linear regression is not suitable for binary outcomes because:
- Straight lines can predict impossible values.
- Binary variables or residuals do not follow a normal distribution.

GLM Solutions

Link functions transform linear predicted values to ensure they never go below 0 or above 1.
The Bernoulli distribution is used instead of the normal distribution, with a single parameter: the average probability of an event occurring (p or μ).

Logistic Regression

The link function is defined as η=g(μ)=ln(μ/1−μ), known as the logit function.
The probability that the outcome will be 1 is denoted as μ, ranging from 0 to 1.
Assumptions of logistic regression include:
- Bernoulli distribution of the outcome.
- Linear relationship on the link scale.
- Independent variables and errors.
- No outliers or separation.
- Large sample size.

Performing Logistic Regression in R

Use the glm() function with the 'family = binomial' argument.

Odds Ratio and Marginal Effect

The odds ratio indicates how many more times the odds of the outcome occurring will be for a one unit change in the predictor.
An odds ratio > 1 indicates a positive relationship, while < 1 indicates a negative relationship.
The marginal effect is the instantaneous effect of change at a particular point, equivalent to the slope of a straight line.

Missing Data

Missing data are common but problematic, leading to biased results and loss of efficiency
Types of missing data:
- Missing Completely at Random (MCAR): missingness is independent of observed and unobserved data
- Missing at Random (MAR): missingness depends on observed data
- Not Missing at Random (NMAR): missingness depends on unobserved data
Consequences:
- List-wise deletion leads to inefficiencies and biased results unless data are MCAR
- Multiple imputation can recover unbiased estimates for MAR data
- NMAR data cannot be recovered

Multiple Imputation

A robust approach to address missing data
Steps:
1. Start with incomplete data
2. Generate multiple datasets with imputed values
3. Analyze each dataset
4. Pool results to estimate parameters and uncertainty
Formula for total uncertainty: T = V¯ + B + B/m

Examining Missing Data

Use the VIM package in R to explore missing data
Functions: aggr(), marginplot()
Goals:
- Identify patterns of missing data
- Check for overlap between variables
- Identify potential issues with data

Clustered Data

Data are clustered when observations are not independent
Examples:
- Repeated measures data (longitudinal studies)
- Grouped data (people within families, schools, companies)
Statistical methods for clustered data:
- Linear mixed models
- Repeated measures ANOVA (limited to discrete time points, equal number of time points, and normal distribution)

Linear Mixed Models

Relax the assumption of independence in linear regression
Types of effects:
- Fixed effects: slope and intercept are identical for everyone
- Random effects: slopes and intercepts vary randomly for each participant
Benefits:
- Handles clustered data
- Allows for varying slopes and intercepts
- Can handle continuous time and missing data

Intraclass Correlation Coefficient (ICC)

Measures the ratio of between-person variance to total variance
Interpretation:
- 0: all individual means are identical
- 1: all values are identical within individuals and vary between individuals
Example: ICC of 0.25 means 25% of variance is between people and 75% is within individuals

Linear Mixed Model Assumptions

Normal distribution of individual intercepts
Constant variance
Independent and identically distributed residuals

Interpreting R Output

Random effects: ID x SD is the average difference between an individual's average and the population average
Residual x SD is the average difference between an individual score and predicted score
Fixed effects: intercept x estimate is the fixed effect of the intercept### Model Comparison
Model comparison involves checking all observations are the same
LRTs (Likelihood Ratio Tests) are used to compare nested models (m0 vs m1, m0 vs m2, ..., m0 vs malt)
AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are used to evaluate all models

Qualitative Research

Ontology and Epistemology

Ontology: assumptions about the nature of the world and the phenomenon within it
Epistemology: theory of knowledge, concerned with the mind's relation to reality
Importance of ontology and epistemology: shape how we know what is true and judge competing truth claims

Qualitative Research Philosophy

Critical Realism: fixed reality, interpreted differently by individuals
Constructionism: no fixed social reality, meaning is given through individual experiences
Two schools of thought:
- Critical Realism → thematic analysis (methodology)
- Phenomenology → interpretative phenomenological analysis

Qualitative Research Methodology

Qualitative sampling: non-probability sampling, sampling based on competence rather than representativeness
Alternative to reliability and validity: rigour and trustworthiness
Four elements of trustworthiness:
- Credibility: confidence in the accuracy of findings
- Transferability: applicability of findings in other contexts
- Dependability: consistency and replicability of findings
- Confirmability: neutrality of findings, free from researcher bias

Reflexive Thematic Analysis

Method for developing, analysing, and interpreting patterns in qualitative data
Involves systematic processes of data coding to develop themes
Steps in reflexive thematic analysis:
1. Familiarising with the dataset
2. Coding
3. Generating initial themes
4. Developing and reviewing themes
5. Refining, defining, and naming themes
6. Writing up

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

R Data Types and Data.Table Subsetting

Choose a study mode

Podcast

Questions and Answers

What is the purpose of 'psych::' in the function to find Cronbach's alpha?

What type of plot is used to compare the distribution of data between groups?

What is the purpose of labeling axes with quantiles in data visualization?

What type of plot is used to show the distribution of data using a smooth density function?

What is the purpose of using shapes on scatterplots?

What is the purpose of a QQ plot?

What is the purpose of using a deviates plot?

What is the purpose of using themes in data visualization?

What type of plot is used to show the raw data for small datasets?

What is the goal of data visualization in terms of data-to-ink ratio?

Which type of regression is most suitable for analyzing the number of children people have?

What is the link function used in Poisson regression?

What is the assumption about the mean and variance in Poisson regression?

What does an Incident Rate Ratio (IRR) of 2 indicate?

What is the purpose of exponentiating coefficients in Poisson regression?

Which type of regression is suitable for analyzing a binary outcome, such as the presence or absence of major depression?

Why is linear regression rarely used for count outcomes?

What is the distribution assumed in Poisson regression?

What is a common application of Poisson regression?

What is a potential issue with Poisson regression?

What is the purpose of the 'by' argument in the data.table subsetting structure DT[ i , j , by ]?

What is the most efficient data type to store whole numbers in R?

What is the purpose of the 'factor' data type in R?

What is the result of using logical operators in R?

What is the convention for treating boolean values in arithmetic operations in R?

What is the main purpose of using logical operators in data management?

What is the purpose of the 'i' argument in the data.table subsetting structure DT[ i , j , by ]?

What is the difference between the 'numeric' and 'integer' data types in R?

What is the purpose of subsetting data in analyses?

What is the rule for data merges in R?

What is the purpose of reshaping data?

What is a characteristic of wide data?

What is the advantage of using the rowMeans() function?

What is the disadvantage of literally adding items together to get a total score?

What is the recommended approach to scoring questionnaire scales?

What is the purpose of the psych::alpha() function?

What is the result of a natural join?

What is the characteristic of long data?

What is the purpose of the 'Call' section in the output of a linear regression model?

In the 'Coefficients' section of a linear regression output, what does the 'Estimate' column represent?

What is the purpose of the link function in general linear models (GLMs)?

What is the assumption of linear regression referred to as 'L.I.N.E.'?

What is the purpose of the density plot of residuals in assessing model diagnostics?

What type of regression is used when the outcome variable is a count variable?

What is the purpose of the 'Residuals' section in the output of a linear regression model?

What is the relationship between the F-statistic and the t-statistic in a linear regression model with one predictor?

What is the purpose of the QQ plot of residuals in assessing model diagnostics?

What is the definition of homoscedasticity in linear regression?

What does the subscript 𝑖 in the equation yi=b0+b1∗xi+εi indicate?

What is the purpose of squaring the residuals in linear regression?

What is the main difference between simple linear regression and multiple linear regression?

How do you interpret the regression coefficient 𝑏1 in multiple linear regression?

What is the generalized linear model (GLM) an extension of?

What is the purpose of the lm() function in R?

What is the primary output of the summary() function when used with a linear model object in R?

What is the normal distribution also known as?

What are the parameters of a normal distribution?

What is a probability distribution?

What is the primary advantage of using a Generalized Linear Model (GLM) over traditional linear regression for binary outcomes?

What is the link function used in logistic regression?

What is the primary assumption of the Bernoulli distribution in logistic regression?

What is the interpretation of an odds ratio greater than 1 in logistic regression?

What is the purpose of checking for separation in logistic regression?

What is the advantage of using logistic regression over traditional linear regression for predicting binary outcomes?

What is the relationship between the odds ratio and the probability of the outcome occurring?

What is the purpose of the marginal effect in logistic regression?

What is the primary assumption of independent errors in logistic regression?

What is the requirement for the sample size in logistic regression?

What is a consequence of missing data?

What type of missing data is considered to be unbiased?

What is the name of the method where only complete cases are analyzed?

What is the condition required for recovering unbiased estimates in MAR data?

What is the characteristic of data that are missing not at random?

What is the consequence of analyzing only complete cases in MAR data?

What is the assumption required for unbiased estimates in listwise deletion?

What is the name of the approach that involves analyzing only complete cases?