Podcast
Questions and Answers
Which command is used to remove all objects from the R workspace?
Which command is used to remove all objects from the R workspace?
The function mean() can be used to calculate the median of a set of numbers.
The function mean() can be used to calculate the median of a set of numbers.
False
What is the syntax to create a vector called 'y' containing the values 1, 5, 4, and 8?
What is the syntax to create a vector called 'y' containing the values 1, 5, 4, and 8?
y=c(1,5,4,8)
The function used to load a saved workspace in R is called _____
The function used to load a saved workspace in R is called _____
Signup and view all the answers
Match the following R commands with their functions:
Match the following R commands with their functions:
Signup and view all the answers
Which function would you use to check for missing values in a dataset?
Which function would you use to check for missing values in a dataset?
Signup and view all the answers
The function na.omit()
removes rows that contain any missing values.
The function na.omit()
removes rows that contain any missing values.
Signup and view all the answers
What is the main purpose of the complete.cases()
function?
What is the main purpose of the complete.cases()
function?
Signup and view all the answers
To create a dataframe in R using vectors for age, gender, and weight, you would use the function data.frame()
with the vectors named as ___.
To create a dataframe in R using vectors for age, gender, and weight, you would use the function data.frame()
with the vectors named as ___.
Signup and view all the answers
Match the following R functions with their purposes:
Match the following R functions with their purposes:
Signup and view all the answers
What command is used to install the Ecdat package in R?
What command is used to install the Ecdat package in R?
Signup and view all the answers
The command library('datasets')
is used to load the datasets package into the R workspace.
The command library('datasets')
is used to load the datasets package into the R workspace.
Signup and view all the answers
How can you access the weight of the first individual in the 'data' dataframe created from age, gender, and weight vectors?
How can you access the weight of the first individual in the 'data' dataframe created from age, gender, and weight vectors?
Signup and view all the answers
What happens when you execute 'mean(y, na.rm = TRUE)' in R?
What happens when you execute 'mean(y, na.rm = TRUE)' in R?
Signup and view all the answers
A dataframe in R can be created by importing text or Excel files.
A dataframe in R can be created by importing text or Excel files.
Signup and view all the answers
What function is used to import a text file into R?
What function is used to import a text file into R?
Signup and view all the answers
The command to display the first six rows of a dataframe is ____.
The command to display the first six rows of a dataframe is ____.
Signup and view all the answers
Match the following dataframe functions with their purposes:
Match the following dataframe functions with their purposes:
Signup and view all the answers
When importing a dataframe from an Excel file, what is the first step you should take in RStudio?
When importing a dataframe from an Excel file, what is the first step you should take in RStudio?
Signup and view all the answers
The function 'merge()' is used to sort a dataframe according to a variable.
The function 'merge()' is used to sort a dataframe according to a variable.
Signup and view all the answers
What would be the output of 'class(nobel)' after importing the dataframe?
What would be the output of 'class(nobel)' after importing the dataframe?
Signup and view all the answers
To create a vector in R containing the values 29020, 32500, and 40320, one could use the command ____.
To create a vector in R containing the values 29020, 32500, and 40320, one could use the command ____.
Signup and view all the answers
Which of the following statements regarding dataframes is false?
Which of the following statements regarding dataframes is false?
Signup and view all the answers
Study Notes
Data Analysis using R
- R is a programming language and software environment for statistical computing and graphics
- RStudio is a desktop environment for R
- R Data analysis involves creating, uploading and using a database in R
- Workspace in RStudio includes variables, mathematical vectors, matrices, lists, and dataframes
- To show objects in a workspace, use
ls()
. - To remove all objects in a workspace use
rm(list=ls(all=TRUE))
- To select a folder for saving the workspace use
getwd()
andsetwd("C:/name/name/name")
- To save the workspace to a file use
save.image("name.Rdata")
- To load the saved workspace use
load("name.Rdata")
- Variables are assigned values using the equal sign (
=
) for example,x = 1
. -
class(x)
shows the object type. - A vector is created using
y=c(1,5,4,8)
. - The command
class(y)
displays the object type - Matrices can be created from vectors using functions like
rbind()
andcbind()
for examplem = rbind(y, z)
. - R functions are used for importing and exporting data, performing operations, generating graphs
- Functions generally use the form
function_name(par1=value1, par2=value2, …)
- Using
?mean
displays the help for themean
function - Using the
is.na()
function indicates missing values, like missing data is denotedNA
(Not Applicable) in R -
mean(y, na.rm = TRUE)
removes NAs when calculating mean. - A dataframe is a table with observations in rows and variables in columns
- Dataframes can be created in R by importing from a
text
orexcel
file - The
read.table
function allows importingtext
files.header = TRUE
shows that the first row contains column names
Dataframe Basic Functions
-
dim()
shows the number of observations and variables. -
rownames()
shows names of rows. -
colnames()
shows names of columns. -
head()
displays the first six rows of a dataframe. -
tail()
displays the last six rows of a dataframe
Dataframe Advanced Functions
-
subset()
is used to select sections of a dataframe -
subset(dataframe, subset = logical_expression, select = list_of_variables)
-
merge(df1, df2, by = list_of_variables, by.x = ,by.y = ,….)
joins two dataframes. -
df[order(df$nom_var),]
sorts the dataframe based on a column named "nom_var".
Exercise: Age and Creativity of Researchers
- Calculate the age in which scientists are highly creative using
year_research_mid
andyear_birth
data - Calculate the average age using the
mean()
function - Dataframes to include Nobel prize winners' characteristics, including the year when research was conducted and year of birth
Exercise 2: Age and Creativity of Researchers
- Calculate the average age at which scientists did their research before 1905.
- Calculate the average age at which scientists did their research after 1985.
Exercise 3: Lifespan Calculation
- Calculate the lifespan of each research using the difference between their birth year and death year (
nobel$year_death - nobel$year_birth
). - Calculate the average lifespan using the
mean()
function on the new lifespan column.
Managing Missing Values
- Problem in calculating the average lifespan with missing value.
-
is.na()
identifies missing values. -
complete.cases()
selects rows without missing values. -
na.omit()
removes rows with any missing values.
2) Dataframe Generation
- Create a dataframe from sample data including age, gender, and weight of 5 individuals
3) Dataframes in R or Database Packages
- Various data packages already existing in R
Example 1: Motor Trend US Magazine
- Data on 32 car models (1973–74)
- A table of their data
Example 2: Weight and Food of Chickens in a Chicken Coop
- Data on the weight and feed of chickens
Example 3: Earthquake Magnitude in Fiji
- Data displaying distribution of earthquake magnitudes near Fiji from 1964
New Tool: Google Dataset Search
- Online tool for searching data sets
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz explores the fundamentals of data analysis using R programming and RStudio. Participants will review concepts such as workspace management, data handling, and object manipulation. Test your knowledge on how to effectively create and manage datasets in R.