Data Analysis using R
23 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which command is used to remove all objects from the R workspace?

  • rm(list=ls(all=TRUE)) (correct)
  • remove.objects()
  • rm(list=all)
  • clear()
  • The function mean() can be used to calculate the median of a set of numbers.

    False

    What is the syntax to create a vector called 'y' containing the values 1, 5, 4, and 8?

    y=c(1,5,4,8)

    The function used to load a saved workspace in R is called _____

    <p>load</p> Signup and view all the answers

    Match the following R commands with their functions:

    <p>getwd() = Get the working directory setwd() = Set the working directory ls() = List objects in the workspace save.image() = Save the current workspace</p> Signup and view all the answers

    Which function would you use to check for missing values in a dataset?

    <p>is.na()</p> Signup and view all the answers

    The function na.omit() removes rows that contain any missing values.

    <p>True</p> Signup and view all the answers

    What is the main purpose of the complete.cases() function?

    <p>To select rows with no missing values.</p> Signup and view all the answers

    To create a dataframe in R using vectors for age, gender, and weight, you would use the function data.frame() with the vectors named as ___.

    <p>a, g, w</p> Signup and view all the answers

    Match the following R functions with their purposes:

    <p>is.na() = Checks for missing values na.omit() = Removes individuals with missing values subset() = Selects specific rows or columns from a dataframe data.frame() = Creates a dataframe from vectors</p> Signup and view all the answers

    What command is used to install the Ecdat package in R?

    <p>install.packages('Ecdat')</p> Signup and view all the answers

    The command library('datasets') is used to load the datasets package into the R workspace.

    <p>True</p> Signup and view all the answers

    How can you access the weight of the first individual in the 'data' dataframe created from age, gender, and weight vectors?

    <p>data[1,3]</p> Signup and view all the answers

    What happens when you execute 'mean(y, na.rm = TRUE)' in R?

    <p>It calculates the mean excluding NA values.</p> Signup and view all the answers

    A dataframe in R can be created by importing text or Excel files.

    <p>True</p> Signup and view all the answers

    What function is used to import a text file into R?

    <p>read.table</p> Signup and view all the answers

    The command to display the first six rows of a dataframe is ____.

    <p>head()</p> Signup and view all the answers

    Match the following dataframe functions with their purposes:

    <p>dim() = Shows the number of observations and variables rownames() = Displays the names of the rows colnames() = Displays the names of the columns tail() = Shows the last six lines</p> Signup and view all the answers

    When importing a dataframe from an Excel file, what is the first step you should take in RStudio?

    <p>Select the folder where the Excel file is located.</p> Signup and view all the answers

    The function 'merge()' is used to sort a dataframe according to a variable.

    <p>False</p> Signup and view all the answers

    What would be the output of 'class(nobel)' after importing the dataframe?

    <p>data.frame</p> Signup and view all the answers

    To create a vector in R containing the values 29020, 32500, and 40320, one could use the command ____.

    <p>c(29020, 32500, 40320)</p> Signup and view all the answers

    Which of the following statements regarding dataframes is false?

    <p>You can only create dataframes by importing files.</p> Signup and view all the answers

    Study Notes

    Data Analysis using R

    • R is a programming language and software environment for statistical computing and graphics
    • RStudio is a desktop environment for R
    • R Data analysis involves creating, uploading and using a database in R
    • Workspace in RStudio includes variables, mathematical vectors, matrices, lists, and dataframes
    • To show objects in a workspace, use ls().
    • To remove all objects in a workspace use rm(list=ls(all=TRUE))
    • To select a folder for saving the workspace use getwd() and setwd("C:/name/name/name")
    • To save the workspace to a file use save.image("name.Rdata")
    • To load the saved workspace use load("name.Rdata")
    • Variables are assigned values using the equal sign (=) for example, x = 1.
    • class(x) shows the object type.
    • A vector is created using y=c(1,5,4,8).
    • The command class(y) displays the object type
    • Matrices can be created from vectors using functions like rbind() and cbind() for example m = rbind(y, z).
    • R functions are used for importing and exporting data, performing operations, generating graphs
    • Functions generally use the form function_name(par1=value1, par2=value2, …)
    • Using ?mean displays the help for the mean function
    • Using the is.na() function indicates missing values, like missing data is denoted NA (Not Applicable) in R
    • mean(y, na.rm = TRUE) removes NAs when calculating mean.
    • A dataframe is a table with observations in rows and variables in columns
    • Dataframes can be created in R by importing from a text or excel file
    • The read.table function allows importing text files. header = TRUE shows that the first row contains column names

    Dataframe Basic Functions

    • dim() shows the number of observations and variables.
    • rownames() shows names of rows.
    • colnames() shows names of columns.
    • head() displays the first six rows of a dataframe.
    • tail() displays the last six rows of a dataframe

    Dataframe Advanced Functions

    • subset() is used to select sections of a dataframe
    • subset(dataframe, subset = logical_expression, select = list_of_variables)
    • merge(df1, df2, by = list_of_variables, by.x = ,by.y = ,….) joins two dataframes.
    • df[order(df$nom_var),] sorts the dataframe based on a column named "nom_var".

    Exercise: Age and Creativity of Researchers

    • Calculate the age in which scientists are highly creative using year_research_mid and year_birth data
    • Calculate the average age using the mean() function
    • Dataframes to include Nobel prize winners' characteristics, including the year when research was conducted and year of birth

    Exercise 2: Age and Creativity of Researchers

    • Calculate the average age at which scientists did their research before 1905.
    • Calculate the average age at which scientists did their research after 1985.

    Exercise 3: Lifespan Calculation

    • Calculate the lifespan of each research using the difference between their birth year and death year (nobel$year_death - nobel$year_birth).
    • Calculate the average lifespan using the mean() function on the new lifespan column.

    Managing Missing Values

    • Problem in calculating the average lifespan with missing value.
    • is.na() identifies missing values.
    • complete.cases() selects rows without missing values.
    • na.omit() removes rows with any missing values.

    2) Dataframe Generation

    • Create a dataframe from sample data including age, gender, and weight of 5 individuals

    3) Dataframes in R or Database Packages

    • Various data packages already existing in R

    Example 1: Motor Trend US Magazine

    • Data on 32 car models (1973–74)
    • A table of their data

    Example 2: Weight and Food of Chickens in a Chicken Coop

    • Data on the weight and feed of chickens

    Example 3: Earthquake Magnitude in Fiji

    • Data displaying distribution of earthquake magnitudes near Fiji from 1964
    • Online tool for searching data sets

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    R Software Guide PDF

    Description

    This quiz explores the fundamentals of data analysis using R programming and RStudio. Participants will review concepts such as workspace management, data handling, and object manipulation. Test your knowledge on how to effectively create and manage datasets in R.

    More Like This

    Why R Programming Language
    20 questions

    Why R Programming Language

    MagnificentEucalyptus8043 avatar
    MagnificentEucalyptus8043
    R Programming in Statistical Computing
    12 questions
    R Programming Fundamentals Quiz
    10 questions
    Overview of R Programming
    5 questions
    Use Quizgecko on...
    Browser
    Browser