Data Analysis using R
23 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which command is used to remove all objects from the R workspace?

  • rm(list=ls(all=TRUE)) (correct)
  • remove.objects()
  • rm(list=all)
  • clear()

The function mean() can be used to calculate the median of a set of numbers.

False (B)

What is the syntax to create a vector called 'y' containing the values 1, 5, 4, and 8?

y=c(1,5,4,8)

The function used to load a saved workspace in R is called _____

<p>load</p> Signup and view all the answers

Match the following R commands with their functions:

<p>getwd() = Get the working directory setwd() = Set the working directory ls() = List objects in the workspace save.image() = Save the current workspace</p> Signup and view all the answers

Which function would you use to check for missing values in a dataset?

<p>is.na() (D)</p> Signup and view all the answers

The function na.omit() removes rows that contain any missing values.

<p>True (A)</p> Signup and view all the answers

What is the main purpose of the complete.cases() function?

<p>To select rows with no missing values.</p> Signup and view all the answers

To create a dataframe in R using vectors for age, gender, and weight, you would use the function data.frame() with the vectors named as ___.

<p>a, g, w</p> Signup and view all the answers

Match the following R functions with their purposes:

<p>is.na() = Checks for missing values na.omit() = Removes individuals with missing values subset() = Selects specific rows or columns from a dataframe data.frame() = Creates a dataframe from vectors</p> Signup and view all the answers

What command is used to install the Ecdat package in R?

<p>install.packages('Ecdat') (D)</p> Signup and view all the answers

The command library('datasets') is used to load the datasets package into the R workspace.

<p>True (A)</p> Signup and view all the answers

How can you access the weight of the first individual in the 'data' dataframe created from age, gender, and weight vectors?

<p>data[1,3]</p> Signup and view all the answers

What happens when you execute 'mean(y, na.rm = TRUE)' in R?

<p>It calculates the mean excluding NA values. (B)</p> Signup and view all the answers

A dataframe in R can be created by importing text or Excel files.

<p>True (A)</p> Signup and view all the answers

What function is used to import a text file into R?

<p>read.table</p> Signup and view all the answers

The command to display the first six rows of a dataframe is ____.

<p>head()</p> Signup and view all the answers

Match the following dataframe functions with their purposes:

<p>dim() = Shows the number of observations and variables rownames() = Displays the names of the rows colnames() = Displays the names of the columns tail() = Shows the last six lines</p> Signup and view all the answers

When importing a dataframe from an Excel file, what is the first step you should take in RStudio?

<p>Select the folder where the Excel file is located. (C)</p> Signup and view all the answers

The function 'merge()' is used to sort a dataframe according to a variable.

<p>False (B)</p> Signup and view all the answers

What would be the output of 'class(nobel)' after importing the dataframe?

<p>data.frame</p> Signup and view all the answers

To create a vector in R containing the values 29020, 32500, and 40320, one could use the command ____.

<p>c(29020, 32500, 40320)</p> Signup and view all the answers

Which of the following statements regarding dataframes is false?

<p>You can only create dataframes by importing files. (B)</p> Signup and view all the answers

Flashcards

Dataframe in R

A data structure in R that stores a table of data, with rows representing observations and columns representing variables.

Creating a dataframe

Methods to create a dataframe in R include importing from files (text or excel), direct creation, or using existing data.

Importing from text/Excel

A method to create a dataframe in R by reading data from text files, comma separated value (csv) files, or spreadsheet files. Commonly used for data load.

read.table function

A function in R to import data from text-based files. It’s often used for loading data from comma separated values (.csv) or tab separated values (.tsv) files.

Signup and view all the flashcards

dataframe dimensions

The number of rows and columns in a dataframe, useful for understanding the dataset size. Determined by the dim() function in R.

Signup and view all the flashcards

Row/Column Names

The names of the rows (observations) and columns (variables) in a dataframe. Used for accessing data by specific names.

Signup and view all the flashcards

head() function

Displays the first 6 rows of a dataframe to preview the data.

Signup and view all the flashcards

tail() function

Displays the last 6 rows of a dataframe to quickly check the end of data.

Signup and view all the flashcards

subset()

A function in R to retrieve specific parts of a dataframe based on logical conditions on variables.

Signup and view all the flashcards

merge()

A function used to combine two dataframes, aligning rows based on similar variables.

Signup and view all the flashcards

order() function

A function used to sort rows in a dataframe based on values in a specific column.

Signup and view all the flashcards

NA value in R

Represents Missing values in R. Often have to be handled when calculating or working with the data

Signup and view all the flashcards

Managing Missing Values

Dealing with 'NA' (Not Available) values in datasets, crucial for accurate analysis.

Signup and view all the flashcards

R Database

A database managed using the R programming language.

Signup and view all the flashcards

R Workspace

A temporary space in RStudio for storing variables, vectors, matrices, lists, and data frames. These are called objects in R.

Signup and view all the flashcards

is.na()

A function to identify missing values in a dataset.

Signup and view all the flashcards

complete.cases()

Selects rows of a dataframe with no missing values.

Signup and view all the flashcards

ls()

A function used to display a list of all objects in the current R workspace.

Signup and view all the flashcards

rm(list=ls(all=TRUE))

A command to remove all objects from the R workspace.

Signup and view all the flashcards

na.omit()

Removes rows with at least one missing value.

Signup and view all the flashcards

Subset(dataframe)

Extracting specific rows or columns of a dataset using criteria.

Signup and view all the flashcards

getwd()

A function used to display the current working directory in R.

Signup and view all the flashcards

setwd()

A function used to change the current working directory in R.

Signup and view all the flashcards

Dataframe generation

Creating a structured table (dataframe) with rows and columns to store data.

Signup and view all the flashcards

save.image()

A command to save the entire R workspace to a file.

Signup and view all the flashcards

Dataframe variables

Individual columns in a dataframe, each representing specific data.

Signup and view all the flashcards

load()

A function to load an R workspace previously saved with save.image().

Signup and view all the flashcards

mtcars data

A dataset containing information about 32 car models, useful for analysis.

Signup and view all the flashcards

Variable (in R)

A named location in R that stores a value.

Signup and view all the flashcards

chicken data

A dataset about weights and food consumption of chickens.

Signup and view all the flashcards

quakes data

Earthquake data showing earthquake magnitude locations.

Signup and view all the flashcards

Vector (in R)

An ordered sequence of values in R (e.g., numbers or characters).

Signup and view all the flashcards

Ecdat package

Database package in R with pre-loaded datasets.

Signup and view all the flashcards

Class of object (in R)

The type of a variable or vector, like numeric, character, or data frame in R.

Signup and view all the flashcards

cbind()

Combines data horizontally (columns).

Signup and view all the flashcards

DoctorAUS data

A dataset containing information from analysis in the health sector.

Signup and view all the flashcards

Dataset Search Tool

Online tool for searching existing datasets.

Signup and view all the flashcards

rbind()

Combines data vertically (rows).

Signup and view all the flashcards

Function (in R)

A block of code that performs a specific task in R.

Signup and view all the flashcards

mean()

Calculates the arithmetic mean of values in R.

Signup and view all the flashcards

NA (in R)

Represents missing data values.

Signup and view all the flashcards

Study Notes

Data Analysis using R

  • R is a programming language and software environment for statistical computing and graphics
  • RStudio is a desktop environment for R
  • R Data analysis involves creating, uploading and using a database in R
  • Workspace in RStudio includes variables, mathematical vectors, matrices, lists, and dataframes
  • To show objects in a workspace, use ls().
  • To remove all objects in a workspace use rm(list=ls(all=TRUE))
  • To select a folder for saving the workspace use getwd() and setwd("C:/name/name/name")
  • To save the workspace to a file use save.image("name.Rdata")
  • To load the saved workspace use load("name.Rdata")
  • Variables are assigned values using the equal sign (=) for example, x = 1.
  • class(x) shows the object type.
  • A vector is created using y=c(1,5,4,8).
  • The command class(y) displays the object type
  • Matrices can be created from vectors using functions like rbind() and cbind() for example m = rbind(y, z).
  • R functions are used for importing and exporting data, performing operations, generating graphs
  • Functions generally use the form function_name(par1=value1, par2=value2, …)
  • Using ?mean displays the help for the mean function
  • Using the is.na() function indicates missing values, like missing data is denoted NA (Not Applicable) in R
  • mean(y, na.rm = TRUE) removes NAs when calculating mean.
  • A dataframe is a table with observations in rows and variables in columns
  • Dataframes can be created in R by importing from a text or excel file
  • The read.table function allows importing text files. header = TRUE shows that the first row contains column names

Dataframe Basic Functions

  • dim() shows the number of observations and variables.
  • rownames() shows names of rows.
  • colnames() shows names of columns.
  • head() displays the first six rows of a dataframe.
  • tail() displays the last six rows of a dataframe

Dataframe Advanced Functions

  • subset() is used to select sections of a dataframe
  • subset(dataframe, subset = logical_expression, select = list_of_variables)
  • merge(df1, df2, by = list_of_variables, by.x = ,by.y = ,….) joins two dataframes.
  • df[order(df$nom_var),] sorts the dataframe based on a column named "nom_var".

Exercise: Age and Creativity of Researchers

  • Calculate the age in which scientists are highly creative using year_research_mid and year_birth data
  • Calculate the average age using the mean() function
  • Dataframes to include Nobel prize winners' characteristics, including the year when research was conducted and year of birth

Exercise 2: Age and Creativity of Researchers

  • Calculate the average age at which scientists did their research before 1905.
  • Calculate the average age at which scientists did their research after 1985.

Exercise 3: Lifespan Calculation

  • Calculate the lifespan of each research using the difference between their birth year and death year (nobel$year_death - nobel$year_birth).
  • Calculate the average lifespan using the mean() function on the new lifespan column.

Managing Missing Values

  • Problem in calculating the average lifespan with missing value.
  • is.na() identifies missing values.
  • complete.cases() selects rows without missing values.
  • na.omit() removes rows with any missing values.

2) Dataframe Generation

  • Create a dataframe from sample data including age, gender, and weight of 5 individuals

3) Dataframes in R or Database Packages

  • Various data packages already existing in R

Example 1: Motor Trend US Magazine

  • Data on 32 car models (1973–74)
  • A table of their data

Example 2: Weight and Food of Chickens in a Chicken Coop

  • Data on the weight and feed of chickens

Example 3: Earthquake Magnitude in Fiji

  • Data displaying distribution of earthquake magnitudes near Fiji from 1964
  • Online tool for searching data sets

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

R Software Guide PDF

Description

This quiz explores the fundamentals of data analysis using R programming and RStudio. Participants will review concepts such as workspace management, data handling, and object manipulation. Test your knowledge on how to effectively create and manage datasets in R.

More Like This

Introduction to R for Data Analysis
5 questions
Why R Programming Language
20 questions

Why R Programming Language

MagnificentEucalyptus8043 avatar
MagnificentEucalyptus8043
Overview of R Programming
5 questions
R Concepts and Data Handling
41 questions

R Concepts and Data Handling

LargeCapacityAntigorite4770 avatar
LargeCapacityAntigorite4770
Use Quizgecko on...
Browser
Browser