R Data Structures: Vectors and Data Frames

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In R, how are scalars treated?

Scalars are treated as vectors with a length of one.

What does the str() function do in R?

The str() function can be utilized to examine the structure of an object, providing a concise and easily understandable overview of any data structure in R.

Which of the following is a homogeneous data structure in R?

  • Data Frame
  • Recursive Vector
  • Atomic Vector (correct)
  • List

All elements in an atomic vector must share the same type or mode, except for NA and NaN.

<p>True (A)</p> Signup and view all the answers

Which of the following is NOT a type of atomic vector in R?

<p>Data Frame Vectors (D)</p> Signup and view all the answers

The function c() cannot be used to create vectors.

<p>False (B)</p> Signup and view all the answers

What does NaN signify in R?

<p><code>NaN</code> signifies 'Not a Number', which is a missing value that occurs due to a numerical calculation, such as dividing 0 by 0.</p> Signup and view all the answers

The function is.na() returns TRUE exclusively for NaNs.

<p>False (B)</p> Signup and view all the answers

In R, string values should be surrounded by single quotes.

<p>False (B)</p> Signup and view all the answers

What is the purpose of the backslash () in a string value in R?

<p>The backslash () acts as an escape character and is disregarded within a string value.</p> Signup and view all the answers

What is the function class() used for in R?

<p>The <code>class()</code> function may be used to verify the class of an object.</p> Signup and view all the answers

What is the function typeof() used for in R?

<p>The <code>typeof()</code> function determines the type or storage mode of any object.</p> Signup and view all the answers

Fundamental operations in R are performed on an _____ by _____ basis.

<p>element-element</p> Signup and view all the answers

What do bracket symbols [] serve as in R?

<p>Bracket symbols <code>[]</code> serve the purpose of indexing or subsetting.</p> Signup and view all the answers

What is coercion in R?

<p>Coercion is the process of converting elements in an atomic vector to the same type.</p> Signup and view all the answers

What is the hierarchy from least to most flexible data type?

<p>Logical -&gt; Integer -&gt; Double -&gt; Character (D)</p> Signup and view all the answers

Match the following functions to their object conversion

<p>as.integer() = Integer as.double() = Double as.numeric() = Numeric as.character() = Character as.logical() = Logical</p> Signup and view all the answers

What is one commonly utilized attribute that can be added to an object in R?

<p>One commonly utilized attribute that can be added to an object is 'names'.</p> Signup and view all the answers

What does the function names() do in R?

<p>The function <code>names()</code> acts as an accessor for the name attribute.</p> Signup and view all the answers

A factor is a numeric object created to represent categorical data.

<p>False (B)</p> Signup and view all the answers

What does the function is.factor() do in R?

<p>It checks the type of an object.</p> Signup and view all the answers

What are lists?

<p>Lists are generic containers in R.</p> Signup and view all the answers

Lists cannot include other lists.

<p>False (B)</p> Signup and view all the answers

How are lists created in R?

<p>Lists are created using the function <code>list()</code>.</p> Signup and view all the answers

What will the typeof() of lists return?

<p>The <code>typeof()</code> of lists is list. The storage mode is list type.</p> Signup and view all the answers

How can you select a part of a list?

<p>The list$name notation may be used to select a part of list.</p> Signup and view all the answers

What is the command used to modify and create factors?

<p><code>factor()</code></p> Signup and view all the answers

How can you describe arrays and matricies

<p>= vector + higher dimension</p> Signup and view all the answers

Describe atomic vectors

<p>Atomic vector is the simplest data structure in R. It is a collection of elements having the same type, and, by default, its only attributes are length and type/mode.</p> Signup and view all the answers

How can the dimension attribute be accessed?

<p>The dimension attribute can be accessed using the <code>dim()</code> function.</p> Signup and view all the answers

Arrays can only have 2 dimensions.

<p>False (B)</p> Signup and view all the answers

What is the result of a matrix with 2 dimensions?

<p>matrix</p> Signup and view all the answers

What symbols are used for indexing or subsetting in a matrix?

<p><code>[]</code></p> Signup and view all the answers

How must the vectors be separated?

<p>The two vectors must be separated by a comma – [vector1, vector2]</p> Signup and view all the answers

What may subsetting a matrix result in?

<p>Subsetting a matrix may result to a matrix-type or vector-type object, depending on the dimension of the output object.</p> Signup and view all the answers

In matrices, what do rbind() and cbind() correspond to?

<p>For matrices, the counterpart of c() function are rbind() and cbind() functions.</p> Signup and view all the answers

What do rbind() and cbind() functions do?

<p>rbind() Combines matrices vertically. cbind() Combines matrices horizontally.</p> Signup and view all the answers

Matrix operations do not have conformability rules

<p>False (B)</p> Signup and view all the answers

What is the structure of a data frame?

<p>a list of equal-length vectors.</p> Signup and view all the answers

How are data frames constructed?

<p>The data.frame() function is used to construct data frames.</p> Signup and view all the answers

Data frames share properties from both the _____ and the _____.

<p>matrix, list</p> Signup and view all the answers

What function is used to invoke a spreadsheet-style data viewer on a matrix-like R object?

<p>The View() function</p> Signup and view all the answers

What function is used to read data from CSV files?

<p>The function read.csv() is commonly utilized to import csv files into a data frame.</p> Signup and view all the answers

Flashcards

Atomic Vectors

Simplest data structure in R where all elements share the same type or mode.

Logical Vectors

TRUE or FALSE values.

Integer Vectors

Numeric vectors with only whole numbers.

Numeric or Double Vectors

Numeric vectors include decimals

Signup and view all the flashcards

Character Vectors

Vectors containing text.

Signup and view all the flashcards

c() function

Used to create vectors

Signup and view all the flashcards

seq() function

Useful for creating regular sequences.

Signup and view all the flashcards

rep() function

Allows for the creation of a vector containing repeated values.

Signup and view all the flashcards

sort() function

Enables the arrangement of values within a vector in ascending order.

Signup and view all the flashcards

NA

Not available or a missing value

Signup and view all the flashcards

NaN

Not a Number, a missing value due to a numerical calculation (dividing 0 by 0).

Signup and view all the flashcards

String Value Quotes

String values should be surrounded by either single or double quotes.

Signup and view all the flashcards

Backslash ()

Acts as an escape character and is disregarded within a string value.

Signup and view all the flashcards

\n

Represents a newline.

Signup and view all the flashcards

\t

Indicates a tab.

Signup and view all the flashcards

cat() function

Used to concatenate string values.

Signup and view all the flashcards

letters object

Character vector that includes the lowercase letters a through z.

Signup and view all the flashcards

class() function

Function used to verify the class of an object.

Signup and view all the flashcards

typeof() function

Function which determines the type or storage mode of any object.

Signup and view all the flashcards

"is" functions

Functions to check if the object belongs to a specific type or class.

Signup and view all the flashcards

element-by-element operations

Fundamental operations are performed on an element-by-element basis.

Signup and view all the flashcards

Vector Recycling

Elements from the smaller vector are recycled when two vectors differ in size; the result is a vector with the same length as the longer one.

Signup and view all the flashcards

Bracket symbols []

Symbols used for indexing or subsetting vectors.

Signup and view all the flashcards

"as" functions

Function to convert an object into a particular class.

Signup and view all the flashcards

Object properties

Every object possesses both a length and a type/mode.

Signup and view all the flashcards

Custom attributes of Objects

Serve to hold metadata pertaining to the objects.

Signup and view all the flashcards

attributes() function

Returns a comprehensive list of all supplementary attributes assigned to an object.

Signup and view all the flashcards

names() function

Acts as an accessor for the names attribute.

Signup and view all the flashcards

Factors

Vector object created to represent categorical data using integers and attributes.

Signup and view all the flashcards

factor()

Used to modify and create factors.

Signup and view all the flashcards

levels()

Returns the predefined categories of the factor object.

Signup and view all the flashcards

is.factor()

Check the type of an object.

Signup and view all the flashcards

as.factor()

Coerce an object into a factor.

Signup and view all the flashcards

relevel()

Modify the first level or reference level of an unordered factor.

Signup and view all the flashcards

Lists

Generic containers in R which can hold elements of any type.

Signup and view all the flashcards

list() function

Created using this function.

Signup and view all the flashcards

[ ] and $

Operators for subsetting to part of a list.

Signup and view all the flashcards

[[]] Operator

Operator especially useful for working with lists.

Signup and view all the flashcards

Atomic vector

They are the simplest data structure. Collection of elements having the same type, attributes are length and type/mode.

Signup and view all the flashcards

Arrays

When a dimension attribute is added to an atomic vector, the object becomes an array.

Signup and view all the flashcards

Matrices

Arrays with two dimensions

Signup and view all the flashcards

Study Notes

Important Data Structures in R

  • Homogeneous data structures store contents of the same type, while heterogeneous can be of different types
  • Atomic vectors and factors are 1-dimensional homogeneous structures
  • Recursive vectors or lists are 1-dimensional heterogeneous structures
  • Matrices are 2-dimensional homogeneous structures
  • Data frames are 2-dimensional heterogeneous structures
  • Arrays are N-dimensional homogeneous structures
  • Scalars are treated as vectors with a length of one in R.
  • The str() function gives a concise overview of a data structure in R

Atomic Vectors

  • Atomic vectors are the simplest data structure in R
  • All elements share the same type or mode, except for NA and NaN which represent missing values.
  • The mode or type signifies the basic nature of an object's fundamental constituent

Types of Atomic Vectors

  • Logical vectors include TRUE, FALSE, T, F
  • Integer vectors such as 1L, 2L, 3L
  • Numeric or double vectors for instant: 1.00, 1.25, 1.50, 1.75, 5.00
  • Character Vectors include "a", "b", "c"
  • c() creates vectors
  • seq() creates regular sequences
  • rep() creates vectors with repeated values
  • sort() arranges values in a vector in ascending order

Missing Values

  • NA denotes "not available" or a "missing value"
  • NaN signifies "Not a Number", which is a missing value from a numerical calculation
  • is.na() returns TRUE for both NA and NaN
  • is.nan() returns TRUE exclusively for NaNs

Character Values

  • String values should be surrounded by either single or double quotes
  • The backslash \ acts as an escape character and is disregarded within a string value
  • \n represents a newline, while \t indicates a tab
  • \' stands for a single quote and \" for a double quote
  • cat() concatenates string values
  • The object letters is a character vector with "a" through "z"

Class, Types and Tests

  • Every object belongs to a class
  • class() verifies the class of an object
  • A data structure is described by the type of its basic constituent
  • typeof() determines the type or storage mode of any object.
  • For atomic vectors, the class and the mode are the same
  • "is" functions check if an object belongs to a specific type or class -- such as is.character(), is.double(), is.integer(), is.logical(), is.atomic(), is.numeric()

Vector Operations

  • Fundamental operations are performed on an element-by-element basis
  • When two vectors differ in size, elements from the smaller vector are "recycled"

Subsetting Vectors

  • Bracket symbols [] index or subset data
  • vec01 <- c(1.00,1.25,1.25,1.50,1.00,1.75) initiates the vector vec01 with the values listed
  • vec01[2] selects the second value
  • vec01[-2] removes the second value
  • vec01[1:3] selects consecutive values 1 to 3
  • vec01[c(1,6)] selects values in the first and sixth positions
  • vec01[-c(1,6)] removes values in the first and sixth positions
  • vec01[c(T,T,F,T,F,F)] selects values if TRUE
  • vec01[vec01<=1.25] is conditional filtering for values less than or equal to 1.25

Modifying Parts of a Vector

  • Values can be assigned on certain parts of a vector.
  • vec02<-rep(NA, 5) assigns NA to 5 parts of a vector
  • vec02[c(1,4)]<-"YES" Modifies parts of vec02 to "YES"

Coercion

  • Every element in an atomic vector must share the same type
  • Vectors of varying types will have elements converted to the most flexible type
  • Hierarchy from least to most flexible data type: Logical -> Integer -> Double -> Character
  • Coercion functions with the prefix "as" can convert an object into a particular class. Examples: as.integer(), as.double(), as.numeric(), as.character(), as.logical()
  • Exercise caution when coercing into a more "specific" or "less flexible" class, as this may lead to the introduction of missing values (NA)
  • Coercing a character vector to a numeric class can produce missing values

Attributes

  • Every object possesses both a length and a type or mode
  • length() and typeof() assess these characteristics.
  • Objects can include custom attributes (such as variable names, category labels, dimensions) that serve to hold metadata
  • The attributes() function returns a comprehensive list of all supplementary attributes assigned to an object
  • A commonly utilized attribute that can be added to an object is "names"
  • The function names() acts as an accessor for the names attribute
  • There are several approaches to assigning names to vectors

Factors

  • Factors are integer vectors with additional attributes (levels)
  • A factor is a vector object created to represent categorical data
  • Factors are constructed on integer vectors and include extra attributes
  • Factors can be coerced to integer or numeric classes
  • Factors are restricted to a specified set of values known as levels
  • Levels can be either unordered (nominal) or ordered (ordinal)
  • Factors are essential in specific modeling processes
  • factor() modifies and creates factors
  • levels() returns the predefined categories of the factor object

Factors and Character Vectors

  • Factors are useful when you know the categories or "levels" linked to a variable
  • Factors cannot take on values beyond those specified by their levels
  • There is a slight difference in how factors are displayed compared to character vectors
  • is.factor() checks the type of an object
  • as.factor() coerces an object into a factor
  • relevel() modifies the first level or reference level of an unordered factor

Lists

  • Lists are generic containers in R
  • List or recursive vectors can hold elements of any type
  • A list is considered "recursive" if it can include other lists
  • Created using the function list()
  • The typeof() lists is list
  • The storage mode is list type
  • Other objects derived from lists, such as data frames, will also be categorized with list as their type
  • Functions named is.list() and as.list() identify or create lists
  • List may also have a name attribute, with each element in a list having a label
  • The list$name notation selects a part of a list

Selecting a Part of a List

  • The operators [] and $ are for subsetting
  • The [[]] operator is particularly significant for working with lists
  • [] returns a list, [[] accesses the actual content of the list
  • The $ notation is similar to [[]] when every element of the list has a name

Review of Atomic Vectors

  • Atomic vector is the simplest data structure in R.
  • It is a collection of elements having the same type, and, by default, its only attributes are length and type or mode.

Arrays and Matrices

  • Atomic vectors do not have a "dimension" attribute
  • The dimension attribute can be accessed using the dim() function
  • A matrix is an array with two dimensions
  • When a dimension attribute is added to an atomic vector, the object becomes an array
  • Arrays with three or more dimensions are rarely used in analyzing data
  • array() and matrix() functions is a way of creating an array and matrix

Subsetting for Matrices

  • The [] symbols are used for indexing or subsetting
  • Since there are two dimensions, an ordered pair of vectors (numeric or logical) must be specified
  • The first vector corresponds to rows, and the second vector corresponds to columns
  • Subsetting a matrix may result in a matrix-type or vector-type object, depending on the dimension of the output object
  • Parts of a matrix can be transformed by combining the syntax for subsetting and assignment
  • Conditional processing available by subsetting in matrices
  • An empty pair of brackets [] will return the whole matrix
  • The order of numbers in brackets [] may change the arrangement of values

Concatenating Matrices

  • The c() function is used to create vectors and/or concatenate vectors
  • For matrices, the counterpart of c() function are rbind() and cbind() functions
  • When rbind() is used on atomic vectors, atomic vectors are treated as row vectors
  • When cbind() is used on atomic vectors, atomic vectors are treated as column vectors
  • rbind() and cbind() have conformability conditions
  • When working with matrices, the number of columns must be the same for rbind(), and the number of rows must be the same for cbind()

Matrix Operations

  • REPETITION. Basic operations are repeated on each element
  • For transpose, t() can be used
  • For matrix multiplication, %*% can be used
  • For the inverse of a square matrix, solve() can be used
  • For the determinant of a matrix, det() can be used
  • For eigenvalues and eigenvectors, eigen() can be used
  • Matrix operations have conformability rules. Always check the dimensions of matrices

Attributes of a Matrix

  • Matrices maintain the intrinsic properties of their vector counterpart – length and type.
  • Unlike atomic vectors, matrices have a dimension attribute
  • The dim() function determines the dimension of the matrix, while ncol() and nrow() determines the number of columns and rows respectively
  • Homogeneous. Coercion applies when different types of atomic vectors are bounded to form a matrix.
  • Matrices do not inherit names from its vector counterparts
  • The counterpart of "names" among matrices are "rownames" and "colnames"
  • Row and column names can be accessed using rownames() and colnames()- respectively

Data Frames

  • Data Frames are a list of equal-length vectors
  • Recursive Vector or List is a heterogeneous object. It can contain any type of object (vectors, lists, matrices etc.)
  • List can be constructed using list() function.

More on Data Frames

  • It is the most common way of storing data in R.
  • Most methods for analysis would require a data frame as input.
  • Data frames share properties from both the matrix and the list
  • A data frame is a type of list of equal-length vectors

Data Frame Properties

  • Data frames can have "names" attributes, meaning each vector can have a name
  • Subsetting using [n], [[n]], or $ applies on data frames
  • A data frame is a group of equal-length column vectors; has two dimensions just like a matrix
  • Subsetting using two vectors [i,j] works on data frames
  • Data frames has a value for dim(), and may have values for rownames(), and colnames()
  • colnames() and names() are the same for data frames

Creating Data Frames

  • The data.frame() function constructs data frames
  • A data frame cannot be created using a list() function, even if the components are equal-length vectors
  • List of equal-length vectors can be coerced to a data frame

More on Data Frames and String Values

  • By default, the data.frame() function converts string-valued vectors to factors
  • Use stringAsFactors = F option to change the default behavior
  • View() invokes a spreadsheet-style data viewer on a matrix-like R object

Binding Data Frames and Matrices

  • rbind() and cbind() are functions for matrices, but can be applied on objects with matrix-like characteristics (e.g. data frames)
  • By default, these functions return matrix objects
  • rbind() and cbind() return a data frame only if one or more inputs are data frames
  • data.frame() ensures the output object is a data frame

Key Functions for Data Frames

  • Function head() returns the first n rows of a data frame
  • Function tail() returns the last n rows of a data frame
  • The function order() takes a vector as input and returns an integer vector describing how the subset vector should be
  • The function subset() is a specialized shorthand function for subsetting data frames

Reading Data from Files

  • The function read.csv() imports CSV files into a data frame
  • Columns with characters a values are saved as factors by default unless using attributes as.is or stringAsFactors
  • The function setwd() sets or changes the current working directory

Review: Relationship of Data Structures

  • Homogeneous elements are atomic vectors that can become factors with levels attribute, or arrays with dimension attributes and then matricies with two dimensions only
  • Heterogeneous elements are lists and can relate to data frames, with two dimensions and matrix like characteristics

R Langauge Characteristics

  • Review the R Language - Functional and Object Oriented
  • Everything computed is an object
  • Objects structure themselves to suit goals of computations
  • Each object belongs to a class- wherein every member of a class shares a set of properties
  • Polymorphic Functions are functions that can communicate differently

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

The Vector Atomic Model Quiz
3 questions
Chapter 4: Atomic Structure Flashcards
19 questions
Atomic Structure Flashcards
15 questions
Atomic Structure Flashcards
11 questions

Atomic Structure Flashcards

ImprovingSocialRealism4496 avatar
ImprovingSocialRealism4496
Use Quizgecko on...
Browser
Browser