Podcast
Questions and Answers
In R, how are scalars treated?
In R, how are scalars treated?
Scalars are treated as vectors with a length of one.
What does the str()
function do in R?
What does the str()
function do in R?
The str()
function can be utilized to examine the structure of an object, providing a concise and easily understandable overview of any data structure in R.
Which of the following is a homogeneous data structure in R?
Which of the following is a homogeneous data structure in R?
- Data Frame
- Recursive Vector
- Atomic Vector (correct)
- List
All elements in an atomic vector must share the same type or mode, except for NA and NaN.
All elements in an atomic vector must share the same type or mode, except for NA and NaN.
Which of the following is NOT a type of atomic vector in R?
Which of the following is NOT a type of atomic vector in R?
The function c()
cannot be used to create vectors.
The function c()
cannot be used to create vectors.
What does NaN
signify in R?
What does NaN
signify in R?
The function is.na()
returns TRUE exclusively for NaNs.
The function is.na()
returns TRUE exclusively for NaNs.
In R, string values should be surrounded by single quotes.
In R, string values should be surrounded by single quotes.
What is the purpose of the backslash () in a string value in R?
What is the purpose of the backslash () in a string value in R?
What is the function class()
used for in R?
What is the function class()
used for in R?
What is the function typeof()
used for in R?
What is the function typeof()
used for in R?
Fundamental operations in R are performed on an _____ by _____ basis.
Fundamental operations in R are performed on an _____ by _____ basis.
What do bracket symbols []
serve as in R?
What do bracket symbols []
serve as in R?
What is coercion in R?
What is coercion in R?
What is the hierarchy from least to most flexible data type?
What is the hierarchy from least to most flexible data type?
Match the following functions to their object conversion
Match the following functions to their object conversion
What is one commonly utilized attribute that can be added to an object in R?
What is one commonly utilized attribute that can be added to an object in R?
What does the function names()
do in R?
What does the function names()
do in R?
A factor is a numeric object created to represent categorical data.
A factor is a numeric object created to represent categorical data.
What does the function is.factor()
do in R?
What does the function is.factor()
do in R?
What are lists?
What are lists?
Lists cannot include other lists.
Lists cannot include other lists.
How are lists created in R?
How are lists created in R?
What will the typeof()
of lists return?
What will the typeof()
of lists return?
How can you select a part of a list?
How can you select a part of a list?
What is the command used to modify and create factors?
What is the command used to modify and create factors?
How can you describe arrays and matricies
How can you describe arrays and matricies
Describe atomic vectors
Describe atomic vectors
How can the dimension attribute be accessed?
How can the dimension attribute be accessed?
Arrays can only have 2 dimensions.
Arrays can only have 2 dimensions.
What is the result of a matrix with 2 dimensions?
What is the result of a matrix with 2 dimensions?
What symbols are used for indexing or subsetting in a matrix?
What symbols are used for indexing or subsetting in a matrix?
How must the vectors be separated?
How must the vectors be separated?
What may subsetting a matrix result in?
What may subsetting a matrix result in?
In matrices, what do rbind() and cbind() correspond to?
In matrices, what do rbind() and cbind() correspond to?
What do rbind() and cbind() functions do?
What do rbind() and cbind() functions do?
Matrix operations do not have conformability rules
Matrix operations do not have conformability rules
What is the structure of a data frame?
What is the structure of a data frame?
How are data frames constructed?
How are data frames constructed?
Data frames share properties from both the _____ and the _____.
Data frames share properties from both the _____ and the _____.
What function is used to invoke a spreadsheet-style data viewer on a matrix-like R object?
What function is used to invoke a spreadsheet-style data viewer on a matrix-like R object?
What function is used to read data from CSV files?
What function is used to read data from CSV files?
Flashcards
Atomic Vectors
Atomic Vectors
Simplest data structure in R where all elements share the same type or mode.
Logical Vectors
Logical Vectors
TRUE or FALSE values.
Integer Vectors
Integer Vectors
Numeric vectors with only whole numbers.
Numeric or Double Vectors
Numeric or Double Vectors
Signup and view all the flashcards
Character Vectors
Character Vectors
Signup and view all the flashcards
c() function
c() function
Signup and view all the flashcards
seq() function
seq() function
Signup and view all the flashcards
rep() function
rep() function
Signup and view all the flashcards
sort() function
sort() function
Signup and view all the flashcards
NA
NA
Signup and view all the flashcards
NaN
NaN
Signup and view all the flashcards
String Value Quotes
String Value Quotes
Signup and view all the flashcards
Backslash ()
Backslash ()
Signup and view all the flashcards
\n
\n
Signup and view all the flashcards
\t
\t
Signup and view all the flashcards
cat() function
cat() function
Signup and view all the flashcards
letters object
letters object
Signup and view all the flashcards
class() function
class() function
Signup and view all the flashcards
typeof() function
typeof() function
Signup and view all the flashcards
"is" functions
"is" functions
Signup and view all the flashcards
element-by-element operations
element-by-element operations
Signup and view all the flashcards
Vector Recycling
Vector Recycling
Signup and view all the flashcards
Bracket symbols []
Bracket symbols []
Signup and view all the flashcards
"as" functions
"as" functions
Signup and view all the flashcards
Object properties
Object properties
Signup and view all the flashcards
Custom attributes of Objects
Custom attributes of Objects
Signup and view all the flashcards
attributes() function
attributes() function
Signup and view all the flashcards
names() function
names() function
Signup and view all the flashcards
Factors
Factors
Signup and view all the flashcards
factor()
factor()
Signup and view all the flashcards
levels()
levels()
Signup and view all the flashcards
is.factor()
is.factor()
Signup and view all the flashcards
as.factor()
as.factor()
Signup and view all the flashcards
relevel()
relevel()
Signup and view all the flashcards
Lists
Lists
Signup and view all the flashcards
list() function
list() function
Signup and view all the flashcards
[ ] and $
[ ] and $
Signup and view all the flashcards
[[]] Operator
[[]] Operator
Signup and view all the flashcards
Atomic vector
Atomic vector
Signup and view all the flashcards
Arrays
Arrays
Signup and view all the flashcards
Matrices
Matrices
Signup and view all the flashcards
Study Notes
Important Data Structures in R
- Homogeneous data structures store contents of the same type, while heterogeneous can be of different types
- Atomic vectors and factors are 1-dimensional homogeneous structures
- Recursive vectors or lists are 1-dimensional heterogeneous structures
- Matrices are 2-dimensional homogeneous structures
- Data frames are 2-dimensional heterogeneous structures
- Arrays are N-dimensional homogeneous structures
- Scalars are treated as vectors with a length of one in R.
- The
str()
function gives a concise overview of a data structure in R
Atomic Vectors
- Atomic vectors are the simplest data structure in R
- All elements share the same type or mode, except for
NA
andNaN
which represent missing values. - The mode or type signifies the basic nature of an object's fundamental constituent
Types of Atomic Vectors
- Logical vectors include
TRUE
,FALSE
,T
,F
- Integer vectors such as
1L
,2L
,3L
- Numeric or double vectors for instant:
1.00
,1.25
,1.50
,1.75
,5.00
- Character Vectors include
"a"
,"b"
,"c"
c()
creates vectorsseq()
creates regular sequencesrep()
creates vectors with repeated valuessort()
arranges values in a vector in ascending order
Missing Values
NA
denotes"not available"
or a"missing value"
NaN
signifies"Not a Number"
, which is a missing value from a numerical calculationis.na()
returnsTRUE
for bothNA
andNaN
is.nan()
returnsTRUE
exclusively forNaNs
Character Values
- String values should be surrounded by either single or double quotes
- The backslash
\
acts as an escape character and is disregarded within a string value \n
represents a newline, while\t
indicates a tab\'
stands for a single quote and\"
for a double quotecat()
concatenates string values- The object
letters
is a character vector with"a"
through"z"
Class, Types and Tests
- Every object belongs to a class
class()
verifies the class of an object- A data structure is described by the type of its basic constituent
typeof()
determines the type or storage mode of any object.- For atomic vectors, the class and the mode are the same
- "is" functions check if an object belongs to a specific type or class -- such as
is.character()
,is.double()
,is.integer()
,is.logical()
,is.atomic()
,is.numeric()
Vector Operations
- Fundamental operations are performed on an element-by-element basis
- When two vectors differ in size, elements from the smaller vector are "recycled"
Subsetting Vectors
- Bracket symbols
[]
index or subset data vec01 <- c(1.00,1.25,1.25,1.50,1.00,1.75)
initiates the vectorvec01
with the values listedvec01[2]
selects the second valuevec01[-2]
removes the second valuevec01[1:3]
selects consecutive values 1 to 3vec01[c(1,6)]
selects values in the first and sixth positionsvec01[-c(1,6)]
removes values in the first and sixth positionsvec01[c(T,T,F,T,F,F)]
selects values ifTRUE
vec01[vec01<=1.25]
is conditional filtering for values less than or equal to 1.25
Modifying Parts of a Vector
- Values can be assigned on certain parts of a vector.
vec02<-rep(NA, 5)
assignsNA
to 5 parts of a vectorvec02[c(1,4)]<-"YES"
Modifies parts of vec02 to "YES"
Coercion
- Every element in an atomic vector must share the same type
- Vectors of varying types will have elements converted to the most flexible type
- Hierarchy from least to most flexible data type: Logical -> Integer -> Double -> Character
- Coercion functions with the prefix "as" can convert an object into a particular class. Examples:
as.integer()
,as.double()
,as.numeric()
,as.character()
,as.logical()
- Exercise caution when coercing into a more "specific" or "less flexible" class, as this may lead to the introduction of missing values (NA)
- Coercing a character vector to a numeric class can produce missing values
Attributes
- Every object possesses both a length and a type or mode
length()
andtypeof()
assess these characteristics.- Objects can include custom attributes (such as variable names, category labels, dimensions) that serve to hold metadata
- The
attributes()
function returns a comprehensive list of all supplementary attributes assigned to an object - A commonly utilized attribute that can be added to an object is
"names"
- The function
names()
acts as an accessor for the names attribute - There are several approaches to assigning names to vectors
Factors
- Factors are integer vectors with additional attributes (levels)
- A factor is a vector object created to represent categorical data
- Factors are constructed on integer vectors and include extra attributes
- Factors can be coerced to integer or numeric classes
- Factors are restricted to a specified set of values known as levels
- Levels can be either unordered (nominal) or ordered (ordinal)
- Factors are essential in specific modeling processes
factor()
modifies and creates factorslevels()
returns the predefined categories of the factor object
Factors and Character Vectors
- Factors are useful when you know the categories or
"levels"
linked to a variable - Factors cannot take on values beyond those specified by their levels
- There is a slight difference in how factors are displayed compared to character vectors
is.factor()
checks the type of an objectas.factor()
coerces an object into a factorrelevel()
modifies the first level or reference level of an unordered factor
Lists
- Lists are generic containers in R
- List or recursive vectors can hold elements of any type
- A list is considered
"recursive"
if it can include other lists - Created using the function
list()
- The
typeof()
lists is list - The storage mode is list type
- Other objects derived from lists, such as data frames, will also be categorized with list as their type
- Functions named
is.list()
andas.list()
identify or create lists - List may also have a name attribute, with each element in a list having a label
- The
list$name
notation selects a part of a list
Selecting a Part of a List
- The operators
[]
and$
are for subsetting - The
[[]]
operator is particularly significant for working with lists []
returns a list,[[]
accesses the actual content of the list- The
$
notation is similar to[[]]
when every element of the list has a name
Review of Atomic Vectors
- Atomic vector is the simplest data structure in R.
- It is a collection of elements having the same type, and, by default, its only attributes are length and type or mode.
Arrays and Matrices
- Atomic vectors do not have a
"dimension"
attribute - The dimension attribute can be accessed using the
dim()
function - A matrix is an array with two dimensions
- When a dimension attribute is added to an atomic vector, the object becomes an array
- Arrays with three or more dimensions are rarely used in analyzing data
array()
andmatrix()
functions is a way of creating an array and matrix
Subsetting for Matrices
- The
[]
symbols are used for indexing or subsetting - Since there are two dimensions, an ordered pair of vectors (numeric or logical) must be specified
- The first vector corresponds to rows, and the second vector corresponds to columns
- Subsetting a matrix may result in a matrix-type or vector-type object, depending on the dimension of the output object
- Parts of a matrix can be transformed by combining the syntax for subsetting and assignment
- Conditional processing available by subsetting in matrices
- An empty pair of brackets
[]
will return the whole matrix - The order of numbers in brackets
[]
may change the arrangement of values
Concatenating Matrices
- The
c()
function is used to create vectors and/or concatenate vectors - For matrices, the counterpart of
c()
function arerbind()
andcbind()
functions - When
rbind()
is used on atomic vectors, atomic vectors are treated as row vectors - When
cbind()
is used on atomic vectors, atomic vectors are treated as column vectors rbind()
andcbind()
have conformability conditions- When working with matrices, the number of columns must be the same for
rbind()
, and the number of rows must be the same forcbind()
Matrix Operations
- REPETITION. Basic operations are repeated on each element
- For transpose,
t()
can be used - For matrix multiplication,
%*%
can be used - For the inverse of a square matrix,
solve()
can be used - For the determinant of a matrix,
det()
can be used - For eigenvalues and eigenvectors,
eigen()
can be used - Matrix operations have conformability rules. Always check the dimensions of matrices
Attributes of a Matrix
- Matrices maintain the intrinsic properties of their vector counterpart – length and type.
- Unlike atomic vectors, matrices have a dimension attribute
- The
dim()
function determines the dimension of the matrix, whilencol()
andnrow()
determines the number of columns and rows respectively - Homogeneous. Coercion applies when different types of atomic vectors are bounded to form a matrix.
- Matrices do not inherit names from its vector counterparts
- The counterpart of
"names"
among matrices are"rownames"
and"colnames"
- Row and column names can be accessed using
rownames()
andcolnames()
- respectively
Data Frames
- Data Frames are a list of equal-length vectors
- Recursive Vector or List is a heterogeneous object. It can contain any type of object (vectors, lists, matrices etc.)
- List can be constructed using list() function.
More on Data Frames
- It is the most common way of storing data in R.
- Most methods for analysis would require a data frame as input.
- Data frames share properties from both the matrix and the list
- A data frame is a type of list of equal-length vectors
Data Frame Properties
- Data frames can have
"names"
attributes, meaning each vector can have a name - Subsetting using
[n]
,[[n]]
, or$
applies on data frames - A data frame is a group of equal-length column vectors; has two dimensions just like a matrix
- Subsetting using two vectors [i,j] works on data frames
- Data frames has a value for
dim()
, and may have values forrownames()
, andcolnames()
colnames()
and names() are the same for data frames
Creating Data Frames
- The
data.frame()
function constructs data frames - A data frame cannot be created using a
list()
function, even if the components are equal-length vectors - List of equal-length vectors can be coerced to a data frame
More on Data Frames and String Values
- By default, the
data.frame()
function converts string-valued vectors to factors - Use stringAsFactors = F option to change the default behavior
View()
invokes a spreadsheet-style data viewer on a matrix-like R object
Binding Data Frames and Matrices
rbind()
andcbind()
are functions for matrices, but can be applied on objects with matrix-like characteristics (e.g. data frames)- By default, these functions return matrix objects
rbind()
andcbind()
return a data frame only if one or more inputs are data framesdata.frame()
ensures the output object is a data frame
Key Functions for Data Frames
- Function
head()
returns the first n rows of a data frame - Function
tail()
returns the last n rows of a data frame - The function
order()
takes a vector as input and returns an integer vector describing how the subset vector should be - The function
subset()
is a specialized shorthand function for subsetting data frames
Reading Data from Files
- The function
read.csv()
imports CSV files into a data frame - Columns with characters a values are saved as factors by default unless using attributes
as.is
orstringAsFactors
- The function
setwd()
sets or changes the current working directory
Review: Relationship of Data Structures
- Homogeneous elements are atomic vectors that can become factors with levels attribute, or arrays with dimension attributes and then matricies with two dimensions only
- Heterogeneous elements are lists and can relate to data frames, with two dimensions and matrix like characteristics
R Langauge Characteristics
- Review the R Language - Functional and Object Oriented
- Everything computed is an object
- Objects structure themselves to suit goals of computations
- Each object belongs to a class- wherein every member of a class shares a set of properties
- Polymorphic Functions are functions that can communicate differently
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.