Podcast
Questions and Answers
Which statement best describes R's capability as a tool?
Which statement best describes R's capability as a tool?
- R can evaluate complicated mathematical expressions. (correct)
- R is solely for data visualization.
- R is primarily a word processor.
- R can only handle statistical analysis.
R is used only for reading files and does not perform any calculations.
R is used only for reading files and does not perform any calculations.
False (B)
What are the two main roles R serves as mentioned in the introduction?
What are the two main roles R serves as mentioned in the introduction?
Calculator and data analysis tool
In R, the term ______ refers to the various types of data that can exist within an object.
In R, the term ______ refers to the various types of data that can exist within an object.
Match the following features of R with their descriptions:
Match the following features of R with their descriptions:
How can R be utilized as a calculator?
How can R be utilized as a calculator?
What is the significance of functions in R?
What is the significance of functions in R?
What types of objects can be examined in R?
What types of objects can be examined in R?
Describe the process of reading files in R.
Describe the process of reading files in R.
What is one simple and useful function in R and its purpose?
What is one simple and useful function in R and its purpose?
Flashcards
R as a calculator
R as a calculator
R can perform mathematical calculations including complex expressions.
R functions
R functions
R provides pre-built blocks of code to perform specific tasks.
Reading files in R
Reading files in R
Functions allow importing data from external files into R.
Data types in R
Data types in R
Signup and view all the flashcards
Exploring R objects
Exploring R objects
Signup and view all the flashcards
What is R?
What is R?
Signup and view all the flashcards
Functions in R
Functions in R
Signup and view all the flashcards
Reading data in R
Reading data in R
Signup and view all the flashcards
Types of objects in R
Types of objects in R
Signup and view all the flashcards
Examining an object
Examining an object
Signup and view all the flashcards
Study Notes
Introduction to R
- R is a calculator
- R can evaluate complex mathematical expressions
- Variables are assigned using the
<-
operator (e.g.,x <- 10
) - Basic arithmetic functions are available (e.g.,
1+1
,sqrt()
) - Functions for creating sequences (
seq()
,rep()
) - Functions for calculating absolute values (
abs()
) - Functions for manipulating decimal places (
e2
,e-2
) - Element-wise product can be performed with
ab; uv
Functions in R
- R has built-in functions for performing various tasks, including mathematical calculations.
- Functions can be used to perform simple arithmetic operations or complex analyses.
Reading Files in R
read.delim()
is used for tab separated files (.txt)- Default decimal separator is "."
read.table()
reads files in tabular format to create a data frameread.csv()
reads comma separated values files (.csv) into a data frame.read.csv2()
is used when the decimal separator is "," and the field separator is ";".
Exploring Data
View(x)
displays the data frame's contents.head(x)
shows the top 6 rows,head(x, n=n)
for the first few rowstail(x)
displays the last 6 rows,tail(x,n=n)
for the last few rowsnames(x)
shows the names of the variables in the data frame.
Types of data in R
- R supports several data types including numeric, character and logical, vectors, matrices and dataframes
Data Types
- Scalars: Single value (e.g.,
a <- 5
) - Vectors: Multiple values (e.g.,
v <- c(1,2,3)
) - Matrices: Two-dimensional array of values (e.g.,
m <- matrix(v, 3, 2)
) - Lists: Can contain varied data types (e.g.,
q <- list(a=v, b=x, c=u)
) - Data frames: Table-like structure, typically used for storing data with different variable types.
Data Structures
- Vectors store multiple values of the same type.
- Matrices are two-dimensional structures.
- Lists can contain elements of different data types.
- Data frames are tabular structures, organized into rows and columns.
Operations (Arithmetic)
3 + exp(4) * 2^2
(3 + exp(4)) * 2^2
- R follows operator precedence rules when evaluating expressions.
Data Frames
- A table-like structure with different variable types.
Data Merging
cbind()
merges by columns.rbind()
merges by rows.merge()
joins two datasets based on a common variable. (Commonly used for merging two datasets).
Data Manipulation
na.omit()
removes rows with NA (missing) values.complete.cases()
filters out rows with missing values.apply
,colMeans
,rowMeans
,mean
, used to calculate and analyze data.summary()
provides summary statistics for data frames or vectors.
Functions Creation
- User-defined functions are created in R using the
function()
syntax.
Data Types in R (continued)
- Character data: Text data (e.g.,
name <- c("Ahmed", "Laila")
). - Logical data: TRUE/FALSE values (e.g.,
smoker <- c(TRUE, FALSE, FALSE)
). - Numeric data stores numbers.
- Ordering variables:
sort(c(4,2,6))
colnames(m)=paste("X",1:ncol(m),sep="")
renames the column names.rownames(m)=1:nrow(m)
renames the row names.
Data Preparation
- Data import from different formats:
read.csv
,read.table
.
Missing Values Analysis
is.na(dataNA)
identifies missing values.sum(is.na(data)
,apply(data, 2, sum)
gives counts or the total of missing data by column.- Common functions include
na.omit()
,complete.cases()
,colSums(is.na(data))
to find the missing values in a column/data. - Handling missing values, including replacing them with imputation methods (mean, median etc).
Outliers
- Identifying:
boxplot(data)
. - Removal:
data[data > bench]
- Filtering data using calculated quartiles (Q1, Q3, IQR).
Descriptive Statistics
- Calculating measures like mean, median, min, max, range, IQR, standard deviation, variance.
- Using functions such as
mean(), median(), min(), max(), range(), IQR(), sd(), var()
. - Determining the mode:
Mode()
calculated using theDescTools
library. summary()
summarises numeric, logical and/or factor.
Tables and Plots
- Creating frequency tables, contingency tables (crosstabulations)
- Creating histograms to plot data (including specific quantitative or qualitative variable(s)).
- Constructing boxplots to identify data distribution.
- Using
table()
,chisq.test()
,fisher.test()
,oddsratio()
,assocstats()
, andcor()
.
Association and Correlation
- Calculate and display correlations of variables.
- Create contingency tables for association analysis - using measures like odds ratio, relative risk or chi-squared test
vcd()
,vcdExtra
andnnet
libraries.
Data Manipulation & Visualisation
- Data manipulation using
data.frame()
,cbind()
,rbind()
,dplyr
. - Scatter plots: The
plot()
function can be used.
Additional Considerations:
- Load the necessary libraries at the start of your script: using
library
command. - Using
attach()
to make variables in a data frame directly accessible. - Appropriately handle data types when using specific functions (e.g., converting factors to numerical values with
as.numeric()
).
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.