week03.pdf
Document Details
Uploaded by GenerousChrysoprase
La Trobe University
Tags
Related
- PPT CSD102 Data Science Session 1 and 2_Shefali Naik.pdf
- King Fahd University of Petroleum & Minerals SEC524 Computer and Network Forensics Lectures 11 and 12 PDF
- Foundations of Computer Science BO CDA 103/ ACS 103/MCS 103 PDF
- R Programming Syllabus PDF
- Introduction to R PDF
- Big Data Analytics Lecture Notes PDF
Full Transcript
CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming CSE5DEV DATA EXPLORATION AND ANALYSIS Week 3 Data Wrangling & R programming CSE5DEV Syllabus Week-Overview Data Wrangling Overview 1 CSE5DEV Syllabus 2 Week-Overview 3 Data Wrangling 4 Basics of R Programming Basics of...
CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming CSE5DEV DATA EXPLORATION AND ANALYSIS Week 3 Data Wrangling & R programming CSE5DEV Syllabus Week-Overview Data Wrangling Overview 1 CSE5DEV Syllabus 2 Week-Overview 3 Data Wrangling 4 Basics of R Programming Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Subject Syllabus — Lecture 1 — Introduction — Lecture 2 — Data Collection & R Programming — Lecture 3 — Data Wrangling & R Programming Lecture 4 Data Cleaning & Normalisation Lecture 5 Data Visualisation Lecture 6 Lecture 7 Lecture 8 Data Exploration 1 Data Exploration 2 Data Exploration 3 Analysis Analysis Analysis Lecture 10 Case Study 1 Lecture 11 Case Study 2 Lecture 12 Revision Lecture 9 Correlation & Pattern Discovery Analysis CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Science Project Almost all data science and analysis projects require the same set of stages to be performed. These are: Stage -1 Identify the problem (question) Stage - 2 Collect & Prepare the data Stage - 3 Explore the data Stage - 4 Communicate the results What is the goal? What do you want to estimate? How to track houses prices across different areas? Data resources Descriptive statistics What are the findings? Data representation Visualisation What we learn? Report the findings Does the result make sense? Clean and normalise the data CSE5DEV Syllabus Week-Overview Data Wrangling Overview 1 CSE5DEV Syllabus 2 Week-Overview 3 Data Wrangling 4 Basics of R Programming Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Week 3 Overview Data Wrangling & R programming This week will be covering the basics of Data Wrangling & R programming. Learning outcomes: • Learn about data representation. • Learn how to convert data from one format to another . • Learn R programming conditional statement. • Learn how to use R programming packages. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming What we have learned so far? Data can be in different formats, but computer program expects your data to be organised in a well-defined structure. What we have learned so far? —— Theory —— • Data Collection: working with data 1 Data sources; PC, internet, external. 2 Data formats: text, CSV, URL, ..., etc. 3 Data values: qualitative or quantitative. 4 Data categories: experimental or observational. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming What we have learned so far? What we have learned so far? —— R Programming —— 1 Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc 2 Data Type and data structure (vector, factor, matrix and data frame) 3 View, Access, Change.... etc. 4 Import data into R Environment (text file and csv files) Note The above steps (Reading, Viewing, Accessing, Changing, ..., etc) are very crucial for Lecture 3 to lecture 11. If you DON’T know how to perform them in R, please let us know as soon as possible. CSE5DEV Syllabus Week-Overview Data Wrangling Overview 1 CSE5DEV Syllabus 2 Week-Overview 3 Data Wrangling 4 Basics of R Programming Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Data Wrangling Data wrangling can be defined as the process of organising data in consistent representation or format that can be easily used and presented. CSV file R Code: Import CSV file View Data Data Type Data Structure Access Data Rstudio Environment Transform data into a readable format CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Example: Consider the country population dataset (data1.csv). The same data can be organised in different representations, as shown in next slides. CSE5DEV Syllabus Week-Overview Data Wrangling Example: format-1. Data Wrangling Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Example: format-2. Data Wrangling Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Example: format-3. Data Wrangling Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Example: format-4. Data Wrangling Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Example: format-5. Data Wrangling Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling From the previous examples, we have see that • The same data can be organised in different representations or formats. • Each format shows the same values of four variables: country, year, population and cases. • Different format show the values in a different representation. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Q: What type of representation will be used in CSE5DEV labs? A: Tabular representation (Observations-by-features). Figure: Image from R for Data Science CSE5DEV Syllabus Week-Overview Data Wrangling Data Wrangling Tabular representation In CSE5DEV, we use data frame data structure Figure: Image from R for Data Science Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Organising data in observations-by-features is considered the most convenient and standard representation for data analysis. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular data Types of features/attributes: It is important to recognise the types of values each feature/attribute takes in order to understand which operations make sense for it. Example • Can we compute an average eye colour? • How do we compute the difference between phone numbers? • Can we say today is ’twice as hot/cold’ as yesterday? This is similar to problems like 6 apples / 4 people = 1.5 apples per person, but 10 people / 4 car seats = 3 cars. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular data Qualitative vs. Quantitative attributes: Attribute values can be split into two types: Qualitative attributes Attributes that take values from a (finite) set of categories are called categorical or qualitative attributes. In some sense, they describe an object/observation, rather than measure its properties. Quantitative attributes Attributes that represent quantities are called numerical or quantitative attributes. They provide concrete quantifiable measurements of an object/observation. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular data Qualitative: Nominal vs. Ordinal: Qualitative attributes can be split further into two types: Nominal attributes Examples: zip codes, eye colour, operating system, gender. Values of such attributes just specify names without any particular order or relation between them (except for = and ̸=). Binary attributes are nominal attributes with only two values (Yes/No or 0/1). They can be symmetric or asymmetric based in whether or not their values are equally informative. Ordinal attributes Examples: ratings, grades, street/avenue numbers. Values of such attributes have some order, even though they don’t specify an exact quantity. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular data Quantitative: Interval vs. Ratio: Quantitative attributes can also be split into two types: Interval attributes Examples: calendar dates, azimuth direction, Fahrenheit temperatures. Such attributes represent quantities with meaningful difference (or fixed intervals) between their values (but no multiplicative relations). Ratio attributes Examples: mass, length, distance, currency, age, electrical current. Such attributes represent quantities that have meaningful ratios between their values. Unlike interval attributes, ratio ones usually have an ’absolute zero’. We can also split quantitative into discrete and continuous ones. All quantitative attributes are considered discrete. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular data Summary of attribute types: The types of attributes can be regarded via the operations that can be applied to them: • Comparison (= and 6=) - every type • Ordering (> and <) - every type except nominal • Differences (-) and addition (+) - only quantitative • Division (/) and multiplication (x, .) - only ratio Other operations (e.g., mean, median, correlation) may also be inapplicable for some types while applicable to others. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular data Technical formats: Tabular data can be stored or collected in several standard formats, such as: • Comma separated file (CSV) • Flat file or delimited text file (e.g., space or tab delimited) • XML or other log files • Proprietary formats (e.g., FCS for biological data or MAT files for Matlab data) • Database tables Non-tabular Data: Transactional data (term matrix, text documents), structured signals, multidimensional signals, nonparametric representations. CSE5DEV Syllabus Week-Overview Data Wrangling Data Wrangling Tabular representation In Tabular representation, we need to make sure that Figure: Image from R for Data Science • Each variable must have its own column. • Each observation must have its own row. • Each value must have its own cell. Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation If the data is not in a tabular representation, then we need perform a couple of processes to convert it into a tabular representation. Examples of the processes are: • Gathering and Spreading. • Separating and Uniting. • Filtering. • Grouping. • mutating. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Gathering process - gather columns into a new pair of variables Figure: Image from R for Data Science CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Gathering process - gather columns into a new pair of variables • gather(data, key, value, ...) • • • • data is the data frame you are working with. key is the name of the key column to create. value is the name of the value column to create. ... is a way to specify what columns to gather from. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Gathering process - gather columns into a new pair of variables Figure: Image from R for Data Science R Code: gather () function gather(data, ”year”, ”cases”, 2:3) CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Spreading process - Spreading is the opposite of gathering. Figure: Image from R for Data Science CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Spreading process - Spreading is the opposite of gathering. • spread(data, key, value) • data is your data of interest. • key is the column whose values will become variable names. • value is the column where values will fill in under the new variables created from key. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Spreading process - Spreading is the opposite of gathering. Figure: R Code: spread () function spread(data, key, value) Image from R for Data Science CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Separating process - pulls apart one column into multiple columns, by splitting wherever a separator character appears Figure: Image from R for Data Science CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Separating process - pulls apart one column into multiple columns, by splitting wherever a separator character appears • separate(data,col, into, sep) • data is the data frame of interest. • col is the column that needs to be separated. • into is a vector of names of columns for the data to be separated into to. • sep is the value where you want to separate the data at. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Separating process - pulls apart one column into multiple columns, by splitting wherever a separator character appears Figure: Image from R for Data Science R Code: separate() function separate(data, rate, c(”cases”, ”population”), sep=”/”) CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Uniting process - the inverse of separate. It combines multiple columns into a single column. Figure: Image from R for Data Science CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Uniting process - the inverse of separate. It combines multiple columns into a single column. • unite(data,col,..., sep) • • • • data is the data frame of interest. col is the column you wish to add. ... is names of columns you wish to unite together. sep is how you wish to join the data in the columns. CSE5DEV Syllabus Week-Overview Data Wrangling Basics of R Programming Data Wrangling Tabular representation Example: Uniting process - the inverse of separate. It combines multiple columns into a single column. Figure: Image from R for Data Science R Code: unite() function unite(data, ”year”, century, year, sep=””) CSE5DEV Syllabus Week-Overview Data Wrangling Data Wrangling Five main verbs • Select - select variables by their names. • Filter - choose rows that satisfy some criteria. • Arrange - reorder the rows. • Mutate - create transformed or derived variables. • Summarise - collapse rows down to summaries. Basics of R Programming CSE5DEV Syllabus Week-Overview Data Wrangling Overview 1 CSE5DEV Syllabus 2 Week-Overview 3 Data Wrangling 4 Basics of R Programming Basics of R Programming Basics of R Programming Overview 5 Basics of R Programming Basics of R Programming Basics of R Programming In previous lectures, we have learned • How to read data from file. • Variable, variable names and data types. • Data structures: vector, factor, matrix and data frame. • View, access, change ...etc. dat <- read.csv("data.csv", header=TRUE, sep =",") • names() - shows the names attribute for a data frame. • head() - shows first 6 rows. • tail() - shows last 6 rows. • dim() - returns the dimensions of data frame. • nrow() - number of rows. • ncol() - number of columns. • str() - structure of data frame - name, type and preview of data in each column. • sapply(dataframe, class) - shows the class of each column in the data frame. Basics of R Programming Basics of R Programming In this lecture, we will learn how to write R code for the following tasks: • Logical conditions to select subsets • Conditional execution: if statements • Repetitive execution: for loops, repeat and while • Packages • Format transform Basics of R Programming View data Example: read data from file. dat <- read.csv("data.csv", header=TRUE, sep =",") names(dat) "Model" "mpg" "am" "gear" "cyl" "carb" "disp" "hp" "drat" "wt" "qsec" "vs" head(dat) ## ## ## ## ## ## ## Model 1 Mazda RX4 2 Mazda RX4 Wag 3 Datsun 710 4 Hornet 4 Drive 5 Hornet Sportabout 6 Valiant mpg cyl disp hp drat wt qsec vs am gear carb 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 • We may need to extract data that satisfy certain criteria. • For example, we may want to select data based on the disp value that equal or less than 160. • We can use Logical condition operators to select subset of data. Basics of R Programming Logical condition operators — Conditional operators — Conditional operators are used to compare between values or expressions. They return TRUE (1) or FALSE (0) Basics of R Programming Logical condition operators — Conditional operators — Examples: Conditional operators for two variables: x and y. x <- 4 y <- 15 x<y ## [1] TRUE x>y ## [1] FALSE x<=5 ## [1] TRUE y>=20 ## [1] FALSE y == 16 ## [1] FALSE x != 5 ## [1] TRUE Basics of R Programming Logical condition operators — Conditional operators — Examples: Conditional operators for a vector x x <- c(3, 5, 1, 2, 7, 6, 4) x < 5 # is x less than 5 ## [1] TRUE FALSE TRUE TRUE FALSE FALSE TRUE x <= 5 # is x less than or equal to 5 ## [1] TRUE TRUE TRUE TRUE FALSE FALSE TRUE x > 3 # is x greater than 3 ## [1] FALSE TRUE FALSE FALSE TRUE TRUE TRUE x >= 3 # is x greater than or equal to 3 ## [1] TRUE TRUE FALSE FALSE TRUE TRUE TRUE x == 2 # is x equal to 2 ## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE x != 2 # is x not equal to 2 ## [1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE Basics of R Programming Logical condition operators — Conditional operators — Useful functions: all, any and which • The all and any functions check whether all or at least some entries of a logical vector are TRUE respectively. x <- c(3, 5, 1, 2, 7, 6, 4) any (x == 2) ## [1] TRUE all (x == 2) ## [1] FALSE all (x < 10) ## [1] TRUE • The function which gives the TRUE and the index of value. x <- c(3, 5, 1, 2, 7, 6, 4) which (x == 2) # Fourth element of x is equal two 2 ## [1] 4 which (x < 3) # Third and fourth elements of x are less than 3 ## [1] 3 4 y <- which (x < 3) print(y) ## [1] 3 4 print(typeof(y)) ## [1] "integer" Basics of R Programming Logical condition operators — Logical Operators — Logical operators can be used to combine two or more conditions. In this subject, we will only use the element-wise operators: !, & and |. All operators compare vectors element by element and then return TRUE (1) or FALSE (0). Basics of R Programming Logical condition operators — Logical Operators — Examples: Logical operators for a vector x x <- c(3, 5, 1, 2, 7, 6, 4) (x > 2) & (x <= 6) # is x greater than 2 and less than or equal to 6 ## [1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE (x < 2) | (x > 5) # is x less than 2 or greater than 5 ## [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE !(x > 3) # not [x greater than] ## [1] TRUE FALSE TRUE TRUE FALSE FALSE FALSE Basics of R Programming Logical condition operators — Logical Operators — Consider the following example: x <- c (5, 3, 7, 9, 10) • We want to extract the values of the vector x which are greater than 5 (7, 9, 10). There are two methods: 1 Method 1 x <- c (5, 3, 7, 9, 10) ind <- x > 5 # is x greater than 5 print (ind) ## [1] FALSE FALSE TRUE TRUE TRUE print (x[ind]) ## [1] 7 9 10 2 Method 2 x <- c (5, 3, 7, 9, 10) x[x > 5] ## [1] 7 9 10 Basics of R Programming Logical condition operators — Logical Condition Operators — • We may need to extract data that satisfy certain criteria. • For example, we may want to select data based on the disp value that equal or less than 160. • We can use Logical condition operators to select subset of data. s <- dat[dat$disp<=160, ] print(s) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Model 1 Mazda RX4 2 Mazda RX4 Wag 3 Datsun 710 8 Merc 240D 9 Merc 230 18 Fiat 128 19 Honda Civic 20 Toyota Corolla 21 Toyota Corona 26 Fiat X1-9 27 Porsche 914-2 28 Lotus Europa 30 Ferrari Dino 32 Volvo 142E mpg cyl disp hp drat wt qsec vs am gear carb 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Basics of R Programming Logical condition operators — Logical Condition Operators — • We may need to extract data that satisfy certain criteria. • For example, we may want to select data based on the disp value that equal or less than 160 AND hp less than 110. z <- dat[dat$disp<=160 & dat$hp<110,] print(z) ## ## ## ## ## ## ## ## ## ## ## Model 3 Datsun 710 8 Merc 240D 9 Merc 230 18 Fiat 128 19 Honda Civic 20 Toyota Corolla 21 Toyota Corona 26 Fiat X1-9 27 Porsche 914-2 32 Volvo 142E mpg cyl disp hp 22.8 4 108.0 93 24.4 4 146.7 62 22.8 4 140.8 95 32.4 4 78.7 66 30.4 4 75.7 52 33.9 4 71.1 65 21.5 4 120.1 97 27.3 4 79.0 66 26.0 4 120.3 91 21.4 4 121.0 109 drat 3.85 3.69 3.92 4.08 4.93 4.22 3.70 4.08 4.43 4.11 wt 2.320 3.190 3.150 2.200 1.615 1.835 2.465 1.935 2.140 2.780 qsec vs am gear carb 18.61 1 1 4 1 20.00 1 0 4 2 22.90 1 0 4 2 19.47 1 1 4 1 18.52 1 1 4 2 19.90 1 1 4 1 20.01 1 0 3 1 18.90 1 1 4 1 16.70 0 1 5 2 18.60 1 1 4 2 Basics of R Programming Logical condition operators — Logical Condition Operators — • We may need to extract data that satisfy certain criteria. • For example, we may want to select data based on the disp value that equal or less than 160 AND hp less than 110 for wt column. x <- dat[dat$disp<=160 & dat$hp<110, "wt"] print(x) ## [1] 2.320 3.190 3.150 2.200 1.615 1.835 2.465 1.935 2.140 2.780 z <- dat[dat$disp<=160 & dat$hp<110,] print(z) ## ## ## ## ## ## ## ## ## ## ## Model 3 Datsun 710 8 Merc 240D 9 Merc 230 18 Fiat 128 19 Honda Civic 20 Toyota Corolla 21 Toyota Corona 26 Fiat X1-9 27 Porsche 914-2 32 Volvo 142E mpg cyl disp hp 22.8 4 108.0 93 24.4 4 146.7 62 22.8 4 140.8 95 32.4 4 78.7 66 30.4 4 75.7 52 33.9 4 71.1 65 21.5 4 120.1 97 27.3 4 79.0 66 26.0 4 120.3 91 21.4 4 121.0 109 drat 3.85 3.69 3.92 4.08 4.93 4.22 3.70 4.08 4.43 4.11 wt 2.320 3.190 3.150 2.200 1.615 1.835 2.465 1.935 2.140 2.780 qsec vs am gear carb 18.61 1 1 4 1 20.00 1 0 4 2 22.90 1 0 4 2 19.47 1 1 4 1 18.52 1 1 4 2 19.90 1 1 4 1 20.01 1 0 3 1 18.90 1 1 4 1 16.70 0 1 5 2 18.60 1 1 4 2 Basics of R Programming Conditional execution: if statements Conditional execution A conditional execution (or if statement) executes some codes ( or statements) only if some condition is met. • If statements have this syntax: • if (condition) {expressions 1 if true} else {expressions 2 otherwise} x <- c (2, 3) if (sqrt (9) > 2){mean (x)} else {sum (x)} ## [1] 2.5 if (sqrt (9) > 4){mean (x)} else {sum (x)} ## [1] 5 if (sqrt (6) < 3){mean (x*3)} else {sum (x)} ## [1] 7.5 x <- c (2, 1, 3, 6, 1) y <- ifelse (x > 3, mean(x), print (y) ## [1] 13.0 13.0 13.0 sum (x)) 2.6 13.0 Basics of R Programming Conditional execution: if statements — If Statement — • We can use If statement without else. For example, team_A <- 8 # Number of goals scored by Team A team_B <- 6 # Number of goals scored by Team B if (team_A > team_B){ print ("Team A wins") } ## [1] "Team A wins" • We can use multi-able else using else if as follows: team_A <- 4 # Number of goals scored by Team A team_B <- 4# Number of goals scored by Team B if (team_A > team_B){ print ("Team A won") } else if (team_A < team_B){ print ("Team B won") } else { print ("Team A & B got the same number of goals") } ## [1] "Team A & B got same number of goals" Basics of R Programming Repetitive execution: for loops, repeat and while Repetitive execution Repetitive execution functions are used to repeatedly perform some computations (or calculation, printing, ..etc) based on predefined rules. Examples of R repetitive execution functions are 1 for loop: iterate over a vector. • for (variable in vector){ commands } 2 repeat: iterate over a block of code number of times until some condition is met. • repeat { expression if(condition) {break} } 3 while: evaluates a expression as long as a stated condition is TRUE. • while(condition){ expression } Basics of R Programming Repetitive execution: for loops, repeat and while — Example: for loops — for (i in 1:4) print (i) ## ## ## ## [1] [1] [1] [1] 1 2 3 4 i<- 1 for (i in seq (5, 10,2)) {print (i)} ## [1] 5 ## [1] 7 ## [1] 9 x <- c(2, 1, 3, 6, 1) nr <- length(x) # vector length print (nr) ## [1] 5 for (i in 1:nr) { print (x[i])} # print vector element ## ## ## ## ## [1] [1] [1] [1] [1] 2 1 3 6 1 Basics of R Programming Repetitive execution: for loops, repeat and while — Example: for loops — fruit <- c('Apple', 'Orange', 'Passion fruit', 'Banana') for (p in fruit) { print(p) } ## ## ## ## [1] [1] [1] [1] "Apple" "Orange" "Passion fruit" "Banana" a <- matrix (10:15, 3) print (a) ## [,1] [,2] ## [1,] 10 13 ## [2,] 11 14 ## [3,] 12 15 nrr <- nrow(a) # number of rows is a for (i in 1:nrr) {print (a[i,2])} ## [1] 13 ## [1] 14 ## [1] 15 Basics of R Programming Repetitive execution: for loops, repeat and while — Example: repeat loop — x <- 4 repeat{ print (x); x <- x + 2; if (x > 10){break}} ## ## ## ## [1] [1] [1] [1] 4 6 8 10 result <- c("Hello CSE5APG") i <- 1 # repeat function repeat { print(result) # update expression i <- i + 1 # test condition if(i >4) {break} } ## ## ## ## [1] [1] [1] [1] "Hello "Hello "Hello "Hello CSE5APG" CSE5APG" CSE5APG" CSE5APG" Basics of R Programming Repetitive execution: for loops, repeat and while — Example: while loop — x <- 0 while (x < 5) {print (x); x <- x + 1} ## ## ## ## ## [1] [1] [1] [1] [1] 0 1 2 3 4 i <- 1 j <- 1 mat <- matrix(0, nrow = 3, ncol = 2) print (mat) ## [,1] [,2] ## [1,] 0 0 ## [2,] 0 0 ## [3,] 0 0 while (i <= 3) { j<-1 while (j <= 2) { mat[i, j] <- i * 2 + 1 j <- j + 1 } i <- i + 1 } print (mat) ## [,1] [,2] ## [1,] 3 3 ## [2,] 5 5 ## [3,] 7 7 Basics of R Programming Packages Packages Packages are collections of R functions, data, and compiled code developed by the community to add or perform specific functionality. • Some packages are installed with R and automatically loaded at the start of the Rstudio. • Several other Packages should be installed before we can use them. To install a Package run ONLY ONE TIME: • install.packages(”Package name”) • To use an installed Package, we need to load it using library function as follows: • library (Package name) Basics of R Programming Packages — Data Wrangling — Example: Package for the five main verbs • Select - select variables by their names. • Filter - choose rows that satisfy some criteria. • Arrange - reorder the rows. • Mutate - create transformed or derived variables. • Summarise - collapse rows down to summaries. The above processes can be used only if the ”tidyr” and/or ”dplyr” package has been installed and loaded into R as follows: • To install a package in R run: install.packages(”tidyr”) • To load a package into R run: library(tidyr) Basics of R Programming Packages — Data Wrangling — • Step 1: Create a data frame: df <- data.frame(color = c("blue", "black", "blue", "blue", "black"), value = 1:5) • Step 2: perform the following functions: • filter() • arrange() • select() • mutate() Basics of R Programming Packages — Data Wrangling — Basics of R Programming Packages — Data Wrangling — Basics of R Programming Packages — Data Wrangling — Basics of R Programming Packages — Data Wrangling — Basics of R Programming Packages — Data Wrangling — Basics of R Programming Packages — Data Wrangling — Basics of R Programming Packages — Data Wrangling — Basics of R Programming Packages — Data Wrangling — Basics of R Programming End of Week 3 See you Next Lecture (Week 4) Data Cleaning & Normalisation Table: CSE5DEV Timetable Check LMS