Full Transcript

Week 5 Data Visualisation CSE5DEV Syllabus Week-Overview Data Visualisation Examples of Data Visualisation Subject Syllabus — Lecture 1 — Introduction — Lecture 2 — Data Collection & R Programming — Lecture 3 — Data Wrangling & R Programming — Lecture 4 — D...

Week 5 Data Visualisation CSE5DEV Syllabus Week-Overview Data Visualisation Examples of Data Visualisation Subject Syllabus — Lecture 1 — Introduction — Lecture 2 — Data Collection & R Programming — Lecture 3 — Data Wrangling & R Programming — Lecture 4 — Data Cleaning & Normalisation — Lecture 5 — Data Visualisation Subject Syllabus CSE5DEV Syllabus Week-Overview Data Visualisation Examples of Data Visualisation Learning outcomes: Learn about the benefit of visualisation. Learn about data visualisation methods. Learn how to use charts and graphs. Data can be in different formats, but computer program expects your data to be organised in a well-defined structure. What we have learned so far? —— Theory —— Data can be in different formats, but computer program expects your data to be organised in a well-defined structure. What we have learned so far? —— Theory —— Data source and format: CSV data, Txt data, ..., etc. Variable names, data types and data structure. Data representation: Tabular representation (observations-by-features). Data Cleaning & Normalising. What we have learned so far? —— R Programming —— What we have learned so far? —— R Programming —— Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc What we have learned so far? —— R Programming —— Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc Data Type and data structure (vector, factor, matrix and data frame) What we have learned so far? —— R Programming —— Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc Data Type and data structure (vector, factor, matrix and data frame) View, Access, Change etc. What we have learned so far? —— R Programming —— Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc Data Type and data structure (vector, factor, matrix and data frame) View, Access, Change etc. Import data into R Environment (text file and csv files) What we have learned so far? —— R Programming —— Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc Data Type and data structure (vector, factor, matrix and data frame) View, Access, Change etc. Import data into R Environment (text file and csv files) Correct or change the format of the data to make it tidy What we have learned so far? —— R Programming —— Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc Data Type and data structure (vector, factor, matrix and data frame) View, Access, Change etc. Import data into R Environment (text file and csv files) Correct or change the format of the data to make it tidy Clean the data What we have learned so far? —— R Programming —— Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc Data Type and data structure (vector, factor, matrix and data frame) View, Access, Change etc. Import data into R Environment (text file and csv files) Correct or change the format of the data to make it tidy Clean the data Handle missing values What we have learned so far? —— R Programming —— Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc Data Type and data structure (vector, factor, matrix and data frame) View, Access, Change etc. Import data into R Environment (text file and csv files) Correct or change the format of the data to make it tidy Clean the data Handle missing values Normalise the data What we have learned so far? —— R Programming —— Install R and Rstudio, create Rmarkdown file, write and run basic codes, ..etc Data Type and data structure (vector, factor, matrix and data frame) View, Access, Change etc. Import data into R Environment (text file and csv files) Correct or change the format of the data to make it tidy Clean the data Handle missing values Normalise the data ?mean Base R Cheat Sheet Getting Help Accessing the help files Vectors Creating Vectors For Loop Example Programming While Loop Example Get help of a particular function. help.search(‘weighted mean’) Search the help files for a word or phrase. help(package = ‘dplyr’) Find help for a package. More about an object sort(x) Vector Functions rev(x) If Statements Functions str(iris) Get a summary of an object’s structure. class(iris) Find the class an object belongs to. Return x sorted. table(x) See counts of values. Return x reversed. unique(x) See unique values. Using Libraries install.packages(‘dplyr’) Download and install a package from CRAN. library(dplyr) Load the package into the session, making all its functions available to use. dplyr::select Use a particular function from a package. data(iris) Load a built-in dataset into the environment. Working Directory getwd() Find the current working directory (where inputs are found and outputs are sent). Selecting Vector Elements By Position x[4] The fourth element. x[-4] All but the fourth. x[2:4] Elements two to four. x[-(2:4)] All elements except two to four. x[c(1, 5)] Elements one and five. By Value x[x == 10] Elements which are equal to 10. x[x < 0] All elements less than zero. Example Reading and Writing Data Example setwd(‘C://file/path’) Change the current working directory. x[x %in% c(1, 2, 5)] Elements in the set 1, 2, 5. Use projects in RStudio to set the working directory to the folder you are working in. Named Vectors x[‘apple’] Element with name ‘apple’. Conditions RStudio® is a trademark of RStudio, Inc. • CC BY Mhairi McNeill • [email protected] Learn more at web page or vignette • package version • Updated: 3/15 m <- matrix(x, nrow = 3, ncol = 3) Create a matrix from x. log(x) Natural log. sum(x) Sum. exp(x) Exponential. mean(x) Mean. max(x) Largest element. median(x) Median. min(x) Smallest element. quantile(x) Percentage quantiles. round(x, n) Round to n decimal rank(x) Rank of elements. places. signif(x, n) Round to n var(x) The variance. significant figures. cor(x, y) Correlation. sd(x) The standard deviation. df <- data.frame(x = 1:3, y = c('a', 'b', 'c')) A special case of a list where all elements are the same length. List subsetting t.test(x, y) Preform a t-test for difference between means. pairwise.t.test Preform a t-test for paired data. prop.test Test for a difference between proportions. aov Analysis of variance. Matrix subsetting df[ , 2] df[2, ] df[2, 2] RStudio® is a trademark of RStudio, Inc. • CC BY Mhairi McNeill • [email protected] • 844-448-1212 • rstudio.com nrow(df) Number of rows. ncol(df) Number of columns. dim(df) Number of columns and rows. cbind - Bind columns. rbind - Bind rows. Learn more at web page or vignette • package version • Updated: 3/15 CSE5DEV Syllabus Week-Overview Data Visualisation Examples of Data Visualisation Data Visualisation Given an example of dataset. What is the best way to explore data variables? Data Visualisation Given an example of dataset. What is the best way to explore data variables? If your data involves a small number of samples, you might just print them out on the screen or paper and investigate them quickly before doing any analysis. Given an example of dataset. What is the best way to explore data variables? If your data involves a small number of samples, you might just print them out on the screen or paper and investigate them quickly before doing any analysis. If you have a huge dataset, then you need some assistance to explore the data. Given an example of dataset. What is the best way to explore data variables? If your data involves a small number of samples, you might just print them out on the screen or paper and investigate them quickly before doing any analysis. If you have a huge dataset, then you need some assistance to explore the data. We can use R programming to display the huge dataset as tables, but we can only explore the pattern of a specific parameter. Data Visualisation Given an example of dataset. What is the best way to explore data variables? If your data involves a small number of samples, you might just print them out on the screen or paper and investigate them quickly before doing any analysis. If you have a huge dataset, then you need some assistance to explore the data. We can use R programming to display the huge dataset as tables, but we can only explore the pattern of a specific parameter. How about if want to explore the patterns or relationships of one or several parameters? Given an example of dataset. What is the best way to explore data variables? If your data involves a small number of samples, you might just print them out on the screen or paper and investigate them quickly before doing any analysis. If you have a huge dataset, then you need some assistance to explore the data. We can use R programming to display the huge dataset as tables, but we can only explore the pattern of a specific parameter. How about if want to explore the patterns or relationships of one or several parameters? Data Visualisation Data Visualisation Visualising data via graphics is an important stage of data analysis Visualising data via graphics is an important stage of data analysis to gain valuable insights that we can not find by just scanning at the raw data in a paper or spreadsheet form. Visualising data via graphics is an important stage of data analysis to gain valuable insights that we can not find by just scanning at the raw data in a paper or spreadsheet form. to understand basic properties of the data. Visualising data via graphics is an important stage of data analysis to gain valuable insights that we can not find by just scanning at the raw data in a paper or spreadsheet form. to understand basic properties of the data. to suggest possible modelling strategies. Data Visualisation Visualising data via graphics is an important stage of data analysis to gain valuable insights that we can not find by just scanning at the raw data in a paper or spreadsheet form. to understand basic properties of the data. to suggest possible modelling strategies. to see and understand trends, outliers, ..etc. Visualising data via graphics is an important stage of data analysis to gain valuable insights that we can not find by just scanning at the raw data in a paper or spreadsheet form. to understand basic properties of the data. to suggest possible modelling strategies. to see and understand trends, outliers, ..etc. to ”debug” an analysis, if an unexpected result occurs, or to communicate your findings to others. Big Data and Data Visualisation Big Data and Data Visualisation In Big Data era, data visualisation methods and technologies are essential to analyse massive amounts of information and make data-driven decisions. Big Data and Data Visualisation In Big Data era, data visualisation methods and technologies are essential to analyse massive amounts of information and make data-driven decisions. Tables can be used where users need to see the pattern of a specific parameter, while charts can be used to show patterns or relationships in the data for one or more parameters. Big Data and Data Visualisation In Big Data era, data visualisation methods and technologies are essential to analyse massive amounts of information and make data-driven decisions. Tables can be used where users need to see the pattern of a specific parameter, while charts can be used to show patterns or relationships in the data for one or more parameters. Advantages of Data Visualisation — Advantages of Data Visualisation — graphics and figures can be easily communicated and under- stood by readers. Advantages of Data Visualisation — graphics and figures can be easily communicated and under- stood by readers. it can be accessed quickly by a wider audience. Advantages of Data Visualisation — graphics and figures can be easily communicated and under- stood by readers. it can be accessed quickly by a wider audience. it conveys a lot of information in a small space. Advantages of Data Visualisation — graphics and figures can be easily communicated and under- stood by readers. it can be accessed quickly by a wider audience. it conveys a lot of information in a small space. it makes your report more visually appealing. Advantages of Data Visualisation — graphics and figures can be easily communicated and under- stood by readers. it can be accessed quickly by a wider audience. it conveys a lot of information in a small space. it makes your report more visually appealing. it can convert raw data into insights. Advantages of Data Visualisation — graphics and figures can be easily communicated and under- stood by readers. it can be accessed quickly by a wider audience. it conveys a lot of information in a small space. it makes your report more visually appealing. it can convert raw data into insights. it can help to find simple and complex patterns in data Data visualisation charts can be used for four basic presentation types: Data Visualisation Data visualisation charts can be used for four basic presentation types: Comparison Data visualisation charts can be used for four basic presentation types: Comparison Distribution Data visualisation charts can be used for four basic presentation types: Comparison Distribution Relationship Data visualisation charts can be used for four basic presentation types: Comparison Distribution Relationship Composition Data visualisation charts can be used for four basic presentation types: Comparison Distribution Relationship Composition Following are the most common graph charts in data visualisation: Data visualisation charts can be used for four basic presentation types: Comparison Distribution Relationship Composition Following are the most common graph charts in data visualisation: Bar Chart Line Chart Scatterplot Pie Chart Histogram Box Plot Heat Map Gauge Maps Indicators Tables Tree Map Funnel Chart Area Chart Radar or Spider — Common Graph Types — — Common Graph Types — — Common Graph Types — The x-axis represents the regular interval and y-axis shows the ob- servations, ordered by the x-axis and connected by a line. — Common Graph Types — — Common Graph Types — — Common Graph Types — Figure: Hourly temp for Jan 1-15 — Common Graph Types — — Common Graph Types — — Common Graph Types — x-axis represents the categories and are spaced evenly. y-axis represents the quantity for each category and is drawn as a bar from the baseline to the appropriate level on the y-axis. — Common Graph Types — — Common Graph Types — — Common Graph Types — 20000 15000 10000 5000 0 Fair Good Very Good Premium Ideal cut — Common Graph Types — — Common Graph Types — — Common Graph Types — — Common Graph Types — — Common Graph Types — — Common Graph Types — x-axis represents discrete bins or intervals for the observations. y-axis represents the frequency or count of the number of observations in the dataset that belong to each bin. — Common Graph Types — — Common Graph Types — — Common Graph Types — — Common Graph Types — — Common Graph Types — — Common Graph Types — x-axis can be used to represent the data sample, where multiple box plots can be drawn side by side on the x-axis if desired. y-axis represents the observation values. — Common Graph Types — — Common Graph Types — — Common Graph Types — Figure: Month by temp boxplot — Common Graph Types — — Common Graph Types — — Common Graph Types — x-axis represents observation values for the first sample. y-axis represents the observation values for the second sample. — Common Graph Types — — Common Graph Types — Common Graph Types — Figure: Arrival Delays vs Departure Delays Common Graph Types — Common Graph Types — Common Graph Types — Common Graph Types — Common Graph Types — Common Graph Types — Common Graph Types — Common Graph Types — Common Graph Types — Common Graph Types — Common Graph Types — Data Visualisation Common Graph Types — CSE5DEV Syllabus Week-Overview Data Visualisation Examples of Data Visualisation In this lecture, we will learn In this lecture, we will learn How to use R to visualise single or several variables. In this lecture, we will learn How to use R to visualise single or several variables. How to use R ggplot2 plotting package to create simple and complex plots from data in a data frame structure. In this lecture, we will learn How to use R to visualise single or several variables. How to use R ggplot2 plotting package to create simple and complex plots from data in a data frame structure. To this end, its assumed that you KNOW how to In this lecture, we will learn How to use R to visualise single or several variables. How to use R ggplot2 plotting package to create simple and complex plots from data in a data frame structure. To this end, its assumed that you KNOW how to import data In this lecture, we will learn How to use R to visualise single or several variables. How to use R ggplot2 plotting package to create simple and complex plots from data in a data frame structure. To this end, its assumed that you KNOW how to import data organise data In this lecture, we will learn How to use R to visualise single or several variables. How to use R ggplot2 plotting package to create simple and complex plots from data in a data frame structure. To this end, its assumed that you KNOW how to import data organise data clean data In this lecture, we will learn How to use R to visualise single or several variables. How to use R ggplot2 plotting package to create simple and complex plots from data in a data frame structure. To this end, its assumed that you KNOW how to import data organise data clean data normalise data Recall .... Data variable values can be: Numeric: Discrete - integer values. Continuous - any value in a pre-defined range (float, double). Categorical: values are selected from a predefined number of categories. Ordinal - categories could be meaningfully ordered. Nominal - don’t have any order. Binary - the special case of nominal, with only 2 possible categories. Date: datetime, timestamp. Text: Multidimensional data Time series: Data points indexed in the time order Recall... R - Factors Factors are the data objects which are used to categorise the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like Male, Female and True, False etc. Factors are created using the factor () function by taking a vector as input. R - Factors To visualise the variable of the given data, we need to To visualise the variable of the given data, we need to Step 1: Import data into Rstudio. To visualise the variable of the given data, we need to Step 1: Import data into Rstudio. Step 2: Install and Load ggplot2 packages. To visualise the variable of the given data, we need to Step 1: Import data into Rstudio. Step 2: Install and Load ggplot2 packages. Step 3: Use ggplot2 to plot data variables. Step 1: Import data into Rstudio Step 1: Import data into Rstudio In this lecture, we will use the following Data-Set to create graphs and figures. Step 1: Import data into Rstudio In this lecture, we will use the following Data-Set to create graphs and figures. Salary: salary attributes: Age, Education, Experience, and other attributes. Step 1: Import data into Rstudio In this lecture, we will use the following Data-Set to create graphs and figures. Salary: salary attributes: Age, Education, Experience, and other attributes. data w: weather dataset. Step 1: Import data into Rstudio In this lecture, we will use the following Data-Set to create graphs and figures. Salary: salary attributes: Age, Education, Experience, and other attributes. data w: weather dataset. Iris: flower data set: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species. Step 1: Import data into Rstudio In this lecture, we will use the following Data-Set to create graphs and figures. Salary: salary attributes: Age, Education, Experience, and other attributes. data w: weather dataset. Iris: flower data set: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species. diamonds: diamonds attributes: Cut, Colour, Clarity, Price, and other attributes. Step 1: Import data into Rstudio In this lecture, we will use the following Data-Set to create graphs and figures. Salary: salary attributes: Age, Education, Experience, and other attributes. data w: weather dataset. Iris: flower data set: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species. diamonds: diamonds attributes: Cut, Colour, Clarity, Price, and other attributes. cars: auto-mobile attributes: Fuel Consumption, Design Per- formance , and other attributes. Step 1: Import data into Rstudio In this lecture, we will use the following Data-Set to create graphs and figures. Salary: salary attributes: Age, Education, Experience, and other attributes. data w: weather dataset. Iris: flower data set: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species. diamonds: diamonds attributes: Cut, Colour, Clarity, Price, and other attributes. cars: auto-mobile attributes: Fuel Consumption, Design Per- formance , and other attributes. Step 1: Import data into Rstudio Reading Data from CSV Files Read the data from data1.csv, which includes a header row. save the data into dat. By default dat will be data frame dat <- read.csv("data_name.csv", header=TRUE) The read.csv() function creates a data frame from the data in the .csvfile. If we pass header=TRUE, then the function uses the very first row to name the variables in the resulting data frame. Step 1: Import data into Rstudio Verify the results use name () function the print the name of columns Step 2: Install and Load ggplot2 packages Step 2: Install and Load ggplot2 packages In this subject, we will use ggplot2 package to visualise our data . Step 2: Install and Load ggplot2 packages In this subject, we will use ggplot2 package to visualise our data . ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. Step 2: Install and Load ggplot2 packages In this subject, we will use ggplot2 package to visualise our data . ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details Step 2: Install and Load ggplot2 packages In this subject, we will use ggplot2 package to visualise our data . ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details We can Install and Load ggplot2, as follows: Step 2: Install and Load ggplot2 packages In this subject, we will use ggplot2 package to visualise our data . ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details We can Install and Load ggplot2, as follows: Install ggplot2 package only one time. Step 2: Install and Load ggplot2 packages In this subject, we will use ggplot2 package to visualise our data . ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details We can Install and Load ggplot2, as follows: Install ggplot2 package only one time. install.packages("ggplot2") Step 2: Install and Load ggplot2 packages In this subject, we will use ggplot2 package to visualise our data . ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details We can Install and Load ggplot2, as follows: Install ggplot2 package only one time. install.packages("ggplot2") Load ggplot2: Step 2: Install and Load ggplot2 packages In this subject, we will use ggplot2 package to visualise our data . ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details We can Install and Load ggplot2, as follows: Install ggplot2 package only one time. install.packages("ggplot2") Load ggplot2: library("ggplot2") Step 2: Install and Load ggplot2 packages Note: we can install all important packages in one command as follows install.packages("pkgs") Step 3: Use ggplot2 to plot data variables Step 3: Use ggplot2 to plot data variables ggplot2 uses various grammars to create graphics. The grammars specify plot building blocks, their types and other features. Step 3: Use ggplot2 to plot data variables ggplot2 uses various grammars to create graphics. The grammars specify plot building blocks, their types and other features. Step 3: Use ggplot2 to plot data variables ggplot2 uses various grammars to create graphics. The grammars specify plot building blocks, their types and other features. Data: the input data. It should be a data frame. Aesthetic mapping (aes): the mapping of the variables to visual graph. Geometric object: points, lines, bars, etc. Color: controls the point colour. size: controls the point size. alpha: controls the point transparency. Transparency ranges from 0 (completely transparent) to 1 (completely opaque). Adding a degree of transparency can help visualise overlapping points. Step 3: Use ggplot2 to plot data variables ggplot2 - other functions • + : Add layers, scales, coords and facets ggsave (filename,plot = last plot(),device = NULL, path = NULL,scale = 1,width = NA,height = NA, units = c(”in”, ”cm”, ”mm”), dpi = 300, limitsize = TRUE,...): save a plot to disk. Step 3: Use ggplot2 to plot data variables Step 3: Use ggplot2 to plot data variables Examples of Geometric objects are Step 3: Use ggplot2 to plot data variables Examples of Geometric objects are geom bar(): Bar chart geom point(): Scatterplot geom line(): Line diagram, connecting observations in order by x-value geom boxplot: Box-and-whisker plot geom path: Line diagram, connecting observations in original order geom smooth: Add a smoothed conditioned mean geom histogram: Histogram ggplot2: step by step example. Step 1 # load package library(ggplot2) # read salary data and saved it in dat dat <- read.csv("salary_data.csv", header=TRUE) # call ggplot, specify dataset, and mapping ggplot(data = dat, mapping = aes(x = exper, y = wage)) ggplot2: step by step example. Step 1: Plot output 40 30 20 10 0 0 20 40 exper ggplot2: step by step example. Step 2 : add geoms. Geoms are the geometric objects (points, lines, bars, etc.) that can be placed on a graph. We can add geoms using geom objects . In this example, we will add points using the geom point function to create a scatterplot. ggplot2: step by step example. Step 2: plot output 40 30 20 10 0 0 20 40 exper ggplot2: step by step example. Step 3: change point colour and shape . In this example we will change points colour into blue, make them larger, and semi- transparent ggplot2: step by step example. Step 3: plot output 40 30 20 10 0 0 20 40 exper ggplot2: step by step example. Step 4: add a line of best fit. We can add best fit line using geom smooth function. The line can be linear, quadratic, non- parametric. We can also control the thickness of the line, line’s colour and show the confidence interval. In this example, we use a linear regression line as follows: (method = lm) (where lm stands for linear model). ggplot2: step by step example. Step 4: plot output 40 30 20 10 0 0 20 40 exper ggplot2: step by step example. Step 5: grouping. In grouping, we map variables into colour, shape, size, transparency, and other visual characteristics of geo- metric objects. In this example, we will add gender to the plot and represent it by colour. ggplot2: step by step example. Step 5: plot output 40 30 sex F M 20 10 0 0 20 40 exper ggplot2: step by step example. Step 6: scales. Scale function (scale) is used to control variable ranges. In this example, we will change the x and y axis scaling, and the colours. ggplot2: step by step example. Step 6: plot output $30 $25 $20 sex F M $15 $10 $5 $0 0 10 20 30 40 50 exper ggplot2: step by step example. Step 7: facets. Facets generate a graph for each variable or a set of variables. We use facet (˜tilde) function to create several graphs based on variable values. In this example, facets will be defined by the eight levels of the sector variable. ggplot2: step by step example. Step 7: plot output $25 $20 $15 $10 $5 $0 $25 $20 $15 $10 $5 sex F M $0 $25 $20 $15 $10 $5 $0 0 10 20 30 40 50 0 10 20 30 40 50 exper ggplot2: step by step example. Step 8: labels. Labels make graphs easy to interpret and very informative. In this example, we will use labs function to add labels for the axes and legends a well as title, subtitle, and caption. # add informative labels ggplot(data = dat,mapping = aes(x = exper, y = wage,color = sex)) + geom_point(alpha = .7) + geom_smooth(method = "lm", se = FALSE) + scale_x_continuous(breaks = seq(0, 60, 10)) + scale_y_continuous(breaks = seq(0, 30, 5), label = scales::dollar) + scale_color_manual(values = c("indianred3", "cornflowerblue")) + facet_wrap(~sector) + labs(title = "Relationship between wages and experience", subtitle = "Current Population Survey", caption = "source: http://mosaic-web.org/", x = " Years of Experience", y = "Hourly Wage", color = "Gender") ggplot2: step by step example. Step 8: plot output Relationship between wages and experience Current Population Survey $25 $20 $15 $10 $5 $0 $25 $20 $15 $10 $5 Gender F M $0 $25 $20 $15 $10 $5 $0 0 10 20 30 40 50 0 10 20 30 40 50 Years of Experience source: http://mosaic−web.org/ ggplot2: step by step example. Step 9: themes. We can add themes to change the graph pre- tension. In this example, we will use theme function to control background colors, fonts, grid-lines, legend placement, and other non-data related features of the graph. # use a minimalist theme ggplot(data = dat,mapping = aes(x = exper, y = wage,color = sex)) + geom_point(alpha = .6) + geom_smooth(method = "lm",se = FALSE) + scale_x_continuous(breaks = seq(0, 60, 10)) + scale_y_continuous(breaks = seq(0, 30, 5), label = scales::dollar) + scale_color_manual(values = c("indianred3", "cornflowerblue")) +facet_wrap(~sector) + labs(title = "Relationship between wages and experience", subtitle = "Current Population Survey", caption = "source: http://mosaic-web.org/", x = " Years of Experience", y = "Hourly Wage", color = "Gender") + theme_minimal() ggplot2: step by step example. Step 9: plot output Relationship between wages and experience Current Population Survey clerical const manag $25 $20 $15 $10 $5 $0 $25 $20 $15 $10 $5 manuf other prof Gender F M $0 $25 $20 $15 $10 $5 $0 sales service 0 10 20 30 40 50 0 10 20 30 40 50 Years of Experience 0 10 20 30 40 50 source: http://mosaic−web.org/ ggplot2: examples of plot for Diamonds dataset. Diamonds dataset: Scatter plot ggplot2 Diamonds dataset: plot output 20000 15000 10000 5000 0 0 1 2 3 4 5 carat cut Fair Good Very Good Premium Ideal ggplot2 Diamonds dataset: bar chart ggplot(diamonds, aes(cut)) + geom_bar(fill = "#0073C2FF") 20000 15000 10000 5000 0 Fair Good Very Good Premium Ideal cut ggplot2 Diamonds dataset: bar chart 20000 15000 10000 5000 0 Fair Good Very Good Premium Ideal cut ggplot2 Diamonds dataset: histogram chart ggplot(diamonds, aes(x=price)) + geom_histogram() 10000 5000 0 0 5000 10000 15000 20000 price ggplot2 Diamonds dataset: box plot ggplot(diamonds, aes(x=color, y=price)) + geom_boxplot() 15000 10000 5000 0 D E F G H I J color ggplot2: example of plots for Iris dataset Iris dataset: scatter plot ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point() + geom_smooth() 4.5 4.0 3.5 3.0 2.5 2.0 5 6 7 8 Sepal.Length Species setosa versicolor virginica ggplot2 Iris dataset: scatter plot ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point() + geom_line() 4.5 4.0 3.5 3.0 2.5 2.0 5 6 7 8 Sepal.Length Species setosa versicolor virginica ggplot2 Iris dataset: scatter plot ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_smooth(aes(linetype = Species)) + geom_point(aes(size = Species, shape = Species)) 4.5 4.0 3.5 3.0 2.5 2.0 5 6 7 8 Sepal.Length Species setosa versicolor virginica ggplot2 Iris dataset: scatter plot ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point(aes(shape = Species), size = 3) + scale_shape_manual(values = c(16, 17, 18)) + scale_color_manual(values = c("purple","black","orange")) 4.5 4.0 3.5 3.0 2.5 2.0 5 6 7 8 Sepal.Length Species setosa versicolor virginica ggplot2 Iris dataset. ggplot2 Iris dataset. ggplot2 Iris dataset. ggplot2 Iris dataset. ggplot2 Population dataset. 300 200 100 0 Population by age 1900 to 2002 1900 1925 1950 1975 2000 Year Age Group >64 55−64 45−54 35−44 25−34 15−24 5−14 <5 source: Census Bureau, 2003, HS−3 In next weeks, we will learning How to choose the right chart based on data type. How to answer analysis questions using chart and figures. How to visualise summarised data. End of Week 5 See you Next Lecture (Week 6) Data Exploration Table: CSE5DEV Timetable Activity Class Type Day Start End Duration Campus CSE5DEV-LT01 Lecture Monday 11:00am 1:00pm 2:00 On-Line CSE5DEV-CL Computer Lab Monday 2:00pm 4:00pm 2:00 On-Line CSE5DEV-CL Computer Lab Tuesday 2:00pm 4:00pm 2:00 On-Line