week01.pdf
Document Details
Uploaded by GenerousChrysoprase
La Trobe University
Tags
Full Transcript
October 8, 2023 Outline Section 1: General Information Section 2: Introduction Section 3: Basics of R Programming Lecturer info ▶ Name: Dr Kiki Adhinugraha ▶ Email: [email protected] ▶ Website: https://scholars.latrobe.edu.au/kadhinugraha ▶ Consultation Time: Wed 01:00 PM-02:00 PM,...
October 8, 2023 Outline Section 1: General Information Section 2: Introduction Section 3: Basics of R Programming Lecturer info ▶ Name: Dr Kiki Adhinugraha ▶ Email: [email protected] ▶ Website: https://scholars.latrobe.edu.au/kadhinugraha ▶ Consultation Time: Wed 01:00 PM-02:00 PM, PS1-215A, by appointment only ▶ Research interests: Spatial Data Science Subject materials ▶ Subject Homepage: Go to LMS: https://lms. latrobe.edu.au/course/view.php?id=135911 ▶ Lecture slides: All slides will be available in Subject Homepage: LMS. ▶ Recommended textbooks ▶ Hastie et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2009 Springer. ▶ Downey Think Stats: Exploratory Data Analysis. 2011, Amazon. ▶ Fischetti Data Analysis with R. 2015, Packt Publishing. ▶ Other materials: I may point you to some other supporting materials through LMS - News & Announcements Forum. Everything you are required to know is covered by the subject slides Timetable ▶ Lecture ▶ Thursday - 9:00 AM to 11:00 AM - DMC-01-C121. ▶ Labs ▶ Lab-1: Thursday - 1:00 PM to 3:00 PM - BG-106. ▶ Lab-2: Thursday - 3:00 PM to 5:00 PM - BG-106. ▶ Lab-3: Friday - 9:00 AM to 11:00 AM - BG-104. Prerequisites: ▶ CSE4DBF: DATABASE FUNDAMENTALS. ▶ MAT4NLA: NUMBER SYSTEMS AND LINEAR ALGEBRA. ▶ Or any programming subjects Lab coding: This subject uses R Programming language. ▶ This subject is not about programming, but we do need a language for the practical parts. ▶ R: At a minimum, you should be comfortable with basic data types and structures, reading data, writing output, loading packages, visualisation, implementing algorithms. ▶ Note: I will cover the basics of R in Lecture 1, Lecture 2 and Lecture 3. ▶ Warning: The assignments and labs are all in R, so you will struggle if you are not comfortable with R. Maths :) This subject has some maths in it. We need to understand certain algorithms and data analysis tools. P ▶ Equations (summations, recursions). ni=0 2. ▶ Arithmetric and partial sum simplifications. Pn i=0 2 = 2n + 2. If you are uncomfortable with above, it is advisable that you review your maths subjects. Attending lectures: ▶ Everything you are required to know is covered by the subject slides. ▶ BUT the subject slides are designed assuming that you are attending the weekly lectures. ▶ During the lectures, you can ask questions at any time. ▶ If you can not attend, ▶ All lectures will be recorded and are available to watch in the Subject Homepage: LMS. ▶ You can post your questions in the LMS subject Q & A Forum. Attending labs ▶ The focus in all labs is on application rather than on the theory. ▶ Lab consolidate your understanding, and help you translate your knowledge into a running code. ▶ Practising and coding are crucial: Implementing all lab tasks. ▶ Please finish the lab tasks for better learning. Remember that computer skills need a systematic approach. Therefore, every week’s learning is built on the previous week’s learning. Assessments: Assignments This subject require you to complete and submit 2 Assignments: ▶ Assignment 1 - worth 15% of your final mark. It covers Week 1 to Week 6. ▶ Assignment 2 - worth 25% of your final mark. It covers Week 4 to Week 10. Both assignments will be a written report in R programming. Assessments: Exam The final exam will be online. ▶ One 2-hour examination- worth 60% - Semester 2 exam period. ▶ Examinable materials ▶ Everything we cover in the lectures and labs is examinable unless explicitly stated otherwise. ▶ Requirements for passing: To obtain a pass, you must: ▶ Accumulate at least 50% over all forms of assessments. Sources of help (We are here to help!) ▶ See me if you are having difficulties which may prevent you from completing the subject as soon as possible. ▶ Ask lab demonstrators! ▶ Weekly consultations times. ▶ Discussion forum and each other! ▶ Extensions: ▶ Please do not wait until the day before it is due. ▶ Extensions are only granted in exceptional circumstances and require the official process of “Special Consideration” to be followed. Academic Misconduct ▶ Please read the University Plagiarism Statement in the subject guide very carefully. ▶ In short, cheating, whether by fabrication, falsification of data, or representing the work of someone else as your own is an offence subject to University disciplinary procedures. ▶ Plagiarism may result in charges of academic misconduct which carry a range of penalties including cancellation of results and exclusion from the subject. ▶ Exact penalties are decided in formal plagiarism hearings. ▶ All assignment, weekly quizzes and exam must be done individually. Student feedback on subject survey The Student Feedback on Subjects (SFS) Survey is part of the quality assurance process that occurs across the university. In this survey you are invited to tell us about your learning experiences in this subject. We want you to tell us of your experience in this subject. Your views will be taken seriously and will assist us to enhance this subject for the next group of students. Your feedback will also contribute to the text for ”Summary of Previous Student Feedback” below so please take the time to tell us your views. The surveys are anonymous and will be distributed prior to the end of the teaching period. CSE5DEV LMS: https: //lms.latrobe.edu.au/course/view.php?id=106176 CSE5DEV LMS: https: //lms.latrobe.edu.au/course/view.php?id=106176 CSE5DEV content overlap: There are two or more subjects that might have a very minor overlap with CSE5DEV. These are: CSE5DMI and CSE5ML. CSE5DEV CSE5DM CSE5ML ▶ To find out whether your subject has any overlap with another subject(s), please check Subject Learning Guide (SGL) and Subject Website. ▶ If you find that CSE5DEV overlap with any elective subjects, you should NOT enrol in elective subject. General Subject Goals Key Goal The goal of this subject is to equip graduate students with indepth practical knowledge, and solid understanding of the latest data exploration techniques and tools in order to find practical solutions to real-world problems The goals of CSE5DEV are: ▶ Theory: ▶ Understand the basics of data types and notations. ▶ Understand data exploration and analysis steps. ▶ Practice: ▶ Learn to implement visualisation techniques in the context of data exploration and analysis. ▶ Learn to implement various tools and techniques to solve a variety of problems. Data exploration and analysis process steps Process steps In data exploration and analysis, we often need to execute different steps to achieve our gaol. We need to know the right data to draw accurate conclusions and inform decision maker. The process steps are also known as Problem Solving. Problem solving Problem solving is the process of identifying a problem, developing possible solution, and perform the appropriate action(s). Data exploration and analysis process steps Problem solving can be summarised as follows: ▶ ▶ ▶ ▶ ▶ ▶ What is the question(s)? Design a solution method. Implement the solution method. Testing. User evaluation. Refinement. What Is the Question? Question We often start with high-level questions. For example, ▶ How to track houses prices across different areas? ▶ How to track customers behaviour in different groups? ▶ What is going to be the fuel price in the next month? Understanding the objectives and requirements are very crucial to a successful data exploration project. In order to answer the above question(s), we need to: ▶ ▶ ▶ ▶ understand data format. understand the structure and size of the data. know which variables suggest interesting relationships. know which observations are usual and unusual. Design, implement and communicate In data exploration and analysis subject you will learn: ▶ How to format and organise data. ▶ How to clean and normalise data. ▶ How to use statistical techniques for the exploratory analysis of data. ▶ How to use visualisation tools to begin uncovering the structure of your data. ▶ How to implement various tools and techniques in R programming language. ▶ How to communicate your results using R programming language. Subject Syllabus Lecture 1 Introduction Lecture 2 Data Collection & R Programming Lecture 3 Data Wrangling & R Programming Lecture 4 Data Cleaning & Normalisation Lecture 5 Data Visualisation Lecture 6 Data Exploration 1 Lecture 7 Data Exploration 2 Lecture 8 Data Exploration 3 Lecture 9 Correlation & Pattern Discovery Analysis Analysis Analysis Analysis Lecture 10 Case Study 1 Lecture 11 Case Study 2 Lecture 12 Revision Data Science Project Almost all data science and analysis projects require the same set of stages to be performed. These are: Stage -1 Identify the problem (question) Stage - 2 Collect & Prepare the data Stage - 3 Explore the data Stage - 4 Communicate the results What is the goal? What do you want to estimate? How to track houses prices across different areas? Data resources Descriptive statistics What are the findings? Data representation Visualisation What we learn? Report the findings Does the result make sense? Clean and normalise the data Data can be explored using either Manual tools, Automation tools or both. ▶ Manual tools: Excel, Notepad, MS-Word. ▶ Automation tools: Programming Language, e.g, R. ▶ Hybrid: manual tools and automation tools. Data Explore Manual Automation Excel Programming Users vs. Programmers ▶ Users see computers as a set of tools - word processor, email, excel, note, etc. ▶ Programmers learn computer languages to write Program. ▶ Programmers use some tools that allow them to build new tools. ▶ Programmers often build tools for lots of users and/or for themselves. What is a program? Program Program is a set of actions (or rules) to accomplish a specific task. Program Development Cycle The process of creating a program that works correctly typically involves Five Phases known as the program development cycle. Design the program steps Write R code Correct syntax errors Run the program Correct logic/output errors What is a programming language? Programming language A programming language comprises a set of instructions to produce various kinds of output. Programming languages are used in computer programming to implement algorithms. Examples of computer programming languages are: ▶ R ▶ C, C++ ▶ JAVA ▶ Python What is R? R R is a high-level programming language that uses a set of instructions or rules for instructing a computer to perform specific tasks. R Features ▶ R is a free software environment for statistical computing and graphics. ▶ Can be easily extended with 15,000+ packages available on CRAN2 (as of Jun 2019). ▶ Many other packages provided in Bioconductor, R-Forge, GitHub, etc. ▶ Many R manuals and books are available in CRAN ▶ An Introduction to R ▶ The R Language Definition ▶ ... Why we use R? ▶ R is easy to understand and implement . ▶ R is widely used in both academia and industry. ▶ R was ranked #1 in the KDnuggets 2016 poll on Top Analytics and Data Science software (actually R has been #1 in a row from 2011 to 2019!). ▶ The CRAN Task Views provide collections of packages for different tasks ▶ ▶ ▶ ▶ ▶ Machine learning & Deep learning Statistical learning Visualisation Optimisation ... R for CSE5DEV To use R in CSE5DEV Labs, we need to do the following steps: ▶ Step-1: We need to install R Programming language. ▶ Step-2: We need to install RStudio - an integrated development environments (IDE) for writing R codes. R for CSE5DEV Before you can try to write any R programs, you need to ▶ Make sure that R is installed on your computer and properly configured. ▶ If you are working in a Uni computer lab, this has been done already. ▶ If you are using your own computer, you can follow the instructions in next slides to install R from Internet. Step 1- downnlaod and install R https://cran.r-project.org/ Basics of R Programming Step 2- downnlaod and install RStudio https://rstudio.com/products/rstudio/download/ How RStudio and R work? CSE5DEV Student Write RStudio Interface R code Run R in the background R programming software PC Monitor Output Computer - Note: you ONLY need to run and write your code in RStudio Interface. RStudio Interface RStudio Interface RStudio Interface RStudio Interface RStudio Interface RStudio Interface RStudio Interface RStudio Interface RStudio Interface RStudio Interface RStudio Interface RStudio Interface RStudio Interface R - Elementary arithmetic operators R - Numeric functions R- Special values R- Data Types Design the program steps Write R code Correct syntax errors Run the program Correct logic/output errors Program Design Steps ▶ The process of designing a program is known as the most important part of the Program Development Cycle. ▶ Define the task(s) that the program is to perform. ▶ Determine the steps that need to be implemented to perform the task(s). ▶ There are several ways to design a program such as Pseudocode and Flowcharts Flowchart What is a Flowchart? Flowchart is a diagram that graphically describes the steps that take place in a program. ▶ It shows steps in sequential order. ▶ It shows the steps as boxes of various kinds, and their order by connecting them using arrows. Flowchart- example Q: Calculate the average of two numbers: num1 and num2. Flowchart- example Q: Calculate the sum of two numbers: A and B. Flowchart- example Q: A school timetable. Input, Processing, and Output Computer programs A computer program usually perform the following three steps: ▶ Take Input(s) ▶ Some Process is performed on the Input(s). ▶ Produce Output(s). Input, Processing, and Output- example Example: Calculate the average of two numbers: num1 and num2. Input Process Output Num1 Num2 Average= (num1+num2)/2 Average Input, Processing, and Output- example R: Calculate the average of two numbers: num1 and num2. Input, Processing, and Output- example R: Calculate the Sum of two numbers: A and B. Do’s and Don’ts ▶ Please attend lectures and labs regularly. ▶ Please do ask questions whenever you have any doubt. Again, participation is very important to understand this subject well. ▶ Practice makes perfect. So, whenever you are given an exercise, please try and practice it. ▶ Come to lectures and labs on time. ▶ Avoid using mobile phones during lecture and labs. ▶ If you wish to communicate with me, please use your La Trobe Email (this is due to privacy act). End of Week 1 See you Next Lecture (Week 2) Data Collection & R Programming Table: CSE5DEV Timetable Check LMS