Python for Data Science Lecture Notes PDF
Document Details
Uploaded by Deleted User
Indian Institute of Technology, Madras
Ragunathan Rengasamy
Tags
Related
- An Introduction to Statistical Learning, With Applications in Python (ISLP) PDF
- Introduction to Machine Learning with Python (PDF)
- Introduction to Machine Learning with Python PDF
- Internship Report on Python For Data Science/ Iris Dataset PDF 2024-2025
- Python ML Tutorial PDF
- Ancita Dsouza CV - MSc Data Science, Python, Machine Learning
Summary
These lecture notes provide an introduction to Python for data science. The course covers fundamental programming concepts, data manipulation, analysis, and visualization techniques. The document also discusses data science principles and steps for solving problems using Python.
Full Transcript
INDEX S.NO TOPICS PAGE.NO Week 1 1 Introduction to Python for Data Science 3 2 Introduction to Python 12 3 Introduction to Spyder - Part 1...
INDEX S.NO TOPICS PAGE.NO Week 1 1 Introduction to Python for Data Science 3 2 Introduction to Python 12 3 Introduction to Spyder - Part 1 30 4 Introduction to Spyder - Part 2 41 5 Variables and Datatypes 58 6 Operators 71 Week 2 7 Jupyter setup 88 8 Sequence_data_part_1 91 9 Sequence_data_part_2 99 10 Sequence_data_part_3 107 11 Sequence_data_part_4 121 12 Numpy 142 Week 3 13 Reading data 157 14 Pandas Dataframes I 175 15 Pandas Dataframes II 191 16 Pandas Dataframes III 208 17 Control structures & Functions 219 18 Exploratory data analysis 240 19 Data Visualization-Part I 257 20 Data Visualization-Part II 271 21 Dealing with missing data 295 1 Week 4 22 Introduction to Classification Case Study 318 23 Case Study on Classification Part I 331 24 Case Study on Classification Part II 355 25 Introduction to Regression Case Study 369 26 Case Study on Regression Part I 379 27 Case Study on Regression Part II 399 28 Case Study on Regression Part III 427 Supporting material for Week 4 29 Module : Predictive Modelling 450 30 Linear Regression 466 31 Model Assessment 487 32 Diagnostics to Improve Linear Model Fit 503 33 Cross Validation 521 34 Classification 536 35 Logistic Regression 542 36 K - Nearest Neighbors (kNN) 556 37 K - means Clustering 569 38 Logistic Regression ( Continued ) 582 39 Decision Trees 591 40 Multiple Linear Regression 614 2 Python for Data Science Prof. Ragunathan Rengasamy Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 01 Why Python for Data Science? Welcome to this course on Python for Data Science. This is a 4-week course; we are going to teach you some very basic programming aspects in python. And since this is a course that is geared towards data science, towards the end of the course, based on what has been taught in the course, we will also show you two different case studies; one is what we call as a function approximation case study, another one a classification case study. And then tell you how to solve those case studies using the programming platform that you have learned. So, in this first introductory lecture, I am just going to talk about why we are looking at python for data science. (Refer Slide Time: 01:10) So, to look at that first, we are going to look at what data science is. This is something that you would have seen in other videos of courses in the NPTEL and other places. Data science is basically the science of analyzing raw data and deriving insights from this data. And you could use multiple techniques to derive insights; you could use simple statistical techniques to derive insights, you could use more complicated and more sophisticated machine learning techniques to derive insights and so on. 3 Nonetheless, the key focus of data science is actually deriving these insights using whatever techniques that you want to use. Now there is a lot of excitement about data science, and this excitement comes because it’s been shown that you can get very valuable insights, from large data and you can get insights about how different variables change together, how one variable affects another variable and so on with large data which is not very easy to simply see by very simple computation. So, you need to invest some time and energy into understanding how you could look at this data and derive these insights from data. And from a utilitarian viewpoint, if you look at data science in industries, if you do proper data science, it allows these industries to make better decisions. These decisions could be in multiple fields; for example, companies could make better purchasing decisions, better hiring decisions, better decisions in terms of how to operate their processes, and so on. So, when we talk about decisions, the decisions could be across multiple verticals in an industry. And data science is not only useful from an industrial perspective, but it is also useful in actual science as themselves. So, where you look at lots of data to model your system or test your hypotheses or theories about systems and so on. So, when we talk about data science, we start by assuming that we have a large amount of data for the problem of interest. And we are going to basically look at this data we are going to inspect the data; we are going to clean and curate the data then we will do some transformation of the data modeling and so on before we can derive insights that are valuable to the organization or to test a theory and so on. 4 (Refer Slide Time: 03:47) Now, coming to a more practical view of what we do once we have data. I have these four bullet points, which roughly tell you, supposing you were solving a data science problem, what are the steps you will do? So, you will start with just having data someone gives you data; and you are trying to derive insights from this data. So, the very first step is really to bring this data into your system. So, you have to read the data. So, the data comes into this programming platform so that you can use this data. Now data could be in multiple formats so you could have data in a simple excel sheet or some other format. So, we will teach you how to pull data into your programming platform from multiple data formats. So, that is the first step, really. If you think about how you are going to solve a problem, these steps would be first to simply read the data. And then, once you read the data many times, you have to do some processing with this data; you could have data that is not correct. For example, we all know that if you have your mobile numbers, there are 10 numbers in a mobile number, and if there is a column of mobile numbers and then say there is one row where there are just five numbers, then you know there is something wrong. So, this is a very simple check I am talking about in real data processing; this gets much more complicated. So, once you bring the data in when you try to process this data, you are going to get errors such as this. So, how do you remove such errors? How do you clean the data? It is one 5 activity that usually precedes doing you more useful stuff with the data. This is not the only issue that we look at there could be data that is missing. So, for example, there is a variable for which you get a value in multiple situations, but in some situations, the value is missing. So, what do you do with this data do you throw the record away? Or you do something to fill that data and so on. So, these are all data processing cleaning steps. So, in this course, we will tell you the tools that are available in python so that you can do this data processing cleaning and so on. Now what you have done at this point is you have been able to get the data into the system, you have been able to process and clean the data and get to a certain data file or data structure that is complete so that you think you can work with this data set at which point what you will do is you will try to summarize this data. And usually, summarization of this data, a very simple technique would be very very simple statistical measures that you will compute; you could, for example, compute a median, mode, mean of a particular column. So, those are simple ideas or summarizing the data you could compute variance and so on. So, we are going to teach you how to use these notions of statistical quantities that you can use to summarize the data. Once you summarize the data, then another activity that is usually taken up is what is called visualization. So, visualization means you look at this data and more pictorially to get insights about the data before you bring in heavy-duty algorithms to bear on this data. And this is a creative aspect of data science; the same data could be visualized by multiple people in multiple ways. And some visualizations are not only eye-catching but are also much more informative than other types of visualization. So, this notion of plotting this data so that some of the attributes or aspects of the data are made apparent is this notion of visualization. And there are tools in python that will teach you in terms of how you visualize this data. So, at this point, you have taken the data, you have cleaned the data, got a set of data points or data structure that you can work with, you have done some basic summary of this data that gives you some insights. You also looked at it more visually, and you have got some more insights, but when you have a large amount of big data, the last step is really deriving those insights which are not readily apparent either through visualization or through a simple summary of data. So, how do we then go and look at more sophisticated analytics or analysis of data so that these insights come out? And that is where machine learning comes, and as a part of this 6 course when you see the progress of this course, you will notice that you will go through all of this so that you are ready to look at data science problems in a structured format and then use python as a tool to solve some of these problems. (Refer Slide Time: 08:57) Now, why python for doing all of this? The number one reason is that there are these python libraries, which are already geared towards doing many of the things that we talked about so that it becomes easy for one to program, and very quickly, you can get some interesting outcomes out of whatever we are trying to do. So, there are, as we talked about in the previous slide, you need to do data manipulation and pre-processing. There are lots of functions libraries in python where you can do data wrangling manipulation and so on. From a data summary viewpoint, there are many of these statistical calculations that you want to do are already pre-programmed, and you have to simply invoke them with your data to be able to show data summary. The next step we talked about visualization, there are libraries in python, which can be used to do the visualization. And finally, for the more sophisticated analysis that we talked about all kinds of machine learning algorithms are already pre-coded available as libraries in python. So, again once you understand some bit about these functions and once you get comfortable working in python, then applying certain machine learning algorithms for these problems becomes trivial. So, you simply call these libraries and then run these algorithms. 7 (Refer Slide Time: 10:29) In the previous slide, we talked about the flow process for how I get the data in clean it and all the way up to insights, and then parallelly, we said why python makes it easy for us to do all of this. If you go back if you go forward a little more and then ask in terms of the other advantages of python, which are little more than just very simple data science activities. Python provides you several libraries, and it's continuously improved so, anytime there is a new algorithm that is coming into the set of libraries. So, in that sense, it's very varied, and there is also a good user community. So, if there are some issues with new libraries and so on and those are fixed so that you get a robust library to work with and we talk about data, and data can be of different scale. So, the examples that you will see in this course are data of reasonably small size, but in real-life problems, you are going to look at data that is much larger, which we call big data. So, python has an ability to integrate with big data frameworks like Hadoop spark and so on. And python also allows you to do more sophisticated programming object-oriented programming and functional programming. Python, with all of these sophisticated tools and abilities, is still reasonably a simple language to learn its reasonably fast to prototype. And it also gives you the ability to work with data which is in your local machine or in a cloud and so on. So, these are all things that one looks for when one looks at a programming platform that is capable of solving problems in real life right. 8 So, these are real problems that you can solve; these are not only toy examples, but real applications that you can build data science applications that you can build with python. (Refer Slide Time: 12:49) And just as another pointer in terms of why we believe that python is something that a lot of our students and professionals in India should learn. As you know, there are tools which are paid tools for machine learning with all of these libraries, and so on. And there are also open-source tools and in India, based on a survey, most people, of course, prefer open-source tools for a variety of reasons cause being one because its free to use. But also if it is just free to use, but it does not have a robust user community, then it's not really very useful; that is where python really scores in terms of a robust user community, which can help with people working in python. So, it is both open-source, and there is a robust user community, both of which are advantageous for python. 9 (Refer Slide Time: 13:48). And if you think of other competing languages for machine learning, if you look at this chart in India, about 44 percent of the people who were surveyed said they use python, or they prefer python. And of course, a close second is R. In fact, R was much more preferred a few years back, but over the last few years in India, a python is starting to become the programming platform of choice. So, in that sense, its a good language to learn because of the opportunities for jobs and so on or a lot more when you are comfortable with python as a language. So, with this, I will stop this brief introduction on why python for data science. I hope I have given you an idea of the fact that while we are going to teach you Python as a programming language, please keep in mind that each module that we teach in this is actually geared towards data science. So, as we teach python, we will make the connections to how you will use some of the things that you see in data science; and all of this, we will culminate with these two case studies that will bring all of these ideas together. In terms of both are giving you an idea and an understanding of how the data science problem will be solved and also how it will be solved in python, which is a program of choice currently in India. So, I hope this short four-week course helps you quickly get on to this programming platform. And then, learn data science, and then, you can enhance your skills with a much 10 more detailed understanding of both the programming language and data science techniques. Thank you. 11 Python for Data Science Prof. Ragunathan Rengasamy Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 02 Introduction to Python Welcome to the lecture of Introduction to Python. (Refer Slide Time: 00:17) In this lecture we are going to see what data science is in brief, we are also going to look at what are the commonly used tools for data science, we will also look at the history of python and followed by that will look at what an IDE means. 12 (Refer Slide Time: 00:33) So, we live in a world that is drowning with data, wherever you go wherever we are data is getting generated from various sources. Now, when you browse through few websites, the websites track every users click and this forms a part of web analytics. The other instance of where data gets generated is when you use a smartphone; you are basically building up a record of your location. So, all these information go and sit somewhere and get collected in the form of data. We also have sensors from electronic devices that record real time information and you also have e-commerce website that collect purchasing habits. So, whenever you log into any of these e-commerce sites, you will see some recommendations based on your previous purchase history or previous view history. 13 (Refer Slide Time: 01:19) So, now let us see what data science is all about. So, it is an interdisciplinary field that brings together computer science, statistics and mathematics to get useful inferences and insights from a data. Now these insights are very crucial from a business perspective because, it will help you in making better business decisions. (Refer Slide Time: 01:40) Now, currently there are many tools that are being used in data science; now these tools can be bucketed into 3 categories. The first category is where you are going to be looking at data preprocessing and analysis, now tools and software’s that fall under this category 14 are python, R, MS Excel, SAS and SPSS. Now all these tools are required for you to preprocess and analyze the data. So, apart from data preprocessing and analysis, there is a fair share of effort that is given for data exploration and visualization and these are done even before you analyze your data. Now, the commonly used data exploration and visualization tools are Tableau, Qlikview and of course, you always have your MS Excel. So, the next bucket that we are going to look into is when you have huge chunks of data, now when your collecting data on a real time basis you are going to be collecting data over every second every minute. Now if you want to store all these data and preprocesses it the regular desktop or computing systems that you have might not be useful. So, that is when you use parallel or distributed computing, where you distribute the work across different systems popular tools that are being used for big data Apache Spark and Apache Hadoop. So, in this course we are going to be mainly focusing on tools that are required for data preprocessing and analysis and in specific we are going to look into python. (Refer Slide Time: 03:08) 15 So, let us look at the evolution of python. So, python was developed by Guido van Rossum in the late eighties at the national research institute for mathematics and computer science and this institute is located at Netherlands. So, there are different versions of python, the first version that it was released was in 1991; the second version was released in 2000 and the third version was released in 2008 with version 3.7 being the latest. So, let us look at the advantages of using python. (Refer Slide Time: 03:41) So, python has features that make it well suited for data science. So, let us look at what these features are. So, the first and foremost feature of python is that it is an open source tool and python community provides immense support and development to its users. So, python was developed under the open source initiative approved license thereby making it free to use and distribute even if its for commercial purposes. 16 (Refer Slide Time: 04:05) The next feature is that the syntax that python use fairly simple to understand and code and this breaks all kinds of programming barriers if you are going to switch to a newer programming language. So, the next important advantage of using python is that, the libraries which are contained in python get installed at the time of installation and these libraries are designed keeping in mind specific data science task and activities. Python also integrates well with most of the cloud platform service providers; and this is a huge advantage if you are looking to use big data. So, if you are going to download python from the website and install it, you will see that most of the scripting is done in shell. So, there are applications that provide better graphical user interfaced for the end users and these are taken care by the integrated development environment. 17 (Refer Slide Time: 04:57) So, now, let us see what an integrated development environment is, an IDE as how its abbreviated is a software application and it consists of tools which are required for development. All these tools are consolidated and brought together under one roof inside the application. IDEs are also designed to simplify the software development this is very useful because as an end user, if you are not a developer you might want all the tools available at a single click. Using an IDE will be very beneficial in that case also the features provided by IDEs include tools for managing, compiling, deploying and debugging a software. So, these also form the core features of any IDEs. (Refer Slide Time: 05:44) 18 So, now let us look at what are the features of an IDE in depth. So, any IDE should consist of three important features; the first is the source code or text editor, the second is a compiler and the third is a debugger. Now all these three features form the crux of any software development. The IDEs can also have additional features like syntax and error highlighting, code completion and version control. (Refer Slide Time: 06:09) So, let us see what are the commonly used IDEs for python, the most frequently used as spyder, PyCharm, Jupyter Notebook and Atom. And these are basically from the endpoint of the user, depending on what he or she is comfortable with. 19 (Refer Slide Time: 06:24) So, now let us look at spyder, the spyder is an IDE and it supported across Linux, Mac and Windows platforms. It is also an open source software and it is bundled up with Anaconda distribution which comes up with all inbuilt python libraries. So, if you want to work with spyder you do not have to install any of the libraries. So, all the necessary libraries are taken care by Anaconda. So, another important feature of spyder is that it was specifically developed for data science and it was developed in python and for python. (Refer Slide Time: 06:57) 20 So, this is how the interface of spyder looks, you have the scripting window and you have other console output here, you have a variable explorer here. All these features we are going to be looking at in the next few lectures to come. (Refer Slide Time: 07:11) The other features of spyder includes a code editor, with robust syntax error highlighting features; it also helps in code completion and navigation it consist of a debugger, it also consist of an integrated documents that can be viewed within the python interface on the web. Another advantage of using spyder is that it has a interface which is very similar to MATLAB and RStudio’s. So, if you are a person who is already work with these two programming languages and are looking to switch to python, then the transition is also going to be seamless. 21 (Refer Slide Time: 07:44) So, now let us look at the second IDE which is pyCharm. So, pyCharm is also supported across all OS systems which is Linux, Mac and windows. It has two versions to it one is the community version which is an open source software; the other is the professional version which is a paid software. So, pyCharm supports only python and it is bundled up and packaged with Anaconda distribution which comes with all the inbuilt python libraries. But; however, if you want to install pyCharm separately then that can also be done. (Refer Slide Time: 08:14) 22 So, this is how the interface of pyCharm looks, you have a very very well define structure for naming your directories and you have the scripting window here. (Refer Slide Time: 08:25) So, let us look at some of the features that pyCharm consists of. The first is that it consists of a code editor which provides syntax and error highlighting; then it consists of a code completion and navigation feature it also consists of a unit testing tool which will help the compiler go through each and every line of the code. It also consists of a debugger and controls the versions. (Refer Slide Time: 08:48) 23 So, now let us look at the next IDE which is Jupyter notebook. So, now, Jupyter notebook is very different from the earlier two IDEs in the sense that it is a web application which allows creation and manipulation of the codes; now these codes are called notebook documents and hence that is how Jupyter gets its name Jupyter note book. Now Jupyter is supported across all operating systems and it is available as an open source version. (Refer Slide Time: 09:18) Now, this is the interface of Jupyter, you can see that you have few cells here as an input you also have some output let me just zoom in and show you how the interface looks. 24 (Refer Slide Time: 09:33) So, here you can see some of the codes that is written, if you just scroll up and see this is some narrative about whatever you have written. (Refer Slide Time: 09:41) So, Jupyter is bundled with Anaconda distribution, but it can also be install separately. It primarily supports Julia, python, R and Scala. So, if you look at the name Jupyter it basically takes the first two letters from Julia the next two from python and then R. 25 So, that is how Jupyter gets its name as Jupyter it also consists of an ordered collection of input and output cells like how we earlier saw; and these can contain narrative text, code, plots and any kind of media. (Refer Slide Time: 10:13) One of the key features of Jupyter notebook is that, it allows sharing of code and narrative text through output formats like HTML markdown or PDF. If you are working in an education environment or if you would like to have a better presentation tool, then you can use these kind of output formats to present. So, though Jupyter consist of features that give a very good aesthetic appeal to it, it is deficit of the important features of a good IDE. So, by good IDE, I mean it should consist of a source code editor and compiler and a debugger; and all three of these are not provided by Jupyter. 26 (Refer Slide Time: 10:50) So, the next IDE that we are going to look into is atom. So, atom is an open source text and source code editor and is supported again across all over systems; it again supports programming languages like python, PHP, Java etc. And it is very very well suited for developers, it also helps the users to install plug-ins or packages. So, one common drawback with all these text editors and source code editor is that these do not come installed with basic libraries of any programming languages; you have to install these kind of packages as and when you have a need for them. So, that is one major drawback for using any kind of text editor or the source code editor. But; however, atom does provide packages or libraries that are suited for data science and code completion or code navigation or debugging. So, you can install it, so if you are a developer and if you want to code an text editor environment then you can go ahead with atom. But you will have to install all these packages as and when you require. 27 (Refer Slide Time: 11:52) So, this is the interface of atom, this is how it looks, it is a proper text editor interface. (Refer Slide Time: 12:00) So, how will you choose the best IDEs then important question. So, it basically depends on your requirements, but it is a good habit to work first with different IDEs to understand what your own requirements are. So, if you are new to python then it is better that you work across all these IDEs and there are several other IDEs out there you can work with all these IDEs see what suits you and then take a call on which IDE to use. 28 But in this course we are going to be looking at spyder; and that is primarily because it is a very good software that has been developed only for data science and python; and it as an interface that is very very appealing and easy to use for beginners. (Refer Slide Time: 12:43) So, to summarize in this lecture we saw what are the popular tools used in data science environment. We also saw how python evolved and what are the commonly used integrated development environment. We also looked at what each of these IDE have to offer us and some of the common pros and cons of each of these. Thank you. 29 Python for Data Science Prof. Ragunathan Rengasamy Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 03 Introduction to Spyder Part -1 (Refer Slide Time: 00:17) Welcome to the lecture on Introduction to Spyder, in this lecture we are going to see how does the interface of spyder look? How to set the working directory and how to create and save a Python file? 30 (Refer Slide Time: 00:28) (Refer Slide Time: 00:32) So, let us see how does the appearance of spyder look. So, on my left you can see a snapshot of the screen that would appear once you open Spyder. So, the Python version that I am using to illustrate this lecture is version 3.6. So, once you open you will get a small description of the author name and when the file was created. There are a couple of windows though here so let us see what each of these windows mean. So, the entire interface is split into three windows, the window on my left is called the scripting window and all your lines of codes and commands that you are going to type 31 will be displayed here. So, you have to write all your commands and codes here on my right I have two windows, the top section is where you would find tabs that read as file explorer, help and variable explorer. Now under file explorer once you set the directory if you have any files that are existing in your current working directory, then all these files will be displayed under file explorer under variable explorer you will basically be having a display of all the objects and variables that you have used in your code. Now, along with the variables you also have their name, type and size. Now, name is the name of the variable, type is the data type and size is whether it is an array or a single value. Now, the first few values will be displayed if it is only a single value then the single value be displayed under, the heading value the section on the bottom is the console. So, console so is an output window where you will be seeing all your printed statements and outputs, you can also perform elementary operations in your console, but the only disadvantage is that you will not be able to save it. Now however, whatever you type in the scripting window can always be saved. So, we are going to look into how to save the lines of commands that you have used in your scripting window and we will do that once the lecture proceeds. (Refer Slide Time: 02:34) 32 Now, let us see how to set the working directory, there are three ways to set a working directory the first is using an icon, the second is using the inbuilt library OS and the third is using a command cd which means change directory. (Refer Slide Time: 02:47) Now, let us see how to set a working directory using the icon. If you look at the top section here you will see an icon here with a folder open, now you can choose a working directory by clicking on this icon. Once you choose you will be prompted to choose a location or a folder. Now, you can choose a suitable folder or a suitable location by clicking on the icon and once you click on the location your directory is considered to be set. Now this is an easy method and if you do not want to be typing commands every single time, then you can just do a drag and drop. 33 (Refer Slide Time: 03:28) Now, let us look at the second and the third methods, now you need to import a library called os, os stands for Operating Systems. Before you use a function from this library to change the directory you need to import it. So, import is a function that you will use to load a library to your environment. Now, once you load the library OS on your environment you can use the function chdir which means change directory. So, I need to use the name of the library which is OS in this case followed by a dot and then use chdir. Now, within parenthesis you we can give single or double quotes. So, copy the entire path from your directory and then paste it here or you can also type it out. The third method is using the command cd, cd also means Change Directory and you can give a space after the command and then give the path. So, this how you set a working directory. 34 (Refer Slide Time: 04:32) Now, once you set the working directory if you have any folders or any sub folders or any other files inside the working directory, all of that will be displayed under file explorer. For me I have a couple of files under this directory and hence it is being displayed here for me. But of course, if you are opening a new folder you are likely to see this space as empty now you can check all your files and sub file and sub directories here under file explorer. (Refer Slide Time: 05:09) 35 So, we have seen how to set a working directory, now let us see how to create a file. So, there are two ways to go about it the first is by clicking an icon that looks like a page folded on the right. Now, this you can find on the toolbar. So, on the icon bar towards your extreme left you will see a page that is folded on the right, now if you click on that a new script file will open. I have also shown you a zoomed in version of the icon, so this is how it looks, the moment you click on it a new script file will pop up. (Refer Slide Time: 05:39) Now, the second method is by clicking on the file menu and then selecting new file. So, you can see the file menu here and then from that click on new file. Now, apart from these two methods you always have a fallback option of using the keyboard shortcut which is Ctrl + N, in all these three methods right away open a script file for you till. Now, we have set the working directory we have created a script file. So, now let us type few pieces of code before we save our script file, but even before we go there let us look at what a variable means. 36 (Refer Slide Time: 06:00) So, variable is an identifier that contains a known information, the known information that is contained within an identifier referred to as a value. So, a variable name will actually point to a memory address or a storage location and then this location is actually used to cross refer to the stored value. So, variable name can be descriptive or can also consist of single alphabets. So, we will look into the naming conventions of naming a variable in the lectures to come. (Refer Slide Time: 06:47) 37 So, let us go ahead and create few variables, now you will see a snapshot of a code here on my left I have zoomed in the lines of code on my right. So, let me again zoom in and show you now I am assigning a value of 11 to a. So, in Python the assignment operator that you will be using to assign a value is equal to. So, I am storing a value of 11 in a, a is my variable name and I am saying b = 8 * 10. So, this is a multiplication and the multiplication operator in Python is referred as asterisk. So, once I create both my variables I would like to print the values of a and b, now because I want to print two values together; I am going to separate them with a comma inside the print statement. So, the print statement will help me print the output and since I want to print two outputs here I am going to separate them with a comma. However, if you just want to print one statement you can just give a single object inside the parentheses. (Refer Slide Time: 07:59) So, now let us go ahead and save our script files. So, to save your script file you can click on the file menu again and you can see there are three different options here. So, let us see what these options are I am going to zoom in a bit to show you the list of options that you have. So, the first option is save which is represented as Ctrl + S in your keyboard shortcut. Now, if you already have a file now if you are making some changes to it, then if you would like to save changes that you made then you can just simply click on save. 38 Now if you are making changes across multiple files. So, now, if you are opening multiple files and making changes in all of them then you can use the option save all. So, what save all does is that it will save all the changes made across all the files that are open. So, this is the use of save all. So, the third option is what is called as save as, now if you are creating a new file and you would like to rename it and save it then you would be using save as. So, let us see how to save a new script file for the very first time. (Refer Slide Time: 09:07) So, once you click on save as it will prompt you to give a name for the file. Now, you can choose your directory here as to where you want to just save it or if you already in your working directory then you can just go there and save it. So, dot py is the extension that is used to save a Python script file. Now once you do this you can just click on save and your file is saved. 39 (Refer Slide Time: 09:35) So, to summarize in this lecture we saw how the interface of Spyder looks, we saw how to set the working directory and how to create and save Python script files. Thank you. 40 Python for Data Science Prof. Ragunathan Rengasamy Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture – 04 Introduction to Spyder Part -2 Welcome to the 2nd lecture on Introduction to Spyder. (Refer Slide Time: 00:17) In this lecture, we are going to see how to execute a Python file, how to execute few pieces of code using run and how to add comments and we will also see how to reset the environment and clear the console. 41 (Refer Slide Time: 00:34) So, let us begin with file execution. On the top in the icon bar you will see a green triangle with its end pointed to right; now this is called the run file option and this will help you run an entire file at once and equivalent shortcut from the keyboard to press F5. Now, if you want run few section of code or few lines of code, then you can click on the run selection option and this will help you run a chosen line. And equivalent shortcut from the keyboard is to either press F9 or press ‘Ctrl+Enter’; after choosing the line. So, let us see how each of these options work and what are the corresponding outputs that they give? 42 (Refer Slide Time: 01:15) So, I am now first starting with the run file option which will run an entire file at once. So, once you have code ready you can click on the green triangle icon and you will see an output here. In this case, I am running the same script file that I used in my earlier lecture. Now which says a=11, b=8 *10 and print(a,b). So, the output I am likely to be get are the values of a and b here. So, once you run the code; you will see the values being stored in the environment and this you can find under the variable explorer. So, in the variable explorer you can see I have my vales of a and b and they are integer type; the size is 1 because the both have only one value and the corresponding value is displayed. Now, after I run on my console; I have the output here. So, let me just zoom in to show you how does the output look. 43 (Refer Slide Time: 02:13) So, this my output I have my value of a and b; a is 11, b is 110. Now, since these work contain within the print statements; I will be getting only these two values as my output. So, another thing to note is that when whenever you click on the run file option, corresponding function is run. So, the run file command is actually the function that will be used to run an entire file at once. If you are not using the icon or the keyboard shortcut, you can also run your file using this command. The input to the run file function is the name of the file along with its entire directory. Now, you also have another parameter called wdir which means working directory and you can specify in whichever a directory the file is residing. So, me for it is residing the desktop; so I have given the same directory. So, if do not want to use the icon or the keyboard shortcut, we can also use that run file command. So, in this case since the output is contained within the print statement; you will be able to see only the output a and b. 44 (Refer Slide Time: 03:23) So, now let us see how to execute few pieces of code using the run selection option or the F9 command from the keyboard. Now, to my earlier code I am assigning a value of 14 to a and I am selecting the line and then I pressing F9. So, I am just going to using the shortcut here; you can also use this icon. Now once you select the line and press F9, you will see a corresponding output being displayed in your console; now this says a = 14. So, whenever you use run selection command or F9; all these lines of code will be displayed in your console. So, whatever I have shown you here is the console output. (Refer Slide Time: 04:08) 45 So, now run the second line which is b=a*10 and then press F9, now once you do that you will corresponding see the code in your console. Then you can run the line which says print(a,b) and once you run this line print(a,b); you will see a corresponding output as well. So, if you have noticed in run every time you select a line and run; it the corresponding code is also displayed in the console. But however, if you going to use run file command which runs an entire file at once then all these lines of code is not printed in the console. So, this is also one of the difference between using a run file and run selection command. So, you can also use run selection to debug. So, if you want to go through each and every line and if you want to find out bags or falls on mistakes that you would have done then you can use the run selection option. So, this is one of the major advantages of using run selection, but again if you have a 1000 line or if you have. So, again if you have a large code then it is going to be impossible for you to use the run selection option. (Refer Slide Time: 05:20) So, now let us move on to commenting script files. So, adding a comment will aid in the understanding of algorithms that have been used to develop a code. On my right, you can see a snapshot now this is a very trivial example that describes how a volume of cylinder is calculated to comment any line you basically begin with ‘#’. So, here I have described the title of the task that I am going to do. 46 Now, I am going to calculate the volume of cylinder; now apart from describing the task or the task objective you can also define what each of the variables mean in your code. So, here I said dia is diameter, len is length vol is volume. Now, this is a very good practice because if you are going to give your code to someone or if you are going to revisited in the future; you might want to know what you have done and why you done it. Now, you can also comment multiple lines instead of just one line. (Refer Slide Time: 06:13) So, to comment to multiple lines select on the lines that have to be commented and then press on “Ctrl+1”; now, this is the keyboard shortcut and alternative ways to go to the edit option in the menu and then select comment or uncomment lines. You can see the keyboard shortcut also being displayed just adjacent to the comment uncomment option under the edit menu. Like I earlier said you can add description to your code to make it more comprehensible, but apart from just making it incomprehensible if you are in the beginning stage of developing a code where you are trying and testing out of you think then you can also use commenting as a way of making a few lines inert. 47 (Refer Slide Time: 06:58) So, what do I mean by this is that. So, let us take the previous example where I say a=14, b=8*10 and print(a,b). Now I am just trying and testing out and seeing what will happen if I just comment a. So, I am basically just making the first line inert and then running the successive lines. Now, if you are playing with your code and you are in the developing stage then you can also use commenting as a way of keeping lines inert. So, now this is another use of committing. So, till now we seen how to execute an entire file at once and how to execute few lines of course, we have also looked at commenting as a way of adding description to your code. Now let us see how to clear the console and the environment.ment. 48 (Refer Slide Time: 07:43) If you have an overpopulated console where you have printed multiple lines of codes and multiple outputs; then you might also want to just clear it off and start a fresh. So, let us take this example where I have run the codes; I have the same codes as earlier and I am just running them. So, this is how my console looks right now; now if I want to clear it. So, you can type “%clear” in the console and once you hit enter, your entire console is clear. Now, an alternate ways to use the “Ctrl+l” shortcut from the keyboard that will also work; so this is to clear the console. (Refer Slide Time: 08:24) 49 Now, once you clear the console this is how it looks; I have snapshot here that tells you how does the console look once you clear it. An important point to note here is that; so whenever I clear my console only the output windows cleared the variable explorer still remains intact all the variables are still there; so, clearing a console only means that you are just clearing or flushing out the output window. So, now let us see instead of just clearing the console is there a way to just clear the environment as well. (Refer Slide Time: 08:57) So, I might also be interested in removing or deleting a few variables from my environment. So to begin with I have two variables in my environment which is a and b and they have a value of 14 and 140 respectively. So, let us see how to remove or delete these variables. 50 (Refer Slide Time: 09:15) Now, to remove a single variable; you can just give del space followed by the variable name. Now this you can type in the console; so del stands for delete and it has to be followed by a space and then a variable name. Now, once you hit enter what you will see is that one of the variables in this case b has been removed from the environment. So, here you can see variable b has been removed from the environment. Now, instead of removing a single variable we can also remove multiple variables from the environment. Now still use the same command del, but instead of just giving one variable; you will give two variables and you have to ensure that you separate the variables with the comma. Now, if you are typing along with me please ensure that b is also present in the environment and then you can type this code. So, once you hit enter you can see that the entire memory has been flushed out and both these values have been deleted. 51 (Refer Slide Time: 10:18) So, instead of dropping variables one by one; you might also be interested in clearing the entire environment at once. So, there are two ways to go about it the first way is to use a “%reset” command in the console. Now, once you type %reset and hit enter; it will prompt you with the line that reads as once deleted, variables cannot be recovered; proceed yes or no; y stands for yes and n stands for no. Now, this is to ensure that you have not typed percentage reset accidentally and this is just another layer of check to make sure that you do not flush out the important variables in your environment. Now, if you would like to proceed then type ‘y’; otherwise you can type ‘n’ and now once you hit enter, you will see that the entire environment has been cleared out. 52 (Refer Slide Time: 11:07) So, this is using a command. (Refer Slide Time: 11:13) Now, let us see how to clear the environment using an icon. Now, above the variable explorer there are a couple of icons here; the one on the extreme right looks like an eraser. So, now click on the icon it will prompt you to the dialogue box; if you click all the variables will be removed. So, while it is removing variables it also prompts with the line that says removing all variables. So, till now we have seen how to execute an entire 53 files or few lines of codes and how to comment the code that you have written and how to clear the console and the environment. (Refer Slide Time: 11:56) So, now let us take a look at some of the basic libraries in Python. Now, there are four major libraries that get installed at the time of installation of Python. These are NumPy which stands for Numerical Python, Pandas which Paneled Dataframe, Matplotlib which stands for Visualization and Sklearn which is used for machine learning. So, these are four major library that are important to solve a data science problem. So, these are parent libraries; there also sub libraries contained within these. So, to access the contents of a library, you need to first import the library; in this case I am importing NumPy. Now ‘dir’ represents directory; so this is the directory of the library; in this case it is numpy. Now, I am just saving the entire directory on to a variable name called content and I am just printing the object content. So, once you run these three lines of code on your console all the sub libraries will be printed. So, this is one way to actually access the sub library. So, now this is the little tedious because you actually have to skim through all the sub libraries to know what they consist of and also your console get overpopulated. 54 (Refer Slide Time: 13:12) Now, under the help tab you have a search box that’s titled as object; let me just zoom into show you how it looks. (Refer Slide Time: 13:19) Now, under object you are going to mention your library name. So, in this case I have mentioned NumPy; the moment you hit enter a documentation pops up. 55 (Refer Slide Time: 13:30) Now, the documentation tells you what Python does; what it provides and how to use the documentation. Now, apart from that if you scroll down, you can also see a list of sub libraries that are available under NumPy. So, the sub libraries available under NumPy are linear algebra, Fourier transform routines, polynomial tools so on and so forth. Now, if you want a specific documentation for each of the sub libraries; so you can type in the library name in the search box, follow it up with dot and then the sub library name. So, let us say if I want to access the sub library lib from numpy; then I am going to write ‘numpy.lib’ under the search box object. So, this is how you get a detailed documentation of all the libraries and sub libraries in Python. 56 (Refer Slide Time: 14:23) So, to summarize in this lecture we saw how to execute Python script files, how to comment single lines of codes and multiple lines of codes, how to clear the console and the environment and how to access some of the basic libraries in Python. Thank you. 57 Python for Data Science Prof. Ragunathan Rengasamy Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture – 05 Variables and Data Types (Refer Slide Time: 00:18) Welcome to the lecture on Variables and Data Types. In this lecture, we are going to look at how name variables, some of the common rules and conventions in naming a variable. We are also going to look at some of the basic data types that we are going to use throughout this course and in Python. As a part of this task, we are going to look at how to identify a data type of an object, how to verify if an object is of a certain data type and how to coerce objects to a new data type. 58 (Refer Slide Time: 00:43) So, let us begin with naming variables. So, values are assigned to variables using the assignment operator equal to(=). So, the variable name should be short and descriptive, and that is because there is an intent for creating the variable and it is supposed to convey an information hence it is better that it is short and descriptive. So, avoid using variable names that clash with inbuilt functions. Like I earlier said, the variable names are designed to indicate the intent and purpose of its use to the end user. So, hence it is better to avoid one character variable names. So, one character variable names are usually used in iterations, and functions and looping constructs. 59 (Refer Slide Time: 01:23) So, variables can be named alpha numerically. So, if I have a variable called age whose value is 55, then I can either have the entire age in lower case or I can also begin with an upper case. You can also add a number 2 weight. So, let us say if I am creating another variable called age 2, then I can add the number after the alphabets. So, one thing that you would have noticed here is that the first letter should always begin with an alphabet, but however, if you were to begin with the number the compiler throws an error saying invalid syntax. (Refer Slide Time: 02:01) 60 So, the only other special character that is allowed while naming a variable is underscore. Now, let us say if I want to create a variable that conveys the employee id, then I can separate the employee and id with an underscore. Now, underscore is the only other special character that is allowed. Now, if you use any of the other special characters you will get an error that says cannot assign to the operator. So, though underscore is allowed, it is better to not begin or end with an underscore and that is a common unaccepted naming convention, though it is accepted by the compiler. So, if you begin or end with an underscore you are not likely to get an error. So, it is usually not a practice that is followed. (Refer Slide Time: 02:46) So, there are a few case types that is commonly accepted while naming variables, the first is the camel case. It can be lower or upper. First example, that you have here where age of the employees given with E in capital. Now, this is a lower camel case, so the example on the right is an example for the upper camel case. Now, in this example the letter ‘a’ in age of the employee begins with the capital letter. So, the next case type is the snake case where I am separating age and emp by an underscore. So, an underscore can be used between two set of letters or between two letters and that is a snake case. The letter after the underscore should always be in lower case, but the first letter of the variable name can be in lower or upper case. 61 The next case type is Pascal; so, where I am again going to take AgeEmp. Now, here the first letter of age is in uppercase and the first letter of emp is also in uppercase. Now, these are the commonly used case types in Python. Now, the compilers is not going to throw an error if you name the variables wrongly, but these are convention that are in books and are accepted. However, there are other case types which use hyphen, but those case types will not be allowed in Python. (Refer Slide Time: 04:06) So, if you want to assign multiple values to a variable you can create all the variable at once and sequentially assign the value. Now, if you run this command you will see that the values of variables chemistry mathematics and physics have been assigned accordingly. 62 (Refer Slide Time: 04:24) So, now let us look at the commonly used data types in Python. (Refer Slide Time: 04:27) So, the basic data types are Boolean which represents two values of logic and are associated with the conditional statement. So, the output values that you would get, when you use a Boolean data type is true or false, and it is represented as bool. The next data type is integer. It consists of set of all integers which are positive or negative whole numbers. It is represented by the letters int. The next data type is complex which 63 contains real and imaginary part. So, any expression of the form a+ib is of a complex data type. It consists of all complex numbers and it is represented by complex. Now, float data type consists of all real numbers which are of floating point numbers, it is represented by float. String data type consists of all strings and characters, so anything that is enclosed between single or double quotes is treated as a string data type. The value that is enclosed between the quotes can be a number, a special character, alphabets anything. So, anything that is contained within quotes is treated as a string data type. It is represented by the letters str. (Refer Slide Time: 05:38) Now, before we go ahead with data type operations, it is important to know the difference between statistically and a dynamically typed language. So, a statistically typed language is the one, where the type of the variable is known at the compile time and you also have to declare the data type of the variables upfront. So, examples of such programming languages are C, C++ and Java. So, contrary to this a dynamically typed language is the one where the data type need not be declared upfront, the type of the variable is known only at the run time. So, whenever you declare a variable and you assign a value of it; the moment you run that specific line, the data type of the variable is known. Examples of such languages are Python, PHP. So, Python that we are using here is a dynamically typed language. 64 (Refer Slide Time: 06:29) So, lets go further and see how to identify the data type of an object. Now, to find the data type of an object you will be using the function type and give the object as an input. Now, an object can be a variable, can be any of the data structures, it can be array or anything. In this case in particular, we are just going to look at how to find the data type of variables. Now, let us take a small example here. I have 3 variables here, the first is Employee_name which is the name of the employee and the value is “Ram”. Next is age of the employee presented by the variable name Age, the value for which is 55. The third variable is height, which is the height of the employee whose value is 150.6. Now, if you want to check the data type you can give type of Employee_name, now Employee_name is a string because it is enclosed between double quotes. If you give type of Age, Age is an integer and hence the output is int and if you give type of Height which is a floating point number the data type is float. 65 (Refer Slide Time: 07:40) So, now, let us see how to verify the data type of an object. Now, if you want to verify if an object belongs to a certain data type, then you basically give type of the object in this case it is a variable name, followed by the keyword is and the data type name. So, this data type will basically be the representation of the data type; int is for integer, str is for string so on and so forth. Now, I am going to use the same example that we have used earlier. Now, here I would like to verify if height is of integer data type. So, I am giving type of height followed by is and the keyword int which is the abbreviation for integer data type. So, one important thing to keep in mind while verifying the data type is that in this case the output is just going to be Boolean. So, the output is going to be true or false. We are actually checking given a variable does it belong to a desired data type or not. We are not trying to assign it or change it, but what we are just trying to do is just cross verification. So, hence the output for this is either going to be true or false. So, I want to know if Age belongs to a float data type, which is false, because Age is actually an integer. Now, in this case I want to check if Employee_name is a string. Yes, the output is true because it is a string. 66 (Refer Slide Time: 09:02) So, till now we have seen how to find the data type of an object. We have also seen how to verify if an object has a desired data type. Now, we are going to look at how to coerce object to new a data types. Now, if I want to convert the data type of an object to another, I will be using the data type, you need to replace data type with the abbreviation used for each of the data types and you give the object as the input here. Now, all the changes that have been made to the variables can be stored in the same variable or different variable. Now, in this case height is of a float data type, let us say I want to convert it to an integer data type. So, I say int(Height) which converts it into an integer data type, but if I want these changes to be reflected in my environment, I have to store it on to a variable. Now, the variable can have the same name or a different name. So, in this case I am converting height to an integer data type and I am storing it onto a new variable called ht. Now, if I get the type of ht the output is int, which means height has been converted to an integer data type. Now, if I want to reflect the changes on the same variable name, I will just say Height=int(Height). So, earlier height was float, now once the operation is done if you check the type of height again the output that you will be getting is int. So, this is how you would coerce existing objects to new a data types. 67 (Refer Slide Time: 10:37) So, one important point here is that only few types of coercions are accepted. Now, let us say I have a variable call Salary_tier, which is a string data type. Now, Salary_tier contains an integer which is enclosed between single quotes. So, in this case the Salary_tier is basically a number which is 1, but that is enclosed within single quotes so it is still treated as a string data type. Now, if I want to change it to an integer, I will just say Salary_tier=int(Salary_tier). Now, this will convert the data type for Salary_tier. So now, the data type is converted from string to an integer. 68 (Refer Slide Time: 11:15) However, not all types of coercion is possible. For instance, if I would like to convert the Employee_name, Ram, to an integer or a float I will be getting an error. So, the value which is enclosed between codes, if it is a float or a integer type then such types of coercion will be possible. However, if it is a string or a set of characters than those types of coercion would not be possible. (Refer Slide Time: 11:42) So, to summarize in this lecturer we looked at some of the common naming conventions to name a variable in Python. We saw some of the basic data type related operations, one 69 of that was to get the data type of a variable. Then, verify if a variable is of a certain data type and how to coerce the data type of an existing variable to a newer one. Thank you. 70 Python for Data Science Prof. Ragunathan Rengasamy Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture – 06 Operators (Refer Slide Time: 00:17) Welcome, to the lecture on Operators! In this lecture, we are going to see what an operator is and what an operand is. We will also look at the different types of operators that is used in python and these are arithmetic, assignment, relational, logical and bitwise. We will also be looking at the precedence of operators and how to use them in an expression. 71 (Refer Slide Time: 00:37) So, let us see what operators and operands are. An operator is a special symbol that will help you carrying out an assignment operation or some kind of a computation the nature of the computation can be arithmetic or logical. Now, the value that the operator operates on is called an operand. So, let us take a small example here to illustrate what an operator and operand is. In this case, the plus symbol that you see in the plus sign that is an operator. This operator denotes addition. So, you see two numbers before and after the operator. So, 2 and 3 in this case are called operands. 72 (Refer Slide Time: 01:19) So, let us look at arithmetic operators. Now, arithmetic operators are used to perform mathematical operations between any two operands. So, let us take an example to illustrate the use of arithmetic operators. Create two variables a and b with values 10 and 5. In the previous lectures we have already seen how to create a variable. So, we were going to use the same procedure here. So, the first operator that we are going to look at is the addition operator. It is denoted by a plus symbol and this is how an addition operation is carried out. If I want to add two variables, then I am just going to separate the variables with the plus symbol in this case a is 10, b is 5 so, the result in sum that you get is 15. The next operation is subtraction denoted by a hyphen. I have the corresponding example here. So, in this case I am going to subtract a and b and the output that I get here is 5. 73 (Refer Slide Time: 02:17) So, the next arithmetic operation that we are going to look at is the multiplication operation, it is denoted by an asterisk and if you want to multiply two variables separate the variable and insert an asterisk symbol. The product that you get in this example is 50; a is 10, b is 5 again. Now, 10 into 5 is 50 and that is the output that you get. The next operation is division that is denoted by a forward slash. So, you separate the variables and insert a forward slash between them, what you get is basically the quotient. So, in this case 10 by 5 gives you a quotient of 2 which is the output here. So, the next operation is getting a remainder and it is denoted by the percentage symbol. So, let us take the same example here. I am trying to get the remainder when I divide a and b and you just separate the variables and insert the percentage symbol and that returns the remainder. In this case a is 10 and b is 5. So, 10 is divisible by 5 and hence I get a remainder of 0. The next operation is exponent and it is denoted by double asterisk. So, let us say if I want to raise a variable to the power of another variable then I am going to use this operation. So, let us say in this case I want to raise a to the power of b, then I just say a double asterisk b. So, in this case since we have a as 10 and b as 5, I am just going to raise 10 to the power of 5 and the corresponding output that you get here is 1 lakh. 74 (Refer Slide Time: 03:51) So, let us look at the hierarchy of operators. Now, I have ordered the operators in the decreasing order of precedence. So, the first is parentheses. So, parentheses is not really an operator, but anything that is enclosed within parentheses gets the topmost priority. So, therefore, I have included parenthesis also as an operation. Now, this is followed by exponential operations and then division, multiplication and addition and subtraction are given the same precedence. So, let us take an example here in this case. I have the following expression. I have So, to avoid confusion I am going to add bracket for this 27 by 3 square term. So, 27 is the numerator, the denominator is 3 square and to denote 3 square I add a double asterisk here and the entire term I am enclosing it within parentheses. So, once you execute the command in your console, if you print the value of A, it should return 5. So, this is how you will use the operators in an expression. You can also try out this example. 75 (Refer Slide Time: 05:03) So, now let us look at assignment operators. So, an assignment operator is used to assign a value to a variable. So, the first assignment operator that we are going to look into is the equal to symbol. Now, this is the most commonly huge assignment operation and that is because whenever you want to create a variable you want to assign a value to it. So, we have learnt how to use this operator in our earlier lectures. So, what it basically does is that it will assign values from the right side operand to the left side operand. Now, the left side operand is a variable name and the right side operand is the value that is given to the variable. So, let us take an example in this case. I am retaining the same variable names a and b, I am assigning a value of 10 to a. In this case 10 is my right side operand and a is my left side operand. So, the same definition also holds good for b equal to 5. So, the next operator that we are going to look into is the += operator. So, what it basically does is that, it first adds the right operand to the left operand and then it will store the result on the left side operand. Now, let us take the same variables a and b. Now, if I were to denote a += b then this would translate to a= a + b. I have indicated that within parentheses here. So, whenever I give a += b, then I am saying a = a + b, so, the first operation that happens is the addition operation which is a + b. So, the value of a + b gets stored in a and hence you can see that the value of a gets updated to 15 a was earlier 10, and now the value gets updated to 15. 76 The next operator is the minus equal to operator. So, this is also similar to the addition operator that we earlier saw. It basically subtracts the right operand from the left and it will store the result on the left side operand. Now, whenever I give a -= b then it translates to a = a - b. Now, whenever I compute the difference in this case I am getting a difference of 5 and that is what I am printing. (Refer Slide Time: 07:19) So, asterisk operator will multiply the right operand from the left and will store the result on the left operand. Now, in this case again I am going to retain the same values of a and b; a is 10, b is 5 which means I am multiplying a and b first. So, a * b will give me a product of 50. Now, whenever I print the value of a it will give me the updated value which is 50. Forward slash equal to means division. So, whenever I use this operator, I am going to divide the right operand from the left and store the value on the left operand. Now, in this case a /= b translates to a = a / b. Now, if you print the value of a you can see that the value of two has been updated 2 to from 10. 77 (Refer Slide Time: 08:09) So, now let us see what relational or comparison operators are. A relational operator will test for a numerical equality or an inequality between two operands. The value that a relational operator returns is Boolean in nature which means it will basically return true or false. Now, all the relational operators that we are going to look into have the same precedence which means they all have the same priority. So, let us create two variables x and y. I am assigning a value of 5 to x and 7 to y. The first relational operation is the strictly less than operation. It is denoted by the angle operator with its tip towards the left. So, let us take an example and see how this operator works. Now, we already have the values for x and y. Now, I am just giving the relation x < y. So, now, what will happen is that it will check whether x is strictly less than y? In our case, yes, x is 5, y is 7. So, yes, 5 is strictly less than 7 and hence the output that you will see is true. The next operation is the less than equal to operation. It is denoted again with the angled operator with it is tip towards the left followed by an equal to symbol. Now, let us take an example. Now, in this case I am trying to print the output for x y or is x = y. So, both the conditions are false anyways and hence the output that you get is also false. So, a strictly greater than operator is denoted by an angled operator with its tip to the right and a greater than equal to operator is denoted by an angled operator with its tip to the right followed by an equal to symbol. The next operation is the equal to equal to operation. It is denoted by a double equal to symbol and what we really check when we give double equal to? We check if the left hand side operand is it exactly equal to the right hand side operand. In this case the value of x and y are 5 and 7 respectively and I am checking if 5 is exactly equal to 7 or not. No, it is not and hence the output is false. The next operation is not equal to. It is denoted by an exclamation followed by an equal to symbol(!=). So, in this case the output is going to be true as long as x is not equal to y. So, this operator is frequently used when you are iterating through a loop and you want to run the loop or you want to iterate through the loop as long as a certain condition is obeyed. So, you can use not equal to in that case. Now, as long as x is not equal to y my output is always going to remain as true. 79 (Refer Slide Time: 11:31) So, the next set of operators are the logical operators. A logical operator is used when the operands are conditional statements. The output for logical operators are Boolean in nature which means they return true or false. So, strictly from the point of view of python logical operators are designed to work only with scalars and Boolean values. So, if you want to compare two arrays, then a logical operator cannot be used. So, let us take the first logical operation which is logical or it is denoted by the letters or, both letters in lowercase. Now, let us take a small example here, I am retaining the same values for x and y; x is 5, y is 7. Now, if I give (x > y) or (x < y), I get an output that says true. Now, why does it happen? So, a logical OR is designed to give an output true when one of the statement is satisfied. In this case x > y it is not satisfied. So, it is a false, but however, x < y that statement is satisfied. So, this gives you an output which is true. So, the inputs to the or operator is basically false and true. So, whenever you have a false and a true operand, then the resultant is always true. So, hence you are also getting an output which is true. The next is the logical AND which is represented by the letters and, all letters again in lower case. Let us take a small example here, I am taking the same expressions that I have considered above. Now, in this case instead of or I am replacing them by and. So, for the same conditions I am getting a different output and in this case it is false. So, why does this happen? If you look at the conditional statements the first is x > y, we know 80 that this is a false statement it gives you a Boolean value of false. The second conditional statement is x < y. Now, this is true like I earlier said. So, the way a logical AND works is that whenever you have a false and a true condition as the operands you will basically get a false output and this is because logical AND expects you to satisfy both the conditions and unless both these values are true it will never return the output as true. So, even when you have false true or true false the output is always false. So, the next logical operator is not represented by the letter not, again in lowercase. So, not basically negates your statement. So, I have taken the example of x == y. Now, we know the value of x is 5, value of y is 7 of course, both of them are not equal. Now, the output that you will get from this conditional statement is false. So, we are trying to negate false which means not false. So, that gives you a result which is true. So, that is why you get the output as true. Now, another important point to note in logical operators is that, whenever you giving these conditional or relational statements, make sure that you enclose them within parentheses because if you are not going to do it then you are likely to get an error. (Refer Slide Time: 14:55) So, let us move on to bitwise operators. So, bitwise operators are used when the operands are integers. So, these integers are treated as a string of binary digits and are binary 81 encoded. So, when you are going to use a bitwise operator on two integers which are binary coded, the operator is going to compare bit by bit of the binary code and that is how the operator got its name bitwise. The other advantage of using a bitwise operator is that, they can operate on conditional statements. Now, these conditional statements can compare scalar values or they can also compare arrays. Now, if you would like to compare arrays you would be using a bitwise operator. We earlier saw that we cannot use logical operators to handle arrays and this is where bitwise operators step in. So, throughout the course we are going to be looking into two bitwise operators. The first says bitwise OR which is represented by a pipe and second operator is the bitwise AND represented by an ampersand. (Refer Slide Time: 15:53) So, create two variables x and y with values 5 and 7. Now, these are the binary codes for 5 and 7 and we are going to be using these variables for our example. So, 0 corresponds to false and 1 corresponds to true. And, in a bitwise OR the operator will copy bit by bit of the result if it is there in either of the operands. But, in a bitwise AND the operator will copy the bit only if it exists across both the operands. So, let us take an example and see what these two statements mean. 82 (Refer Slide Time: 16:23) Now, I am going to be illustrating a bitwise OR on integers. Now, I am using the bitwise OR operator which is a pipe symbol between x and y; x and y are my operands. The output that you will be getting is 7. So, let us see how this output was achieved. So, I have created two arrays here. The cells in these arrays consists of the individual binary code for 5 and 7, and I have color coded them for reference. So, the first two positions of both the binary codes is 0. So, both these serve as my input operands. Now, both these positions have 0 and hence the resultant will also contain 0. (Refer Slide Time: 17:07) 83 Now, let us take the second position. The second position also has 0 for both the binary codes and hence my corresponding position in the resultant binary code is also going to be 0. I have highlighted the positions using circles to just show you which cells I am referring to. So, now you can also see that positions 3, 4 and 5 consists of 0’s for both the binary codes. So, hence the corresponding positions of the resultant binary code will also contain 0. Another important point to note is that the sixth position of both the binary codes consists of 1. So, the binary code for 5 and the binary code for 7, in both of these codes the sixth position corresponds to 1 and hence in the resultant binary code I am copying 1 for the sixth position. (Refer Slide Time: 17:59) Moving further, if you compare the 7th position for both these binary codes you can see that for the binary code 5 the 7th position has 0 and for the binary code 7, the 7th position has 1. So, in this case there is a difference in values between both these operands. So, we are going to see how to fill in the corresponding position of the resultant binary code. So, since we are using an OR operator, when one of the condition is true the resultant always becomes true. In this case, if you can recall so, like I earlier said 0 corresponds to false and 1 corresponds to true so, we have one true condition. So, the resultant will also 84 contain the true value which is 1. So, an OR operator will give you the output as true when one of the operands is true. Now, we are left with the last position and for both these binary codes the last position is 1 and I am going to be copying this value to the corresponding position in my resultant binary code. So, this is the binary code that you get when you apply a bitwise OR between two integers. This is the binary code for 7 that we earlier started with and this is how a bitwise or operator works between two integers. (Refer Slide Time: 19:17) We can also use bitwise operators for conditional statements. Now, if I were to use the bitwise OR for a relational statement, this is how it could look. So, I am giving two conditional statements here. The first is where I am checking if x is less than y; the second is where I am checking if x is equal to equal to y. Now, I am in this case x is less than y that results in a value which is true and whereas, the second conditional statements which is x equal to equal to y will result in false. In this case, the first condition is true and hence the output that you get is also true. 85 (Refer Slide Time: 19:53) So, now let us look at the precedence of operators, I have ordered the operators in the decreasing order of precedence. So, like I earlier mentioned parenthesis is not an operator. Now, any expression with operators that are enclosed within parentheses they get the topmost priority. So, that is why parentheses always occupy the first line in terms of precedence. Now, after parentheses I have the exponential operation followed by division, multiplication, addition and subtraction are given the same precedence, I then follow it up with my bitwise AND bitwise OR, all relational operators are given the same precedence. 86 (Refer Slide Time: 20:31) And, then comes the logical NOT, logical AND, and logical OR. So, this is the decreasing order of precedence for all the operators put together. (Refer Slide Time: 20:45) So, to summarize in this lecture we saw what are the important operators. We looked at what arithmetic, assignment, relational, logical and bitwise operators do. We also took an example in each of these case to illustrate how the operator works and what is the nature of the output. Thank you. 87 Python for Data Science Prof. Ragunathan Rengasamy Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture – 7 Jupyter Setup (Video Start Time: 00:15) So this is a small demo on how to install Jupiter and how to use the web app Jupiter. So, let us just start by installing the Jupiter package. So, you can just give pip install Jupiter. So, I already had it in my cache. So, that is why you see that if you just take leave it from the cache and if you just install it. So, in this course we will be using two IDE’s one is Spider with the other is Jupiter and you see that some modules are bit Spider whereas the other modules are the Jupiter. And this is from a point of getting your custom to both of these IDE’s and from there on you can always choose whichever you were comfortable with. So, this is going to be a small demo on how to introduce you to the IDE Jupiter. So, let us launch Jupiter notebook now. So, you will see that since this is a web app it reopened with your default browser and this is how the interface looks and what you'll also see is that it directly opens to the C Drive. So, under the C Drive you will see desktop, documents, downloads. Now if you do not want this to open to the C Drive there's also a way to set the directory from the terminal itself. So, let us see how to do that. So, I had come to my command prompt now and you need to just exit out of the C drive here. So, you can give your corresponding driver letter. So, for me it is D type and from there I just do CD. So, this is the directory where I want my codes to be saved okay. So, I just do CD space D colon backslash Jupiter codes. So, this is the directory where I want my course to be safe now you can launch a Jupiter notebook from here. So, you will see that your directory is already set from the command prompt and it is empty because we do not have a file here. So, let us start by creating a Python file here. So, you can click on new and you can just choose Python 3. If you can recall from the earlier lectures we would have emphasized on the fact that Jupiter 88 notebook is basically a collection of input cells and any input cells can have can hold codes, text, plots or any kind of description right. So, by default if you click on the cell you will see that it is code. So, I am just assigning a value of 5 to the variable a and in order for you to execute this cell you can just click on run. So, this means that this line of code has already been executed. What you will also see is that against the first cell you see a letter In now In stands for input. We also see a number within these square indices. Now this refers to the line of code that you have run. If this is the first line of code that you have run then it is one. Now if I again run this then it becomes 2 though you are running the same line again and again now this number will keep changing how many of our times you hit run. So, that is the only idea of this number. So, that is how you interpret this number that is within the square indices. So, now let us print the value of a and I am going to give the statement print. So, here you will see that the output is displayed just below the cell the value of a is 5 and the put is printed below the cell and this becomes a third line that you have run and hence the number within the square indices is 3. So, let us just add some text. Now in order for you to add some text you just have to come out of the cell and you will see that the cell is highlighted as green and once you come out it gets highlighted as blue. Now you can click on the drop-down and change the option from code to markdown and this allows you to add texture. So, again you have to run the cell for the text to get displayed aAnd now you will see that the text is displayed. Though we had typed it in a cell it still does not look like a code. It is a description more or less and that is the advantage of using Jupiter because it allows you to have narrative text and you can also add descriptions. Now instead of just typing it I am just going to make it in bold. So, whenever you start with a hash and include a space. So, that will basically make your text bold and you can use these wherever you want to add any heading or a title to your notebook. And you can see that the font is big and it is also in bold. Now you can always change this font 89 by adding another hash. Now instead of one I have added two and I have again run it. You can see that the text is still in bold but then the size has reduced. So, this is how you add a text or a description above yourself. So, let us say if I want to add a line of code above the description sample code. I just click on A from the keyboard. So, this adds a cell above. Similarly if you want to add a cell below you can just click on the cell and then hit B from the keyboard. So, A the letter A from the keyboard will add a cell above and the letter B from the keyboard adds a cell below right. So, this is one way to remember it. Similarly if you want to delete a cell just come outside of it and just click on D twice. So, these are some operations with Jupiter notebook that you want to be familiar with and now let us just save this file as sample code and I saved it and you can also edit it from here this is another way of doing it. And if you look at this you will see that all your files are stored with an extension ipynb which means it is ipython notebook and this is something very specific to Jupiter. Now throughout the course we will be integrating Jupiter along with the spider interface and that is just to ease the purpose of demonstration but then you can continue on using Jupiter separately as a web application, thank you. (Video End Time: 07:05) 90 Python for Data Science Prof. Ragunathan Rengasamy Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture – 8 Sequence Data Part 1 Hello all welcome to the lecture on the sequence data type. So, in this lecture we are going to deal with sequence data type. So, before getting into that let us get started with what sequence data types are. (Refer Slide Time: 00:26) Basically sequence data type allows you to create or store multiple values in an organised and efficient fashion. So there are several sequence for example strings, Unicode strings, lists, tuples arrays and range objects. The two of the other data types are called as dictionaries and sets or containers for non sequential data. In this lecture we are going to look at the examples for all of the sequential and non sequential data type. The first let us look at what string data type is. A string is a sequence of one or more characters, for example, it can contain letters, numbers and symbols and that can be either a constant or variable. And strings are basically a mutable sequence in Python. So, to create a string we can close a sequence of characters inside single, double or triple quotes. Let us see an example on creating string. Create string includes A sequence of characters inside single, double or triple 91 quotes. I am creating the string called strsample and it has a sequence of characters which describes the word learning and storing that on two variable called strsample. So that becomes a string. So we can print out strings by simply calling the print function. So when we print the string strsample will be getting an output call learning. So the strsample is now a string. So let us move to the next sequence data that is called list. (Refer Slide Time: 01:58) So list in Python can be created by just placing the sequence inside the square brackets. So, here I am creating a list call lisnumber and it is going to contain only the numbers. And if you look at this example in this list I have an element which is repeating basically I am having duplicate values. Because the list may contain duplicate values with their distinct positions and hence multiple this thing or duplicate values can be passed as a sequence at the time of the list creation itself. So here lstnumber is the lissample which has only numbers and it also has duplicate values to it. So let us print the list and see what is the output. So you be able to see the output here it has values 1, 2, 3, 3, 3, 4 and 5 but a single is may contain data types like integers strings as well as object. So list can contain elements of multiple data types and list also mutable and hence they can be altered even after their creation. 92 So now I am creating another list call list sample lstsample which has both numbers and strings to it. So, for example 1, 2, a, sam, and as well as 2 here, a lstsample has multiple data types to it. Let us print that and see. So the one advantage of using list is basically you can have elements of multiple data type. So, next we will look at another sequence data call Array and an array is nothing but it is a collection of items stored at contiguous memory locations. And we can use the array to store multiple items of the same data types together and arrays in Python can be created by importing the array module. So let us import the array module so from array I am just important it as asterix so that I can use just a function from array without even calling them. And the syntax or the way we can use the array function is we need to specify the data type and the value list as argument to the function array. So let us see how to create array using the array function. As a first argument as specified the data type and a second argument given the values as a list and storing that output as an object called array sample. So, now let us print the values of array and see. So here I am using for loop to print the values of the array sample. (Refer Slide Time: 04:41) So, if you just use print of your array name, if you use the print function it will just going to give you the same input that you give in but if you want to print the values of your array then you can 93 use the for loop to get the values of your array. So, basically in the first itration is going to print the first value and for loop goes to the next titration. So it is going to print all the values in your array sample. Now we got the values. So the values are 1, 2, 3 and 4. So this is how we create an array. As you can see here that array you created now is just one dimensional array, we will see how to create or how to deal with multidimensional array in the upcoming session. (Refer Slide Time: 05:32) So here I specify the data type as integer to represent integer I have given i similarly if you want to create an array with different data types, then you can use the notation to denote different types of data. So here the data types mentioned below can be used in creating an array of different data types. So, I have given the quote here to represent the Python type and also given the minimum number of bytes. For example, if you want to create an array with the data type float, then you can use the code f. (Refer Slide Time: 06:13) 94 So, the next will move on to tuple that is also one of the sequence data type. Tuple in Python is similar to a list but the difference between the two is that we cannot change the elements of a tuple once it is assigned whereas in a list elements can be changed. In Python tuples created by placing all the elements inside parents is separated by commas. And I am just separating all the values or elements by commas. And tuple can have any number of items and they can be of different data types and here I have integer as well as string to the tuple that I am creating now and tuple can also be created without using the parenthesis as well and that is known as tuple packing. So here I have example to show you. So here I have not use the parenthesis to create the tuple but I am just given the values separated by commas. So if you print the values of the value of your tuple, this is how it will be. So we