🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

Week 1 (Lecture Slides).pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

Introduction to Python Popular tools used in data science  Data pre-processing and analysis ◦ Python, R, Microsoft Excel, SAS, SPSS  Data exploration and visualization ◦ Tableau, Qlikview, Microsoft Excel  Parallel and distributed computing incase of big data ◦ A...

Introduction to Python Popular tools used in data science  Data pre-processing and analysis ◦ Python, R, Microsoft Excel, SAS, SPSS  Data exploration and visualization ◦ Tableau, Qlikview, Microsoft Excel  Parallel and distributed computing incase of big data ◦ Apache Spark,Apache Hadoop Python for Data Science 2 Evolution of Python  Python was developed by Guido van Rossum in the late eighties at the ‘National Research Institute for Mathematics and Computer Science’ at Netherlands  Python Editions ◦ Python 1.0 ◦ Python 2.0 ◦ Python 3.0 Python for Data Science 3 Python as a programming language  Supports multiple programming paradigm ◦ Functional, Structural, OOPs, etc.  Dynamic typing ◦ Runtime type safety checks  Reference counts ◦ Deallocates objects which are not used for long  Late binding ◦ Methods are looked up by name during runtime  Python’s design is guided by 20 aphorisms as described in Zen of Python by Tim Peters Python for Data Science 4 Python as a programming language  Standard CPython interpreter is managed by “Python Software Foundation”  There are other interpreters namely JPython (Java), Iron Python (C#), Stackless Python (C, used for parallelism), PyPy (Python itself JIT compilation)  Standard libraries are written in python itself  High standards of readability Python for Data Science 5 Python as a programming language  Cross-platform (Windows, Linux, Mac)  Highly supported by a large community group  Better error handle Python for Data Science 6 Python as a programming language  Comparison to Java  Python vs Java ◦ Java is statically typed i.e. type safety is checked during compilation (static compilation) ◦ Thus in Java the time required to develop the code is more ◦ Python which is dynamically typed compensates for huge compilation time when compared to Java ◦ Codes which are dynamically typed tend to be less verbose therefore offering more readability Python for Data Science 7 Advantages of using python  Python has several features that make it well suited for data science  Open source and community development ◦ Developed under Open Source Initiative license making it free to use and distribute even commercially  Syntax used is simple to understand and code  Libraries designed for specific data science tasks  Combines well with majority of the cloud platform service providers Python for Data Science 8 Coding environment  A software program can be written using a terminal, a command prompt (cmd), a text editor or through an Integrated Development Environment (IDE)  The program needs to be saved in a file with an appropriate extension (.py for python,.mat for matlab, etc...) and can be executed in corresponding environment (Python, Matlab, etc…)  Integrated Development Environment (IDE) is a software product solely developed to support software development in various or specific programming language(s) Python for Data Science 9 Coding environment  Python 2.x support will be available till 2020  Python 3.x is an enhanced version of 2.x and will only be maintained from 3.6.x post 2020  Install basic python version or use the online python console as in https://www.python.org/  Execute following commands and view the outputs in terminal or command prompt Basic print statement Naming conventions for variables and functions, operators Conditional operations, looping statements (nested) Function declaration and calling Installing modules Python for Data Science 10 https://www.python.org/ Python for Data Science 11 https://www.python.org/ Python for Data Science 12 Integrated development environment (IDE)  Software application consisting of a cohesive unit of tools required for development  Designed to simplify software development  Utilities provided by IDEs include tools for managing, compiling, deploying and debugging software Python for Data Science 13 Coding environment- IDE  An IDE usually comprises of ◦ Source code editor ◦ Compiler ◦ Debugger ◦ Additional features include syntax and error highlighting, code completion  Offers supports in building and executing the program along with debugging the code from within the environment Python for Data Science 14 Coding environment- IDE  Best IDEs provide version control features  Eclipse+PyDev, SublimeText, Atom, GNU Emacs,Vi/Vim,Visual Studio,Visual Studio Code are general IDEs with python support  Apart from these some of the python specific editors include Pycharm, Jupyter, Spyder, Thonny Python for Data Science 15 Spyder  Supported across Linux, Mac OS X and Windows platforms  Available as open source version  Can be installed separately or through Anaconda distribution  Developed for Python and specifically data science  Features include ◦ Code editor with robust syntax and error highlighting ◦ Code completion and navigation ◦ Debugger ◦ Integrated document  Interface similar to MATLAB and RStudio Python for Data Science 16 Spyder Python for Data Science 17 PyCharm  Supported across Linux, Mac OS X and Windows platforms  Available as community (free open source) and professional (paid) version  Supports only Python  Can be installed separately or through Anaconda distribution  Features include ◦ Code editor provides syntax and error highlighting ◦ Code completion and navigation ◦ Unit testing ◦ Debugger ◦ Version control Python for Data Science 18 PyCharm Python for Data Science 19 Jupyter Notebook  Web application that allows creation and manipulation of documents called ‘notebook’  Supported across Linux, Mac OS X and Windows platforms  Available as open source version Python for Data Science 20 Jupyter Notebook Source-https://jupyter.org/ Python for Data Science 21 Jupyter Notebook  Bundled with Anaconda distribution or can be installed separately  Supports Julia, Python, R and Scala  Consists of ordered collection of input and output cells that contain code, text, plots etc. Source-https://jupyter.org/ Python for Data Science 22 Jupyter Notebook  Allows sharing of code and narrative text through output formats like PDF, HTML etc. ◦ Education and presentation tool  Lacksmost of the features of a good IDE Source-https://jupyter.org/ Python for Data Science 23 How to choose the best IDE?  Requirements  Working with different IDEs helps us understand our own requirement Python for Data Science 24 THANK YOU Introduction to Spyder In this lecture  How does Spyder look?  How to set the working directory?  How to create a Python file and save it? Python for Data Science 2 Appearance of Spyder Python for Data Science 3 Appearance of Spyder Python version 3.6 Python for Data Science 4 Appearance of Spyder Files/ Variables/ Help Scripts Console Python for Data Science 5 Setting working directory Python for Data Science 6 Setting working directory  Thereare three ways to set a working directory ◦ Icon ◦ Using library os ◦ Using command cd Python for Data Science 7 Setting working directory Method 1 To choose a working directory, click on the icon Choose a suitable location by clicking on the indicated icon Python for Data Science 8 Setting working directory  Type the following in the console Method 2 Method 3 cd C:/Users/DELL/Desktop Python for Data Science 9 Accessing file explorer Click here to check for files after setting the working directory Python for Data Science 10 File creation Python for Data Science 11 Creating a script file  There are two ways of creating a script file  By clicking the icon “ ” below the menubar Method 1 Python for Data Science 12 Creating a script file  By clicking the “File” menu in the menubar and select “New File” Method 2 Python for Data Science 13 Variable Python for Data Science 14 Variable  An identifier containing a known information  Information is referred to as value  Variable name points to a memory address or a storage location and used to reference the stored value Python for Data Science 15 Creating variables Python for Data Science 16 Saving script files Python for Data Science 17 Saving a script file Python for Data Science 18 Saving a script file for the first time Python for Data Science 19 Summary  Interface of Spyder  Setting the working directory  Create and save Python script file Python for Data Science 20 THANK YOU Introduction to Spyder In this lecture  How to execute a Python file?  How to execute pieces of code - Run?  How to add comments?  How to reset and clear console Python for Data Science 2 File execution Python for Data Science 3 Executing script files To run chosen line, select the line and 1. Press ‘Run selection’ from icon bar To run full code:- 2. Press Ctrl+Enter or F9 1. Press ‘Run file’ from icon bar 2. F5 to run full code Python for Data Science 4 Executing script files using Run file/F5 RESULT Python for Data Science 5 Executing script files using Run selection/F9 Step 1: Assign a new value of 14 to ‘a’ in the script and press F9 Console output Python for Data Science 6 Executing script files using Run selection/F9 Step 2: Select line 2 and press F9 Console output Step 3: Select line 3 and press F9 Console output Python for Data Science 7 Commenting script files Python for Data Science 8 Commenting lines of codes  Adding comments will help in understanding algorithms used while developing codes  In practice, commented statements will be added before the code and begin with a ‘#’  Multiple lines can also be commented Python for Data Science 9 Commenting multiple lines  Select lines that have to be commented and then press “Ctrl + 1”  Select “Edit” in menu and select “Comment/Uncomment”  Uses - to add description, render lines of code inert during testing Python for Data Science 10 Clearing console and environment Python for Data Science 11 Clearing an overpopulated console Console Type %clear in console Place cursor on console and press Ctrl+L Python for Data Science 12 After clearing an overpopulated console Python for Data Science 13 Removing/deleting variable(s) Environment Python for Data Science 14 Removing/deleting variable(s) Removing single variable Removing multiple variables Using del followed by variable name Python for Data Science 15 Clearing the entire environment at once  There are two ways to clear the environment Type %reset in console and type ‘y’ Method 1 after the prompt Python for Data Science 16 Clearing the entire environment at once Method 2 Click the symbol to remove variables in environment Python for Data Science 17 Basic libraries in Python Python for Data Science 18 Basic libraries in Python  Basic libraries ◦ NumPy – Numerical Python ◦ Pandas – Dataframe Python ◦ Matplotlib - Visualization ◦ Sklearn – Machine Learning  Modules within a library. E.g.- Python for Data Science 19 Help in Python Type the name of the library in ‘Object’ The following are the sub libraries Note: You can click the details of the sublibraries by typing libraryname.sublibraryname under object Eg- numpy.lib in object Python for Data Science 20 Summary  Execute Python script file  Commenting lines of code  Clearing console and environment  Basic libraries in Python Python for Data Science 21 THANK YOU Variables and Data Types In this lecture  Naming variables  Basic data types ◦ Identify data type of an object ◦ Verify if an object is of a certain data type ◦ Coerce object to new data type Python for Data Science 2 Naming variables  Values assigned to variables using an assignment operator ‘=’  Variable name should be short and descriptive ◦ Avoid using variable names that clash with inbuilt functions  Designed to indicate the intent of its use to the end user  Avoid one character variable names ◦ One character variable names are usually used in looping constructs, functions, etc Python for Data Science 3 Naming variables  Variables can be named alphanumerically  However the first letter must start with an alphabet (lowercase or uppercase) Python for Data Science 4 Naming variables  Other special character ◦ Underscore ( _ ) ◦ Use of any other special character will throw an error ◦ Variable names should not begin or end with underscore even though both are allowed Python for Data Science 5 Naming conventions  Commonly accepted case types ◦ Camel (lower and upper) ◦ Snake ◦ Pascal Python for Data Science 6 Assigning values to multiple variables Code Values reflected in environment Python for Data Science 7 Data types Python for Data Science 8 Basic data types Basic data Description Values Representation types represents two values of logic and Boolean True and False bool associated with conditional statements Integer positive and negative whole numbers set of all integers, Z int Complex contains real and imaginary part (a+ib) set of complex numbers complex Float real numbers floating point numbers float all strings or characters enclosed String sequence of characters str between single or double quotes Python for Data Science 9 Identifying object data type  Find data type of object using  Syntax: type(object) Checking the data type of an object Python for Data Science 10 Verifying object data type  Verifyif an object is of a certain data type  Syntax: type(object) is datatype Verifying the data type of an object Python for Data Science 11 Coercing object to new data type  Convert the data type of an object to another  Syntax: datatype(object)  Changes can be stored in same variable or in different variable Coercing the data type of an object Python for Data Science 12 Coercing object to new data type  Only few coercions are accepted  Consider the variable ‘Salary_tier’ which is of string data type  ‘Salary_tier’ contains an integer enclosed between single quotes Coercing the data type of an object Python for Data Science 13 Coercing object to new data type  However if the value enclosed within the quotes is a string then conversions will not be possible Python for Data Science 14 Summary  Conventions to name a variable  Basic data types ◦ Get data type of a variable ◦ Verify if a variable is of a certain data type ◦ Coerce variable to new data type Python for Data Science 15 THANK YOU Operators In this lecture  Operators and operands  Different types of operators ◦ Arithmetic ◦ Assignment ◦ Relational or comparison ◦ Logical ◦ Bitwise  Precedence of operators Python for Data Science 2 Operators and operands  Operators are special symbols that help in carrying out an assignment operation or arithmetic or logical computation  Value that the operator operates on is called operand Python for Data Science 3 Arithmetic operators  Used to perform mathematical operations between two operands  Create two variable a and b with values 10 and 5 respectively Symbol Operation Example + Addition Python for Data Science 4 Arithmetic operators  Used to perform mathematical operations between two operands  Create two variable a and b with values 10 and 5 respectively Symbol Operation Example + Addition - Subtraction Python for Data Science 5 Arithmetic operators Symbol Operation Example * Multiplication Python for Data Science 6 Arithmetic operators Symbol Operation Example * Multiplication / Division Python for Data Science 7 Arithmetic operators Symbol Operation Example * Multiplication / Division % Remainder Python for Data Science 8 Arithmetic operators Symbol Operation Example * Multiplication / Division % Remainder ** Exponent Python for Data Science 9 Hierarchy of arithmetic operators Decreasing order of A=7–2x 𝟐𝟕 +𝟒 Operation 𝟑𝟐 precedence Parentheses () Exponent ** Division / Multiplication * Addition and subtraction +,- Python for Data Science 10 Assignment operators  Used to assign values to variables Symbol Operation Example Assign values from right side operands to left side = operand Python for Data Science 11 Assignment operators  Used to assign values to variables Symbol Operation Example Assign values from right side operands to left side = operand Adds right operand to left operand and stores += result on left side operand (a=a+b) Python for Data Science 12 Assignment operators  Used to assign values to variables Symbol Operation Example Assign values from right side operands to left side = operand Adds right operand to left operand and stores += result on left side operand (a=a+b) Subtracts right operand from left operand and -= stores result on left side operand (a=a-b) Python for Data Science 13 Assignment operators Symbol Operation Example Multiplies right operand from left operand and *= stores result on left side operand (a=a*b) Python for Data Science 14 Assignment operators Symbol Operation Example Multiplies right operand from left operand and *= stores result on left side operand (a=a*b) Divides right operand from left operand and stores /= result on left side operand (a=a/b) Python for Data Science 15 Relational or comparison operators  Tests numerical equalities and inequalities between two operands and returns a boolean value  All operators have same precedence  Create two variables x and y with values 5 and 7 respectively Symbol Operation Example < Strictly less than Python for Data Science 16 Relational or comparison operators  Tests numerical equalities and inequalities between two operands and returns a boolean value  All operators have same precedence  Create two variables x and y with values 5 and 7 respectively Symbol Operation Example < Strictly less than Strictly greater than >= Greater than equal to Python for Data Science 18 Relational or comparison operators Symbol Operation Example > Strictly greater than >= Greater than equal to == Equal to equal to Python for Data Science 19 Relational or comparison operators Symbol Operation Example > Strictly greater than >= Greater than equal to == Equal to equal to != Not equal to Python for Data Science 20 Logical operators  Used when operands are conditional statements and returns boolean value  In python, logical operators are designed to work with scalars or boolean values Symbol Operation Example or Logical OR Python for Data Science 21 Logical operators  Used when operands are conditional statements and returns boolean value  In python, logical operators are designed to work with scalars or boolean values Symbol Operation Example or Logical OR and Logical AND Python for Data Science 22 Logical operators  Used when operands are conditional statements and returns boolean value  In python, logical operators are designed to work with scalars or boolean values Symbol Operation Example or Logical OR and Logical AND not Logical NOT Python for Data Science 23 Bitwise operators  Used when operands are integers  Integers are treated as a string of binary digits  Operates bit by bit  Can also operate on conditional statements which compare scalar values or arrays  Bitwise OR (|), AND(&) Python for Data Science 24 Bitwise operators  Create two variables x and y with values 5 and 7 respectively  Binary code for 5 is 0000 0101 and for 7 is 0000 0111  0 corresponds to False and 1 corresponds to True  In bitwise OR ( | ), operator copies a bit to the result if it exists in either operand  In bitwise AND (& ), operator copies a bit to the result if it exists in both operands Python for Data Science 25 Bitwise OR on integers Code and output in console Binary code for 5 Binary code for 7 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0 present in corresponding positions, therefore resultant cell is also 0 Python for Data Science 26 Bitwise OR on integers Binary code for 5 Binary code for 7 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 present in positions 2-5, therefore resultant cell will also contain 0  In the 6th position, 1 is present in both operands and hence resultant will also contain 1 Python for Data Science 27 Bitwise OR on integers  The 7th position has 0 in the first operand and 1 in the second Binary code for 5 Binary code for 7 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1  Since this is an OR operator, only the True condition is considered 0 0 0 0 0 1 1 Binary code for 5 Binary code for 7 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 Python for Data Science 28 Bitwise operators  Bitwise operators can also operate on conditional statements Symbol Operation Example | Bitwise OR Python for Data Science 29 Bitwise operators  Bitwise operators can also operate on conditional statements Symbol Operation Example | Bitwise OR & Bitwise AND Python for Data Science 30 Precedence of operators Decreasing order Operation of precedence Parentheses () Exponent ** Division / Multiplication * Addition and +,- subtraction Bitwise AND & Python for Data Science 31 Precedence of operators Decreasing order Operation of precedence Bitwise OR | Relational/ ==, !=, >, >=,

Use Quizgecko on...
Browser
Browser