Lec-1-Introduction to Python.pdf
Document Details
Uploaded by TopnotchZombie
Tags
Full Transcript
Introduction to Python Popular tools used in data science Data pre-processing and analysis ◦ Python, R, Microsoft Excel, SAS, SPSS Data exploration and visualization ◦ Tableau, Qlikview, Microsoft Excel Parallel and distributed computing incase of big data ◦ A...
Introduction to Python Popular tools used in data science Data pre-processing and analysis ◦ Python, R, Microsoft Excel, SAS, SPSS Data exploration and visualization ◦ Tableau, Qlikview, Microsoft Excel Parallel and distributed computing incase of big data ◦ Apache Spark,Apache Hadoop Python for Data Science 2 Evolution of Python Python was developed by Guido van Rossum in the late eighties at the ‘National Research Institute for Mathematics and Computer Science’ at Netherlands Python Editions ◦ Python 1.0 ◦ Python 2.0 ◦ Python 3.0 Python for Data Science 3 Python as a programming language Supports multiple programming paradigm ◦ Functional, Structural, OOPs, etc. Dynamic typing ◦ Runtime type safety checks Reference counts ◦ Deallocates objects which are not used for long Late binding ◦ Methods are looked up by name during runtime Python’s design is guided by 20 aphorisms as described in Zen of Python by Tim Peters Python for Data Science 4 Python as a programming language Standard CPython interpreter is managed by “Python Software Foundation” There are other interpreters namely JPython (Java), Iron Python (C#), Stackless Python (C, used for parallelism), PyPy (Python itself JIT compilation) Standard libraries are written in python itself High standards of readability Python for Data Science 5 Python as a programming language Cross-platform (Windows, Linux, Mac) Highly supported by a large community group Better error handle Python for Data Science 6 Python as a programming language Comparison to Java Python vs Java ◦ Java is statically typed i.e. type safety is checked during compilation (static compilation) ◦ Thus in Java the time required to develop the code is more ◦ Python which is dynamically typed compensates for huge compilation time when compared to Java ◦ Codes which are dynamically typed tend to be less verbose therefore offering more readability Python for Data Science 7 Advantages of using python Python has several features that make it well suited for data science Open source and community development ◦ Developed under Open Source Initiative license making it free to use and distribute even commercially Syntax used is simple to understand and code Libraries designed for specific data science tasks Combines well with majority of the cloud platform service providers Python for Data Science 8 Coding environment A software program can be written using a terminal, a command prompt (cmd), a text editor or through an Integrated Development Environment (IDE) The program needs to be saved in a file with an appropriate extension (.py for python,.mat for matlab, etc...) and can be executed in corresponding environment (Python, Matlab, etc…) Integrated Development Environment (IDE) is a software product solely developed to support software development in various or specific programming language(s) Python for Data Science 9 Coding environment Python 2.x support will be available till 2020 Python 3.x is an enhanced version of 2.x and will only be maintained from 3.6.x post 2020 Install basic python version or use the online python console as in https://www.python.org/ Execute following commands and view the outputs in terminal or command prompt Basic print statement Naming conventions for variables and functions, operators Conditional operations, looping statements (nested) Function declaration and calling Installing modules Python for Data Science 10 https://www.python.org/ Python for Data Science 11 https://www.python.org/ Python for Data Science 12 Integrated development environment (IDE) Software application consisting of a cohesive unit of tools required for development Designed to simplify software development Utilities provided by IDEs include tools for managing, compiling, deploying and debugging software Python for Data Science 13 Coding environment- IDE An IDE usually comprises of ◦ Source code editor ◦ Compiler ◦ Debugger ◦ Additional features include syntax and error highlighting, code completion Offers supports in building and executing the program along with debugging the code from within the environment Python for Data Science 14 Coding environment- IDE Best IDEs provide version control features Eclipse+PyDev, SublimeText, Atom, GNU Emacs,Vi/Vim,Visual Studio,Visual Studio Code are general IDEs with python support Apart from these some of the python specific editors include Pycharm, Jupyter, Spyder, Thonny Python for Data Science 15 Spyder Supported across Linux, Mac OS X and Windows platforms Available as open source version Can be installed separately or through Anaconda distribution Developed for Python and specifically data science Features include ◦ Code editor with robust syntax and error highlighting ◦ Code completion and navigation ◦ Debugger ◦ Integrated document Interface similar to MATLAB and RStudio Python for Data Science 16 Spyder Python for Data Science 17 PyCharm Supported across Linux, Mac OS X and Windows platforms Available as community (free open source) and professional (paid) version Supports only Python Can be installed separately or through Anaconda distribution Features include ◦ Code editor provides syntax and error highlighting ◦ Code completion and navigation ◦ Unit testing ◦ Debugger ◦ Version control Python for Data Science 18 PyCharm Python for Data Science 19 Jupyter Notebook Web application that allows creation and manipulation of documents called ‘notebook’ Supported across Linux, Mac OS X and Windows platforms Available as open source version Python for Data Science 20 Jupyter Notebook Source-https://jupyter.org/ Python for Data Science 21 Jupyter Notebook Bundled with Anaconda distribution or can be installed separately Supports Julia, Python, R and Scala Consists of ordered collection of input and output cells that contain code, text, plots etc. Source-https://jupyter.org/ Python for Data Science 22 Jupyter Notebook Allows sharing of code and narrative text through output formats like PDF, HTML etc. ◦ Education and presentation tool Lacksmost of the features of a good IDE Source-https://jupyter.org/ Python for Data Science 23 How to choose the best IDE? Requirements Working with different IDEs helps us understand our own requirement Python for Data Science 24 THANK YOU