Module 2 Lesson 1.pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
Module 2: Intro to Python Libraries ▪ Lesson 1: Python Packages ▪ ▪ Lesson 2: Numpy Topics ▪ Lesson 3: Pandas 79 Module 2: Intro to Python Libraries Lesson 1: Python Packages Lesson 1: Python Packages ▪ Pyth...
Module 2: Intro to Python Libraries ▪ Lesson 1: Python Packages ▪ ▪ Lesson 2: Numpy Topics ▪ Lesson 3: Pandas 79 Module 2: Intro to Python Libraries Lesson 1: Python Packages Lesson 1: Python Packages ▪ Python has a standard library with in-built functions and offers a vast variety of other packages 82 Lesson 1: Python Packages ▪ A ‘package’ is a directory of scripts. ▪ In Python, a package is a directory of python scripts. ▪ Each script is like a module. ▪ We can specify functions, methods, objects and data types 83 Lesson 1: Python Packages Scrapy Data Mining Beautiful Soap NumPy Python libraries SciPy Data processing Pandas and modelling Sci-Kit Learn TensorFlow Matplotlib Seaborn Data Visualization Ploty Pydot 84 Lesson 1: Python Packages 85 Lesson 1: Python Packages ▪ Most of the data science job can be done with five Python libraries: o Numpy o Pandas o Scipy o Scikit-learn o Seaborn. ▪ The Python ecosystem offers many other tools that can be helpful for data science work. ▪ Data scientists and software engineers involved in data science projects that use Python will use many of these tools, as they are essential for building high-performing ML models in Python. 86 Lesson 1: Python Packages ▪ Numpy (Numerical Python) o Fundamental package for numerical computation in Python. o Contains a powerful N-dimensional array object. o Has an active community of more than 1000 contributors. o It’s a general-purpose array-processing package that provides high-performance multidimensional objects called arrays and tools for working with them. o Addresses the slowness problem partly by providing these multidimensional arrays as well as providing functions and operators that operate efficiently on these arrays. 87 Lesson 1: Python Packages ▪ Numpy (Numerical Python) o Features: ⮚ Provides fast, precompiled functions for numerical routines ⮚ Array-oriented computing for better efficiency ⮚ Supports an object-oriented approach ⮚ Compact and faster computations with vectorization o Applications: ⮚ Extensively used in data analysis ⮚ Creates a powerful N-dimensional array ⮚ Forms the base of other libraries, such as SciPy and scikit-learn ⮚ Replacement of MATLAB when used with SciPy and matplotlib 88 Lesson 1: Python Packages ▪ SciPy (Scientific Python) o Another free and open-source Python library extensively used in data science for high- level computations. o SciPy has an active community of about 900 contributors. o Extends NumPy o It’s widely used for scientific and technical computations because it extends NumPy and provides many user-friendly and efficient routines for scientific calculations. 89 Lesson 1: Python Packages ▪ SciPy (Scientific Python) o Features: ⮚ Collection of algorithms and functions built on the NumPy extension of Python ⮚ High-level commands for data manipulation and visualization ⮚ Multidimensional image processing with the SciPy.ndimage submodule ⮚ Includes built-in functions for solving differential equations o Applications: ⮚ Multidimensional image operations ⮚ Solving differential equations and the Fourier transform ⮚ Optimization algorithms ⮚ Linear algebra 90 Lesson 1: Python Packages ▪ Pandas (Python Data Analysis) o Is a must in the data science life cycle. o It is the most popular and widely used Python library for data science, along with NumPy in matplotlib. o With an active community of 1,700 contributors, it is heavily used for data analysis and cleaning. o Pandas provide fast, flexible data structures, such as data frame CDs, which are designed to work with structured data very quickly and intuitively. 91 Lesson 1: Python Packages ▪ Pandas (Python Data Analysis) o Features: ⮚ Eloquent syntax and rich functionalities that gives you the freedom to deal with missing data ⮚ Enables you to create your function and run it across a series of data ⮚ High-level abstraction ⮚ Contains high-level data structures and manipulation tools 92 Lesson 1: Python Packages ▪ Pandas (Python Data Analysis) o Applications: ⮚ General data wrangling and cleaning ⮚ ETL (extract, transform, load) jobs for data transformation and data storage, as it has excellent support for loading CSV files into its data frame format ⮚ Used in a variety of academic and commercial areas, including statistics, finance, and neuroscience ⮚ Time-series-specific functionality, such as date range generation, moving window, linear regression, and date shifting. 93 Lesson 1: Python Packages ▪ Scikit-learn o Machine learning library that provides almost all the machine learning algorithms you might need. Scikit-learn is designed to be interpolated into NumPy and SciPy. ▪ TensorFlow o High-performance numerical computations with a vibrant community of about 2,000 contributors. It’s used across various scientific fields. o Is a framework for defining and running computations that involve tensors o Features: ⮚ Better computational graph visualizations; Reduces error by 50% to 60% in neural machine learning; Parallel computing to execute complex models ⮚ Seamless library management backed by Google; Quicker updates and frequent new releases to provide you with the latest features 94 Lesson 1: Python Packages ▪ Libraries for Data Visualization ▪ MatPlotLib ▪ SeaBorn ▪ PlotLy ▪ PyDot 95 Lesson 1: Python Packages ▪ MatPlotLib : Powerful and beautiful visualizations ▪ Plotting library for Python ▪ Very vibrant community of about 1000 contributors. ▪ Extensively used for data visualization. ▪ Provides an object-oriented API, which can be used to embed those plots into applications. Image Source: Medium.com 96 Lesson 1: Python Packages ▪ MatPlotLib o Features: ⮚ Usable as a MATLAB replacement, with the advantage of being free and open-source ⮚ Supports dozens of backends and output types, which means you can use it regardless of which operating system you’re using or which output format you wish to use ⮚ Low memory consumption and better runtime behavior Image Source: DataFlair 97 Image Source: DataFlair Lesson 1: Python Packages ▪ MatPlotLib o Applications: ⮚ Correlation analysis of variables ⮚ Visualize 95 percent confidence intervals of the models ⮚ Data scientists are also leveraging the power of some other useful libraries such as PlotLy and PyDot along with MatPlotLib Image Source: Wikipedia 98 Lesson 1: Python Packages ▪ SeaBorn ▪ Python data visualization library based on matplotlib. ▪ Provides a high-level interface for drawing attractive and informative statistical graphics. 99 Lesson 1: Python Packages ▪ SeaBorn official website 100 Lesson 1: Python Packages ▪ PlotLy ▪ For interactive graphics ▪ High quality 101 Lesson 1: Python Packages ▪ PlotLy official website 102 Lesson 1: Python Packages ▪ PyDot o Create and manipulate graphs in Graphviz's dot language. ⮚ Graphviz is a data visualization software; open-source o To use pydot, we need to have Graphviz installed on the system. o We can install this using pip install 103 Lesson 1: Python Packages ▪ PyDot o Plot tree type plots https://tmilan0604.medium.com/ 104 Lesson 1: Python Packages ▪ PyDot o Cyclic and acyclic graphs https://stackoverflow.com/ 105