9833319001%2FNotes%2FUnit%20II.pdf
Document Details
Uploaded by MagnanimousBinary
Tags
Full Transcript
Data Analytics with Python Course Code: 24UBDS301 Compiled By: Beena Kapadia Syllabus Unit Details Cognitiv e Levels II Data Manipulation with Pandas: Introduction to Pandas,...
Data Analytics with Python Course Code: 24UBDS301 Compiled By: Beena Kapadia Syllabus Unit Details Cognitiv e Levels II Data Manipulation with Pandas: Introduction to Pandas, K1, K2, Series and DataFrame, data indexing, selection, and K3, K4 filtering. Advanced Data Manipulation: Handling missing data, data transformation, aggregation, and grouping. Data Analysis with NumPy: Introduction to NumPy, array operations, broadcasting, indexing, and slicing. Data Manipulation with Pandas: Introduction to Pandas Pandas is a powerful and flexible open-source data manipulation and analysis library for Python. It provides data structures and functions needed to manipulate structured data without doing much efforts. The core data structures in Pandas are the Series and DataFrame. Key Features of Pandas: 1.Data Alignment: Automatically aligns data for arithmetic operations. 2.Integrated Handling of Missing Data: Pandas handles missing data gracefully. 3.Label-Based Slicing, Indexing, and Subsetting: Allows intuitive data manipulation. 4.Merge and Join: Powerful tools for combining data from different sources. 5.Group By: Perform split-apply-combine operations on data sets. 6.Reshaping and Pivoting: Reshape and pivot data frames for different views. 7.Data Cleaning: Provides utilities to clean and preprocess data. 8.Time Series Functionality: Provides comprehensive methods for working with time series data. Series A Series is a one-dimensional array-like object containing a sequence of values (similar to a column in a table) and an associated array of data labels, called its index. A Pandas Series is like a column in a table. For example, creating a series of any 3 numbers. import pandas as pd output a = [1, 7, 2] myvar = pd.Series(a) print (myvar) If nothing else is specified, the values are labeled with their index number as – the first value has index 0, second value has index 1 etc. This label can be used to access a specified value. import pandas as pd When you have created labels, you can access an item by referring to the label. a = [1, 7, 2] label = ["x", "y", "z"] print(myvar["y"]) myvar = pd.Series(a, index = label) output print(myvar) output Create a simple Pandas Series from a dictionary. (The keys of the dictionary become the labels.) import pandas as pd calories = {"day1": 420, "day2": 380, "day3": 390} myvar = pd.Series(calories) print(myvar) output To select only some of the items in the dictionary, use the index argument and specify only the items you want to include in the Series. DataFrame A Pandas DataFrame is a 2-dimensional data structure, like a 2- dimensional array, or a table with rows and columns. Example Create a simple Pandas DataFrame: import pandas as pd output data = { "calories": [420, 380, 390], "duration": [50, 40, 45] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df) Locate Row As you can see from the result above, the DataFrame is like a table with rows and columns. Pandas use the loc attribute to return one or more specified row(s) #refer to the row index: output print(df.loc) import pandas as pd Example data = { "calories": [420, 380, 390], Return row 0 and 1: "duration": [50, 40, 45] #use a list of indexes: } print(df.loc[[0, 1]]) df = pd.DataFrame(data, index=["day1", "day2", "day3"]) output print(df) output Locate Named Indexes Use the named index in the loc attribute to return the specified row(s). Load Files Into a DataFrame output If your data sets are stored in a file, Pandas can load them into a DataFrame. import pandas as pd df = pd.read_csv('data.csv') print(df.to_string()) #object to string Another example import pandas as pd # Creating a DataFrame from a dictionary data = { 'Name': ['A', 'B', 'C'], 'Age': [25, 30, 35], 'Occupation': ['Engineer', 'Doctor', 'Artist'] } df = pd.DataFrame(data) print(df) Basic Operations on DataFrames # Display the first few rows of the DataFrame print(df.head()) # Display the last few rows of the DataFrame print(df.tail()) # Display the shape of the DataFrame print(df.shape) # Display information about the DataFrame print(df.info()) # Display summary statistics for numerical columns print(df.describe()) Accessing Data # Accessing a single column print(df['Name']) # Accessing multiple columns print(df[['Name', 'Age']]) # Accessing rows by index print(df.iloc) # First row print(df.iloc[-1]) # Last row print(df.iloc[0, 1]) # 0th row 1st col = 25 iloc is an attribute in pandas DataFrame used for integer-location-based indexing to select by the position of rows and columns. Handling NaN df.fillna(df['Calories'].mean(), inplace=True) import pandas as pd print(df.to_string()) df=pd.read_csv("data.csv") #inplace = True means making change in df print(df.to_string()) # itself and not creating a new one. Removing NaN rows import pandas as pd #removing the nan rows df=pd.read_csv("data.csv") df.dropna(subset=["Calories"], print(df.to_string()) inplace=True) print(df.to_string()) print(df.shape) print(df.shape) Removing duplicate rows #remove duplicates df.drop_duplicates() print(df.to_string()) print(df.shape) Reading a JSON file data.json { "Duration":{ "0":60, import pandas as pd "1":60, "2":60, df = "3":45, pd.read_json('data.json') "4":45, print(df.to_string()) "5":60, "6":60, "7":45, "8":30, "9":60, "10":60, "11":60, "12":60, "13":60, "14":60, "15":60, "16":60, "17":45, data indexing and selection The index property returns the index information of the DataFrame. The index information contains the labels of the rows. If the rows has NOT named indexes, the index property returns a RangeIndex object with the start, stop, and step values. import pandas as pd df = pd.read_csv('data.csv') print(df.index) RangeIndex(start=0, stop=169, step=1) 1. Indexing with.loc[] and.iloc[] LOC – label location # Select rows 'a' print(df.loc['a']) A 1 import pandas as pd B 5 data = {'A': [1, 2, 3, 4, 5], C 10 'B': [5, 6, 7, 8, 9], Name: a, dtype: int64 'C': [10, 11, 12, 13, 14]} df = pd.DataFrame(data, index=['a', 'b', # Select rows 'a' and 'c', 'd', 'e']) 'c' # Selecting data using.loc[] print(df.loc[['a', # print the dataframe 'c']]) print(df) #more than one row then 2 times [] A B C a 1 5 10 c 3 7 12 import pandas as pd ILOC – data = {'A': [1, 2, 3, 4, 5], Integer 'B': [5, 6, 7, 8, 9], 'C': [10, 11, 12, 13, 14]} Location df = pd.DataFrame(data, index=['a', 'b', 'c', 'd', 'e']) A 1 : # Selecting data using.iloc[] B C 5 10 position # Print first row Name: a, dtype: location print(df.iloc) int64 # Print first and third rows A B C print(df.iloc[[0, 2]]) #double [] a 1 5 10 c 3 7 12 # Print first three rows and first two A B columns a 1 5 print(df.iloc[:3, :2]) #single [] for slicing b 2 6 c 3 7 data = {'A': [1, 2, 3, 4, 5], 'B': [5, 6, 7, 8, 9], 'C': [10, 11, 12, 13, 14]} df = pd.DataFrame(data, index=['a', 'b', 'c', 'd', 'e']) #print last row # print first three rows print(df.iloc[-1]) # Last row print("\nFirst three rows:") A 5 print(df.iloc[0:3]) #single[] B 9 in slice C 14 Name: e, dtype: int64 First and third rows: A B C #print 1st row, 2nd column a 1 5 10 print(df.iloc[0, 1]) # single[] b 2 6 11 5 c 3 7 12 2. Indexing with Boolean Arrays / Conditional Selection data = {'A': [1, 2, 3, 4, 5], 'B': [5, 6, 7, 8, 9], 'C': [10, 11, 12, 13, 14]} df = pd.DataFrame(data, index=['a', 'b', 'c', 'd', 'e']) Example: # Boolean indexing # Select rows where column 'A' is greater than 2 print(df[df['A'] > 2]) #single[] in Boolean/conditional # Select rows where column 'A' is greater than 2 and column 'B' is less than 9 print(df[(df['A'] > 2) & (df['B'] < 9)]) 3. Indexing with.at[] and.iat[].at[] Access a single value for a row/column label pair. Example: # Accessing a single value using.at[] # Value at row 'a' and column 'A' print(df.at['a', 'A']) #single[] 1.iat[] Access a single value for a row/column position pair. Example: # Accessing a single value using.iat[] # Value at first row and first column print(df.iat[0, 0]) #single[] 1 4. Using.loc[] and.iloc[] with Slices ':' is used for slicing ranges of rows or columns, while ',' separates the rows from the columns in the selection criteria for slicing. Example: # Using slices with.loc[] # Select all rows and columns 'A' to 'B' print(df.loc[:, 'A':'B']) #'B' inclusive # Select all rows and columns 'A' and 'C' print(df.loc[:, ['A','C']]) #'C' inclusive # Using slices with.iloc[] # Select first three rows and all columns print(df.iloc[:3, :]) #3 exclusive 5. Selecting Columns and Rows Directly Selecting Columns By label: df['column_name'] By attribute: df.column_name Selecting Rows Example: By label Using.loc[] or.iloc[]. # Selecting a single column, 'A' Example: print(df['A']) # Selecting a single row # Selecting multiple columns 'A' and 'C' print(df.loc['b’]) #single[] print(df[['A', 'C']]) # Selecting multiple rows # By attribute print(df.loc[['b', 'd’]]) #multiple[] print(df.A) print(df.A, df.C) 6. Setting Values Example: # Setting a value using.loc[] df.loc['a', 'A'] = 100 print(df) # Setting a value using.iloc[] df.iloc[0, 0] = 101 print(df) Summary.loc[]: Label-based indexing for rows and columns. 'from' and 'to' in slice inclusive..iloc[]: Position-based indexing for rows and columns. 'from' in slice inclusive but 'to' in slice exclusive..at[]: Access a single value by label..iat[]: Access a single value by position. Boolean indexing: Select data based on conditions. Direct selection: Use column labels or attributes to select columns. filtering The filter() method filters the DataFrame, and returns only the rows or columns that are specified in the filter. data = {'A': [1, 2, 3, 4, 5], 'B': [5, 6, 7, 8, 9], 'C': [10, 11, 12, 13, 14]} df = pd.DataFrame(data, index=['a', 'b', 'c', 'd', 'e’]) newdf = df.filter(items=["A", "C"]) Advanced Data Manipulation: Handling missing data How is a Missing Value Represented in a Dataset? Missing values in a dataset can be represented in various ways, depending on the source of the data and the conventions used. Here are some common representations: NaN (Not a Number): In many programming languages and data analysis tools, missing values are represented as NaN. This is the default for libraries like Pandas in Python. NULL or None: In databases and some programming languages, missing values are often represented as NULL or None. For instance, in SQL databases, a missing value is typically recorded as NULL. Empty Strings: Sometimes, missing values are denoted by empty strings (""). This is common in text-based data or CSV files where a field might be left blank. Special Indicators: Datasets might use specific indicators like -999, 9999, or other unlikely values to signify missing data. This is often seen in older datasets or specific industries where such conventions were established. Why is Data Missing From the Dataset? There can be multiple reasons why certain values are missing from the data. Some of the reasons are listed below: Past data might get corrupted due to improper maintenance. Observations are not recorded for certain fields due to some reasons. There might be a failure in recording the values due to human error. The user has not provided the values intentionally The participant refused to respond. A. Handling Missing Values 1. Removal of Missing Values: Row Removal: If a few rows in a dataset have missing values and those rows represent a small fraction of the total data, it might be acceptable to remove them. This approach is often used when the dataset is large enough that the loss of a few records will not significantly affect the analysis. Example: In a dataset of 10,000 rows, if 50 rows have missing values in one column, those 50 rows can be removed. Removing NaN rows import pandas as pd import numpy as np data = { 'A': [1.5, 2.0, np.nan, 4.1, 5.1], 'B': [np.nan, 2.3, 3.8, 4.5, 5.6], # remove rows with missing values 'C': [1.7, 2.3, 3.1, np.nan, 5.2], df.dropna(inplace=True) 'D': [1.7, 2.4, 3.9, 4.1, 5.7] print(df) } df = pd.DataFrame(data) print(df) Column Removal: If an entire column has a significant number of missing values and the information it contains is not crucial, the entire column might be removed. This is more common in cases where the missing values represent a high percentage of the total data. Example: In a dataset with 25 columns, if one column has 90% missing values, it might be dropped. Removing NaN cols import pandas as pd import numpy as np data = { 'A': [1, 2, np.nan, 4], # Drop columns where all values are NaN df_cleaned = df.dropna(axis=1, how='all’, 'B': [np.nan, np.nan, np.nan, np.nan], inplace=False) 'C': [5, 6, 7, 8] } #how=‘any’ for specific print("\nDataFrame after removing columns df = pd.DataFrame(data) with all NaN:") print("Original DataFrame:") print(df) print(df_cleaned) 2. Imputation of Missing Values: Mean Imputation: Replace missing values with the mean (average) of the available data. This method is best suited for continuous numerical data that follows a normal distribution. Example: If the ages in a dataset are missing in some entries, the missing values can be replaced with the mean age. Replace Missing Values With mean import pandas as pd import numpy as np data = { 'A': [1.5, 2.0, np.nan, 4.1, 5.1], df.fillna(df['A'].mean(), 'B': [np.nan, 2.3, 3.8, 4.5, 5.6], inplace=True) 'C': [1.7, 2.3, 3.1, np.nan, 5.2], print(df) 'D': [1.7, 2.4, 3.9, 4.1, 5.7] } df = pd.DataFrame(data) print(df) Median Imputation: Replace missing values with the median (middle value) of the available data. This method is robust and less sensitive to outliers, making it suitable for skewed distributions. Example: If the incomes in a dataset are missing in some entries, the missing values can be replaced with the median income, especially if the income distribution is skewed. Replace Missing Values With Median import pandas as pd import numpy as np data = { 'A': [1.5, 2.0, np.nan, 4.1, 5.1], df.fillna(df['A'].median(), inplace=True) print(df) 'B': [np.nan, 2.3, 3.8, 4.5, 5.6], 'C': [1.7, 2.3, 3.1, np.nan, 5.2], 'D': [1.7, 2.4, 3.9, 4.1, 5.7] } df = pd.DataFrame(data) print(df) Mode Imputation: Replace missing values with the mode (most frequent value) of the available data. This method is often used for categorical data. Example: If the 'City' column in a dataset is missing some entries, the missing values can be replaced with the most frequently occurring city. Replace Missing Values With mode import numpy as np import pandas as pd data = {'Id': [1, 2, 3, 4, 5, 6, 7, 8,9,10], 'Gender': ['M', 'M', 'F', np.nan, np.nan, 'F', 'M', 'F’, 'M', 'M']} # convert to data frame df = pd.DataFrame(data) print(df) df.fillna(df['Gender'].value_counts().index, inplace=True) Replace Missing Values With zero import pandas as pd import numpy as np data = { df.fillna(value=0, inplace=True) 'A': [1.5, 2.0, np.nan, 4.1, 5.1], 'B': [np.nan, 2.3, 3.8, 4.5, 5.6], 'C': [1.7, 2.3, 3.1, np.nan, 5.2], 'D': [1.7, 2.4, 3.9, 4.1, 5.7] } df = pd.DataFrame(data) print(df) data transformation The transform() method allows you to execute a function for each value of the DataFrame. If the given values are not suitable for a particular function, then transform() method transforms that given value in the required format to make it suitable for that function. E.g. converting categorical data to numeric data: To make predictive models in machine learning, we have to convert categorical data into numeric form. To fit certain model, sometimes we require to convert 1D array to 2D array. #replacing categorical data to int import numpy as np import pandas as pd data = {'Id': [1, 2, 3, 4, 5, 6, 7, 8,9,10], 'Gender': ['M', 'M', 'F', 'M', 'M','F', 'M', 'F','M', 'M']} df = pd.DataFrame(data) print(df) # replacing values df['Gender'].replace(['M', 'F'], [0, 1], inplace=True) print(df) #converting one dimensional row to 2D row import numpy as np # 1-D array having elements [1 2 3 4 5 6 7 8] arr = np.array([1, 2, 3, 4, 5, 6, 7, 8]) print ('Before converting to 2D array:') print(arr) # Now we can convert this 1-D array into 2-D in two ways # 1. having dimension 4 x 2 arr1 = arr.reshape(4, 2) print ('After reshaping having dimension 4x2:') print (arr1) print ('\n') #converting one dimensional row to 2D row import numpy as np # 1-D array having elements [1 2 3 4 5 6 7 8] arr = np.array([1, 2, 3, 4, 5, 6, 7, 8]) print ('Before converting to 2D array:') print(arr) print(arr.shape) # Now we can convert this 1-D array into 2-D in two ways # 1. having dimension -1,1 arr1 = arr.reshape(-1, 1) print ('After reshaping:') print (arr1) print(arr1.shape) print ('\n') Aggregation can be used to get a summary of columns in our dataset. Some functions used in the aggregation are: Function Description: sum() :Compute sum of column values min() :Compute min of column values max() :Compute max of column values aggregati mean() :Compute mean of column size() :Compute column sizes on describe() :Generates descriptive statistics first() :Compute first of group values last() :Compute last of group values count() :Compute count of column values std() :Standard deviation of column var() :Compute variance of column sem() :Standard error of the mean of column import pandas as pd data = {'height-Inch': [63, 67, 62,71,68], 'weight-KG': [64,81,49,90,60], 'Gender': ['M', 'M', 'F', 'M', 'F’]} df = pd.DataFrame(data) print(df) df.describe() df.agg(['min', 'max']) grouping Grouping is used to group data using some criteria from our dataset. It is used as split-apply-combine strategy. Splitting the data into groups based on some criteria. Applying a function to each group independently. Combining the results into a data structure. # Group by 'Gender' and get the first value of height in each group first_values = df.groupby('Gender')['height-Inch'].first() print("\nFirst value of height in each group:") print(first_values) # Group by 'Gender' and get the last value of height in each group last_values = df.groupby('Gender')['height-Inch'].last() print("\nLast value of height in each group:") print(last_values) #Standard error of the mean of column # Group by 'Gender' and get the standard error of the mean column of # weight in each group serr_mean = df.groupby('Gender')['weight-KG'].sem() print("\nstandard error of the mean column of weight-KG:") print(serr_mean) Data Analysis with NumPy: Introduction to NumPy NumPy is a Python library. NumPy is used for working with arrays. NumPy is short for "Numerical Python". If you have Python and PIP already installed on a system, then installation of NumPy is very easy. Install it using this command: C:\Users\Your Name>pip install numpy Once NumPy is installed, import it in your applications by adding the import keyword: import numpy In NumPy arrays, it is possible to have all elements of different data types. Example import numpy arr = numpy.array([1, 2, 3, 4, 5]) print(arr) print(type(arr)) #object import numpy arr = numpy.array(["a", "b", "c", "d", "e"]) print(arr) print(type(arr)) import numpy as np arr = np.array([1, 2, 'three', 4]) print("Array:") print(arr) print("Data type of the array elements:", type(arr)) # Create arrays with different data types int_array = np.array([1, 2, 3], dtype=np.int64) float_array = np.array([1.0, 2.0, 3.0], dtype=np.float64) complex_array = np.array([1+2j, 3+4j], dtype=np.complex64) bool_array = np.array([True, False, True], dtype=np.bool_) string_array = np.array(['a', 'b', 'c'], dtype=np.string_) unicode_array = np.array(['a', 'b', 'c'], dtype=np.unicode_) datetime_array = np.array(['2023-01-01', '2024-01-01'], dtype=np.datetime64) # Check the type of each array print(type(int_array)) # Output: print(type(float_array)) # Output: print(type(complex_array)) # Output: print(type(bool_array)) # Output: print(type(string_array)) # Output: print(type(unicode_array)) # Output: print(type(datetime_array)) # Output: NumPy arrays are always mutable, regardless of whether they were created from lists or tuples. The primary differences between lists and tuples become irrelevant once the data is in a NumPy array. import numpy as np arr = np.array([1, 2, 3, 4, 5]) arr1=np.insert(arr,2,10) print(arr1) import numpy as np arr = np.array((1, 2, 3, 4, 5)) arr1=np.insert(arr,2,10) print(arr1) 0-D Arrays A 0D (zero-dimensional) array, also known as a scalar, is an array that contains a single value with no dimensions. In NumPy, a 0D array is essentially a scalar value encapsulated in an array structure. import numpy as np a = np.array(23) 1-D Arrays A 1D (one-dimensional) array is a sequence of elements that are arranged in a single row or column. Example: Create a 1-D array containing the values 1,2,3,4,5: import numpy as np arr = np.array([1, 2, 3, 4, 5]) print(arr) 2-D Arrays import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) print(arr) import numpy as np a = np.array(55) b = np.array([1, 2, 3, 4, 5]) c = np.array([[1, 2, 3], [4, 5, 6]]) print(a.ndim) #0 print(b.ndim) #1 print(c.ndim) #2 Higher Dimension array import numpy as np arr = np.array([1, 2, 3, 4], ndmin=5) print(arr) print('number of dimensions :', arr.ndim) Array indexing import numpy as np arr = np.array([1, 2, 3, 4]) print(arr) #1 import numpy as np arr = np.array([1, 2, 3, 4]) print(arr + arr) #7 import numpy as np arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) print('2nd element on 1st row: ', arr[0, 1]) #2 import numpy as np arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) print(arr) print(arr[0, 1, 2]) import numpy as np arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) print('Last element from 2nd dim: ', arr[1, -1]) Array slicing import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, 7]) print(arr[1:5]) import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, 7]) print(arr[4:]) Reshaping means changing the shape of an array. The shape of an array is the number of elements in each dimension. import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) print(arr.shape) print(arr.ndim) newarr = arr.reshape(4, 3) print(newarr) print(newarr.ndim) array operations Basic Operations Slicing and Indexing Mathematical Functions Boolean Operations Aggregations Basic Operations import numpy as np array_from_list = np.array([10, 20, 30, 40, 50]) print("array_from_list: ",array_from_list) zeros_array = np.zeros((3, 3)) print("zeros_array (3,3): ") print(zeros_array) ones_array = np.ones((2, 2)) print("ones_array: ") print(ones_array) range_array = np.arange(10) print("range_array: ",range_array) range_array = np.arange(start=20, stop=30, step=2) print(range_array) # Create an array with evenly spaced values - linear space linspace_array = np.linspace(10, 20, 5) print("linspace_array: ",linspace_array) a = np.array([10, 20, 30]) b = np.array([40, 50, 60]) # Element-wise addition sum_array = a + b print("sum_array", sum_array) #sum_array [50 70 90] # Element-wise subtraction diff_array = a - b print("diff_array",diff_array) #diff_array [-30 -30 -30] # Element-wise multiplication product_array = a * b print('product_array', product_array) #product_array [ 400 1000 1800] a = np.array([10, 20, 30]) b = np.array([40, 50, 60]) # Element-wise division quotient_array = a / b print("quotient_array", quotient_array)#quotient_array [0.25 0.4 0.5 ] # Scalar addition, subtraction, multiplication, and division scalar_add = a + 2 print("scalar_add: ",scalar_add)#scalar_add: [12 22 32] scalar_sub = a - 2 print("scalar_sub: ",scalar_sub)#scalar_sub: [ 8 18 28] scalar_mul = a * 2 print("scalar_mul: ",scalar_mul)#scalar_mul: [20 40 60] scalar_div = a / 2 print("scalar_div: ",scalar_div)#scalar_div: [ 5. 10. 15.] Slicing and Indexing import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, 7]) print(arr[1:5]) print(arr[1:5]) #[2 3 4 5] print(arr[4:]) print(arr[4:]) #[5 6 7] print(arr[:4]) print(arr[:4]) #[1 2 3 4] print(arr[-3:-1]) print(arr[-3:-1]) #[5 6] print(arr[1:5:2]) print(arr[1:5:2]) #[2 4] print(arr[::2]) print(arr[::2]) #[1 3 5 7] arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) print(arr[1, 1:4]) print(arr[1, 1:4]) #[7 8 9] print(arr[0:2, 2]) print(arr[0:2, 2]) #[3 8] print(arr[0:2, 1:4]) print(arr[0:2, 1:4]) #[[2 3 4] [7 8 9]] Mathematical Functions import numpy as np d = np.array([1, 2, 3, 4, 5]) total_sum = np.sum(d) #total_sum 15 print("total_sum",total_sum) mean_value = np.mean(d) print("mean_value",mean_value) #mean_value 3.0 std_dev = np.std(d) print("std_dev",std_dev) #std_dev 1.4142135623730951 min_value = np.min(d) print("min_value",min_value) #min_value 1 max_value = np.max(d) print("max_value",max_value) #max_value 5 sqrt_array = np.sqrt(d) print("sqrt_array",sqrt_array) #sqrt_array [1. 1.41421356 1.73205081 2. 2.23606798] Boolean Operations import numpy as np arr = np.array([1, 2, 3, 4, 5]) # Extract elements based on condition filtered_array = arr[arr>3] print("filtered_array",filtered_array) #filtered_array [4 5] # Set elements based on condition arr[arr > 3] = 10 print("arr",arr) #arr [ 1 2 3 10 10] Note: Pandas axis=0: This refers to the rows. Operations are performed across the columns. This is the default axis in many operations. axis=1: This refers to the columns. Operations are performed across the columns. NumPy axis=0: This refers to the columns. Operations are performed down the columns. axis=1: This refers to the rows. Operations are performed across the rows. By default, it will consider both (rows and columns) Aggregations import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) row_sum = np.sum(arr, axis=1) print("row_sum",row_sum) #row_sum [ 6 15 24] # Sum along columns col_sum = np.sum(arr, axis=0) print("col_sum",col_sum) #col_sum [12 15 18] broadcasting Broadcasting is a powerful feature in NumPy that allows arrays of different shapes to be used together in arithmetic operations. 1: Adding a Scalar to an Array: When you add a scalar value to a NumPy array, the scalar is broadcasted to the shape of the array. 2: Adding Arrays of Different Shapes: You can add arrays of different shapes if they are compatible according to the broadcasting rule - The size of each dimension of the output shape is the maximum size of the corresponding dimensions of the input arrays. Example 1: Adding a Scalar to an Array: When you add a scalar value to a NumPy array, the scalar is broadcasted to the shape of the array. import numpy as np # Array of shape (3,) arr = np.array([1, 2, 3]) result = arr + 5 # Scalar addition of 5 takes the shape of the array (3,) print("Array:", arr) print("Scalar addition result:", result) Example 2: Adding Arrays of Different Shapes: You can add arrays of different shapes if they are compatible according to the broadcasting rule - The size of each dimension of the output shape is the maximum size of the corresponding dimensions of the input arrays. import numpy as np arr1 = np.array([, , ]) # (3,1) arr2 = np.array([10, 20, 30]) #(3,) will be transformed to (1,3) # Broadcasting addition result = arr1 + arr2 print("Array 1:") print(arr1) print("Array 2:") print(arr2) print("Broadcasting addition result:") print(result) When performing the addition, NumPy broadcasts (3,) to match the shape of (3,1). The 1D array (3,) is effectively transformed to a shape (1,3) and then repeated along the rows to match the shape (3,3) of the resulting array.