Summary

This document provides lecture notes on Numpy, covering various functions like arange, linspace, random, shuffle, where, diff, compress, and clip for data science programming.

Full Transcript

Programming for Data Science Lecture 8 Dr. Faten Khalifa Numpy-Part2 Numpy np.arange() np.linspace() import numpy as np print(np.arange(10)) print("-----------------") print(np.arange(10,100,5)) #start from 10, stopping be...

Programming for Data Science Lecture 8 Dr. Faten Khalifa Numpy-Part2 Numpy np.arange() np.linspace() import numpy as np print(np.arange(10)) print("-----------------") print(np.arange(10,100,5)) #start from 10, stopping before 100, increase by 5 print("----------------") print(np.arange(100,10,-5)) print("-----------------") print(np.linspace(1,100,10)) #start, stop, number of elements print("------------------") print(np.linspace(0,1,5)) Numpy np.random.random() np.random.shuffle() np.random.randint() np.unique() print(np.random.random(10)) print("----------------") x=np.arange(1,10) print(x) np.random.shuffle(x) print(x) print("---------------") y=np.random.randint(1,5,10) #generating 10 integer values between 1 to 4 print(y) print("-------------") print(np.unique(y)) Numpy np.where() np.diff() np.compress() np.clip() x=np.random.randint(1,6,30) print(x) print(np.where(x==3)) #return indices print("--------------") print(np.diff([2,3,8,9,5,1])) print(np.diff([1,1,1,1,1,1])) print("--------------") y=np.arange(1,20) print(y) print(np.compress(y%5==0,y)) # numpy.compress(condition, array, axis=None, out=None) print(np.compress([True,False,True,True,False],y)) # Extracts based on first 5 True/False values Numpy np.where() np.diff() np.compress() np.clip() y = np.arange(1, 19) y = y.reshape(3, 6) # Reshape into 3 rows and 6 columns print(y) print("--------------") # print(np.compress(y%5==0,y)) # ERROR (the condition must have the same dimension as the array) print(np.compress((y%5==0).flatten(),y.flatten())) print("--------------") condition = [True, False, True] print(np.compress(condition, array_2d, axis=0) ) print("--------------") z=np.random.randint(1,30,10) print(z) print(np.clip(z,7,15)) # limits the values of the array z to be within the range [7, 15] # Any value less than 7 is clipped to 7. # Any value greater than 15 is clipped to 15. # Values already between 7 and 15 remain unchanged. Numpy np.repeat() np.tile() import numpy as np x=np.repeat([0,1,2],3) # np.repeat(A, repeats, axis=None) y=np.tile([0,1,2],3) # np.tile(A, repeats) print(x) print(y) print(x.shape) print(y.shape) import numpy as np arr=np.repeat([[3,2]],5,axis=0) # axis=1 print(arr) print(arr.shape) print("--------------") arr2=np.tile([[1, 2], [3, 4]], (2, 3)) print(arr2) print(arr2.shape) Numpy Splitting is a reverse operation of Joining. a = np.array([ [4, 8], [6, 1] ]) b = np.array([ [3, 5], [7, 2] ]) print(a) print("--------------") print(b) print("--------------") print(np.hstack( (a,b) ) ) print("--------------") print(np.vstack( (a,b) ) ) print("------------------") print(np.concatenate((a,b) ,axis=1)) print("------------------") x=np.concatenate( (a,b) ,axis=None) print( x ) print("------------------") np.split(x,4) # Split into 4 equal parts # np.split(ary, indices_or_sections, axis=0) Numpy Sum of elements treating Not a Number (NaN) as zeros simple_array=np.array([1,2,3,4,np.nan,5,2,np.nan]) print(simple_array) print(np.isnan(simple_array)) # Do not use == with nan print(np.nansum(simple_array)) print(np.sum(simple_array)) print(np.nan_to_num(simple_array)) print(np.nanprod(simple_array)) print(np.nancumsum(simple_array)) print(np.nancumprod(simple_array)) print(np.nanmean(simple_array)) print(np.nanmax(simple_array)) print(np.nanmin(simple_array)) Numpy Compute the cumulative product one_d_array = np.array([1,2,3,4,5,6]) two_d_array = np.array([one_d_array, one_d_array + 6, one_d_array + 12]) print(two_d_array) np.cumproduct(two_d_array,axis=0) Compute the cumulative sum np.cumsum(two_d_array,axis=0) Numpy np.ndenumerate is a NumPy function that returns an iterator for traversing a multi-dimensional array, yielding both the index and the value of each element. one_d_array = np.array([1,2,3,4,5,6]) two_d_array = np.array([one_d_array, one_d_array + 10, one_d_array + 20]) print(two_d_array) for idx,i in np.ndenumerate(two_d_array): print(idx, i) Numpy Copy vs. View Function arr1= np.array([1,2,3,4,5]) arr1= np.array([1,2,3,4,5]) arr2=np.copy(arr1) arr2=arr1.view() #(OR) arr2=arr1.copy() arr1=6 arr1=6 print("arr1>> ", arr1) print("arr1>> ", arr1) print("arr2>> ", arr2) print("arr2>> ", arr2) print("--------------") print("--------------") arr1=66; arr2=55; arr1=66; arr2=55; print("arr1>> ", arr1) print("arr1>> ", arr1) print("arr2>> ", arr2) print("arr2>> ", arr2) Numpy Sort elements of ndarray first=np.array([3,5,7,2,9,4,1]) sorted_first=np.sort(first) print(sorted_first) first=np.array([[3,5,7,2],[9,4,1,6],[8,5,6,1]]) print(first) print("-----------------------") print(np.sort(first,axis=0)) # Sort columns print("-----------------------") print(np.sort(first,axis=1)) # Sort rows print("-----------------------") print(np.sort(first,axis=None)) # Sort flattened array Numpy temperatures = np.array([29.3, 42.1, 18.8, 16.1, 38.0, 12.5, 12.6, 49.9, 38.6, 31.3, 9.2, 22.2 ]).reshape(2, 2, 3) print(temperatures) print("---------------------") print(temperatures[1,:,:]) print("-----------------------") table = np.array([ [5, 3, 7, 1], [2, 6, 7 ,9], [1, 1, 1, 1], [4, 3, 2, 0],]) print(table) print(table.max()) print(table.max(axis=0)) # Max along columns print(table.max(axis=1)) # Max along rows Applications Curving Test Grades The scenario is this: You’re a teacher who has just graded your students on a recent test. Unfortunately, you may have made the test too challenging, and most students did worse than expected. To help everybody out, you’re going to curve everyone’s grades. import numpy as np CURVE_CENTER = 80 grades = np.array([72, 35, 64, 88, 51, 90, 74, 12]) print("mean of original grades = ", grades.mean()) def curve(grades): average = grades.mean() change = CURVE_CENTER - average new_grades = grades + change print(new_grades) return np.clip(new_grades,grades,100) new_grades=curve(grades) print("new grades =",new_grades) print("mean of new grades=", new_grades.mean()) Masking and Filtering We have 5 groups of students, each group is represented as one row in d-dimensional array. If we know that excellent is represented by grades >=85. Count how many students in each group who got excellent. grades=np.linspace(1,100,60,dtype=int).reshape(5,-1) # -1 automatic y print(grades) print("-----------------") mask=(grades>=85) print(mask) print("-----------------") print(np.sum(mask,axis=1)) print("-----------------") print(grades[mask]) linspace will generate floating-point numbers. You specify a dtype of int to force the function to round down and give you the space between 1 and 100 , the resulting numbers whole integers. array.reshape() can take -1 as one of its dimension sizes. That signifies that NumPy should just figure out how big that particular axis needs to be based on the size of the other axes. Masking and Filtering grades=np.linspace(1,100,60,dtype=int).reshape(5,-1) # -1 automatic y print(grades) print("-----------------") mask=(grades>=85) print(mask) print("-----------------") print(np.sum(mask,axis=1)) print("-----------------") print(grades[mask]) Structured arrays Originally, you learned that array items all have to be the same data type, but that wasn’t entirely correct. NumPy has a special kind of array, called a record array or structured array, with which you can specify a type and, optionally, a name on a per-column basis. This makes sorting and filtering even more powerful, and it can feel similar to working with data in Excel, CSVs, or relational databases. Structured arrays arr = np.array([[1, 2.4],[5, 7.8]], dtype=int) print(arr) print("-----------------") data = np.array([("joe", 32, 6),("mary", 15, 20),("felipe", 80, 100), ("AI_teams", 38, 9001)], dtype=[("name", str, 10), ("age", int), ("points", int)]) print(data) print(data["name"]) print(data[data["points"]>50]["name"]) print("-----------------") np.sort(data[data["age"] > 20],order="age")

Use Quizgecko on...
Browser
Browser