Python Programming for Business Analytics Lecture 6 PDF
Document Details
Uploaded by Deleted User
The University of Manchester, Alliance Manchester Business School
Manuel López-Ibáñez
Tags
Summary
This document is a lecture covering numerical analysis in Python using NumPy. It details the use of NumPy for vector and matrix operations. It also touches upon broadcasting and reshaping operations, and aggregates.
Full Transcript
BMAN737 01 Programming in for Business Python Analytics Week 3: Lecture 2 Numerical Analysis Prof. Manuel López-Ibáñez manuel.lopez-ibanez@manchester. ac.uk Office hours: Mon 4pm-5pm, Fri 9am-10am Contact Manuel Ló...
BMAN737 01 Programming in for Business Python Analytics Week 3: Lecture 2 Numerical Analysis Prof. Manuel López-Ibáñez manuel.lopez-ibanez@manchester. ac.uk Office hours: Mon 4pm-5pm, Fri 9am-10am Contact Manuel López-Ibáñez (or Lopez-Ibanez) Discussion Board in Blackboard Office hours (AMBS 3.050): Mon 4pm-5pm, Fri 9am- 10am https://calendly.com/manuel-lopez-ibanez Email: [email protected] – 22 711 emails received last year ! – Allow me 2-3 days to reply – BMAN7370 3 What is new? From basics to advanced analytics Complete Python code available in BB Labs: Less exercise-like, more case- study-like BMAN7370 9 Plan Week 3: Numerical Analysis (NumPy) Week 4, Lecture 1: Data Exploration and Visualization (Pandas + Matplotlib) Week 4, Lecture 2: Data preprocessing and preparation (Pandas + Scikit-learn) Week 5: Machine learning (Scikit-learn) BMAN7370 10 BMAN737 01 Programming in for Business Python Analytics Week 3: Lecture 2 Numerical Analysis Part 1: Vectors and matrices Part 2: Broadcasting Part 3: Aggregations and other useful operations Summing up 10M numbers a = list(range(10**7)) For- sum() over a list loop np.sum() s = 0 s = sum(a) x = np.array(a ) for i in a: s = np.sum(x) s += i Time: 0.1 Time: 0.05 Time: 0.0009 seconds seconds seconds BMAN7370 1 Good integration with Pandas and Matplotlib Scikit-learn (ML) is built on top of NumPy Fast computations Large multi-dimensional arrays (vectors and matrices) Reference docs: https://docs.scipy.org/doc/numpy/reference/index.html import numpy as # Just a np shortcut. BMAN7370 14 NumPy arrays import numpy as np Vector Matrice 𝑎Ԧ 0 1 0 1 s s 𝐴 = 2 2 = 3 4 In: a = In: A = 5 np.array([[0, 1, 2], np.array([0,1,2]) [3, 4, 5]] Out: array([0, 1, 2]) ) Out: 1, 2] In: a.shape array([[0, In: A.shape Out: (3,) # 1D (3 [3, 4, 5] ]) Out: (2,3) # 2D (rows, columns) columns) BMAN7370 15 NumPy arrays ≠ Lists In: a = In: a = np.arange(4) list(range(4)) Out: Out: array([0, 1, 2, [0, 1, 2, 3] 3]) In: a * In: a * Out 2 In:[0, a1, [0, 1, 2, 3, 2,0, 3, 1, 2,0, 3] 1, 2, 2 In: a array([0, 2, 4, 6]) + a 3] : + a In: Out: a.append(10 array([0 )[0, 1, a 2, 3, 10] Out , 2, 4, array([0, 1, 2, 3, 10]) Out: [0, 1, 2, : 6]) 3, 10] In: a = np.append(a, BMAN7370 10) a 16 Indexing: List vs NumPy 0 1 𝐴 2 = # A list of lists 3 4 A = [[0,1,2], 5 [3,4,5]] Return first row, first column A A Return first row A ?? Return first column ?? # A NumPy matrix A = np.array([[0,1,2], [3,4,5]]) Return first row, first column A[0, 0] Return first row A[0, :] BMAN7370 1 Slice Notation x[:] is the same as x[0:len(x):1] x[START:END:STEP] Start counting at START (default 0) Stop counting before END (default len or num. rows/columns), Increment by STEP (default 1) x[0:2] same as x[[0,1]] or x[range(2)] x[1:] same as x[range(1,len(x))] x[:5] same as x[range(5)] x[:] same as x[range(len(x))] and default END is BMAN73701 Slice Notation 0 1 2 Expression Shape 0 A[0:2, 1:3] (2,2) 1 A[:2, 1:] (2,2) 2 A (3,) A[2, :] (3,) A[2:, :] (1,3) A[:, :2] (3,2) A[:, [0,1]] (3,2) A[:2, 1] (2,) A[:2, 1:2] (2,1) BMAN7370 22 Quiz z x = np.array([ [3,2,1], [4,5,6] ]) 3 2 𝑋 1 = In: x[:, 1] 4 5 Out: array([2, 6 5]) In: x[-1, :] Out: array([4, 5, 6]) In: x[:2, 1:] Out: array([[2, 1], [5, 6]]) In: x[::-1, [2,1,0]] Out: array([[6, 5, 4], [1, 2, 3]]) BMAN7370 23 Boolean Indexing # A NumPy vector x = np.array([0,3,1,4,2,5]) x > 2 [False,True,False,True,False,True] x[x > 2] [3,4,5] # A list x = [0,3,1,4,2,5] error x[x > 2] Error !!! # A NumPy Matrix A = np.array([[0,3,1], [4,2,5]]) [3,4,5] A[A > 2] [3,4,5] BMAN7370 24 𝐴 0 1 2 Copies vs Views = A Creates a view 3 4 B 5 B = = A.copy() Creates a copy B[0, 0] = 10 B[0, 0] = 20 A[0, 0] value ? A[0, 0] value ? Slices are views not copies! Same as B = A A[0, row0 0] = 1 :] = A[0, returns first row row0 = 5 change first element of first row A[0, 0] slice_copy = A[0, value ? :].copy() # Creates a copy! BMAN7370 26 Example: Total gains and losses Given a matrix 𝑋 where 𝑋𝑡𝑗 is the net profit of department 𝑗 in time period 𝑡, calculate: 𝑔𝑎𝑖𝑛𝑠 = 𝑡 σ 𝑇 if 𝑋𝑡𝑗 > 𝑙𝑜𝑠𝑠𝑒𝑠 =𝑗 σ 𝑇 0 σ𝐷 𝑋 σ 𝑋𝑡𝑗 𝐷 𝑡𝑗 𝑡 if 𝑋𝑡𝑗 < 0 𝑗 for-loop No for t in loops! range(X.shape): for j in range(X.shape): gains = np.sum(X[X > if X[t,j] > 0: 0]) losses= gains += np.sum(X[X < 0]) X[t,j] Time: 0.004 else: seconds losses += X[t,j] BMAN7370 27 Element-wise operators (Most) operations are element-wise (+ - / * **) A * 2 # multiply each element of A by 2 A ** 2 # square each element of A by 2 A * B # Element-wise (not matrix product) np.dot(A, B) # Matrix Broadcastin A × A + x # Matrix (2,3) + Vector g! 3 2 product: (3,) B + 0 1 2 + 1 3 2 1 0 1 2 4 5 =? 4 5 6 0 1 2 BMAN7370 28 Recap NumPy arrays represent vectors and matrices ≠ Lists ! Indexing Numpy arrays: – Slicing similar to lists – More powerful than lists – Boolean indexing Element-wise mathematical operations Mathematical operations that apply to many elements of a Numpy array are much faster than indexing with for- BMAN7370 loops 29 BMAN737 01 Programming in for Business Python Analytics Week 3: Lecture 2 Numerical Analysis Part 1: Vectors and matrices Part 2: Broadcasting Part 3: Aggregations and other useful operations Broadcasting http://www.scipy-lectures.org/intro/numpy/operations.htm l#broadcasting Creative Commons Attribution 4.0 International License (CC-by) http://creativecommons.org/licenses/b BMAN7370 3 Broadcasting x = np.array([0,10,20,30]) y = np.array([0,1,2]) In: x + y Out: ValueError: operands could not be broadcast together with shapes (4,) (3,) BMAN7370 32 Broadcasting 0 10 20 30 ? x = np.array([0,10,20,30]) y = np.array([0,1,2]) In: x + y Out: ValueError: operands could not be broadcast together with shapes (4,) (3,) BMAN7370 33 Broadcasting x = np.array([0,10,20,30]) y = np.array([0,1,2]) In: x + y Out: ValueError: operands could not be broadcast together with shapes (4,) (3,) In: x.reshape((4,1)) + y Out: array([[ 0, 1, 2], [10, 11, 12], [20, 21, 22], [30, 31, 32]]) BMAN7370 34 Reshaping x = np.array([0,10,20,30]) In: x Out: array([ 0, 10, 20, 30]) 0 10 20 30 In: x.reshape((4,1)) Out: array([[ 0], , , ]) In: x.reshape((2,2)) Out: array([[ 0, 10], 0 10 [20, 30]]) 20 30 BMAN7370 3 Example: Total of outer product Given two vectors 𝑥, 𝑦 calculate the 𝑥 = 1 10 20 30 𝑦 sum of every pairwise product of 2 3 4 5 𝑡𝑜𝑡𝑎𝑙 = 𝑖 =1σ 𝑛 𝑥𝑖 ⋅ 𝑦𝑗 = σ 𝑥 = their elements: 𝑖 σ 𝑛 𝑗 =1 × 𝑦𝑇 𝑗 for-loops No loops! n = total = np.sum( x.reshape((n,1)) * x.shape y) total = 0 1 1 11 2 3 4 5 1 5 for i in 10 10 10 10 * 2 3 4 5 range(n): 20 20 20 20 2 3 4 5 for j in 30 30 30 30 2 3 4 5 range(n): total BMAN7370 3 Broadcasting and reshape In: X = np.array([[1,2,3],[4,5,6]]) a = np.array([0,1,0]) print(X * a) Out: array([[0 2 0], [0 5 0]]) In: b = np.array([0,1]) print(X * b) Out: Error: operands could not be broadcast together with shapes (2,3) print(X (2,) * b) Out: array([[ 0 0], 0 In: b = [4 5 6]] BMAN7370 40 Recap Broadcasting allows mathematical operations between NumPy arrays of different shapes If shapes cannot be broadcast Error! Mathematical operations using broadcasting are: than for- Faster – Shorter to to execute loops write BMAN7370 41 BMAN737 01 Programming in for Business Python Analytics Week 3: Lecture 2 Numerical Analysis Part 1: Vectors and matrices Part 2: Broadcasting Part 3: Aggregations and Aggregations (Reductions) Some operations aggregate: sum, min, max, mean, all, any,... 0 1 2 axis = 1 3 4 5 axis = 0 np.min(A) returns 1 number np.min(A, axis = 0) min along rows, returns 1 number per column np.min(A, axis = 1) min along columns, returns 1 number per row BMAN7370 43 Maximum Mean Squared Error 𝑃11 ⋯ 𝑃1𝑛 P ⋮ ⋱ ⋮ each row gives 𝑛 predictions by k ML = 𝑃𝑘1 ⋯ 𝑃𝑘𝑛 methods obs = 𝑜𝑏𝑠1 ⋯ 𝑜𝑏𝑠𝑛 the actual observed values Calculate the maximum Mean Squared Error (MSE) 1 𝑛 given as: 𝑚𝑎𝑥𝑀𝑆𝐸 = 𝑃𝑘𝑖 − 2 max 𝑜𝑏𝑠𝑖 𝑘=1,…𝑛 𝑖=1 maxMSE = np.max(np.mean((P-obs)**2, axis = 1)) BMAN7370 44 Functions vs. Methods Most functions in NumPy have an equivalent method np.min(A) A.min() np.min(A, axis = 0) A.min(axis = 0) Methods can only be applied to NumPy arrays! np.min([1,2,3]) OK [1,2,3].min() Error Some methods modify the array in- place! BMAN7370 46 Boolean arrays: Any vs. All b = np.array([1, 1, 0, 0]) # 1 is True, 0 is False np.logical_not(b) np.logical_and(b, b) np.logical_or(b, b) np.all(b # all ) True? # np.any(b axis = any True? ) 1 1 0 B = np.array([[1,0],[0,1]]) axis = 0 1 0 np.all(B, axis = 0) all True along rows ? returns 1 value per column np.any(B, axis = 1) any True along columns ? returns BMAN7370 1 value per row 47 −1 2 𝐴 3 = −4 −5 6 np.any(A Are there negative values? < 0) Are all negative values? np.all(A < 0) Which columns have only negative values? np.all(A < 0, axis = 0) Which rows have at least BMAN7370 one negative 48 Sorting Direct np.sort(array, axis=) sorting sort ascending np.sort(a) -np.sort(-a) sort descending np.sort(A, axis=0) sort each column (along rows) x = np.array([11,12,10,9]) In: np.sort(x) Out: [9,10,11,12] In: -np.sort(-x) Out: [12,11,10,9] BMAN7370 50 Indirect Sorting Direct sorting (ascending) np.sort(array, axis=) np.argsort(array, Indirect sorting axis=) np.argmin(array, axis=) x = np.argmax(array, np.array([12,11,10,9]) axis=) In: In: np.sort(x) np.max(x) Out: [9,10,11,12] In: Out: 12 np.argsort(x) In: Out: [3,2,1,0] In: np.argmax(x) x[np.argsort(x)] Out: 0 BMAN7370 51 Minimisation by indirect sorting Find the x value that produces the minimum of cos(x) between [0, 6] with a precision of 0.0000001 x = np.arange(0 6, 0.0000001 , ) y = np.cos(x) i = np.argmin(y ) print(x[i]) Out: 3.1415926999999 BMAN7370 52 numpy.random Sub-module of NumPy with lots of functions for random number generation, random distributions, etc. Different from built-in module (import random). Better not mix the two to avoid confusion! Basic functions: np.random.permutation(array) np.random.rand(N) np.random.randn(N) np.random.seed(N) BMAN7370 5 Library for numerical analysis and scientific computations scipy.optimize (BMAN60101) Multi-variate numerical optimization, linear programming, non-linear optimization, differential evolution scipy.linalg Faster (than np.linalg) linear algebra, matrix inversion determinant, norms, decompositions, Eigen-vectors, etc. scipy.stats (BMAN71791) Statistical distributions, tests, trimmed mean, geometric mean, interquartile BMAN7370 range, etc. 56 Recap Aggregations (reductions): sum, min, max, mean, all, any,... Along axis=0 or axis=1 np.sort() np.argsort() indirect sorting np.argmin() np.argmax() Scipy: lots of useful mathematical and statistical functions using Numpy arrays BMAN7370 57 Going further NumPy User Guide: https://docs.scipy.org/doc/numpy/user/inde x.html Advanced numerical tutorial: http://www.scipy-lectures.org BMAN7370 58 Next week Python library for data manipulation and analysis Advanced customisation requires using Matplotlib functions Complex plots require Matplotlib concepts BMAN7370 59