LBOBGDT Big Data Techniques and Technologies - Introduction to Programming with Python PDF

9/18/2024 LBOBGDT Big Data Techniques and Technologies Bobby Reyes Introduction to Programming with Python Scalar Objects (int, float, complex, bool, string, NoneType) Operators and Precedence of...

9/18/2024 LBOBGDT Big Data Techniques and Technologies Bobby Reyes Introduction to Programming with Python Scalar Objects (int, float, complex, bool, string, NoneType) Operators and Precedence of Operators Variables, Assignment statement print() statement input() statement 1 9/18/2024 What is Python? ◼ Python… ◼ …is a general purpose interpreted programming language. ◼ …is a language that supports multiple approaches to software design, principally structured and object-oriented programming. ◼ …provides automatic memory management and garbage collection ◼ …is extensible ◼ …is dynamically typed. Some History ◼ “Over six years ago, in December 1989, I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas… I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus).” –Python creator Guido Van Rossum, from the foreword to Programming Python (1st ed.) ◼ Goals: ◼ An easy and intuitive language just as powerful as major competitors ◼ Open source, so anyone can contribute to its development ◼ Code that is as understandable as plain English ◼ Suitability for everyday tasks, allowing for short development times 2 9/18/2024 Programming Basics ◼ code or source code: The sequence of instructions in a program. ◼ syntax: The set of legal structures and commands that can be used in a particular programming language. ◼ output: The messages printed to the user by a program. ◼ console: The text box onto which output is printed. ◼ Some source code editors pop up the console as an external window, and others contain their own console window. Python Programs ▪ a program is a sequence of definitions and commands ◦ definitions evaluated ◦ commands executed by Python interpreter in a shell ▪ commands (statements) instruct interpreter to do something ▪ can be typed directly in a shell or stored in a file that is read into the shell and evaluated 3 9/18/2024 Compiling and Interpreting ◼ Two basic approaches in executing a program in a prog. language: ◼ Compiler: Many languages require you to compile (translate) your program into a form that the machine understands. compiler execute source code byte code output Hello.java Hello.class ◼ Interpreter: Python is instead directly interpreted into machine instructions. interpreter source code output Hello.py Compiling vs. Interpreting Compiled Interpreted ▪ Faster programs ▪ Slower programs ▪ Compiled code runs directly on CPU ▪ Sometimes as fast as ▪ Can communicate directly with compiled, rarely faster hardware ▪ Faster development ▪ Longer development ▪ Easier debugging ▪ Edit / compile / test cycle is longer! ▪ Debugging can stop ▪ Harder to debug anywhere, swap in new code, more control over ▪ Usually requires a special compilation state of program ▪ (almost always) takes more code to get ▪ (almost always) takes less things done code to get things done ▪ More control over program behavior ▪ Less control over program behavior 4 9/18/2024 The Python Interpreter Install Python, Open a terminal (cmd window) and Run the python interpreter: Python is an interpreted language C:\> python Python 3.11.4 (default: … The interpreter provides Type “help”, “copyright”, … an interactive >>> ← python interpreter prompt environment to play with the language >>> 3 + 7 10 Outputs and Results of expressions are printed >>> 3 < 15 on the screen True >>> 'print me' 'print me' >>> print('print me’) print me >>> Hello, World! ◼ Install Anaconda3 and launch JupyterLab. In JupyterLab, open a Console and enter the following code: print("Hello World!") ▪ Then press Shift+Enter to execute. Output: Hello World! ◼ Open a Text File (editor) and enter the following code: print("Hello World!") Save file as hello.py. ◼ To execute, type %run hello.py in a console. ◼ Congratulations! You have written your first Python program! 5 9/18/2024 Hello World! ◼ In a Jupyter Notebook: print("Hello World!") ← in Code cell Hello World! ← Output ◼ To execute the Hello, World! (hello.py) python script: %run hello.py Hello World! Google Colab ◼ Can also work with a hosted Jupyter Notebook service, e.g. colab.google.com 6 9/18/2024 Objects OBJECT ▪ programs manipulate data objects S ▪ objects have a type that defines the kinds of things programs can do to them ◦ Ana is a human so she can walk, speak English, etc. ◦ Chewbacca is a wookie so he can walk, “mwaaarhrhh”, etc. ▪ objects are ◦ scalar (cannot be subdivided) ◦ non-scalar (have internal structure that can be accessed) Scalar Objects SCALAR ▪ int – represent integers, ex. 5 OBJECTS ▪ float – represent real numbers, ex. 3.27 ▪ complex – represent complex numbers, ex. 3.27 + 2j ▪ bool – represent Boolean values True and False ▪ str – represent String, a sequence of characters enclosed in “” or ‘’, ex. “This is it!” ▪ NoneType – special and has one value, None ▪ can use type()function to see the type of an object >>> type(5) int >>> type(3.0) float 7 9/18/2024 Numbers: Integers ◼ Integer – represents whole numbers, positive or negative, >>> 132224 without decimals, of 132224 unlimited length >>> 132323 ** 4 306578259430545516241 >>> Numbers: Floating Points ◼ Floating point numbers >>> 1.23232 represent real numbers 1.23232 with decimal points >>> print(1.23232) ◼ int(x) converts x to an 1.23232 integer >>> 1.3E7 13000000.0 ◼ float(x) converts x >>> int(2.0) to a floating point 2 ◼ The interpreter >>> float(2) shows a lot of digits 2.0 8 9/18/2024 Numbers: Complex ◼ Built into Python ◼ A complex number comprises a real part and an imaginary part ◼ Denoted as a + bj, where a is the real part, >>> x = 3 + 2j b is the imaginary part, and >>> y = -1j >>> x + y j is the imaginary unit (3+1j) ◼ Same operations are supported >>> x * y (2-3j) >>> z = complex(2,5) as integer and float >>> z ◼ Use complex() function to (2+5j) create a complex data type or convert / cast into complex data type String Literals ◼ think of as a sequence of case sensitive characters ◼ Can use single or double quotes, and three quotes for a multi-line string ◼ can compare strings with ==, >, < etc. >>> 'I am a string' 'I am a string' >>> "So am I!" 'So am I!’ >>> '''This is the first line. Here is the second line.''' 'This is the first line.\nHere is the second line.' 9 9/18/2024 Type Conversions (Cast) TYPE CONVERSIONS ▪(CAST) can convert object of one type to another ▪ float(3) converts integer 3 to float 3.0 ▪ int(3.9) truncates float 3.9 to integer 3 Printing to Console ▪ to show output from code to a user, use print function In : 3+2 Out: 5 PRINTING TO CONSOLE In : print(3+2) 5 10 9/18/2024 Expressions ▪ combine objects and operators to form expressions ▪ an expression has a value, which has a type ▪ syntax for a simple expression Operators on ints and floats ◼ Arithmetic operators we will use: + - * / addition, subtraction/negation, multiplication, division // % integer division (quotient), modulus (a.k.a. remainder) ** exponentiation ▪ i+j → the sum if both are ints, result is int ▪ i-j → the difference if either or both are floats, result is float ▪ i*j → the product ▪ i/j → division result is float ▪ i//j → integer division; the quotient when i is divided by j ▪ i%j → modulus; the remainder when i is divided by j ▪ i**j → exponentiation; i to the power of j 11 9/18/2024 A quick note on the increment operator shorthand ▪ Python has a common idiom that is not necessary, but which is used frequently and is therefore worth noting: x += 1 Is the same as: x=x+1 SIMPLE OPERATIONS ▪ This also works for other operators: x += y # adds y to the value of x x *= y # multiplies x by the value y x -= y # subtracts y from x x /= y # divides x by y Precedence of Operators ▪ precedence: Order in which operations are evaluated ▪ PEMDAS rule: ▪ Parentheses used to tell Python to do these operations first ** Exponentiation has a higher precedence * / // % Multiplication / Division Addition / Subtraction SIMPLE OPERATIONS + - ◦ If at same level, operations evaluated left to right ◦ Thus 1 + 3 * 4 is 13 ◦ Exercise: What is the value of the following expressions? 1 + 2 * 3 ** 4 10 / 4 // 3 10 // 4 / 3 12 9/18/2024 String Concatenation ◼ + is overloaded to do concatenation >>> x = 'hello' >>> x = x + ' there' >>> x 'hello there' Substrings and Methods square brackets used to perform indexing into a string to get the value at a certain index/position s = "abc" index: 0 1 2  indexing always starts at 0 index: -3 -2 -1  last element always at index -1 >>> s = '012345' >>> s len(String) – returns the length '3' (number of characters) in the String >>> s[1:4] '123' str(Object) – returns a String >>> s[2:] representation of the Object '2345' >>> x='string'; len(x) >>> s[:4] 6 '0123' >>> s[-2] >>> str(10.3) '4' '10.3' 13 9/18/2024 String Manipulation ▪ can slice strings using [start:stop:step] ▪ if give two numbers, [start:stop], step=1 by default ▪ you can also omit numbers and leave just colons s = "abcdefgh" s[3:6] → evaluates to "def", same as s[3:6:1] s[3:6:2] → evaluates to "df" s[::] → evaluates to "abcdefgh", same as s[0:len(s):1] s[::-1] → evaluates to "hgfedbca", same as s[-1:-(len(s)+1):-1] s[4:1:-2]→ evaluates to "ec" Math Commands ◼ Python has useful commands (or called functions) for performing calculations. Command name Description Constant Description abs(value) absolute value e 2.7182818... ceil(value) rounds up pi 3.1415926... cos(value) cosine, in radians floor(value) rounds down log(value) logarithm, base e Note: python is log10(value) logarithm, base 10 case-sensitive max(value1, value2) larger of two values abs is different from min(value1, value2) smaller of two values Abs, ABS round(value) nearest whole number sin(value) sine, in radians sqrt(value) square root ◼ To use many of these commands, you must write the following at the top of your Python program: from math import * 14 9/18/2024 Variables ◼ variable: A named piece of memory that can store a value. ◼ Usage: ◼ Compute an expression's result, ◼ store that result into a variable, ◼ and use that variable later in the program. In memory: ◼ Variable names: x ◼ Must begin with a letter (a - z, A - B) or underscore _ ◼ Other characters can be letters, numbers or _ ◼ Are case sensitive: capitalization counts! ◼ Can be any reasonable length ◼ By convention, it is common to have: ◼ Variable names that start with lower case letters, and ◼ Class names beginning with a capital letter but you can do whatever you want. ◼ Some keywords are reserved and cannot be used as variable names due to them serving an in-built Python function ◼ i.e. and, while, continue, break Abstracting Expressions ▪ why give names to values of expressions? ▪ to reuse names instead of values ▪ easier to change code later ABSTRACTING pi = 3.14159 EXPRESSIONS radius = 2.2 area = pi*(radius**2) ◼ Python is dynamically typed the type of the variable is derived from the value it is assigned. 15 9/18/2024 Variables ◼ assignment statement*: Stores a value into a variable. * binds name to value ◼ Syntax: name = value ◼ Examples: In memory: x = 5 x 5 gpa = 3.14 gpa 3.14 ◼ A variable that has been given a value can be used in expressions. x + 4 is 9 ▪ Assignment can be done en masse: x = y = z = 1 ▪ Multiple assignments can be done on one line: x, y, z = 1, 2.39, 'cat' ◼ Exercise: Evaluate the quadratic equation for a given a, b, and c. Programming vs Math ▪ in programming, you do not “solve for x” In memory: pi 3.14159 CHANGING BINDINGS pi = 3.14159 radius 2.2 radius = 2.2 area area = pi*(radius**2) radius = radius+1 16 9/18/2024 Changing Bindings ▪ can re-bind variable names using new assignment statements ▪ previous value may still be stored in memory but lost the handle for it CHANGING BINDINGS ▪ value for area does not change until you tell the computer to do the calculation again 3.14 pi = 3.14159 pi 2.2 radius = 2.2 radius area = pi*(radius**2) area 3.2 radius = radius+1 15.1976 Swapping Values of Variables ▪To swap the values of two variables (say x and y), normally require the use of a temporary variable >>> print(x,y) 10 20 >>> temp = x >>> x = y >>> y = temp >>> print(x,y) 20 10 17 9/18/2024 Strings ▪ strings are “immutable” – cannot be modified s = "hello" s = 'y' → gives an error s = 'y'+s[1:len(s)] → is allowed, s bound to new object "hello" "yello" s print ◼ print : Produces text output on the console. ◼ Syntax: print("Message") print(Expression) Prints the given text message or expression value on the console and moves the cursor down to the next line. print(Item1, Item2,..., ItemN) Prints several messages and/or expressions on the same line. ◼ Examples: print("Hello, world!") age = 45 print("You have", 65 – age, "years until retirement") Output: Hello, world! You have 20 years until retirement 18 9/18/2024 Example: print Statement Elements separated by commas print with a space between them >>> print('hello', 'there') hello there The end parameter can be used to append any string at the end of the output of the print statement in python. >>> print('hello’, end=''); print('there’) hellothere String Formatting ◼ Similar to C’s printf % ◼ Can usually just use %s for everything, it will convert the object to its String representation. >>> "One, %d, three" % 2 'One, 2, three' >>> "%d, two, %s" % (1,3) '1, two, 3' >>> "%s two %s" % (1, 'three') '1 two three' >>> 19 9/18/2024 input ◼ input : Takes input from the user. ◼ You can assign (store) the result of input into a variable. ◼ Input read is returned as a string; typecast / convert input string to correct type before using in arithmetic operations ◼ Example: age = int(input("How old are you? ")) print("Your age is", age) print("You have", 65–age, "years until retirement") Output: How old are you? 53 Your age is 53 You have 12 years until retirement ◼ Exercise: Write a Python program that prompts the user for his/her amount of money, then reports how many Nintendo Wiis the person can afford, and how much more money he/she will need to afford an additional Wii. input: Example input.py print("What's your name?") name = input("> ") print("What year were you born?") birthyear = int(input("> ")) print("Hi ", name, "!", "You are ", 2024 –birthyear, “y.o.”) % python input.py What's your name? > Michael What year were you born? >1980 Hi Michael! You are 44 y.o. 20 9/18/2024 Coding Best Practices & Guidelines 1. Focus on Code readability 2. Choose meaningful variable and function names Acceptable naming conventions: Camel Case (e.g. userName) vs Snake Case (e.g. user_name) 3. Avoid using a Single Identifier for multiple purposes 4. Use comments and whitespace effectively 5. Use indentation and consistent formatting 6. Prioritize Documentation (include README.txt file, docstrings) 7. Efficient Data Processing; Avoid unnecessary loops and iterations 8. Effective Version Control and Collaboration 9. Try to formalize Exception Handling (try-catch block) 10. Standardize Headers for Different Modules 21 9/22/2024 LBOBGDT Big Data Techniques and Technologies Bobby Reyes Program Program Flow Flow Control: Control: Branching Branching and and Iteration Iteration Conditional Execution – if, else, elif Repeated Execution (loops) – for, while 1 9/22/2024 Python Programs ▪ Recall: ▪ a program is a sequence of definitions and commands ◦ definitions evaluated ◦ commands executed by Python interpreter in a shell ▪ interpreter executes each instruction in order ◦ use tests to change flow of control through sequence ◦ stop when done ▪ control the flow of programs using: ◦ Conditional statements: if, else, elif ◦ Loops: for, while Comparison Operators ◼ Many logical expressions use relational operators ◼ comparisons evaluate to a Boolean Operator Meaning Example Result == equals 1 + 1 == 2 True != does not equal 3.2 != 2.5 True < less than 10 < 5 False > greater than 10 > 5 True = 5.0 True 2 9/22/2024 Logical Operators ◼ Logical expressions can be combined with logical operators: not a → True if a is False False if a is True A B A and B A or B a and b → True if both are True True True True True True False False True a or b → True if either or both are True False True False True False False False False Operator Example Result not not 7 > 0 False and 9 != 6 and 2 < 3 True or 2 == 3 or -1 < 5 True Precedence of Operators ▪ precedence: Order in which operations are evaluated ▪ Parentheses used to tell Python to do these operations first + - Unary operators ** Exponentiation has a higher precedence * / // % Multiplication / Division / Modulo + - Addition / Subtraction SIMPLE OPERATIONS Relational operators < > = == != not Unary logical operator and or Binary logical operators If at same level, operations are evaluated from left to right 3 9/22/2024 Conditional Execution: if Statement ◼ if statement: Executes a group of statements only if a certain condition is True. Otherwise, the statements are skipped. ◼ Syntax: if condition : statements a code block ◼ Example: gpa = 3.4 if gpa > 2.0 : print("Your application is accepted.") If-else Statement ◼ if-else statement: Executes one block of statements if condition is True, and a second block of statements if condition is False. ◼ Syntax: if condition : statements1 else: statements2 ◼ Example: gpa = 1.4 if gpa > 2.0 : print("Welcome to Mars University!") else: print("Your application is denied.") 4 9/22/2024 Nested if Statements ◼ Possible to nest if-else statements; i.e. combine multiple if-else statements within each other. ◼ Example: if temperature > 21: if wind > 3.2: if pressure > 66: rain = True else: rain = False else: rain = False else: if wind > 7.2: rain = True else: if pressure > 79: rain = True else: rain = False ◼ Multiple conditions can be chained with elif ("else if"): Syntax: if condition1 : statements1 elif condition2 : statements2 else : statementsx 2 Indentation ▪ matters in Python ▪ how you denote blocks of code ▪ Code blocks are logical groupings of commands. They are always preceded by a colon : x = float(input("Enter a number for x: ")) y = float(input("Enter a number for y: ")) if x == y: print("x and y are equal") equal" ) if y != 0 0:: print("therefore, x / y is", x/y) x/y ) code blocks elif x < y: print("x is smaller") smaller" ) else: print("y is smaller") smaller" ) print("thanks!") 5 9/22/2024 Example of if Statements import math x = 30 if x >> [x**3 for x in range(8) if x%2 == 1] [1, 27, 125, 343] >>> {x**3 for x in range(8) if x%2 == 1} {1, 27, 125, 343} >>> print( [[(i,j) for i in range(3)] for j in range(4)] ) [[(0, 0), (1, 0), (2, 0)], [(0, 1), (1, 1), (2, 1)], [(0, 2), (1, 2), (2, 2)], [(0, 3), (1, 3), (2, 3)]] Iterating Over a List ▪ Example: compute the sum of elements of a list ▪ common pattern: iterate over list elements L = [1,2,3,4,5] L = [1,2,3,4,5] total = 0 total = 0 for i in range(len(L)): for i in L: total += L[i] total += i print(total) print(total) ▪ Note: list elements are indexed 0 to len(L)-1 range(n) goes from 0 to n-1 5 9/18/2024 List Functions ◼ L.insert(i,x) ◼ Insert item x at a given position i. ◼ Similar to a[i:i]=[x] ◼ L.append(x) ◼ Add item x at the end of the list. ◼ L.remove(x) ◼ Removes first item from the list with value x ◼ L.pop(i) ◼ Remove item at position i and return it. If no index i is given, then remove the element at the end of the list. ◼ L.index(x) ◼ Return the index in the list of the first item with value x. ◼ L.count(x) ◼ Return the number of time x appears in the list ◼ L.sort() ◼ Sorts items in the list in ascending order ◼ L.reverse() ◼ Reverses items in the list Lists: Modifying Contents ◼ L[i] = a reassigns the ith element to the value a >>> x = [1,2,3] ◼ In example on the right, >>> y=x since x and y point to >>> x = 15 the same list object, >>> x both are changed [1, 15, 3] >>> y ◼ The method append [1, 15, 3] also modifies the list >>> x.append(12) >>> y [1, 15, 3, 12] 6 9/18/2024 Lists: Modifying Contents ◼ The method append >>> x = [1,2,3] modifies the list and >>> y=x returns None >>> z = x.append(12) >>> z == None ◼ Use + to True concatenate lists; >>> y returns a new list [1, 2, 3, 12] >>> x = x + [9,10] >>> x [1, 2, 3, 12, 9, 10] >>> y [1, 2, 3, 12] >>> Using Lists as Stacks ◼ You can use a list as a stack >>> a = ["a", "b", "c“,”d”] >>> a ['a', 'b', 'c', 'd'] >>> a.append("e") >>> a ['a', 'b', 'c', 'd', 'e'] >>> a.pop() 'e' >>> a.pop() 'd' >>> a ['a', 'b', 'c'] >>> 7 9/18/2024 Lists of Lists of Lists of … ▪ can have nested lists ▪ side effects still possible after mutation warm = ['yellow’, 'orange'] hot = ['red'] brightcolors = [warm] brightcolors.append(hot) print(brightcolors) hot.append('pink') print(hot) print(brightcolors) [['yellow', 'orange'], ['red']] ['red', 'pink'] [['yellow', 'orange'], ['red', 'pink']] To access the elements, it would require 2 levels. >>> brightcolors ‘pink Tuples ◼ Tuples are immutable versions of lists ◼ Denoted by parenthesis ◼ One strange point is the format to make a tuple with one element: ',' is needed to differentiate from the mathematical expression (2) >>> x = (1,2,3) >>> x[1:] (2, 3) >>> y = (2,) >>> y (2,) >>> te = () 8 9/18/2024 Sets ◼ A set is another python data structure that is an unordered collection with no duplicates. ◼ Initialize set by placing elements inside curly braces { }, or use the set() function ◼ Set operations include union, intersection, membership testing and eliminating duplicate entries. >>> setA = {"a", "b", "c", "d"} >>> setA {'a', 'b', 'c', 'd'} >>> setB = set(["c", "d", "e", "f"]) >>> "a" in setA True >>> "a" in setB False Sets >>> setA - setB → set Difference; in A but not in B {'a', 'b'} >>> setA | setB → set Union {'a', 'c', 'b', 'e', 'd', 'f'} >>> setA & setB → set Intersection {'c', 'd'} >>> setA ^ setB → Symmetric Difference {'a', 'b', 'e', 'f'} >>> 9 9/18/2024 Dictionaries ◼ A set of key-value pairs ◼ Dictionaries are mutable >>> d= {‘one’ : 1, 'two' : 2, ‘three’ : 3} >>> d[‘three’] 3 Dictionaries: Add/Modify ◼ Entries can be changed by assigning to that entry >>> d {1: 'hello', 'two': 42, 'blah': [1, 2, 3]} >>> d['two'] = 99 >>> d {1: 'hello', 'two': 99, 'blah': [1, 2, 3]} Assigning to a key that does not exist adds an entry >>> d = 'new entry' >>> d {1: 'hello', 7: 'new entry', 'two': 99, 'blah': [1, 2, 3]} 10 9/18/2024 Dictionaries: Deleting Elements ◼ The del function deletes an element from a dictionary >>> d {1: 'hello', 2: 'there', 10: 'world'} >>> del(d) >>> d {1: 'hello', 10: 'world'} Iterating Over a Dictionary >>>address={'Wayne': 'Young 678', 'John': 'Oakwood 345', 'Mary': 'Kingston 564'} >>>for k in address.keys(): print(k,":", address[k]) Wayne : Young 678 John : Oakwood 345 Mary : Kingston 564 >>> >>> for k in sorted(address.keys()): print(k,":", address[k]) John : Oakwood 345 Mary : Kingston 564 Wayne : Young 678 >>> 11 9/18/2024 Copying Dictionaries and Lists ◼ The built-in >>> l1 = >>> d = {1 : 10} list function >>> l2 = list(l1) >>> d2 = d.copy() >>> l1 = 22 >>> d = 22 will copy a list >>> l1 >>> d ◼ The dictionary {1: 22} has a method >>> l2 >>> d2 called copy {1: 10} Data Type Summary Integers: 2323, 32343565 Floating Point: 32.3, 3.1E2 Complex: 3 + 2j, 1j Boolean: True, False, 3 < 5 String: ‘Hello World!’ Lists: l = [ 1, 2, 3] Tuples: t = (1, 2, 3) Sets: s = {1, 2, 3} Dictionaries: d = {‘hello’ : ‘there’, 2 : 15} ◼ Lists, Tuples, and Dictionaries can store any type (including other lists, tuples, and dictionaries!) ◼ Only lists and dictionaries are mutable ◼ All variables are references 12 9/22/2024 LBOBGDT Big Data Techniques and Technologies Bobby Reyes Functions 1 9/22/2024 Good Programming ▪ more code not necessarily a good thing ▪ measure good programmers by the amount of functionality ▪ introduce functions ▪ mechanism to achieve decomposition and abstraction Abstraction and Decomposition Video Wall Example ▪ A LED display is a black box ▪ ABSTRACTION IDEA: do not need to know how LED display works to use it ▪ A video wall projects a large image decomposed into segments displayed using separate LED displays ▪ All LED displays work together to produce a larger image ▪ DECOMPOSITION IDEA: different devices work together to achieve an end goal 2 9/22/2024 APPLY THESE CONCEPTS TO PROGRAMMING! Create Structure with Decomposition ▪ in video wall example, use separate devices ▪ in programming, divide code into modules are self-contained used to break up code intended to be reusable keep code organized keep code coherent ▪ this lecture, achieve decomposition with functions ▪ in another module, achieve decomposition with classes 3 9/22/2024 Suppress Details with Abstraction ▪ in video wall example, instructions for how to use it are sufficient; no need to know how to build one ▪ in programming, think of a piece of code as a black box cannot see details do not need to see details do not want to see details hide tedious coding details ▪ achieve abstraction with function specifications or docstrings Functions ▪ write reusable pieces/chunks of code, called functions ▪ function characteristics: has a name has parameters (0 or more) has a docstring (optional but recommended) has a body returns something as a result ▪functions are not run in a program until they are “called” or “invoked” in a program 4 9/22/2024 How to Write and Call/Invoke a Function To create a function, you need to define it using the def keyword. def is_even( i ): """ Input: i, a positive int Returns True if i is even, otherwise False """ print("inside is_even") return i%2 == 0 is_even(3) Function Basics def Max(x,y) : if x > y : return x else : return y print(Max(3,5)) 5 print(Max('hello', 'there’)) 'there' 5 9/22/2024 Multiple Return Values ▪ can return multiple values by simply separating them with commas in the return statement; comma-separated values are treated as tuples def test(): return 'abc', 100, [0, 1, 2] # function call in Main program a, b, c = test() print(a) print(b) print(c) abc 100 [0, 1, 2] ▪ Can also return multiple values as a list using [] Variable Scope ▪ formal parameter gets bound to the value of actual parameter when function is called ▪ new scope/frame/environment created when enter a function ▪ scope is mapping of names to objects def f( x ): x = x + 1 print('in f(x): x =', x) return x x = 3 z = f( x ) 6 9/22/2024 Variable Scope def f( x ): Global scope f scope x = x + 1 f Some x print('in f(x): x =', x) 3 code return x x 3 #main x = 3 z z = f( x ) Variable Scope def f( x ): Global scope f scope x = x + 1 f Some x print('in f(x): x =', x) 4 code return x x 3 #main x = 3 z z = f( x ) 7 9/22/2024 Variable Scope def f( x ): Global scope f scope x = x + 1 f Some x print('in f(x): x =', x) 4 code return x x 3 returns 4 #main x = 3 z z = f( x ) Variable Scope def f( x ): Global scope x = x + 1 f Some print('in f(x): x =', x) code return x x 3 #main x = 3 4 z z = f( x ) 16 6.0001 LECTURE 4 next instruction 8 9/22/2024 Python Scope: Resolving Names The scope of a name (variables, functions, objects, etc.) defines the area of a program in which you can unambiguously access that name. Two general scopes: Global scope: names defined in this scope are available to whole code Local scope: names defined in this scope are only visible to the code within the scope. Python uses the LEGB rule in deciding the order in resolving names, i.e. mapping names to objects Local (or function) scope: Defined inside function/class Enclosing (or nonlocal) scope: Defined inside enclosing functions (Nested function concept) Global (or module) scope: Defined at the uppermost level (program, script, or module) Built-in scope: Reserved names in Python built-in modules, available everywhere. Exer: Understanding Variable Scope def f( x ): print('in f(x), upon entry: x =', x) w = 3 x = x + 4 # y = y + 5 print("in f(x), before exit: w, x, y = ", w, x, y) # print("in f(x), before exit: z = ", z) return x x=1 y=2 z=f(x) print("In main, before exit: x, y, z = ", x, y, z) #print("In main, before exit: w = ", w) ◼ Exercise: 1. Run the above code. What is the output? 2. Restart the kernel, uncomment 1 line, and execute the code. What is the output? Why? 3. Restore the comment (#) and repeat #2 for another commented line. 9 9/22/2024 Functions are Objects ◼ Can be assigned to a variable ◼ Can be passed as a parameter ◼ Can be returned from a function ◼ Functions are treated like any other variable in Python ◼ The def statement simply assigns a function to a variable Function names are like any variable >>> x = 10 >>> x ◼ Functions are objects 10 ◼ The same reference >>> def x () : rules hold for them as... print('hello') for other objects >>> x >>> x() hello >>> x = 'blah' >>> x 'blah' 10 9/22/2024 Functions as Parameters def foo(f, a) : return f(a) def bar(x) : return x * x foo(bar, 3) 9 ◼ Note that the function foo takes two parameters and applies the first as a function with the second as its parameter High-Order Functions ◼ map(func,seq) –executes a specified function for each item in an iterable seq. The item is sent to the function as a parameter. - for all i, applies func(seq[i]) and returns the corresponding sequence of the calculated results. >>> def double(x): return 2*x >>> lst = range(10) >>> result = map(double, lst) >>> print(list(result)) [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] 11 9/22/2024 High-Order Functions ◼filter(boolfunc,seq) – returns a sequence containing all those items in seq for which boolfunc is True. >>> def even(x): return x%2 == 0 >>> lst = range(10) >>> result = filter(even,lst) >>> print(list(result)) [0, 2, 4, 6, 8] High-Order Functions ◼reduce(func,seq) – applies func to the items of seq, from left to right, two-at-time, to reduce the seq to a single value. The reduce function is defined in “functools” module. ◼ >>> from functools import * >>> def plus(x,y): return x + y >>> lst = ['h', 'e', 'l', 'l', 'o'] >>> result = reduce(plus,lst) >>> print(result) hello 12 9/22/2024 Functions Inside Functions ◼ Since they are like any other object, you can have functions inside functions >>> def foo (x,y) : def bar (z) : return z * 2 return bar(x) + y >>> foo(2,3) 7 Functions Returning Functions >>> def foo (x) : def bar(y) : return x + y return bar >>> f = foo(3) >>> print(f) >>> print(f(2)) 5 13 9/22/2024 Parameters: Defaults ◼ Parameters can be >>> def foo(x = 3) : assigned default values... print(x) ◼ They are overridden if a... parameter is given for them >>> foo() 3 ◼ The type of the default >>> foo(10) doesn’t limit the type of a 10 parameter >>> foo('hello') hello Parameters: Named ◼ Call by name >>> def foo (a,b,c) : print(a, b, c) ◼ Any positional arguments >>> foo(c = 10, a = 2, b = 14) must come 2 14 10 before named >>> foo(3, c = 2, b = 19) ones in a call 3 19 2 14 9/22/2024 Recursion ◼ It is possible for a function to call itself! The function is said to be recursive. ◼ Example: Factorial ◼ The factorial of a positive integer n, denoted as n!, is defined as: ◼ Expressing the definition of n! recursively: ◼ Define the Factorial Function def factorial(n): return 1 if n >> f = lambda x,y : x + y >>> f(2,3) expression 5 returns a >>> lst = ['one', lambda x : x * x, 3] function object >>> lst ◼ The body can 'one' only be a simple >>> lst(4) expression, not 16 complex >>> lst statements 3 15 9/22/2024 Where to use functions and lambda? ▪In data science, processing and cleaning data takes up a significant amount of time in the pipeline. ▪Usually functions and lambda expressions are helpful in modifying and applying a specific processing step to multiple files, rows or columns in the data. Modules ◼ The highest level structure of Python ◼ Each file with the py suffix is a module ◼ Each module has its own namespace 16 9/22/2024 Modules: Imports import mymodule Brings all elements of mymodule in, but must refer to as mymodule. from mymodule import x Imports x from mymodule right into this namespace from mymodule import * Imports all elements of mymodule into this namespace Module: Function Basics def Max(x,y) : >>> import calculation if x > y : >>> calculation.Max(3,5) return x 5 else : >>> calculation.Max('hello', 'there') return y 'there' calculation.py In console 17 9/22/2024 Module: Functions as Parameters def foo(f, a) : >>> from funcasparam import * return f(a) >>> foo(bar, 3) 9 def bar(x) : return x * x ◼ Note that the function foo takes funcasparam.py two parameters and applies the first as a function with the second as its parameter 18 9/16/2024 LBOBGDT Big Data Techniques and Technologies Bobby Reyes Text and File Processing 1 9/16/2024 Strings ◼ string: A sequence of text characters in a program. ◼ Strings start and end with quotation mark " or apostrophe ' characters. ◼ Examples: "hello" "This is a string" "This, too, is a string. It can be very long!" ◼ A string may not span across multiple lines or contain a " character. "This is not a legal String." "This is not a "legal" String either." ◼ A string can represent characters by preceding them with a backslash. ◼ \t tab character ◼ \n new line character ◼ \" quotation mark character ◼ \\ backslash character ◼ Example: "Hello\tthere\nHow are you?" Indexes ◼ Characters in a string are numbered with indexes starting at 0: ◼ Example: name = "P. Diddy" index 0 1 2 3 4 5 6 7 character P. D i d d y ◼ Accessing an individual character of a string: variableName [ index ] ◼ Example: print(name, "starts with", name) Output: P. Diddy starts with P 2 9/16/2024 String Properties ◼ len(string) - number of characters in a string (including spaces) ◼ str.lower(string) - lowercase version of a string ◼ str.upper(string) - uppercase version of a string ◼ Example: name = "Martin Douglas Stepp" length = len(name) big_name = str.upper(name) print(big_name, "has", length, "characters") Output: MARTIN DOUGLAS STEPP has 20 characters input ◼ input : Reads a string of text from user input. ◼ Example: name = input("Howdy, pardner. What's yer name? ") print(name, "... what a silly name!") Output: Howdy, pardner. What's yer name? Sixto Dimaculangan Sixto Dimaculangan... what a silly name! 3 9/16/2024 Text Processing ◼ text processing: Examining, editing, formatting text. ◼ often uses loops that examine the characters of a string one by one ◼ A for loop can examine each character in a string in sequence. ◼ Example: for c in "booyah": print(c) Output: b o o y a h Strings and Numbers ◼ ord(text) - converts a string into a number. ◼ Example: ord("a") is 97, ord("b") is 98,... ◼ Characters map to numbers using standardized mappings such as ASCII and Unicode. ◼ chr(number) - converts a number into a string. ◼ Example: chr(99) is "c" ◼ Exercise: Write a program that performs a rotation cypher. ◼ e.g. "Attack" when rotated by 1 becomes "buubdl" 4 9/16/2024 The File Object ◼ Many programs handle data, which often comes from files. ◼ File handling in Python can easily be done with the built-in object file. ◼ The file object provides all of the basic functions necessary in order to manipulate files. ◼ Exercise: Open up notepad or notepad++. Write some text and save the file to a location and with a name you’ll remember, say 'Practice_File.txt'. The open() function ◼ Before you can work with a file, you first have to open it using Python’s in-built open() function. ◼ The open() function takes two arguments; the name of the file that you wish to use and the mode for which we would like to open the file; the result of open() is a file object that is used work on this file fh = open('Practice_File.txt', 'r') ◼ By default, the open() function opens a file in ‘read mode’; this is what the 'r' above signifies. ◼ There are a number of different file opening modes. The most common are: 'r'= read, 'w'=write, 'r+'=both reading and writing, 'a'=appending. ◼ Exercise: Use the open() function to read the file in. 5 9/16/2024 The close() function ◼ Likewise, once you’re done working with a file, you can close it with the close() function. ◼ Using this function will free up any system resources that are being used up by having the file open. fh.close() Reading in a file and printing to screen example ◼ Using what you have now learned about for loops, it is possible to open a file for reading and then print each line in the file to the screen using a for loop. ◼ Use a for loop and the variable name that you assigned the open file to in order to print each of the lines in your file to the screen. ◼ Example: fh = open('Practice_File.txt', 'r') for line in fh: print(line) Output: The first line of text The second line of text The third line of text … 6 9/16/2024 The read() function ◼ However, you don’t need to use any loops to access file contents. Python has in-built file reading commands: ◼ The read() function gets an optional argument, which is the number of bytes to read. If you skip it, it will read the whole file content and return it as a string. 1..read() - returns the entire contents of the file as a single string Output: fh = open('Practice_File.txt', 'r’) The first line of text print(fh.read()) The second line of text The third line of text The fourth line of text The fifth line of text.read(6) - read n=6 number of bytes Output: The fi fh = open('Practice_File.txt', 'r’) print(fh.read(6)) readline() functions ◼ Other in-built file reading commands: 2..readline() - returns one line at a time fh = open('Practice_File.txt', 'r’) Output: print(fh.readline()) The first line of text 3..readlines() - returns a list of lines Output: fh = open('Practice_File.txt', 'r’) ['The second line of print(fh.readlines()) text\n', 'The third line of text\n', 'The fourth line of text\n', 'The fifth line of text\n'] 7 9/16/2024 The write() function ◼ Likewise, there are two similar in-built functions for getting Python to write to a file: 1..write() - Writes a specified sequence of characters to a file fh = open('Practice_File_W.txt', 'w') fh.write('I am adding this string') 2..writeln() - Writes a list of strings to a file: testList = ['First line\n', 'Second line\n'] fh = open('Practice_File_W.txt', 'w') fh.writelines(testList) Example Line-by-line Processing ◼ Reading a file line-by-line and write to output file: fh1 = open('Practice_File.txt', 'r') fh2 = open('Write_File.txt', 'w') count = 0 for line in fh1.readlines(): fh2.write(line) count += 1 fh2.write('The file contains ' + str(count) + ' lines.') fh1.close() fh2.close() ◼ Exercise: Write a program to process a file of DNA text, such as: ATGCAATTGCTCGATTAG ◼ Count the percent of C+G present in the DNA. 8 9/16/2024 Data Conversion and Parsing ◼ A file, specifically a text file, consists of strings. However, especially in engineering and science, we work with numbers. Thus, need to convert (cast) input string to int or float. ◼ Another challenge is having multiple numbers on a string (line) separated by special characters or simply spaces as '10.0 5.0 5.0’ Can use the.split(delimiter) method of a string, which returns a list of strings separated by the given delimiters. instr = '10.0 5.0 5.0' outlst = [ float(substr) for substr in instr.split(' ')] print(outlst) [10.0, 5.0, 5.0] ▪ Other useful methods on working with strings: ▪.join(delimiter) – join elements of list of string with a delimiter ▪.rstrip(‘\n’) – remove occurrences of '\n' at the end of string Termination of Input ◼ Two ways to stop reading input: 1. By reading a definite number of items. 2. By the end of the file. EOF indicator – at end of file, functions like read() and readline() return an empty string ''. fp = open("pointlist.txt") # open file for reading pointlist = [] # start with empty list nextline = fp.readline() # first line of pointlist.txt is number of lines that follow; skip nextline = fp.readline() # read following line, has two real values # denoting x and y values of a point while nextline != ‘’: # until end of file nextline = nextline.rstrip('\n’) # remove occurrences of '\n' at the end (x, y) = nextline.split(' ‘) # get x and y (note that they are still strings) x = float(x) # convert them into real values y = float(y) pointlist.append( (x,y) ) # add tuple at the end nextline = fp.readline() # read the nextline fp.close() print(pointlist) [(0.0, 0.0), (10.0, 0.0), (10.0, 10.0), (0.0, 10.0)] 9 9/16/2024 LBOBGDT Big Data Techniques and Technologies Bobby Reyes Testing and Debugging 1 9/16/2024 Python Programs ▪ Recall: ▪ a program is a sequence of definitions and commands ◦ definitions evaluated ◦ commands executed by Python interpreter in a shell ▪ commands (statements) instruct interpreter to do something ▪ can be typed directly in a shell or stored in a file that is read into the shell and evaluated Where Programming Goes Wrong ▪ syntactic errors ◦ common and easily caught ▪ static semantic errors ◦ some languages check for these before running program ◦ can cause unpredictable behavior ▪ logic errors - no semantic errors but different meaning than what programmer intended ◦ program crashes, stops running ◦ program runs forever ◦ program gives an answer but different than expected 2 9/16/2024 Aim For High Quality Programs DEFENSIVE PROGRAMMING Write specifications for functions Modularize programs Check conditions on inputs/outputs (assertions) TESTING/VALIDATION DEBUGGING Compare input/output Study events leading up pairs to specification to an error “It’s not working!” “Why is it not working?” “How can I break my “How can I fix my program?” program?” Classes of Tests ▪ Unit testing validate each piece of program testing each function separately ▪ Regression testing add test for bugs as you find them catch reintroduced errors that were previously fixed ▪ Integration testing does overall program work? tend to rush to do this 3 9/16/2024 Debugging ▪ steep learning curve ▪ goal is to have a bug-free program ▪ tools built in to IDLE and Anaconda Python Tutor print statement use your brain, be systematic in your hunt print Statements ▪ good way to test hypothesis ▪ when to print: enter function parameters function results ▪ use bisection method put print halfway in code decide where bug may be depending on values 4 9/16/2024 Handle Exceptions ▪ An alternative to putting check conditions in your code. ▪ Syntax: try:...... # a block with possible errors...... # if there are function calls here...... # and error occurs in the function, we can handle error here except exceptionname: # exceptionname is optional..... # this is error handling block...... # when there is an error, execution jumps here ▪ This try-except block has two parts: ▪ A group of code (after try:) that possibly generates run-time errors, and a second or more blocks to handle the errors. ▪ When the first error occurs in the try part, the execution jumps to the except part. If except part matches the corresponding block, it is executed. except:, without an exception name, matches all errors and can be used to handle all of them as one block. Logic Errors - HARD ▪ think before writing new code ▪ draw pictures, take a break ▪ explain the code to someone else a rubber ducky 5 9/16/2024 DON’T DO Write entire program Write a function Test entire program Test the function, debug the function Debug entire program Write a function Test the function, debug the function *** Do integration testing *** Change code Backup code Remember where bug was Change code Test code Write down potential bug in a Forget where bug was or what change comment you made Test code Panic Compare new version with old version 6 9/16/2024 LBOBGDT Big Data Techniques and Technologies Bobby Reyes Objects and Classes 1 9/16/2024 Objects ▪ Python supports many different kinds of data 1234 3.14159 "Hello" [1, 5, 7, 11, 13] {"CA": "California", "MA": "Massachusetts"} ▪ each is an object, and every object has: a type an internal data representation (primitive or composite) a set of procedures for interaction with the object ▪ an object is an instance of a type 1234 is an instance of an int "hello" is an instance of a string Object Oriented Programming (OOP) ▪ EVERYTHING IN PYTHON IS AN OBJECT (and has a type) ▪ can create new objects of some type ▪ can manipulate objects ▪ can destroy objects explicitly using del or just “forget” about them python system will reclaim destroyed or inaccessible objects – called “garbage collection” 2 9/16/2024 What are Objects? ▪ objects are a data abstraction that captures… (1) an internal representation through data attributes (2) an interface for interacting with object through methods (aka procedures/functions) defines behaviors but hides implementation Internal representation should be private correct behavior may be compromised if you manipulate internal representation directly Advantages of OOP ▪ bundle data into packages together with procedures that work on them through well-defined interfaces ▪ divide-and-conquer development implement and test behavior of each class separately increased modularity reduces complexity ▪ classes make it easy to reuse code many Python modules define new classes each class has a separate environment (no collision on function names) inheritance allows subclasses to redefine or extend a selected subset of a superclass’ behavior 3 9/16/2024 Defining a Class ▪ use the class keyword to define a new type ▪ Example class Coordinate(object): class UCSBstudent: age = 21 #define attributes here schoolname=‘UCSB’ ▪similar to def, indent code to indicate which statements are part of the class definition ▪ the word object means that Coordinate is a Python object and inherits all its attributes (inheritance next lecture) Coordinate is a subclass of object object is a superclass of Coordinate What are Attributes? ▪ data and procedures that “belong” to the class ▪ data attributes think of data as other objects that make up the class for example, a coordinate is made up of two numbers ▪ methods (procedural attributes) think of methods as functions that only work with this class how to interact with the object for example you can define a distance between two coordinate objects but there is no meaning to a distance between two list objects 4 9/16/2024 Constructors: Defining How to Create an Instance of a Class ▪ first have to define how to create an instance of object ▪ use a constructor (a special method) called __init__ to initialize some data attributes class Coordinate(object): def init (self, x, y): self.x = x self.y = y Creating an Instance of a Class c = Coordinate(3,4) origin = Coordinate(0,0) print(c.x) print(origin.x) ▪ data attributes of an instance are called instance variables ▪ don’t provide argument for self, Python does this automatically 5 9/16/2024 What is a Method? ▪ procedural attribute, like a function that works only with this class ▪ Python always passes the object as the first argument convention is to use self as the name of the first argument of all methods ▪ the “.” operator is used to access any attribute a data attribute of an object a method of an object Define a Method for Coordinate Class Syntax: def name(self, parameter,..., parameter) : statements Example: class Coordinate(object): def __init__ (self, x, y): self.x = x self.y = y def distance(self, other): x_diff_sq = (self.x-other.x)**2 y_diff_sq = (self.y-other.y)**2 return (x_diff_sq + y_diff_sq)**0.5 must access the object's fields through the self reference other than self and dot notation, methods behave just like functions (take params, do operations, return) 6 9/16/2024 How to Use a Method def distance(self, other): # code here Using the class: ▪ conventional way ▪ equivalent to c = Coordinate(3,4) c = Coordinate(3,4) zero = Coordinate(0,0) zero = Coordinate(0,0) print(c.distance(zero)) print(Coordinate.distance(c, zero)) Example Point Class name = value ◼ Example: class Point: point.py x = 0 y = 0 1 class Point: 2 x = 0 # main 3 y = 0 p1 = Point() p1.x = 2 p1.y = -5 ◼ can be declared directly inside class (as shown here) or in constructors (more common) ◼ Python does not really have encapsulation or private fields ◼ relies on caller to "be nice" and not mess with objects' contents 7 9/16/2024 Using a Class import class ◼ client programs must import the classes they use point_main.py 1 from Point import * 2 3 # main 4 p1 = Point() 5 p1.x = 7 6 p1.y = -3 7 8 p2 = Point() 9 p2.x = 7 10 p2.y = 1 # Python objects are dynamic (can add fields any time!) p1.name = "Tyler Durden" Exercise point.py 1 from math import * 2 3 class Point: 4 x = 0 5 y = 0 6 7 def set_location(self, x, y): 8 self.x = x 9 self.y = y 10 11 def distance_from_origin(self): 12 return sqrt(self.x * self.x + self.y * self.y) 13 14 def distance(self, other): 15 dx = self.x - other.x 16 dy = self.y - other.y 17 return sqrt(dx * dx + dy * dy) 8 9/16/2024 Calling Methods ◼ A client can call the methods of an object in two ways: ◼ (the value of self can be an implicit or explicit parameter) 1) object.method(parameters) or 2) Class.method(object, parameters) ◼ Example: p = Point(3, -4) p.move(1, 5) Point.move(p, 1, 5) toString and _str_ def str (self): return string ◼ equivalent to Java's toString (converts object to a string) ◼ invoked automatically when str or print is called Exercise: Write a str method for Point objects that returns strings like "(3, -14)" def str (self): return "(" + str(self.x) + ", " + str(self.y) + ")" 9 9/16/2024 Complete Point Class point.py 1 from math import * 2 3 class Point: 4 def init (self, x, y): 5 self.x = x 6 self.y = y 7 8 def distance_from_origin(self): 9 return sqrt(self.x * self.x + self.y * self.y) 10 11 def distance(self, other): 12 dx = self.x - other.x 13 dy = self.y - other.y 14 return sqrt(dx * dx + dy * dy) 15 16 def move(self, dx, dy): 17 self.x += dx 18 self.y += dy 19 20 def str (self): 21 return "(" + str(self.x) + ", " + str(self.y) + ")" Operator Overloading ◼ operator overloading: You can define functions so that Python's built-in operators can be used with your class. ◼ See also: http://docs.python.org/ref/customization.html Operator Class Method Operator Class Method - neg (self, other) == eq (self, other) + pos (self, other) != ne (self, other) * mul (self, other) < lt (self, other) / truediv (self, other) > gt (self, other) = ge (self, other) - neg (self) + pos (self) 10 9/16/2024 Generating Exceptions raise ExceptionType("message") ◼ useful when the client uses your object improperly ◼ types: ArithmeticError, AssertionError, IndexError, NameError, SyntaxError, TypeError, ValueError ◼ Example: class BankAccount:... def deposit(self, amount): if amount < 0: raise ValueError("negative amount")... Inheritance class name(superclass): statements ◼ Example: class Point3D(Point): # Point3D extends Point z = 0... ◼ Python also supports multiple inheritance class name(superclass,..., superclass): statements (if > 1 superclass has the same field/method, conflicts are resolved in left-to-right order) 11 9/16/2024 Calling Superclass Methods ◼ methods: class.method(object, parameters) ◼ constructors: class. init (parameters) class Point3D(Point): z = 0 def init (self, x, y, z): Point. init (self, x, y) self.z = z def move(self, dx, dy, dz): Point.move(self, dx, dy) self.z += dz Python Libraries For Data Science The appeal of Python is in its simplicity and beauty, as well as the many essential third-party packages and toolkits used in scientific computing and data science applications: NumPy provides the ndarray object for efficient storage and computation for multidimensional data arrays. SciPy contains a wide array of numerical tools such as numerical integration and interpolation. Pandas provides a DataFrame object along with a powerful set of methods to manipulate, filter, group, and transform data. Matplotlib provides a useful interface for creation of publication-quality plots and figures. Scikit-Learn provides a uniform toolkit for applying common machine learning algorithms to data. 12 10/2/2024 LBOBGDT Big Data Techniques and Technologies Bobby Reyes Python Python Libraries Libraries for for Data Data Science: Science: NumPy, NumPy, Pandas, Pandas, Matplotlib Matplotlib 1 10/2/2024 Data Science Project Cycle , and perform feature extraction and data cleaning 1-3 Python Libraries For Data Science The appeal of Python is in its simplicity and beauty, as well as the many essential third-party packages and toolkits available that are used in scientific computing and data science applications: NumPy Numerical Python; provides the ndarray object for efficient storage and computation for multidimensional data arrays. SciPy Scientific Python; contains a wide array of numerical tools such as numerical integration and interpolation. Pandas Python Data Analysis or Panel Data; provides a DataFrame object along with a powerful set of methods to manipulate, filter, group, and transform data. Matplotlib Plotting library with MATLAB interface; provides capabilities for creation of publication-quality plots and figures. Scikit-Learn SciPy Toolkit; provides a uniform toolkit for applying common machine learning algorithms to data. Seaborn provides high level interface for drawing attractive statistical graphics 2 10/2/2024 NumPy (Numeric Python) introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance written in compiled C code Many of its operations are executed in compiled C or Fortran code, not Python. many other python libraries are built on NumPy Link: http://www.numpy.org/ 1-5 SciPy (Scientific Python) collection of algorithms for linear algebra, differential equations, numerical integration, optimization, statistics and more part of SciPy Stack built on NumPy Link: https://www.scipy.org/scipylib/ 1-6 3 10/2/2024 Pandas (Python Data Analysis or Panel Data) provides a DataFrame object along with a powerful set of methods to manipulate, filter, group, and transform data. aims to be the fundamental high-level building block for doing practical, real world data analysis in Python built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries Link: http://pandas.pydata.org/ 1-7 SciKit-Learn (SciPy Toolkit) provides machine learning algorithms: classification, regression, clustering, model validation etc. built on NumPy, SciPy and matplotlib Link: http://scikit-learn.org/ 1-8 4 10/2/2024 Matplotlib python 2D plotting library which produces publication quality figures in a variety of hardcopy formats a set of functionalities similar to those of MATLAB line plots, scatter plots, barcharts, histograms, pie charts etc. relatively low-level; some effort needed to create advanced visualization Link: https://matplotlib.org/ 1-9 Seaborn Matplotlib is a powerful, but sometimes unwieldy, Python library. Based on Matplotlib; Seaborn provides a high-level interface to Matplotlib and makes it easier to produce attractive statistical graphics Some IDEs incorporate elements of this “under the hood” nowadays. Similar (in style) to the popular ggplot2 library in R Link: https://seaborn.pydata.org/ 1-10 5 10/2/2024 Loading Python Libraries #Import Python Libraries import numpy as np import scipy as sp import pandas as pd import matplotlib as plt import seaborn as sns import sklearn as skl 1-11 Understanding Data Types in Python ▪ Programs manipulate data objects ▪ EVERYTHING IN PYTHON IS AN OBJECT ▪ Everything means everything! - including functions and classes 6 10/2/2024 Understanding Data Types in Python ▪ Every object has: a type an internal data representation (primitive or composite) a set of procedures for interaction with the object; i.e. the kinds of things programs can do to them ▪ For example, an integer object contains four pieces: ob_refcnt – reference count ob_type – encodes type of variable ob_size – size of data members ob_digit – actual integer value ▪ Here PyObject_HEAD is the part of the structure containing the reference count, type code, and other pieces. Understanding Data Types in Python ▪ A List can hold elements (data) of mixed data types A List is mutable ▪ But this flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other information—that is, each item is a complete Python object. 7 10/2/2024 NumPy NumPy Recall Slide: NumPy (Numeric Python) introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance written in compiled C code Many of its operations are executed in compiled C or Fortran code, not Python. many other python libraries are built on NumPy Link: http://www.numpy.org/ 1-16 8 10/2/2024 The ndarray Data Structure NumPy adds a new data structure to Python – the ndarray An N-dimensional array is a homogeneous collection of “items” indexed using N integers Defined by: 1. the shape of the array, and 2. the kind of item the array is composed of 1-17 Array Shape and Item Types Shape: ndarrays are rectangular The shape of the array is a tuple of N integers (one for each dimension) Item Types: Every ndarray is a homogeneous collection of exactly the same data-type every item takes up the same size block of memory each block of memory in the array is interpreted in exactly the same way 1-18 9 10/2/2024 1-19 Creating an Array ndarray or numpy.array (alias) is the basic data type. A = np.array([2,3,4]) Make array with list or tuple (NOT numbers) A = np.fromfile(file, dtype=float, count=-1, sep='') dtype Allows the construction of multi-type arrays count Number of values to read (-1 == all) sep Separator To generate using a function that acts on each element of a shape: np.fromfunction(function, shape, **kwargs) 1-20 10 10/2/2024 Built-in Functions A = np.zeros( (3,4) ) array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]]) Also numpy.ones and numpy.empty (generates very small floats) B = np.ones( (3,4) ) C = np.random.random((3,4)) (fill with random numbers) More generic : ndarray.fill(value) np.putmask() Put values based on a Boolean mask array (True == replace) 1-21 arange Like range but generates arrays with values: A = np.arange( 1, 10, 2 ) array([1, 3, 5, 7, 9]) Can use with floating point numbers, but precision issues mean better to use: a = np.linspace(start, end, numberOfNumbersBetween) Note that with linspace "end" is generated. 1-22 11 10/2/2024 ndarray np.ndim(A) Number of axes (dimensions) np.shape(A) Length of different dimensions np.size(A) Total data amount np.dtype(A) Data type in the array (standard or numpy) print(A) Will print the array nicely, but if too larger to print nicely will print with "…," across central points. Options set with np.set_printoptions, including np.set_printoptions(threshold = None) 1-23 Platform Independent Save Save/Load data in numpy.npy /.npz format np.save(file, A, allow_pickle=True, fix_imports=True) A = np.load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII') 1-24 12 10/2/2024 Indexing Arrays Data locations are referenced using a tuple [row, col] (for 2D): arrayA[1,2] not arrayA Can use numpy arrays to pull out values: A = np.random.random((3,4)) J = np.array( [ [ 0, 2], [ 1, 2 ] ] ) A[J] 1-25 Indexing Arrays Can also use Boolean arrays, with "True" values indicating values we want: mask = np.array([False,True,False,True]) A = np.array([1,2,3,4]) A[mask] == Numpy has something called 'Structured arrays' which allow named columns, but these are better done through their wrappers in Pandas. 1-26 13 10/2/2024 Slicing Arrays Slicing arrays is almost the same as slicing lists, except you can specify multiple dimensions This means we can slice across multiple dimensions, one of the most useful aspects on numpy arrays: A = arrayA[1:3,:] array of the 2nd and 3rd row, all columns. B = arrayA[:, 1] array of all values in second column. You can also use … to represent "the rest": A[4,...,5,:] == A[4,:,:,5,:] 1-27 Shape Changing To take the current values and force them into a different shape, use reshape, for example: A = np.arange(12).reshape(3,4) resize changes the underlying array. np.squeeze() removes empty dimensions arrayA.flat gives you all elements as an iterator arrayA.ravel() gives you the array flattened arrayA.T gives you array with rows and columns transposed (note not a function) 1-28

LBOBGDT Big Data Techniques and Technologies - Introduction to Programming with Python PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue