Summary

This document presents an introduction to Python functions, explaining concepts like fruitful and void functions. The document also details how to define your own Python functions, positional and keyword arguments, and how to handle exceptions in Python. It includes several code examples with questions and answers.

Full Transcript

Contents    1 Functions encapsulate functionality to perform one task 1.1 Methods are functions that are bound to an object 1.2 Use help and dir to inspect objects  2 Functions and methods can be nested 2.1 Fruitful functions vs. void functions  3 We can define functions ourselve...

Contents    1 Functions encapsulate functionality to perform one task 1.1 Methods are functions that are bound to an object 1.2 Use help and dir to inspect objects  2 Functions and methods can be nested 2.1 Fruitful functions vs. void functions  3 We can define functions ourselves Grundlagen der Programmierung in Python 3.1 Docstrings document function behavior Bio Data Science, 2024 3.2 Type hints indicate the types of values Dmitrij Turaev 4 Positional arguments and keyword arguments 5 Default argument values 6 All functions return something source 7 Functions allow to logically structure the code  8 Functions have their own namespaces 8.1 The scope defines the visibility of names 8.2 The global keyword lets you assign to global scope  9 Exceptions are errors that occur at runtime (during program execution) 9.1 Catching exceptions 9.2 Python error messages: Tracebacks  10 Development environment 10.1 Some useful VS Code options/extensions 10.2 Some useful Jupyter options 10.3 Managing Python files What will be the output of this code? 1. [1, 10, 2, 3] 2. [1, 2, 3] 3. [1, 10, 3] 4. None of the above 1. ['a', 'b', 'z'] abc 2. ['a', 'b', 'z'] None 3. TypeError exception 4. None of the above Bonus question: What does the code list("abc") do? Will list(s) also work? Functions are code blocks that perform one task def count_GC(sequence): """Calculate GC content in nucleotide sequence.""" # https://en.wikipedia.org/wiki/GC-content count_G = sequence.count("G") count_C = sequence.count("C") return (count_G + count_C)/len(sequence) seq = input("Enter nucleotide sequence") print("GC content is:", count_GC(seq)) TODO: 1. Test the code above. What is the advantage of using a function, compared to sequential code execution? 2. You know from the previous lecture that a function call is an expression. What does this mean? Solution: 1. The main advantage is that functions allow to re-use code without writing it again, simply by calling the function. Functions improve the program structure and lead to cleaner and more maintainable code (code that is easy to read, modify or extend). 2. Every function call evaluates to a single value, the return value of the function. As any value is an expression, a function call is an expression. 1 Functions encapsulate functionality to perform one task Indexing and slicing allows to get single characters or substrings from strings. For complex tasks this is not enough. E.g., we might need to determine the string length, or to set the string to upper case. For this, we can use functions (e.g. len ) and methods (e.g. str.upper ). Functions are blocks of code that do one specific task. They usually accept one or more input arguments and always return a value. Python has many useful built-in functions, like: type(4) # Returns object type isinstance(3, float) # Determines if object is instance of given t ype (returns boolean) len("acgt") # Returns string length input("input value: ") # Returns user input abs(-4) # Returns absolute value Other built-in functions are abs() , max() , min() , str() , int() , bool() , etc. Get help using the ? syntax in Ipython, e.g.: len? Use the interactive interpreter to test and play around with things These functions are standalone functions: You can pass them any argument - an object like an integer or string, and they will operate on it. The syntax is: function(arg1, arg2,...) 1.1 Methods are functions that are bound to an object Some functions are bound to specific objects like strings. Such functions are called methods Methods always receive at least one argument, which is the object they are bound to. They have a different syntax: arg1.function(arg2,...) # List methods a = [2, 4, 1] a.append(7) # Call the `append` method of list `a` a.pop() # Behind the scenes, the `pop` method receives one argumen t, the list `a` # String methods "acgattg".count("g") # Call a string method with one additional ar gument "abc".upper() # Some string methods don't require/allow addition al arguments "abc".isupper() # A function call always includes parentheses "abc".isupper # Without parentheses, you output the function obj ect, instead of calling it (a Python function is represented by an object, like an integer, or a string, or a list, or a module, etc.) These methods exist only in the object namespace, and are not directly accessible from the global namespace: help(isupper) # NameError Autocompletion lists available methods: a = "abc" a. # press → list available methods for this object Self-test: a. Why is a function call an expression? b. Two examples of built-in functions? c. What are methods? d. Two examples of list methods? e. Two examples of string methods? 1.2 Use help and dir to inspect objects Collecting information about objects and other parts of your program is crucial for successful coding. Python provides many helpful functions for this, like type , id , and print. Two more highly useful functions are help and dir. help() — Start interactive help utility help(object) — Get help about object (displays the object's docstring) In Ipython, the question mark is a shortcut to get help. You can use it on any object (like a function object): abs? # Get help on function str? # Get help on type str.count? # Get help on method help(str.lower) # Get help on method; same as `str.lower?` A double question mark tries to pull out more information about an object, and if possible display the Python source code of this object. import collections # Standard library module with some additional types collections? collections?? # View source code, if available dir(object) — List object attributes. Generally, an attribute is any name following a dot. E.g., in the expression z = 7; z.real , the object real is an attribute of the object z. An attribute can be any object. Many attributes are methods: Try type("abc".count) or type([1,2,3].append) dir(str) # List attributes of 'str' type When using dir on an object, you'll see methods that start and end with two underscores. These are "special methods" or dunder (double underscore) methods. They determine the behavior of objects in different situations, e.g. with operators. For example, this is why... using + on integers and using + on strings has different effects... function objects and some other objects can be called (are callable), while other objects aren't... some objects are iterable and can be used in loops, while others can`t, etc. Self-test: a. How do you get help on a Python object? b. How do you list object attributes? c. What other functions do you know that are useful for code introspection? (hint) TODO: 1. Convert an upper case string (e.g. "GACCT") to lower case 2. Convert a lower case string (e.g. "accgagct") to upper case 3. Split the string "acganncgtnngat" on the sequence "nn" 4. Split the string "acganncgtnngat" on the first occurrence of the sequence "nn" from the right 5. Set the letters "tata" in the sequence "acgtgactatagc" to upper case 6. Determine if the sequence "ACCGCAT" starts with the letters "ACC" and ends with the letters "CAT" 7. Remove whitespaces from the sequence " acg gat gcc " 8. Do other objects like integers have methods, too? Solution: "GACCT".lower() "accgagct".upper() "acganncgtnngat".split("nn") "acganncgtnngat".rsplit("nn",1) "acgtgactatagc".replace("tata", "TATA") "ACCGCAT".startswith("ACC") and "ACCGCAT".endswith("CAT") " acg gat gcc ".strip() dir(7) # inspect integer object source 1. AttributeError exception 2. True 3. False 4. None of the above 2 Functions and methods can be nested You know that expressions can be nested, e.g. 3 + (4 * 5) , and can also include functions. The evaluation begins at the innermost function and the return value becomes the argument of the outer functions. len(bin(abs(min(-16, 23)))) What does this code do? Execute it step-by-step to see how each function modifies the result. Methods are functions, and can also be nested. Every method call evaluates to an object, on which another method is called: header = ">gi|1253402|Ecoli|Helper protein\n" # example header line from fasta file header[1:].lower().strip().split('|').index("ecoli") What does this code do? Execute it step-by-step to see how each method modifies the result. Solution: >>> header = ">gi|1253402|Ecoli|Helper protein\n" >>> tmp1 = header[1:] # what type does the object named `tmp1` hav e? >>> tmp2 = tmp1.lower() # what type does the object named `tmp2` h ave? >>> tmp3 = tmp2.strip() # OK, you get the idea... >>> tmp4 = tmp3.split("|") >>> result = tmp4.index("ecoli") >>> print(result) 1. 0 2. 1 3. 2 4. 3 5. 4 6. None 2.1 Fruitful functions vs. void functions It's important to know that every function does something and returns something. The function does something. A function consists of Python code that gets executed. For example, the function can print text, perform calculations, read and write files, etc. Every function has one single return value. This is the value that the function returns to the caller. In other words, each function call is an expression that evaluates to the return value. If you write result = function(argument) , you bind a name to this value. (This is really important. When you finish the chapter and execute all code examples, go back to these sentences, read them a few times more and make sure that you understand what this means.) All functions return something, but not all return something useful. The book "Think Python" calls them fruitful and void functions (greenteapress.com). One example of fruitful functions are string methods. They return a value that you can use (note that the terms "value" and "object" are sometimes used interchangeably, though this is not strictly correct): print("abcde".upper()) # Print return value result = "abcde".upper() # Bind name to return value print(result) Other functions do something, but return the value None. "Think Python" calls them void functions: nums = [3, 1, 20, 15] sorted_nums = nums.sort() # sort list print(sorted_nums) # print sorted list?!.. print(nums) # print sorted list! It's important to remember that some functions/methods return None ! If you forget the distinction between fruitful and void functions, you may try to use the return value of a void function, leading to bugs of this kind: print([3, 1, 20, 15].sort()) # print return value of `list.sort` m ethod 1. What type is the return value of the str.upper method? 2. What type is the return value of the list.sort method? Solution: Test it in the interpreter: 1. >>> result = 'abcde'.upper(); type(result) → Method str.upper returns a string 2. >>> result = [3, 1, 20, 15].sort(); type(result) → Method list.sort sorts the list in place and returns None The reason for this behavior is: Python strings are immutable. Their methods don't modify the original string, but return the result, e.g. a new string object. You can print it, bind a name to it, or use as part of an expression. Mutable types like lists have methods that operate in place, on the original object, and return None. Using their return value isn't helpful. There is also a built-in function sorted. What does it return? nums = [3, 20, 1, 15] print(sorted(nums)) The convention is that Python functions: Either operate on data in place and return None (this is true for methods of mutable data types) Or do not modify the original data, and return the result (such functions usually end in "ed": sorted , reversed ) You might have noticed that methods like insert , remove or sort that only modify the list have no return value printed – they return the default None. This is a design principle for all mutable data structures in Python. (Python tutorial: Data structures) 1. Can you nest any kind of functions/methods? 2. How can you tell if a function is fruitful or void, e.g. len ? 3. Is the print function a fruitful or a void function? 4. What does the function reversed do? Compare the three options of reversing a list in Python, and pick your favorite. Solution: 1. Nesting is only possible with fruitful functions. They evaluate to a value that becomes the argument of another function, e.g.: >>> len("abc".upper()) → "abc".upper() evaluates to a string, which becomes the argument of len. The return value of void functions isn't useful, e.g.: >>> a = [1,3,2]; len(a.sort()) leads to TypeError 2. Fruitful functions return a value that is not None. You can test it in the interpreter: >>> type(len("abc")) → returns an integer. 3. Test it in the interpreter: >>> result = print("Hello, world"); type(result). This may be a little bit confusing: the print function does something (it prints something to the screen), and returns None. (Fun fact: In Python 2, print was a statement, not a function. This was changed in Python 3 for more consistency and flexibility, see Stackoverflow). 4. Get help on the function using a question mark, reversed?. Test the function with some examples in the interpreter. Self-test: a. What is the difference between "fruitful" and "void" functions? b. Why can some functions/methods be nested, while others can't? 3 We can define functions ourselves We saw built-in functions like dir() and object methods like str.upper(). We can also define functions ourselves, using one of two keywords, def or lambda. ( lambda is a short form of function definition, sometimes used for short single-use functions; it has no other advantages, and we won't use it for now.) The basic syntax is: def function_name(parameter1, parameter2,...): #...more python statements that make up the function body... return value def statement name parameter names A function consists of a function header and a function body def fahr_to_celsius(temp): body return ((temp - 32) * (5/9)) return statement return value A parameter is a variable in the function definition. When a function is called, the arguments are the actual values that you pass to the function ("Arguments are actual"): fahr_to_celsius(451) # function call Here, the function parameter temp is initialized with the argument 451 that is passed during the function call. Within the function, the name temp is bound to the passed integer object. TODO: Read the function below. 1. How many parameters does the function definition have? 2. How many lines does the function body have? 3. What does the function do? 4. How many arguments are passed during function call? 5. You can see that a return statement is not required. However, you know that every function returns something. What is the return value of this function? def print_greeting(myname, friend_name): print("hello", myname) print("your friend is", friend_name) print_greeting("Alice", "Bob") Solution: 1. 2 parameters ( myname and friend_name ) 2. 2 lines 3. The function prints two lines and returns None (see Stackoverflow for details) 4. 2 arguments ( "Alice" and "Bob" ) 5. You can test it in the interpreter: result = print_greeting("Alice", "Bob"); type(result) or simply type(print_greeting("Alice", "Bob")) → function returns None You know that the Python interpreter executes code sequentially, line by line. Here is what happens, when it meets a function definition: 1. When the interpreter sees a function definition, it executes only the function header ( def... ), but skips the function body. 2. This creates a new callable function object. (Like any object, it has an identity (memory address) and a type.) It can be called any time later using the () syntax. 3. When the function is called, the function parameters are bound to the passed argument values (no data is copied, which is faster and saves memory). Then the function body is executed. After that, the flow of execution comes back to pick up where it left off. # Like any Python object, a function object has a type and an ident ity print("Object ", print_greeting, "says hello.") print("Type is", type(print_greeting)) The flow of execution is explained in Think Python: How to Think Like a Computer Scientist (also try the interactive edition). Self-test: a. What is the difference between function parameters and function arguments? b. Which part of the function is executed during function definition, and which part during function call? 3.1 Docstrings document function behavior Explain what this function does: def validate_base_sequence(base_sequence): """Returns True for valid nucleotide sequence""" seq = base_sequence.upper() return len(seq) == (seq.count("T") + seq.count("C") + seq.count("A") + seq.count("G")) validate_base_sequence("ACCGT") # function call Note: After function is defined: Tab autocompletion of function name in interpreter/IDE Parentheses allow a multi-line expression. You can also use a backslash instead (makes the interpreter ignore the invisible \n character) A docstring is a string literal that is the first statement in a function/method, class, or module. It serves as documentation and describes what this function/module/class does. Every function should have a docstring (PEP 8) Use """triple double quotes""" around docstrings Docstrings are similar to comments (stackexchange.com), but have some special properties, e.g. they are picked up by the help() function: help(validate_base_sequence) validate_base_sequence? # Ipython syntax There are different ways to write docstrings (Stackoverflow). One-line docstrings are OK for simple functions (Python docs). More complex functions should have multi-line docstrings, including a one-line summary, a detailed explanation, an explanation of the arguments and of the return value, and usage examples (Google style guide). Example: """Validates a nucleotides sequence. Validates a DNA or RNA sequence by comparing it to allowed nucleotide symbols. Returns False if the numbers don't add up. Function is case-insensitive. Args: base_sequence: String with DNA or RNA sequence RNAflag: Optional; if True, base_sequence is RNA, otherwise it's DNA Returns: A boolean indicating if base_sequence is a valid or invalid nucleotide sequence. >>> validate_base_sequence("acGaTTag") True >>> validate_base_sequence("AAACAGGXG") False """ More examples: geeksforgeeks.org Pro tip: You can let ChatGPT write docstrings for you. By automating docstring creation, you can focus on coding, while ensuring your code is properly documented according to common Python standards. Solution: The function receives a string, sets it to upper case, counts the number of letters "T", "C", "A" and "G" and compares them to the length of the string. If the string consists only of these letters, it returns True, otherwise False. 3.2 Type hints indicate the types of values You might see syntax like this: def validate_base_sequence(base_sequence: str) -> bool: # function body pass This syntax is called type hints, introduced in Python 3.5 (Stackoverflow) and expanded ever since The hints indicate the intended type of the arguments and of the return value This function accepts one argument, which is supposed to be a string. The function is supposed to return a boolean The Python interpreter doesn't use this information, it simply ignores the type hints Type hints are useful in static code analysis (without actually running the code): Type checkers like MyPy can analyse the code and discover possible bugs, e.g. if the function is called with the wrong argument type (YouTube). This helps to prevent runtime exceptions, and is especially useful for large software projects Type hints also help the programmer, because they force you to think about the types in your program (YouTube) If you write useful code that goes beyond a short script, you should use type hints (MyPy cheat sheet) Type-checking functionality is built-in in IDEs like PyCharm and can be activated in VS Code (also see here) Self-test: a. What is a docstring? b. What are type hints? 4 Positional arguments and keyword arguments So far, we called functions passing arguments in the same order as the parameters in the function definition, e.g. def print_greeting(myname, friend_name). This is called positional arguments. Functions can also be called using keyword arguments (also called named arguments) of the form kwarg=value. def validate_base_sequence(base_sequence, RNAflag): """Returns True for valid nucleotide sequence""" seq = base_sequence.upper() return len(seq) == (seq.count("U" if RNAflag else "T") # terna ry operator + seq.count("C") + seq.count("A") + seq.cou nt("G")) # Different ways to call the function validate_base_sequence("AGCTG", False) # positional arguments validate_base_sequence("AGCUG", RNAflag=True) # one positional and one keyword argument validate_base_sequence(base_sequence="AGCUG", RNAflag=True) # two keyword arguments validate_base_sequence(RNAflag=True, base_sequence="AGCUG") # keyw ord arguments can have any order Explanation ternary operator You can call a function using positional arguments and/or keyword arguments. Positional arguments are arguments passed to a function in correct positional order Keyword arguments are explicitely called by their name Keyword arguments make the function call more explicit, and you can rearrange their order in the function call There is a special notation related to positional and keyword arguments in function definitions. E.g. sorted? or help(sorted) gives: sorted(iterable, /, *, key=None, reverse=False) The / notation indicates that the preceding parameters are positional-only. You see this if you try to call sorted(iterable=[3,1,2]). (The reason for that restriction is that CPython, which is the reference implementation of the Python programming language and is written in C and Python, has many "builtin" and standard library functions that are implemented in C, and accept only positional-only parameters.) The * syntax indicates that subsequent parameters are keyword-only. You see this if you try to call sorted(['abc','a','ab'], len). The correct syntax is sorted(['abc','b','cd'], key=len) or sorted(['abc','b','cd'], reverse=True). [Additional information] Sometimes you see a notation using asterisks in the function definition: def foo(kind, *args, **kwargs): pass This is a common idiom to allow arbitrary number of arguments to functions, as positional or keyword arguments. Example: def send(message, *args, **kwargs): print(message) print(args) print(kwargs) send("hello", "Mars", "Venus", "Jupyter", favorite="Saturn", se cret_favorite="Pluto") args is a collection of type tuple, kwargs is a collection of type dict. (A tuple is similar to a list, but immutable. A dict is a collection where ojects are accessed not by position (=index), but by a key). The pass statement can be used as a placeholder for code, or to just do nothing. It lets you run the program even with some code still missing (Stackoverflow). This is useful for program design and incremental development: You draft a broad picture first, and fill in the details later. E.g., you are writing a program that makes tea. You know that you need to get a cup, but don't know yet exactly how this works. You can prepare the function and implement it later: def get_cup(): """Get cup and make it ready for tea""" # the logic might be sth like this: # check the cupboard for clean cups # make sure it's not a coffee cup # if no clean cups, wash one pass # TODO: implement later The advantage is that you have a runnable program at all times. You can add code and immediately test it. If you don't have a runnable program, you can't test your code, so you don't know if it's doing the right thing. Self-test: a. What is the difference between positional arguments and keyword arguments (named arguments)? b. How is the pass statement useful? 5 Default argument values When you define a function, you can provide default argument values. This allows optional arguments: If the function is called without this argument, it is initialized with a default value. Default arguments are always specified after the positional arguments. def validate_base_sequence(base_sequence, RNAflag=False): """return True for valid nucleotide sequence""" seq = base_sequence.upper() return len(seq) == (seq.count("U" if RNAflag else "T") + seq.count("C") + seq.count("A") + seq.cou nt("G")) validate_base_sequence("AGCTG") validate_base_sequence("AGCUG", True) # possible, but ugly validate_base_sequence("AGCUG", RNAflag=True) # much better Required arguments are arguments that must be passed to the function Optional arguments are arguments that have a default value and can be omitted from the function call Functions should behave in the way one would expect ("principle of least surprise"). This is how built-in functions behave, and this is how you should set the defaults for your own functions Many functions/methods have optional arguments that provide useful modifications to what the function does. Read the documentation to not reinvent the wheel 1. Split the string this is a test string on the first two empty spaces (Hint: str.split? ) 2. Some functions like print accept an unlimited number of arguments and additional optional arguments. Which arguments of print are optional arguments? What are their default values? 3. Example 1: What will be the output of the code? 4. Example 2: How many optional arguments does the function have? 5. Example 2: How many positional/named arguments are in the function call? # Example 1 print("shu", "bi", "du", sep="?", end="MOO") print("hooray!") # Example2 def send(message, to, cc=None, bcc=None): # function definition pass send("Hello", to="World", bcc="Cthulhu", cc="Zeus") # function cal l Solution: 1. "this is a test string".split(maxsplit=2) 2. Optional arguments are sep , end , file and flush , default values are sep=' ', end='\n', file=sys.stdout, flush=False 3. Test it in the interpreter 4. 2 optional arguments 5. 3 named arguments Self-test: a. What is the difference between required arguments and optional arguments? b. What are default argument values? 6 All functions return something Every function returns something Each function call is evaluated to a single return value. The rest of the code can then make use of the returned value. Implicitely: If there is no return statement, they return None. def be_loud(): print("Yes, I'm a function!") print("Hello, world!") result = be_loud() # bind name (variable) "result" to return value of the function print(result) Explicitly: The value passed to the return statement is returned. def add_or_subtract_two(some_num, do="a"): """Add or subtract 2 to/from a number""" if do == "a": return some_num + 2 elif do == "s": return some_num - 2 else: return False # something went wrong? result = add_or_subtract_two(17, do="s") print(result) Whenever a return statement occurs, the function immediately exits and the value is returned to the caller A function always returns one object. To return multiple values you can use e.g. a list, but it's often done via a tuple. Tuples are similar to lists, i.e. they are sequences (ordered collections) of objects, but immutable: (1, 3, "tohuwabohu") # tuple with three items (parentheses syntax) 1, 3, "shubidu" # it's possible to omit the parentheses in some ca ses Example of function returning multiple values as tuple: def get_min_max(lst): """Returns smallest and largest item""" return min(lst), max(lst) # return tuple with two items Example of string method returning multiple values as tuple: "datafile.trimmed.filtered.mapped.txt".partition(".") 1. What is the difference between printing something in the function, and returning a value from the function (example in Code 1 )? (hint) 2. What's wrong with the function in Code 2 ? Pick A, B, C or D. A. You should never use print in a function. B. Once the function gets to the return statement, it will return the value, and the remaining statements won't be executed. C. You must calculate the value of x+y+z before you return it. D. A function cannot return a number. 3. Draw a diagram of the function calls in Code 3 # Code 1 def first_item(list_of_numbers): """You really don't need a function for this, it's just an exam ple""" print(list_of_numbers) return list_of_numbers # Code 2 (example from https://runestone.academy/runestone/books/pu blished/thinkcspy/Functions/Functionsthatreturnvalues.html) def add_3(x, y, z): return x + y + z print("the answer is", x + y + z) # Code 3 def smallest_number(list_of_numbers): # print(min(list_of_numbers)) return min(list_of_numbers) smallest_number([9, 7, 2]) Solution: 1. If all you want is to print something, then you don't really need to return it. However, if you want to use the returned value (the output) later in your program, you can do that only if you return the object. (Every function call is an expression that evaluates to a value, remember? This is the value returned by the return statement.) Apart from that, there is a small difference in how Python behaves in an interactive session vs. executing the code as a script (see realpython.com). 2. The correct answer is "B": The code after the return statement won't be executed. 3. Diagram of function calls: Functions can call each other Python keeps track of function calls in the "call stack". It remembers which functions were called in which order, and when a function returns, it passes the returned value to the calling code. In this example, the function smallest_number calls the function min , which returns a value to the caller (the function smallest_number ). This example doesn't make a lot of sense, because smallest_number doesn't do anything but calling the min function, but it's easy to think of a function like double_smallest_number that does more. Self-test: a. What do functions without a return statement return? b. What type of objects can a function return? c. How can multiple objects be returned? Random number generator (xkcd.com/221) Note that the function name is preceded by int , which determines the return type of the function. Languages that explicitely define the type of variables are called statically typed. Python is dynamically typed: A variable name can be re-bound to any object type, i.e. change dynamically during program execution. 7 Functions allow to logically structure the code A function is a logical unit that should do one useful thing. A well-structured program using functions substantially improves the quality of the code. It's easier to discover and correct bugs, and it's much easier to use and modify this program. Functions: Improve program structure/design: code becomes more modular, easier to understand and less bug-prone Make code reusable: one function can be used in many places and scripts Reduce code duplication: code gets shorter, more robust (= less bug-prone), easier to modify Can be small: small functions are easier to read, write and test, and generally more useful than large and complex functions that do many things Even 1-line functions are OK. For example, you can define the function random_base() instead of writing "TCAG"[random.randint(0,3)] Can call each other for complicated tasks (compare this to Bash's way of doing things: small "functions" that can be easily chained together) General considerations for writing functions: Functions should do one thing ("Single responsibility principle"). If you have difficulties summarizing your function's purpose in a single sentence, you are probably doing it wrong. A function should either do something or return something – not both. Either it should modify the state of an object (e.g. change the value of a list or write data to a file) and return None or it should return some information/value. Doing both will lead to confusion. Try to keep your functions small, ideally no more than 20 lines of code. For example, if you have conditions (in if / else statements, while loops etc.), you can refactor them as an additional function that returns True or False Use descriptive function names that tell what the function does, this is better than a short enigmatic name. Your function should accept no more than 3-4 arguments. If you need to pass it more data, you can group this data in a collection type (like a tuple), and pass it as one argument. Prevent code duplication: If two functions are partly doing the same thing, your logic is probably off. Essential usage examples are given in Python Crash Course cheat sheets → "Beginner’s Python Cheat Sheet - Functions" TODO: 1. In the paper "Ten simple rules on writing clean and reliable open-source scientific software" (Hunter-Zinck et al. 2021), read "Rule 3: Reduce complexity when possible". What is the main message of this rule? 2. Below are two suggestions how to structure the code, Code 1 and Code 2. Which one do you prefer? source Solution: 1. The main message is to follow best practices for writing functions when writing scientific software. 2. Functions should only perform a single task, so Code 2 is preferable in most cases (unless it's a very simple function). If your function name contains the word "and", you can probably split it into two functions. Such program design makes the code easier to read, test and modify. Self-test: a. What is the single responsibility principle? b. Is code duplication a good thing or a bad thing? Maybe they had too many arguments 8 Functions have their own namespaces We can define names (variables) to refer to objects (values). What happens to names that are defined within functions? def test(): z = 7 print("test function says: z is", z) def test2(): print("test2 function says: z is ", z) test() test function says: z is 7 test2() -------------------------------------------------------------- ------------- NameError Traceback (most rece nt call last) Cell In, line 1 ----> 1 test2() Cell In, line 6, in test2() 5 def test2(): ----> 6 print("test2 function says: z is ", z) NameError: name 'z' is not defined Functions have separate namespaces: The test2 function cannot "see" the z that was defined in the test function The namespace of test has a variable z , the namespace of test2 doesn't A namespace is a collection of names mapping to objects (Python tutorial). Modules, classes and functions have their own namespaces. Names live in the namespace where they were defined: Built-in types like int , str etc. and built-in functions like len , abs etc. live in the built-in namespace, accessible from everywhere Each module/interactive interpreter has its own global namespace. Every name defined outside a function or class is global. Each imported module has its own global namespace. E.g. import random; random.random refers to the random function defined in the global namespace of the random module When you execute a function, a new local namespace is created, and deleted when the function returns. All names defined in this function live in its local namespace. There is no relation between names in different namespaces. This allows to create a name in a function, without worrying that it will clash with an existing name. [Additional information] The global and local namespaces are implemented as dictionaries. A dictionary is a collection of (key, value) pairs, where each key maps to a value. The global and local namespaces are dictionaries where they keys are variable names and the values are objects. These dictionaries can be inspected using the built-in functions locals() and globals() (Python docs), which is useful for debugging (Stackoverflow). The scope determines the order in which the namespaces are searched to resolve a name 8.1 The scope defines the visibility of names So, different name↔object mappings exist in different namespaces, thus avoiding potential name collisions. But what happens if you reference the name x somewhere in your program, and x exists in multiple namespaces? The variable scope refers to the level in the namespace hierarchy that is searched for the variable name, and defines which namespaces will be searched in what order. The search is first performed in the local namespace. In case of a hit, the variable is said to have "local scope". Otherwise the search continues then in the closest enclosing namespace, moving outwards until it reaches the module's global namespace, before moving on to the built-in namespace (referencing Python's predefined functions and constants, like len and abs ), which is the end of the line. “Scope” are the rules for finding name bindings. "Namespaces" are a way for implementing scope. Think of namespaces as a way to organize code so that names don’t clash, while the scope answers the question: “For this piece of code here, if I wrote some name, which value would it resolve to?” (quora.com) To better understand how this works, we can perform experiments on a toy example to test our assumptions. (Restart the kernel to remove any previous user-defined names.) What will be the output of each of the code cells below? Make a hypothesis, before you execute the code. # Ipython magic: remove all user-defined names %reset -f def namespace_testfunction(): a = 6 print("namespace_testfunction: a =", a) def test2(): print("test2: a =", a) namespace_testfunction() #test2() # won't work namespace_testfunction: a = 6 We know that test2 can't access the name a , because it's local to another function. What about access from the global namespace? print(a) # can `a` be accessed from global namespace? -------------------------------------------------------------- ------------- NameError Traceback (most rece nt call last) Cell In, line 1 ----> 1 print(a) # can `a` be accessed from global namespace? NameError: name 'a' is not defined Not possible. The search for names is performed in the current namespace, and then in the outer namespaces. OK, can we modify a global name in the function? a = 5 # bind `a` in global namespace print("a =", a) # access name in global namespace namespace_testfunction() # will the local `a` override the global `a`? print("a =", a) # did the function call change the global namespace? a = 5 namespace_testfunction: a = 6 a = 5 No, this is not possible. In the function, the new name binding creates a new name/object pair with local scope that shadows the same name in the global namespace. (Some objects like lists and dictionaries are mutable, which means that they can change their value. Mutable objects can be modified in functions, if the mutable object is modified in place, instead of creating a new name binding. For details, see next lecture.) Can we at least access the global name in a function? test2() # can we access a global variable within a function? Yes, because the search for names is performed in the current namespace, and then in the outer namespaces. We see that: Every function has its own local namespace. A name defined in a function lives in its namespace and is local to the function (it has local scope) The a that lives in the function is different from the a that lives in the global namespace (global variable). The global name can be referenced from any function A name in the local scope can shadow the same name in an enclosing scope. ("When a name is used in a code block, it is resolved using the nearest enclosing scope. The set of all such scopes visible to a code block is called the block’s environment", Python docs) Here is a pitfall that you should know about. What will be the output of the code below? (Make a hypothesis, before you execute the code.) def namespace_testfunction(): a = a + 1 print("in function", a) a = 5 print(a) namespace_testfunction() print(a) namespace_testfunction() 5 -------------------------------------------------------------- ------------- UnboundLocalError Traceback (most rece nt call last) Cell In, line 7 5 a = 5 6 print(a) ----> 7 namespace_testfunction() 8 print(a) 9 namespace_testfunction() Cell In, line 2, in namespace_testfunction() 1 def namespace_testfunction(): ----> 2 a = a + 1 3 print("in function", a) UnboundLocalError: cannot access local variable 'a' where it i s not associated with a value Weird, why do we suddenly get an error? Due to the assignment a =... in the function, Python assumes that we intend to use a as a local variable. Knowing this, the lookup for a in the line a = a + 1 is performed only in the local namespace, where a is not defined! 8.2 The global keyword lets you assign to global scope What will be the output of the code below? (Make a hypothesis, before you execute the code.) def namespace_testfunction(): b = a + 1 print("in function", a, b) a = 5 b = 10 print(a, b) namespace_testfunction() print(a, b) namespace_testfunction() 5 10 in function 5 6 5 10 in function 5 6 As expected, we can reference a global name in the function, but a new name binding has local scope. In very rare cases you may want to change this behavior, because you need to assign to a global variable in a function. Python has a special keyword, global. If a name is declared global , it will be interpreted as a global variable, and re-binding this name (assigning to this variable) will happen in the global scope. def namespace_testfunction(): global b b = a + 1 print("in function", a, b) a = 5 b = 10 print(a, b) namespace_testfunction() print(a, b) namespace_testfunction() 5 10 in function 5 6 5 6 in function 5 6 This is rarely used, because it can lead to bugs that are hard to track down Recommendation: Don't use global unless you know exactly what you're doing [Additional information] Python 3 introduced a new keyword nonlocal , which is used even more rarely than global. It lets you perform name bindings in an outer (but non-global) scope. This is relevant in case of nested function definitions, if an inner function wants to modify a name binding in the enclosing function scope (Stackoverflow). Self-test: a. What is a namespace? b. What namespaces do you know? c. What is variable scope? d. What are the rules for name resolution in Python? source 1. 10 2. 20 3. NameError exception 4. None of the above 9 Exceptions are errors that occur at runtime (during program execution) The function below returns False if called with an unknown argument: def add_or_subtract_two(some_num, do="a"): """Add or subtract 2 to/from a number""" if do == "a": # add return some_num + 2 elif do == "s": # subtract return some_num - 2 else: return False # Inform user that something went wrong result = add_or_subtract_two(17, do="r") print(result) False But is this a good idea? The user might erroneously think that False is a valid result calculated by the function. However, we didn't use the function as intended. If something unintended happens, it's better if the code breaks with an error to show that something went wrong: def add_or_subtract_two(some_num, do="a"): """Add or subtract 2 to/from a number""" if do == "a": return some_num + 2 elif do == "s": return some_num - 2 else: raise Exception("Sorry, no idea what to do") # Inform the user th at something went wrong result = add_or_subtract_two(17, do="r") print(result) -------------------------------------------------------------- ------------- Exception Traceback (most rece nt call last) Cell In, line 10 7 else: 8 raise Exception("Sorry, no idea what to do") # Inform the user that something went wrong ---> 10 result = add_or_subtract_two(17, do="r") 11 print(result) Cell In, line 8, in add_or_subtract_two(some_num, do) 6 return some_num - 2 7 else: ----> 8 raise Exception("Sorry, no idea what to do") Exception: Sorry, no idea what to do Now the user can be completely sure that something went wrong. Two types of errors can prevent program execution: 1. Syntax errors (parsing errors) are perhaps the most common kind of complaint you get while you are still learning Python: >>> while True print("Hello world") File "", line 1 while True print("Hello world") ^ SyntaxError: invalid syntax The parser repeats the offending line and displays a little arrow pointing at the earliest point where the error was detected. The error occured nearby: in this example, it is detected at print() , since a colon ( : ) is missing before it. File name and line number are printed so you know where to look in case the input came from a script. 2. Exceptions. Even if a statement or expression is syntactically correct, it may cause an error during execution (at runtime). Such errors are called exceptions, they disrupt the normal flow of the program. When a Python script encounters a situation that it cannot cope with, it raises an exception. An exception is a Python object that represents an error. There is a limited number of exception subtypes, all of which have a name (like every Python object) The raise statement allows the programmer to force a specified exception to occur (Python tutorial); this is useful in advanced software development Example: 1/0 # do you think this'll work? raise ZeroDivisionError # command to generate ("raise") an excepti on raise ZeroDivisionErr # to demonstrate that `ZeroDivisionError` is an object name (also try `type(ZeroDivisionError)`) 9.1 Catching exceptions In case of an unhandled exception, the program terminates We rarely need to raise exceptions (unless it's advanced software development), but we sometimes need to handle exceptions, i.e. tell the program what to do if an exception occurs (Python tutorial) The try... except statement specifies what happens if an exception occured in the try block: try: print("Hi") add_or_subtract_two(17, do="r") print("Bye") except: print("Sorry, this doesn't work (yet?)") # Continue as usual print("Yay, we are still live!") Hi Sorry, this doesn't work (yet?) Yay, we are still live! This is called catching an exception: We tell explicitely how to handle this situation It is often preferable to catch only specific exception subtypes E.g., if you expect that a division by zero can occur in your program: my_list = [3, 4, 0] try: my_list / my_list except ZeroDivisionError: # specifies the exception subtype print("Division by zero, this can happen here.") print("Continue program execution.") Division by zero, this can happen here. Continue program execution. This code catches the division-by-zero situation, but not other types of problems, e.g. if the list is too short or contains a wrong type ("TypeError"): my_list = [3, 4, "hello"] try: my_list / my_list except ZeroDivisionError: print("Division by zero, never mind.") print("Continue program execution.") -------------------------------------------------------------- ------------- TypeError Traceback (most rece nt call last) Cell In, line 4 1 my_list = [3, 4, "hello"] 3 try: ----> 4 my_list / my_list 5 except ZeroDivisionError: 6 print("Division by zero, never mind.") TypeError: unsupported operand type(s) for /: 'int' and 'str' What is the exception type in case the list is too short, and the item can't be accessed? The except clause may specify a variable after the exception name. The variable is bound to an exception object which allows to get detailed informations about the error without disrupting the program flow: my_list = [3, 4, "hello"] try: my_list / my_list except TypeError as e: # binds name "e" to exception object print("No can do:", e) print("Continue program execution.") No can do: unsupported operand type(s) for /: 'int' and 'str' Continue program execution. [Additional information] The try... except statement has an optional else clause. It is useful for code that must be executed if the try clause does not raise an exception. This is better than adding more code to the try clause, because it avoids accidentally catching an exception that we didn't intend to catch. The try... except statement is only used to protect code where we expect that things can go wrong in a particular way. (Python tutorial) my_list = [3, 4, 6] try: print("This may fail:", my_list / my_list) except ZeroDivisionError: print("Division by zero, never mind.") else: print("This executes only if there was no exception:", my_list / my _list) print("Continue program execution.") This may fail: 0.5 This executes only if there was no exception: 2.0 Continue program execution. 9.2 Python error messages: Tracebacks When your program results in an exception, Python prints the current traceback to tell you what went wrong. A traceback is a report containing the sequence of function calls made in your code at a specific point. (What Python calls traceback is also known by names like stack trace, stack traceback or backtrace.) Reading and understanding the traceback provides helpful information about what went wrong. Some interesting exeptions you saw (or will see): NameError: name 'abcd' is not defined IndexError: string index out of range SyntaxError: invalid syntax ZeroDivision Error: division by zero AttributeError: 'str' object has no attribute 'isuper' There is a number of useful built-in exceptions that cover many different situations. Error messages tell you exactly what went wrong. It's much worse if something goes wrong silently, and you get a wrong result. Self-test: a. What are exceptions? b. What is exception handling? c. What are tracebacks? 10 Development environment Text editors are OK for simple scripts, but IDEs (Integrated development environment) offer helpful options for code development. Many code editors can be extended by plugins to have IDE-like capabilities. Most IDE features are based on "understanding" the code. IDEs offer options like: Syntax highlighting Autocompletion Real-time code analysis: linting, code style analysis Project management (overview over all files that make up a project) Debugging Dependency checks, testing assistance etc. Some Python-specific or Python-aware IDEs are (recommendations in bold): Jupyter notebooks/JupyterLab + plugins – "JuPyteR" stands for Julia, Python and R, but supports a multitude of other languages ("kernels"); well-suited for exploratory data analysis, reproducible workflows and presentations (nature.com) Code editor + plugins: VS Code + Python plugin – very popular, extensible via plugins; Jupyter notebooks support, many useful plugins like real-time collaboration Sublime, vim, Atom, Notepad++ (on Windows),... Python-specific IDEs: Spyder – Python-specific IDE for scientific programming, e.g. real-time code analysis/linting, part of Anaconda distribution (GitHub) PyCharm (Community edition) – complete Python-specific IDE, very popular, recommended for software development Others: Wing IDE, Eric, etc. Universal IDEs: Eclipse (+ PyDev plugin), Netbeans, etc. Non-GUI text editor like nano or vim for quick fixes (vim can be customized via plugins) Cloud-based IDEs (may become even more important in future): Codespaces, Google Colab, Kaggle,... It's worthwile to spend some time to learn the abilities of an IDE, to make code development easier and more efficient. Note that while Jupyter is great for exploratory data analysis, there are downsides to it for software development, for which Spyder or VS Code are better suited (both are a good choice). 10.1 Some useful VS Code options/extensions Ruff linter (also see VS Code docs) Black formatter (Ruff also includes a formatter and may replace Black in the long run) Code spell checker Markdownlint Remote SSH File Utils VS Code has many options, and it's impossible and unnecessary to learn them all at once. Start somewhere, and move further when you have the chance. Examples: Commenting in/out, indenting/unindenting blocks of code Right mouseclick on a function call → "Go to definition" Code outline explorer (Explorer → Outline) Comment lines with # TODO , # FIXME etc. are highlighted (of course, there is also an extension...) 10.2 Some useful Jupyter options Jupyter was designed as an open platform and makes it easy to extend its functionality using via extensions. JupyterLab and Jupyter notebooks are two different components of the Jupyter ecosystem, and both have their own extensions. Jupyter notebook extensions are installed as one single package. The question which extensions are useful comes down to personal preferences. Examples: Hinterland – enable code autocompletion menu for every keypress in a code cell Variable Inspector – show all variables and their type, size, shape, and value Codefolding – add code folding capability Autopep8 – reformat code to comply with the PEP8 guidelines Other possibly interesting extensions: Scratchpad – an additional cell that acts like a scratch notebook where you can execute code without having to modify your notebook Spellchecker – highlight incorrectly spelled words in markdown cells Snippets – create code snippets and examples of code to insert into your notebook Collapsible Headings – organize your notebook into sections that can be collapsed Table of Contents 2 – easier navigation zenmode – remove the menu to focus on the code JupyterLab was designed to replace Jupyter notebooks in the long run. It has a built-in extension manager, which you can use to search, install or disable extensions. Some functionality is built-in, e.g. autocompletion, table of contents and the debugger. Examples of extensions: Variable inspector Code formatter Spellchecker Jupyterlab-LSP – makes JupyterLab look a lot more like an IDE; looks promising, but is not yet integrated with JupyterLab (towardsdatascience.com, jupyter.org, jupyterlab- lsp.readthedocs.io) Jupytext, nbdime,... more extensions Real programmers (xkcd.com/378) 10.3 Managing Python files Create a Python projects directory in your home folder (or use e.g. the Documents folder) Create subdirectories for each project, e.g. ~/python/proj1 , ~/python/proj2 etc. During development you can create file versions (with names like test01.py , test02.py etc.) to be able to go back to earlier versions Regularly back up your data according to the 3-2-1 strategy

Use Quizgecko on...
Browser
Browser