Python Data Types - First Part PDF
Document Details
Uploaded by UserReplaceableComplex8014
FH Wiener Neustadt
2024
Dmitrij Turaev
Tags
Summary
This document introduces Python programming, focusing on data types and name bindings. It covers basic concepts like sequence, decision, and repetition. Examples and solutions for problems are included. This document is part of a Bio Data Science course.
Full Transcript
Contents 1 Programming is based on a few simple concepts 2 A brief history of Python 3 Python is a high-level programming language 4 The interactive interpreter lets you run Python code interactively 5 Know your interpreter 6 Expressions are values, statements are code Grundl...
Contents 1 Programming is based on a few simple concepts 2 A brief history of Python 3 Python is a high-level programming language 4 The interactive interpreter lets you run Python code interactively 5 Know your interpreter 6 Expressions are values, statements are code Grundlagen der Programmierung in Python 7 First Python program: "Hello, world" Bio Data Science, 2024 Dmitrij Turaev Hello, Python: Data types and name bindings 1 Programming is based on a few simple concepts Programming is a set of instructions that tell the computer how to solve a problem, i.e. how to convert inputs into outputs. This is similar to creating a cooking recipe, that tells how to convert food products (the input) into a tasty dish (the output). First, you need to clearly define what you want to do (what the inputs and outputs are) Then you can think about how to do it (give precise instructions how to convert inputs into outputs). You can do this as pseudocode first (can be done on paper), and implement it as program code later The key is to break up a large problem into smaller problems, which are easy to solve All computer programs can be made from a few simple ideas: 1. Sequence – list of subsequent instructions 2. Decision – if condition is true, then outcome A, else outcome B 3. Repetition – repeat instructions a given number of times, or until something happens Main control flow constructs (T = True, F = False) Here is an example. Problem: Given two strings ransomNote and magazine , return true if ransomNote can be constructed using the letters from magazine , and false otherwise. Each letter in magazine can only be used once in ransomNote. Input: string ransomNote , string magazine Output: true or false Example 1: Input: ransomNote = "aa", magazine = "aab" Output: true Example 2: Input: ransomNote = "aa", magazine = "abb" Output: false Example 3: Input: ransomNote = "i love you", magazine = "the quick brown fox jumped over the lazy dog" Output: true Give step-by-step instructions how to solve this problem. Solution: Here is one possible solution approach: For each letter in ransomNote: For each letter in magazine: If it's the same letter, delete it from magazine, and stop inner loop If letter wasn't found, return "false" and stop Return "true" and stop There are two nested loops here, because for each letter in ransomNote you need to look through possibly all letters of magazine , until a match is found. If it's found, you stop the inner loop, go to the next letter of ransomNote (the outer loop), and then again search possibly all letters of magazine , etc. If you apply this algorithm in your mind to the examples above, you'll see that it works. That's great! But you probably noticed that we repeated the same operation (searching all letters of magazine , the inner loop) many times (as many as there are letters in ransomNote ). Let's see if we can do better. Here is another approach: For each letter in magazine: Count how often it occurred (save the count in a count table) For each letter in ransomNote: If the letter occurs in the count table, reduce its count by 1 If it doesn't occur in the count table, or its count is 0, retu rn "false" and stop Return "true" and stop Again, there are two loops. But they are not nested. Instead, you go once through all letters of magazine , and then you go once through all letters of ransomNote. This is a much more efficient solution (the difference is quite substantial). 2 A brief history of Python Python was created around 1990 by Guido van Rossum, a Dutch programmer. It was strongly influenced by several preceding languages. I know what you think, Guido loved snakes. That's not true, though: Guido loved the British TV show Monty Python's Flying Circus. The official Python documentation claims that it helps if you also like Monty Python. Python 2.0 was released in 2000. Its final version was Python 2.7.18, released in 2020. Some people are still using it. However, this is the last version of Python 2, development has focused completely on Python 3. Python 3.0 was released in 2008. It introduced several changes that made it backwards incompatible (Python 2 code didn't work with Python 3). Some of the core functionality was changed to make the language more consistent and future-proof. Not all software projects survive such a breaking point; Python did. All important libraries have been ported to Python 3. Python has a regular release cycle (PEP602). New Python 3 versions introduce new features, but remain backwards compatible. Python development is based on PEPs, Python Enhancement Proposals. "A PEP is a design document providing information to the Python community, or describing a new feature for Python or its processes or environment" (PEP1). Currently, it's probably the most popular scripting language for data science and machine learning. There are several interesting competitors, but they didn't catch up yet. Wrong Python version? 3 Python is a high-level programming language From the program in its human-readable form of source code, a compiler or assembler can derive machine code — a form consisting of instructions that the computer can directly execute. Alternatively, a computer program may be executed with the aid of an interpreter. (Wikipedia: Computer program) In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language elements, be easier to use, or may automate (or even hide entirely) significant areas of computing systems (e.g. memory management), making the process of simpler and more understandable than when using a lower-level language. The amount of abstraction provided defines how "high-level" a programming language is. (Wikipedia: High-level programming language) The reality is slightly more complicated, and the distinguishment between compiled and interpreted languages is not always clear. For example, Python is considered an interpreted language. However, Python code is compiled to bytecode before execution. The bytecode is a low-level platform-independent representation of your source code. Python scripts have the.py file ending, and bytecode files have the.pyc file ending. The bytecode is then sent for execution to the Python Virtual Machine, which is specific to the target machine. The default implementation of the Python language engine (the program that runs Python programs) is CPython, which is written in the C programming language (Stackoverflow). CPython compiles the Python source code into the bytecode, and this bytecode is then executed by the CPython virtual machine. High level vs. low level languages Tradeoff: Efficiency (execution speed) ⟷ Readability (high level of abstraction) Self test: a. Difference between high-level and low-level languages? b. Difference between interpreted and compiled languages? High level or low level? 4 The interactive interpreter lets you run Python code interactively Start the interactive Python interpreter by entering python in the terminal. Use quit() , exit() or Ctrl + D to exit the interpreter Which Python version are you using? Solution: $ which python $ python --version A REPL (Read, Evaluate, Print and Loop) is an interactive way to execute Python code. You just type your commands and hit return for the computer to evaluate them. The Python interpreter: Reads the user input (your Python commands) Evaluates your code (to work out what you mean) Prints any results (so you can see the outcome) Loops back to step 1 (to continue the conversation) Programmers use the REPL to execute pieces of code, often to test ideas and explore problems. Because of the instant feedback you get, the REPL makes it easy to explore all the functionality that Python has to offer. The interpreter tells you it’s waiting for instructions by presenting you with three chevrons ( >>> ). The easiest way to use Python is as a calculator, to add, multiply, subtract and divide numbers. Type something and see what happens. >>> 4 * 5 # multiplication 20 >>> 2 / 3 # division; (also try this in Python 2) 0.6666666666666666 >>> 2 ** 3 # exponent 8 >>> 7 % 2 # modulo division 1 What about square root? Additional functionality is often outsourced into modules (the generic term is "library"). Modules have to be imported to access this functionality: >>> import math # import the math module >>> math.sqrt(16) # function "sqrt" from the math module calculate s the square root 4.0 A module is a file with Python code. It contains definitions (functions, classes and variables) and statements (instructions) The module name is the file name without the file name extension.py Many modules are provided with Python as part of the standard library. Some of them are written in C ("built-in" modules), most are written in Python. Many more third-party modules can be additionally installed We just imported a Python module, called a function from this module and passed it an argument. That's pretty advanced! Let's try something else: >>> "abc" + 3 >>> "abc" + "def" >>> "abc" - "def" >>> "abc" * 3 We can see that numbers and strings behave in different ways. How does Python know which is which? Every object in Python has a type. (Unlike Bash, which is one of the reasons why its capabilities are very limited.) In computer science and computer programming, a data type or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support common data types of real, integer and boolean.... This data type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored. (Wikipedia: Data type) The Python function type returns the type (= class) of an object: >>> type("abc") # determine type of object "abc" >>> type(123) >>> type(True) >>> type(None) >>> type(Nothing) 1. How many arguments does the function type accept? 2. How many arguments does the function sqrt from the math module accept? Solution: 1. In the example above, type accepts one argument and returns the object type. Functions may accept arguments, and always return something. An argument is an object that you pass to a function. 2. math.sqrt accepts one argument. Here are some types we've seen so far: int – integer float – floating point; examples of scientific notation with base/exponent: 2.34e5 , 2e-3 str – string; " or ' enclose strings, triple quotes for multiline strings bool – boolean, only two possible values: True and False None – special "NoneType", only one possible value None ("Explicit is better than implicit", Zen of Python) None is used when you want to explicitely state that something exists but doesn't have a value. Many other programming languages call it NULL. This is sometimes useful, you'll see examples later. What is the result type of the following expressions? 2 + 1 2.0 + 1 "abc" * 2 True and False Solution: Find out using the type function, e.g. type(2 + 1) or type('abc' * 2). This works because first the expression (e.g. 2 + 1 ) is evaluated, and then the result ( 3 ) is passed to the type function, so that it receives one argument. You could also write type(3) or type("abcabc") instead. Another function tests if an object has a particular type: >>> isinstance("abc", str) # test if an object is an instance of a particular type/class Almost everything in Python is an object. How can you tell? Easy: If you pass the object as argument to the type function, and type doesn't throw an error, then it's an object Try it: type(math) — you can see that the imported module is represented by an object Self-test: a. What is a Python module? Example? b. What property does every Python object have? c. What is a data type? d. What Python types do you know so far? e. Which functions provide information about the object type? If toilet paper was a data type Difference between 0, None and not defined 5 Know your interpreter The interactive interpreter is very useful for running a few lines of code, e.g. performing small calculations and testing code snippets. Similar as the Bash shell, the Python interpreter knows several keyboard shortcuts. ↑ / ↓ → scroll in command history Ctrl + A → go to beginning of line Ctrl + E → go to end of line Ctrl + → → jump one word to the right Ctrl + ← → jump one word to the left Ctrl + K → delete rest of line after cursor Ctrl + _ → undo Ctrl + R → search command history and, of course: Tab for autocompletion IPython is an interactive Python shell with additional abilities. The project goal was to create a comprehensive environment for interactive and exploratory computing. A part of the project, most importantly the notebook and related tools, was outsourced into a separate project, Jupyter. You can run IPython from the command line by entering ipython in an environment where it's installed. The IPython kernel (the program that runs and introspects the user’s code) is also used by tools like Jupyter (jupyterlab.readthedocs.io) and Spyder. You can always use IPython instead of Python. It has several advantages: Syntax highlighting: e.g. functions are green, strings are orange, and module imports are blue Autocompletion: You remember the sqrt function from the math module? Wondering what other functions math has? Type: math.. When you go through the functions, Ipython also tells you how many arguments they accept Copy/paste multi-line code: This is not reliably possible with the regular Python interpreter Here is another of IPython's superabilities: Any command that works in Bash can be used in IPython by prefixing it with the ! character. !which python !python --version !which ipython !ipython --version IPython also has a number of "magic commands" (IPython magics), which are sometimes useful. E.g., %paste or %cpaste commands let you paste multi-line code snippets (if regular copy-paste doesn't work as intended) %lsmagic – lists all available magic functions %quickref – shows a quick reference sheet The regular Python interpreter has three chevrons ( >>> ) as prompt, while IPython uses a numbered prompt ( In : ). Many code examples in this and other chapters use the >>> prompt by convention. You can execute them in IPython. [Additional information] JupyterLab is a web-based user interface for Project Jupyter (jupyter.org). It enables you to work with Jupyter notebooks, text editors and terminals. Jupyter notebooks (.ipynb files) are documents that combine live runnable code with narrative text (Markdown) and output/visualizations. A notebook kernel is a "computational engine" that executes the code in the notebook file. (In this architecture, the web interface in the browser is called "frontend", and the kernel that runs in the background is called "backend"). The default is the IPython kernel. It is very similar to the IPython running in the terminal, but there are a few minor differences. For example, not all output is printed by default (this can be changed, see Stackoverflow). A notebook consists of a sequence of cells. A code cell allows you to edit and write new code, with syntax highlighting and tab completion. The programming language you use depends on the kernel, and the default kernel (IPython) runs Python code. When a code cell is executed, code that it contains is sent to the kernel associated with the notebook. The results that are returned from this computation are then displayed in the notebook as the cell’s output. The output can be text, figures, HTML tables and more. This is known as IPython’s rich display capability. (Jupyter docs) The Jupyter notebook interface In addition to running your code, Jupyter stores code and output, together with markdown notes, in an editable document called a notebook. When you save it, this is sent from your browser to the notebook server, which saves it on disk as a text document (JSON format) with a.ipynb extension. The notebook server, not the kernel, is responsible for saving and loading notebooks, so you can edit notebooks even if you don’t have the kernel for that language— you just won’t be able to run code. The kernel doesn’t know anything about the notebook document: it just gets sent cells of code to execute when the user runs them. Which Jupyter version are you using? Solution: !jupyter --version 6 Expressions are values, statements are code You will often hear the terms "expressions" and "statements". What do they mean? An expression is something that evaluates to a single value A statement is code that does something, like assigning a variable or displaying a value An expression is a combination of values, variables, operators, and function calls that are evaluated to one resulting value (Stackoverflow). The interpreter evaluates expressions interactively: >>> 2 + 3 5 >>> 1 + 2 + 3 * (8 ** 9) - math.sqrt(4.0) 402653185.0 >>> 23 # A value all by itself is also a (simple) expression 23 >>> 4.0 4.0 Operator, operand, literal An operator is a symbol that represents an action. It tells the interpreter to perform a specific mathematical or logical operation on the operands and produce the final result. A literal is the literal notation for representing a fixed (constant) value, e.g. 42 , 3.14 , 1.6e-10 (numeric literals), or "Hello, world" (string literals). If you need a number in your code, and you are not reading it from a file, or from the keyboard, or from a database, or calculating it, or importing it from a module, then you can use a numeric literal. More examples of expressions: >>> min(2, 22) # function calls are always evaluated to one value 2 >>> max(3, 94) 94 >>> round(81.5) 82 >>> math.pi * 2 6.283185307179586 >>> "foo" # every value is also an expression 'foo' >>> "foo" + "bar" 'foobar' >>> "abc" * 2 'abcabc' >>> None None >>> True # this is a value of the type "bool" True If you ask Python to print an expression, the interpreter evaluates the expression and prints the result: >>> print(min(max(3, 10), 5)) # evaluation order: max(3, 10) → mi n(10, 5) → print(5) Statements are everything that can make up a line (or several lines) of Python code. Statements usually do something. Examples of statements are loops and conditionals. In Python, expressions are also considered statements. Examples: >>> x = 17 # Variable assignment: does not return a value, but is simply executed >>> if x == 17: print("hello") hello >>> print(42) 42 >>> 3 + 7 10 Self-test: a. What is an expression? b. What are examples of statements? c. What is an operator? What is an operand? 7 First Python program: "Hello, world" The interpreter is great for testing short pieces of code. Anything longer than a few lines should go in a script, to make it reproducible. A Python script is a text file with Python instructions. It can be saved, modified and executed. The script is run by the Python interpreter as if the commands were entered line by line. There is, however, a small difference. If you type an expression in interactive mode, the interpreter evaluates it and displays the result: >>> 1 + 1 2 But in a script, an expression all by itself doesn’t do anything! You need to print the value explicitely using the print function. 1. Open a new file in a text editor (VS Code, Spyder, JupyterLab editor, vim,...) 2. Enter these lines: #!/usr/bin/env python print("Hello, World!") 1. Save the file under an expressive file name, for example hello_world.py The file extension.py is optional, but strongly recommended General rules for file names: Only alphanumeric characters and _ , - and. No "umlaute", no spaces, no special characters! English names (you never know who is going to read it later) Descriptive names: e.g., guess_what.py is not as good as calculate_protein_properties.py 2. Run the program in a terminal: python path/to/hello_world.py If the script is in your current directory: python hello_world.py You should see the output of your first program! Other than it's pretty short, it's a fully valid program. Congrats, you're officially a Python programmer now. TODO: 1. What is the difference between the shebang lines #!/usr/bin/env python and #!/usr/bin/python ? (Hint) 2. What is required to be able to run the program using the syntax $ path/to/hello_world.py ? 3. How many arguments did you pass to the print function? 4. Does the print function accept more than one argument? Try it in the interpreter, then in your script. (If multiple arguments are passed to a function, they are separated by commas; e.g. min(2, 22) → 2 arguments were passed to the min function.) 5. Add some more print functions that output the values of some expressions/functions of your choice. E.g., what will print(min(2, 22)) do? Solution: 1. First case: You want to use the currently active python executable, as given by $ which python. Second case: You want to use a defined Python executable. 2. Make it executable: $ chmod +x hello_world.py 3. One argument. 4. Yes, it accepts an unlimited number of arguments. (If you look at the help message of the print function – you can do this by typing print? in Ipython –, you'll see that is says print(*args, sep=' ', end='\n', file=None, flush=False). The notation *args signifies an unlimited number of arguments; the rest are optional arguments, similar to command options in shell commands.) 5. It will evaluate the expression and print the resulting value. Important note You should be very clear on the distinction between a Python script and a Jupyter notebook (also see this blog post for details). A Python script is a plain text file with Python code and the file extension py. It can be opened and edited in any text editor. It is run in the terminal from the Bash command line, and the standard output also goes to the terminal (unless redirected to a file). When you run the script, it is executed completely from top to bottom. This is the traditional way to write and execute Python code, and the most useful approach for many applications. A Jupyter notebook is a text file with a more complex format that distinguishes text and code cells, and includes code output and plots. It is edited in a specialized IDE like Jupyterlab or VS Code. You can run code cells separately and out of order. Conceptually, this is very similar to working in the interactive Ipython interpreter. Notebooks are well-suited for data science and exploratory data analysis, reproducible workflows, presentations and teaching. Note that if an exercise explicitely asks to write a Python script, you should write a Python script and demonstrate its execution in the terminal. Generally, you should be able to use both approaches, and pick the one that is more suited to your use case. Self-test: a. What are two ways of executing Python code? b. What is a Python script? c. What is the shebang line? d. What is a function argument? e. How do you print something in Python?