Bio Data Science 2024: Python Programming Course Intro PDF

Document Details

UserReplaceableComplex8014

Uploaded by UserReplaceableComplex8014

FH Wiener Neustadt

2024

Dmitrij Turaev

Tags

python programming bioinformatics data science computer programming

Summary

This document provides an introduction to a Python programming course for bio data science in 2024. It covers topics including what programming is, why learn Python, the course goals, organization, and general resources. The course aims at teaching Python programming and concepts relevant to bio data science.

Full Transcript

Contents   1 What is programming? 2 Why learn it?  3 Why Python? 3.1 Python is popular and widely used 4 Course goals 5 Course organization Grundlagen der Programmierung in Python  6 General notes and additional resources...

Contents   1 What is programming? 2 Why learn it?  3 Why Python? 3.1 Python is popular and widely used 4 Course goals 5 Course organization Grundlagen der Programmierung in Python  6 General notes and additional resources Bio Data Science, 2024 6.1 Python resources Dmitrij Turaev 6.2 A note on ChatGPT & Co 7 How to learn Python, or anything else? Introduction and general info 1 What is programming? Modern biology is not possible without computers. They allow to process large datasets, access databases and apply complex computational methods within seconds. Many tasks can be done via GUI-based applications, e.g. apps or browser interfaces. But when tasks become more complex, a GUI is too limiting, and you have a distinct advantage if you know a programming language. Essentially, programming is a language that you can use to talk to a computer and tell it what to do. A computer program is a collection of instructions that can be executed by a computer to perform a specific task. (Wikipedia: Computer program) Algorithm = sequence of instructions to solve a defined problem or perform a computation (can be written on paper) Program = implementation of an algorithm, written in a language a computer understands, like Java or Python 2 Why learn it? Obviously, two important reasons are that it's fun, and that other people will be impressed because they think you're a hacker. But there is more: 1. Automation of repetitive tasks. Maybe you want to rename thousands of files, download all records from a database (like this one), or filter sequences from a FASTA file? You may be able to do it manually, but it will take a really long time. You may be able to find a tool that does the right thing, if you're lucky. Or you can write a short Bash or Python script that will do what you want. 2. Custom analysis of large-scale data. GUI-based applications for data analysis are limited by the available options. The more options there are, the more complex the GUI becomes. Programming is more like a constructor building block: It allows you to do almost anything using a small number of basic constructs. 3. Reproducible and transparent research. An interesting feature of programming code is the the code itself effectively corresponds to a labbook, describing everything that happened with the data to get the results. You can quickly reproduce the whole project simply by running the code. In short, modern biology isn't possible without data processing, and data processing is very limited without programming. "11th Grade" (xkcd.com/519) Python is even better 3 Why Python? Powerful: can do anything you might want to; many toolkits (libraries) for bioinformatics, data science and machine learning High-level language: high level of abstraction speeds up the development and increases productivity Elegant: intuitive syntax, easy to write and read Popular: large and friendly community Backed by corporate sponsors (e.g. Google) Python is very versatile and used by many large companies like YouTube, Spotify, Instagram, and others. It's also great for data science-related applications like image processing, AI and machine learning and all kinds of exploratory data analysis. Right now, it may be the best choice for all bioinformatics- and data science-related applications. It's worth noting that Python is a general-purpose language, and there are other languages better- suited for specific tasks. Being a high-level language, Python is relatively slow compared to languages like C and even Java. Instead, Python allows to be productive by quickly expressing ideas on a high level of abstraction. The C language is very fast, and time-critical algorithms are usually implemented in C Java might be better-suited for GUI apps, mobile development and large software development projects Another popular choice for data science-related applications is R, which excels in statistical modeling and visualization. Knowing both languages allows you to alter between them depending on the task at hand. Once you understand the concepts, it's also easier to learn other programming languages. Every language has its limitations, and there are always interesting competitors on the market; who knows what will happen in some years. ("Prediction is very difficult, especially about the future.") However, currently Python appears to be a future-proof choice even beyond data science (2024 update: index.dev, crossover.com). 3.1 Python is popular and widely used GitHub (a popular service that allows developers to host their code online for free, and collaborate during code development) releases an annual report with statistics about the latest developments on the platform: Python held steady in the second place position over the past year in large part due to its versatility in everything from development to education to machine learning and data science. (octoverse.github.com 2022, section: top languages) Stack Overflow (a question and answer site for professional and enthusiasts, which allows to upvote/downvote questions/answers and edit them, similar to a wiki or Reddit, Wikipedia: Stack Overflow) in its annual developer survey on the latest trends and preferences among developers: 2023 continues JavaScript’s streak as its eleventh year in a row as the most commonly-used programming language. Python has overtaken SQL as the third most commonly-used language, but placing first for those who are not professional developers or learning to code (Other Coders). (Stack Overflow survey, 2023) Popularity of Python-related questions on Stack Overflow Inspect the most recent developments (popularity of programming languages by question views) here 4 Course goals Learning outcomes. Setting up a development environment (virtual machine) Understanding of computer-related terms Basics of Linux and Bash scripting Programming basics using Python Important programming concepts Writing Python scripts Programming style Software design Debugging Examples of important libraries Final goal: Writing short scripts that perform useful tasks in your everyday biological data analysis Foundation to go deeper by yourself (more advanced shell and Python programming, other programming languages) The goal of the course is to help you learn Python, explain important concepts and gain practical experience with the language. We'll spend the first few courses with Linux & Bash, then switch to Python. Information is mostly presented as bullet points and code examples, with more extensive explanations if required. Course features. Many good resources is available on the internet. Special features of this course are: Selection of topics and examples relevant for bio data science Compilation from different sources, not based on a single book; relies on many external resources that are freely available Lecture notes oriented towards life scientists Only important information, densely packed Additional information and links included Actualized on the go – let me know about mistakes, broken links or incomprehensible passages 5 Course organization You need: Your own notebook Internet access Server access (intranet or VPN) Linux (Ubuntu), preferably as virtual machine (e.g. VirtualBox) No previous programming knowledge expected General: 6 ECTS → 150 h (1 ECTS = 25 h workload, oesterreich.gv.at) Final assignment: ~20 h? 130 h, ca. 3 months → ~10 h / week Teaching materials provided on Moodle Language: all written course materials: English lectures: English or German Grading: ILV (Integrierte Lehrverstaltung), no single exam Final grade (100 points = max. grade, 50 points = passing grade), based on: 33% — Oral grade, average of 1 or more brief oral examinations, e.g. presentation of home assignments Home assignments are a crucial part of this course, and the course can't work without them. If you are asked to present a home assignment and you can't, you get one "free ticket" (not graded), and otherwise 25 points for this oral examination. It's also possible that you are asked for this home assignment later, so you need to catch up on all home assignments that you missed. 33% — Written test at the end of the course (last lecture) 33% — Final home assignment after the last lecture (group work) A failure to follow explicit specifications (e.g. you have to submit the final project in groups of 2-3 people, but you submit in a group of 4) will lead to points deduction in the evaluation Note: The course is designed so that it's possible for essentially everybody to reach 50%, but requires substantial effort to reach 100% Code is generally graded based on three criteria: Correctness: how well the code fulfills the specifications and is free of bugs Design: how well the code is written/designed (clearly, efficiently, elegantly, and/or logically) Style: how readable the code is (comments, variable names, etc.) The weighting (in %) for the three criteria are ca. 60:30:10 (correctness:design:style). During each lecture: Brief oral examinations (home assignments and questions) Handout with exercises and code examples After each lecture: Handout with exercise solutions and self-test questions Home assignment for the next lecture The home assignments can require substantial time, but practice is the only way to learn programming Collaborative work is encouraged (unless stated otherwise): It's recommended that you work on code in groups and discuss the results. This approach has the nice name mob programming, in which groups of 3-4 people discuss the code together, rather than distributing the tasks. If you use someone else's code, it's crucial that you always understand what it does. Recommendations: use VS Code live share, set up a Discord server For group assignments, the groups shouldn't exchange code between each other Consultation hours Monday 16:00-17:30 – questions, problems, etc. (via MS Teams) Contact any time via E-mail or MS Teams 6 General notes and additional resources Biodatasciencetulln Python Wiki Programming is not hard, but requires practice The beginning might be the most confusing/hardest part Work together, talk to each other (online and offline), use social platforms and collaborative tools (VS Code real time collaboration?) There are tons of good resources: books, blogs, tutorials, YouTube, Stackoverflow Try and see which work best for you (i.e. which you understand best), and stick to them Look for problem-solving tutorials that discuss tasks/problems/projects. Tutorials that just roll out information are usually boring and hard to read Some resources like medium.com and towardsdatascience.com have limits of free articles per month, but access might work in an incognito window or via VPN Margaret Hamilton, lead software engineer of the Apollo project, stands next to the code she wrote by hand and that was used to take humanity to the moon (1969) 6.1 Python resources software-carpentry.org – beginner-friendly tutorials "A Byte of Python" – free beginner-friendly ebook; doesn't go too deep, but covers the basics E. Freeman: "Head First Learn to Code: A Brain-Friendly Guide" (Amazon) – very beginner- friendly, if you like the approach of the book series Allen B. Downey: "Think Python: How to Think Like a Computer Scientist" (2024) – free and very nice ebook, maybe not great for complete beginners; you'll learn a lot if you can work your way through it "How to Think Like a Computer Scientist: Interactive Edition" – interactive edition E. Matthes: "Python Crash Course: A Hands-On, Project-Based Introduction to Programming" (Amazon) – practical introduction for complete beginners, with exercises The author provides a collection of online resources, including some nice cheat sheets A. Sweigart: "Automate the Boring Stuff with Python: Practical Programming for Total Beginners" (Amazon, online book) – practically oriented introduction for beginners (ranges from beginners to somewhat advanced content) And many other great books, see e.g. inventwithpython.com YouTube! Many excellent channels, for example: Corey Schafer Tech with Tim CS Dojo CS50 course Official Python Tutorial – good reference for topics you already know, to refresh your memory or to go deeper Official Python Wiki with many resources, links, book suggestions, etc., e.g. for beginning programmers Coding challenges, e.g. pythonchallenge.com Codecademy, Reddit, etc. 6.2 A note on ChatGPT & Co 1. Your learning is a function of your efforts. The goal is not to get the correct answer, but to push yourself, to struggle to grasp the concepts in order to eventually absorb them. Getting an answer from ChatGPT or Stackoverflow won't have this effect, so you will be cheating yourself. 2. There are many circumstances where ChatGPT will not help you for real-life tasks. It works very well on a basic level, but often fails in more complex scenarios. You need to master programming concepts to succeed in these situations. Therefore, it is currently not recommended to use ChatGPT for your assignments. It can act as a personal tutor with whom you discuss questions that remained unclear after class, however it can get things wrong in many subtle ways, because it lacks an actual understanding of the world (it's akin to a very advanced autocomplete, and is not comparable to a real AGI). Apart from this, you shouldn't let your tutor do assignments for you, but see them only as a learning aid. Start being a computer expert right now (xkcd.com/627) 7 How to learn Python, or anything else? 1. Understand the details. Split the whole picture into chunks, and intensively examine the small units. If you're using code you don't understand, you'll run into trouble sooner or later 2. Correct the mistakes. Slow it down, don't let mistakes go unpassed 3. Attentive repetition. That's how neural networks work, in machine learning and in biological beings, they become better with repetition 4. Leave your comfort zone. Practice at the edge of your capabilities The mysterious secret of talent "How do you program so well?"

Use Quizgecko on...
Browser
Browser