Foundations: Basic Probabilities PDF - Technische Universität Darmstadt
Document Details
![AdaptiveOpArt68](https://quizgecko.com/images/avatars/avatar-10.webp)
Uploaded by AdaptiveOpArt68
Technische Universität Darmstadt
2024
Jan Peters
Tags
Summary
These lecture slides from Prof. Jan Peters at Technische Universität Darmstadt cover foundational concepts in probability theory including random variables, probability distributions, and Bayes' Rule. It is aimed at undergraduate computer science students and discusses key concepts such as expectation, variance, and distributions.
Full Transcript
Join at slido.com #2878010 Click Present with Slido or install our Chrome extension to display joining ⓘ instructions for participants while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Foundations: Basic Probabi...
Join at slido.com #2878010 Click Present with Slido or install our Chrome extension to display joining ⓘ instructions for participants while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Foundations: Basic Probabilities Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Basic Cycle of Probabilistic Modeling and Analysis Prior Assumptions Modeling probability distribution (Lecture 3) Observed Data from Model structure Experiment Estimating uncertain quantities (Lecture 4) Model parameters Lecture 6+7 Lecture 1+2 Experimental design in computer science and evaluation (Lecture 5) Model quality evaluation Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 3 Motivation In computer science we often deal with systems or processes that exhibit randomness Time a data packet takes to travel across a network Errors in data packets due to corruption from random noise Fluctuations in sensor readings Randomized algorithms are a powerful tool to tackle complex problems Randomized load balancing: Increases fault tolerance and reliability Monte Carlo methods: Used for numerical integration, optimization, and simulation Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 4 Learning Objectives Defining the building blocks of probabilities Using the types and rules of probabilities Learning about important statistics of probability distributions Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 5 Outline 1. Building Blocks of Probability Theory 2. Types and Rules of Probability Distributions 3. Statistics of Probability Distributions 4. Conclusion Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 6 Outline 1. Building Blocks of Probability Theory 2. Types and Rules of Probability Distributions 3. Statistics of Probability Distributions 4. Conclusion Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 7 Sample Spaces Characterize the random system or process we deal with The sample space, , defines the set of all possible outcomes of a process Coin flip: Errors in data packet: Hours spent fixing bugs in a day: Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 8 Event & Event Spaces An event, , is a set of outcomes in the sample space Coin lands tails: Errors in data packet is 4 or less: “Tired programmer” The event space, , is the set of all defined events Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 9 Random Variables A random variable, , is a variable whose value is determined by chance – i.e., its value is unknown and/or can change Events can be defined when the random variables takes on certain values We get at most one error in the data packet: ○ We write the probability for such an event as: Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 10 Probability Axioms 1. The probability of an event is non-negative: Andrey Kolmogorov 2. The probability of the sample space is equal to 1: 3. The probability of a union of mutually exclusive events is the sum of the individual probabilities: If then Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 11 Probability Distributions The sample space denotes the set of possible values the random variable can take. If is finite, then is a discrete random variable. We define the probability mass function (PMF) as the probability that is equal to a sample. with and Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 12 Probability Distributions The sample space denotes the set of possible values the random variable can take. If is infinite, then is a continuous random variable. We define the probability density function (PDF) as the relative likelihood that is close to a sample. with and Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 13 Probability Distributions Cumulative distribution function (CDF) Gives the probability that will take a value less than or equal to. Is an increasing differentiable function mapping to with and. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 14 Uniform Distribution A continuous random variable has uniform distribution, , if the probability of sample any value in an interval, e.g., , is the same. More precisely, its PDF is defined as: Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 15 Categorical Distribution Describes the possible results of a random variable among finite categories, each one with a specified probability. Its PMF is given as: Example: probability of getting even or odd when throwing a six sided dice: Category Even Odd Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 16 Which function evaluated at x can be above 1 ? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Does the CDF of a categorical distribution exist? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Outline 1. Building Blocks of Probability Theory 2. Types and Rules of Probability Distributions 3. Statistics of Probability Distributions 4. Conclusion Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 19 Independence Two variables are independent if the value or occurrence of one does not affect the other. Thus, the joint distribution is defined as: If the independence condition do not hold, the variables are dependent, i.e., one affects the value of the other. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 20 Do you still remember how a cache system works? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Example - Cache System Suppose the system randomly pre-loads files into cache: Probability of requesting any file: (20% chance) Probability of any file in cache (cache hit): (10% chance) Since the events are independent, the probability of requesting a file in the cache is: (2% chance) Picture modified from: “Simultaneous and Hierarchical Cache Accesses” (GeeksforGeeks) https://www.geeksforgeeks.org/simultaneous-and-hierarchical-cache-accesses/ Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 22 Example - Cache System Now, suppose the system keeps in cache the Least Recently Used (LRU) files: Probability of requesting a “popular” file: Probability of any file in cache (cache hit): Chance of file is both popular and in cache: Since , the events are dependent. Additionally, if a popular file was requested, it’s more likely to be in cache than a random file, as Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 23 Independence Two variables are conditionally independent if given a third variable , the variables are independent. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 24 Example - Cache System Suppose we have information about when the cache is updated: Probability of a popular file is requested after an update: Probability of any file is in the cache after an update: Thus, the probability of a popular file is in cache after an update: Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 25 Types of Probability Joint probability Consider two random variables and , the joint probability is the probability of both taking specific values and. Discrete random variables, the joint PMF is: For continuous random variables, the joint PDF can also be related with the CDF : Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 26 Example - Cache System In the system with cache update information: Probability of a popular file is requested after an update: Probability of the cache to be updated: Probability of popular file is requested after a cache update: Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 27 Types of Probability Marginal probability The probability of a random variable, e.g., , taking a specific value regardless of other random variables values. Discrete random variable: Continuous random variables: Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 28 Types of Probability Conditional probability The probability of a random variable, e.g., , when : Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 29 A system has two independent sensors, each with a 30% probability of reading 1. What is the probability that both sensors give a reading of 1? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt A system collects data from a binary sensor and transmits it via Wi-Fi. The sensor has a 30% probability of reading 1. However, when it's raining, only 20% of the 1 readings are successfully transmitted. What is the probability of receiving a 1 reading on a rainy day? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Types of Probability From Probability Axiom 3, the probability of the union of two events is: The intersection or product of two events is given by the joint distribution: Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 32 Bayes’ Rule Consider the conditional probabilities between two events and : Equalizing both expressions by isolating We obtain the Bayes’ rule: Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 33 Bayes’ Rule - Visual proof Credit: Akshay Pachaar (@akshay_pachaar on X) https://x.com/akshay_pachaar/status/1828772286020702655 Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 34 Bayes’ Rule Example - Spam Filtering (1) Let us define the following events: : e-mail is a spam : e-mail is flagged as a spam by spam filter We suppose. We want to know the probability of an e-mail actually being a spam when flagged as a spam, i.e.. We can use Bayes’ Rule: Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 35 Bayes’ Rule Example - Spam Filtering (2) We need to find first. Using the law of total probability: Coming back to Bayes’ Rule: It means that only 16.7% of e-mails in your spam box are actual spams! Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 36 Outline 1. Building Blocks of Probability Theory 2. Types and Rules of Probability Distributions 3. Statistics of Probability Distributions 4. Conclusion Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 37 Expectation The expectation (or expected value) of a random variable whose possible values are defined by the set is defined by for a discrete , for a continuous. We often denote or. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 38 Expectation Example We want to find the Mean Time To Failure (MTTF) of a hard drive, i.e. the expected time it takes for a hard drive to break. Let us define as the Time To Failure (time it takes for a hard drive to break). We assume that where failures/hour (failure rate). The mean time to failure is: hours. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 39 A distribution is defined by P(X = -1) = 0.8 and P(X = 1) = 0.2. What is the expected value of X? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Variance & Standard Deviation The variance of a random variable is The standard deviation of a random variable is the square root of its variance, i.e. It measures how spread out a distribution is. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 41 A random variable X has a variance of 4. What it its standard deviation? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Which distribution has the lowest/highest variance? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Summary: Properties of Expectation & Variance The expectation is linear, i.e.. The variance has also important properties: for any constant for any constant for independent and (does not hold in general!) Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 44 Moments The n-th moment of a random variable about a value is defined by The n-th central moment is the n-th moment about the mean, i.e. The n-th standardized moment is the n-th central moment divided by the n-th power of the standard deviation, i.e. Why are they important? The moments of a probability distributions are used to characterize them. For instance, the first raw moment of is its expected value and its second centered moment is its variance. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 45 Skewness The skewness of a random variable with mean and variance is the third standardized moment of : The skewness measures the asymmetry of a probability distribution, i.e. how much the distribution of differs from the distribution of. It equals 0 for a symmetrical distribution. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 46 Which PDF(s) is/are associated with a skewness of 0? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Kurtosis The kurtosis of a random variable with mean and variance is a shifted version of the fourth standardized moment of : The kurtosis is a measure of “how long” the tail of a distribution is. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 48 Covariance & Correlation The covariance between random variables and is The correlation between random variables and is They measure a degree of dependence between random variables. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 49 Let X and Y be two random variables such that: Cov(X, Y) = 6, Var(X) = 9, Var(Y) = 16.What is Corr(X, Y)? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Summary: Properties of Covariance & Correlation The covariance and correlation have a few important properties: and are independent (reciprocal does NOT hold!) Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 51 Median We say that is a median of a random variable if and. For a perfectly symmetrical distribution, the mean equals the median. The median splits the distribution in two halves of equal mass. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 52 The mean and median of a random variable are equal when... Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Quantiles, Percentiles & Quartiles The concept of splitting the distributions in shares of equal mass can be generalized to quantiles, equally splitting the distribution in a specified number of parts. A common use of quantiles are percentiles, splitting the distribution in 100 equal parts, and quartiles, splitting it in 4 equal parts. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 54 Modes Modes are values with the greatest density (or probability). They can be graphically identified as local peaks of the distribution. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 55 How many modes does this distribution have based on its PDF? Click Present with Slido or install our Chrome extension to activate this ⓘ poll while presenting. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt Outline 1. Building Blocks of Probability Theory 2. Types and Rules of Probability Distributions 3. Statistics of Probability Distributions 4. Conclusion Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 57 Recap Building Blocks of Probability Theory: Overview of sample spaces, events, and event spaces. Types of Probability Distributions: Introduction to discrete and continuous random variables, PMF, PDF, CDF, and key probability distributions (e.g., uniform, categorical). Probability Axioms and Bayes' Rule: Explanation of fundamental axioms, conditional probabilities, and Bayes' theorem. Statistics of Probability Distributions: Understanding of the key statistics: expectation, variance, standard deviation, skewness, kurtosis, and correlation. Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 58 Self-Test Questions What is a random variable? How does the CDF relate to the PDF / PMF? What is an important property of the expectation? What is the standardized centered second moment of any distribution? What does the standard deviation measure? What is the relation between the standard deviation and the variance? What does the skewness measure? What is the value of the skewness of a symmetric distribution? What does the kurtosis measure? When does the mean equal the median? What are the modes of a distribution? Prof. Jan Peters | Probabilistic Methods for Computer Science | WiSe 2024/25 | CS, TU Darmstadt 59