High-Dimensional Statistics Lecture Notes PDF
Document Details

Uploaded by ComfortingAlbuquerque
FAU
2024
Marius Yamakou
Tags
Related
- Design Space: Multiple Dimensions And Rationale PDF
- Fundamental Concepts in Machine Learning PDF
- Optimization in Engineering PDF
- Maharashtra Board Class 12 Mathematics and Statistics Part 1 PDF
- Chapter 3: Two Dimensional Kinematics PDF
- Comparative Analysis of Similarity Methods in High-Dimensional Vectors PDF
Summary
This document is a lecture PDF on high-dimensional statistics, specifically focusing on concentration bounds and sub-Gaussian random variables. The lecture, part of a winter 2024/25 semester course, outlines objectives, examples, and key concepts.
Full Transcript
Selected Topics in Mathematics of Learning High-Dimensional Statistics Lecturer: Marius Yamakou Winter Semester 2024/25 Department of Data Science, FAU...
Selected Topics in Mathematics of Learning High-Dimensional Statistics Lecturer: Marius Yamakou Winter Semester 2024/25 Department of Data Science, FAU November 5, 2024 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou Part II Concentration bounds Objectives: Understand concentration bounds by examining classical examples and their relevance in probabilistic analysis. Define sub-Gaussian random variables and identify their properties, focusing on their tail bounds and sum behavior. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou Part II Concentration bounds Objectives: Understand concentration bounds by examining classical examples and their relevance in probabilistic analysis. Define sub-Gaussian random variables and identify their properties, focusing on their tail bounds and sum behavior. Apply key concentration inequalities, such as Hoeffding’s and Chernoff’s inequalities, to bound probabilities involving sums of sub-Gaussian random variables. Recognize and use different characterizations of sub-Gaussianity, understanding the equivalences and implications of these definitions. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou Part II Concentration bounds Objectives: Understand concentration bounds by examining classical examples and their relevance in probabilistic analysis. Define sub-Gaussian random variables and identify their properties, focusing on their tail bounds and sum behavior. Apply key concentration inequalities, such as Hoeffding’s and Chernoff’s inequalities, to bound probabilities involving sums of sub-Gaussian random variables. Recognize and use different characterizations of sub-Gaussianity, understanding the equivalences and implications of these definitions. Explore sub-exponential concentration and learn how it extends concepts of concentration to a broader class of random variables with heavier tails than sub-Gaussian random variables. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou Outline 1 Concentration bounds: Classical examples 2 Sub-Gaussian Random variables High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou Outline 1 Concentration bounds: Classical examples 2 Sub-Gaussian Random variables 1 Tail bound 2 Sum of sub-Gaussian RVs 3 Hoeffding 4 Chernoff High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou Outline 1 Concentration bounds: Classical examples 2 Sub-Gaussian Random variables 1 Tail bound 2 Sum of sub-Gaussian RVs 3 Hoeffding 4 Chernoff 3 Equivalent characterizations of sub-Gaussianity 4 Sub-exponential concentration High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1. Concentration Bounds Definition: A concentration bound is a type of inequality that provides an upper bound on the probability that a random variable deviates significantly from a central value (often its mean or median). It is crucial in high-dimensional settings where traditional low-dimensional intuition may fail due to the curse of dimensionality. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1. Concentration Bounds Definition: A concentration bound is a type of inequality that provides an upper bound on the probability that a random variable deviates significantly from a central value (often its mean or median). It is crucial in high-dimensional settings where traditional low-dimensional intuition may fail due to the curse of dimensionality. Mathematically, for a random variable X with mean µ. A concentration inequality typically takes the form: P(|X − µ| ≥ t) ≤ φ(t), where t > 0 and φ(t) is a function that decays as t increases. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.1 Purpose of concentration bounds Why are concentration bounds useful? Provide probabilistic guarantees, which are crucial in uncertain environments like machine learning, high-dimensional data analysis, and statistical estimation. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.1 Purpose of concentration bounds Why are concentration bounds useful? Provide probabilistic guarantees, which are crucial in uncertain environments like machine learning, high-dimensional data analysis, and statistical estimation. They help estimate how closely a random variable concentrates around its mean, which is essential in high-dimensional settings where traditional low-dimensional intuition may fail due to the curse of dimensionality. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.1 Purpose of concentration bounds Why are concentration bounds useful? Provide probabilistic guarantees, which are crucial in uncertain environments like machine learning, high-dimensional data analysis, and statistical estimation. They help estimate how closely a random variable concentrates around its mean, which is essential in high-dimensional settings where traditional low-dimensional intuition may fail due to the curse of dimensionality. Concentration bounds are typically non-asymptotic, providing probabilistic guarantees that hold for a finite number of observations or trials instead of relying on limits as the sample size n → ∞. This is useful in practical settings where the sample size is fixed or small, allowing analysts to make probabilistic statements about deviations without relying on large-sample approximations. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.2 Intuition behind concentration bounds What do concentration bounds tell us? They provide insights into how likely certain outcomes are, especially deviations from expected values. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.2 Intuition behind concentration bounds What do concentration bounds tell us? They provide insights into how likely certain outcomes are, especially deviations from expected values. Example: If X is a random variable with mean µ: P(|X − µ| ≥ t) ≤ exp(−Ct2 ), t>0 implies that large deviations from µ are exponentially unlikely. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.3 Concentration bounds and the Law of Large Numbers Relation to the Law of Large Numbers (LLN): LLN states that as the number of samples n increases, the sample average converges to the expected value. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.3 Concentration bounds and the Law of Large Numbers Relation to the Law of Large Numbers (LLN): LLN states that as the number of samples n increases, the sample average converges to the expected value. Concentration bounds strengthen this by providing explicit probabilities for deviations: n ! 1X P Xi − µ ≥ ϵ ≤ exp(−Cnϵ2 ). n i=1 This tells us how the sample mean is likely to deviate from the true mean, even for finite n. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.4 Example: Estimating sample mean Illustrating Concentration with Sample Mean: Let X1 , X2 ,... , Xn be i.i.d. random variables with mean µ. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.4 Example: Estimating sample mean Illustrating Concentration with Sample Mean: Let X1 , X2 ,... , Xn be i.i.d. random variables with mean µ. By applying a concentration bound, we can bound the probability that the sample mean µ̂ = n1 ni=1 Xi deviates P from µ: P(|µ̂ − µ| ≥ ϵ) ≤ exp(−Cnϵ2 ). High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.4 Example: Estimating sample mean Illustrating Concentration with Sample Mean: Let X1 , X2 ,... , Xn be i.i.d. random variables with mean µ. By applying a concentration bound, we can bound the probability that the sample mean µ̂ = n1 ni=1 Xi deviates P from µ: P(|µ̂ − µ| ≥ ϵ) ≤ exp(−Cnϵ2 ). This indicates that as n increases, the probability of a large deviation becomes exponentially smaller. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.5 Understanding the “looseness" or “tightness" of concentration inequalities Concentration inequalities provide bounds on how a random variable deviates from a typical (mean or median) value. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.5 Understanding the “looseness" or “tightness" of concentration inequalities Concentration inequalities provide bounds on how a random variable deviates from a typical (mean or median) value. The “looseness" of a concentration inequality refers to how far the bound provided by the inequality is from the actual probability of a given event. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.5 Understanding the “looseness" or “tightness" of concentration inequalities Concentration inequalities provide bounds on how a random variable deviates from a typical (mean or median) value. The “looseness" of a concentration inequality refers to how far the bound provided by the inequality is from the actual probability of a given event. If a concentration inequality is loose (i.e., not tight), it means: The inequality gives an upper bound that is much higher than the true probability. The result is less precise and potentially less useful in applications. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.5 Understanding the “looseness" or “tightness" of concentration inequalities Concentration inequalities provide bounds on how a random variable deviates from a typical (mean or median) value. The “looseness" of a concentration inequality refers to how far the bound provided by the inequality is from the actual probability of a given event. If a concentration inequality is loose (i.e., not tight), it means: The inequality gives an upper bound that is much higher than the true probability. The result is less precise and potentially less useful in applications. Causes of Looseness Minimal assumptions and higher moments ignored: Many concentration inequalities (e.g., Markov’s and Chebyshev’s) only use basic properties like the mean or variance, and ignore other distributional features like skewness or kurtosis. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.5 Understanding the “looseness" or “tightness" of concentration inequalities Concentration inequalities provide bounds on how a random variable deviates from a typical (mean or median) value. The “looseness" of a concentration inequality refers to how far the bound provided by the inequality is from the actual probability of a given event. If a concentration inequality is loose (i.e., not tight), it means: The inequality gives an upper bound that is much higher than the true probability. The result is less precise and potentially less useful in applications. Causes of Looseness Minimal assumptions and higher moments ignored: Many concentration inequalities (e.g., Markov’s and Chebyshev’s) only use basic properties like the mean or variance, and ignore other distributional features like skewness or kurtosis. Lack of Tail Behavior Information: Inequalities that do not leverage information about the tail behavior of the distribution tend to be looser. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Markov’s Inequality Statement: For any non-negative random variable X and a > 0: E[X] P(X ≥ a) ≤. a High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Markov’s Inequality Statement: For any non-negative random variable X and a > 0: E[X] P(X ≥ a) ≤. a Derivation: R∞ Let X ≥ 0. Using the definition of expectation: E[X] = 0 P(X ≥ t) dt. R∞ Splitting the integral at a: E[X] ≥ a P(X ≥ a) dt = P(X ≥ a) · a. E[X] Rearranging gives Markov’s inequality: P(X ≥ a) ≤ a. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Markov’s Inequality Statement: For any non-negative random variable X and a > 0: E[X] P(X ≥ a) ≤. a Derivation: R∞ Let X ≥ 0. Using the definition of expectation: E[X] = 0 P(X ≥ t) dt. R∞ Splitting the integral at a: E[X] ≥ a P(X ≥ a) dt = P(X ≥ a) · a. E[X] Rearranging gives Markov’s inequality: P(X ≥ a) ≤ a. Example: Suppose X represents the number of customers in a queue with E[X] = 5: 5 1 P(X ≥ 15) ≤ =. 15 3 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Markov’s Inequality Statement: For any non-negative random variable X and a > 0: E[X] P(X ≥ a) ≤. a Derivation: R∞ Let X ≥ 0. Using the definition of expectation: E[X] = 0 P(X ≥ t) dt. R∞ Splitting the integral at a: E[X] ≥ a P(X ≥ a) dt = P(X ≥ a) · a. E[X] Rearranging gives Markov’s inequality: P(X ≥ a) ≤ a. Example: Suppose X represents the number of customers in a queue with E[X] = 5: 5 1 P(X ≥ 15) ≤ =. 15 3 Use Case: Markov’s inequality is useful when only the mean of a random variable is known. It is generally loose as it uses only the mean, providing an often-loose bound. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Markov’s Inequality 0 with probability 0.9, Let X be a random variable that takes values: X = 10 with probability 0.1. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Markov’s Inequality 0 with probability 0.9, Let X be a random variable that takes values: X = 10 with probability 0.1. The mean of X is: E[X] = 0.9 · 0 + 0.1 · 10 = 1. E[X] 1 Using Markov’s inequality, for any a > 0, P (X ≥ a) ≤ =. a a Let’s apply this for a = 5: 1 P (X ≥ 5) ≤ = 0.2. 5 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Markov’s Inequality 0 with probability 0.9, Let X be a random variable that takes values: X = 10 with probability 0.1. The mean of X is: E[X] = 0.9 · 0 + 0.1 · 10 = 1. E[X] 1 Using Markov’s inequality, for any a > 0, P (X ≥ a) ≤ =. a a Let’s apply this for a = 5: 1 P (X ≥ 5) ≤ = 0.2. 5 However, we can directly calculate P (X ≥ 5): P (X ≥ 5) = P (X = 10) = 0.1. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Markov’s Inequality 0 with probability 0.9, Let X be a random variable that takes values: X = 10 with probability 0.1. The mean of X is: E[X] = 0.9 · 0 + 0.1 · 10 = 1. E[X] 1 Using Markov’s inequality, for any a > 0, P (X ≥ a) ≤ =. a a Let’s apply this for a = 5: 1 P (X ≥ 5) ≤ = 0.2. 5 However, we can directly calculate P (X ≥ 5): P (X ≥ 5) = P (X = 10) = 0.1. Conclusion: The bound provided by Markov’s inequality (0.2) is much larger than the actual probability (0.1), demonstrating that Markov’s inequality can be loose or "not tight" in this case. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Chebyshev’s Inequality Statement: For a random variable X with mean µ and variance σ 2 : 1 P(|X − µ| ≥ kσ) ≤ 2 , for any k > 0. k High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Chebyshev’s Inequality Statement: For a random variable X with mean µ and variance σ 2 : 1 P(|X − µ| ≥ kσ) ≤ 2 , for any k > 0. k Derivation: Apply Markov’s inequality to Y = (X − µ)2 : E[(X − µ)2 ] σ2 1 P((X − µ)2 ≥ k2 σ 2 ) ≤ = = 2. k2 σ2 k2 σ2 k This gives Chebyshev’s inequality: 1 P(|X − µ| ≥ kσ) ≤. k2 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Chebyshev’s Inequality Statement: For a random variable X with mean µ and variance σ 2 : 1 P(|X − µ| ≥ kσ) ≤ 2 , for any k > 0. k Derivation: Apply Markov’s inequality to Y = (X − µ)2 : E[(X − µ)2 ] σ2 1 P((X − µ)2 ≥ k2 σ 2 ) ≤ = = 2. k2 σ2 k2 σ2 k This gives Chebyshev’s inequality: 1 P(|X − µ| ≥ kσ) ≤. k2 Example: For X with µ = 10 and σ 2 = 4: 1 P(|X − 10| ≥ 4) ≤ = 0.25. 4 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.6 Common Concentration Inequalities: Chebyshev’s Inequality Statement: For a random variable X with mean µ and variance σ 2 : 1 P(|X − µ| ≥ kσ) ≤ 2 , for any k > 0. k Derivation: Apply Markov’s inequality to Y = (X − µ)2 : E[(X − µ)2 ] σ2 1 P((X − µ)2 ≥ k2 σ 2 ) ≤ = = 2. k2 σ2 k2 σ2 k This gives Chebyshev’s inequality: 1 P(|X − µ| ≥ kσ) ≤. k2 Example: For X with µ = 10 and σ 2 = 4: 1 P(|X − 10| ≥ 4) ≤ = 0.25. 4 Use Case: Useful when only mean and variance are known, providing bounds for all distributions with finite variance. It still can be quite loose because it ignores other properties like skewness or kurtosis. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.7 Key takeaways on the concept of concentration bounds so far Concentration bounds provide a way to quantify the deviation of random variables from their mean or median. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.7 Key takeaways on the concept of concentration bounds so far Concentration bounds provide a way to quantify the deviation of random variables from their mean or median. They are critical for understanding behavior in random processes, deriving error bounds, and analyzing algorithms in machine learning. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.7 Key takeaways on the concept of concentration bounds so far Concentration bounds provide a way to quantify the deviation of random variables from their mean or median. They are critical for understanding behavior in random processes, deriving error bounds, and analyzing algorithms in machine learning. Concentration bounds strengthen the law of large numbers by giving specific probabilities for how close a sample average is to its expectation. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.7 Key takeaways on the concept of concentration bounds so far Concentration bounds provide a way to quantify the deviation of random variables from their mean or median. They are critical for understanding behavior in random processes, deriving error bounds, and analyzing algorithms in machine learning. Concentration bounds strengthen the law of large numbers by giving specific probabilities for how close a sample average is to its expectation. Concentration bounds can be relatively loose (or tight). High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.7 Key takeaways on the concept of concentration bounds so far Concentration bounds provide a way to quantify the deviation of random variables from their mean or median. They are critical for understanding behavior in random processes, deriving error bounds, and analyzing algorithms in machine learning. Concentration bounds strengthen the law of large numbers by giving specific probabilities for how close a sample average is to its expectation. Concentration bounds can be relatively loose (or tight). Concentration bounds are typically non-asymptotic, including non-asymptotic versions of CLT. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation Lemma (Asymptotic CLT) Let X1 , X2 ,... be i.i.d. random variables with mean µ and variance σ 2. Consider the sum: SN = X1 + · · · + XN SN −ESN and ZN = √. Then, as Visual Insight: As we add more V ar(SN ) N → ∞, i.i.d. random variables, the distribution of their normalized sum ZN → N (0, 1) in distribution. approaches a normal distribution, even if the original variables are not normally distributed. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation Consider independent Bernoulli random variables X1 ,... , Xn ∼ Ber(1/2), each representing Pn a simple coin flip with a success probability of 1/2. Define Sn = i=1 Xi , the total number of successes in n trials. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation Consider independent Bernoulli random variables X1 ,... , Xn ∼ Ber(1/2), each representing Pn a simple coin flip with a success probability of 1/2. Define Sn = i=1 Xi , the total number of successes in n trials. By the Central Limit Theorem (CLT), the standardized sum Sn converges in distribution to a normal distribution: Sn − n/2 D Zn := p → N (0, 1) n/4 This result implies that for large n, Sn behaves approximately like a normal variable centered at n/2 with variance n/4. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation Consider independent Bernoulli random variables X1 ,... , Xn ∼ Ber(1/2), each representing Pn a simple coin flip with a success probability of 1/2. Define Sn = i=1 Xi , the total number of successes in n trials. By the Central Limit Theorem (CLT), the standardized sum Sn converges in distribution to a normal distribution: Sn − n/2 D Zn := p → N (0, 1) n/4 This result implies that for large n, Sn behaves approximately like a normal variable centered at n/2 with variance n/4. Using this approximation, we can bound the probability that Sn deviates from its mean. Let G ∼ N (0, ! 1), so: r Sn − n/2 n n P (Zn > t) = P p > t = P Sn > + t. n/4 2 4 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation Using CLT, n n q CLT P Sn > + t ≈ P(G > t). 2 4 We have R∞ R∞ g2 ? λ2 MG (λ) := E[eλG ] = −∞ eλg · fG (g)dg = −∞ eλg · √1 e− 2 =e 2 , 2π and we also have High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation Using CLT, n n q CLT P Sn > + t ≈ P(G > t). 2 4 We have R∞ R∞ g2 ? λ2 MG (λ) := E[eλG ] = −∞ eλg · fG (g)dg = −∞ eλg · √1 e− 2 =e 2 , 2π and we also have M arkov λ2 E[eλG ] λ2 −λt P(G > t) = P(eλG > eλt ) ≤ eλt = e 2 eλt =e 2 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation Using CLT, n n q CLT P Sn > + t ≈ P(G > t). 2 4 We have R∞ R∞ g2 ? λ2 MG (λ) := E[eλG ] = −∞ eλg · fG (g)dg = −∞ eλg · √1 e− 2 =e 2 , 2π and we also have M arkov λ2 E[eλG ] λ2 −λt P(G > t) = P(eλG > eλt ) ≤ eλt = e 2 eλt =e 2 To get the best bound, we choose λ = t which minimizes the upper t2 bound, and we get P(G > t) ≤ e− 2 , which yields: n n 1 − t22 q CLT P Sn > + t ≈ P(G > t) ≲ e 2 4 2 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation Using CLT, n n q CLT P Sn > + t ≈ P(G > t). 2 4 We have R∞ R∞ g2 ? λ2 MG (λ) := E[eλG ] = −∞ eλg · fG (g)dg = −∞ eλg · √1 e− 2 =e 2 , 2π and we also have M arkov λ2 E[eλG ] λ2 −λt P(G > t) = P(eλG > eλt ) ≤ eλt = e 2 eλt =e 2 To get the best bound, we choose λ = t which minimizes the upper t2 bound, and we get P(G > t) ≤ e− 2 , which yields: n n 1 − t22 q CLT P Sn > + t ≈ P(G > t) ≲ e 2 4 2 where the 12 factor accounts for the symmetric nature of the normal distribution (since P(G > 0) = 21 ). This provides an exponential decay rate for the tail probability, highlighting the rarity of large deviations. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation √ Setting t = α n yields: n 1 nα2 P Sn > (1 + α) ≲ e− 2 2 2 √ Here, we see that deviations on the order of α n in Sn become exponentially unlikely as n grows. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8 Some Motivation √ Setting t = α n yields: n 1 nα2 P Sn > (1 + α) ≲ e− 2 2 2 √ Here, we see that deviations on the order of α n in Sn become exponentially unlikely as n grows. Problem: Although the CLT gives a general approximation, it may not always be tight, especially for finite n or large α. Improving this bound requires more refined probabilistic techniques. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem Theorem (Berry-Esseen Central Limit Theorem) 3 Under the assumptions of the Central Limit Theorem, with δ = E|Xσ1 −µ| 3 , δ we have: |P(Zn > t) − P(G > t)| ≤ n , where Zn is the standardized √ sum, and G ∼ N (0, 1) is the standard normal distribution. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem Theorem (Berry-Esseen Central Limit Theorem) 3 Under the assumptions of the Central Limit Theorem, with δ = E|Xσ1 −µ| 3 , δ we have: |P(Zn > t) − P(G > t)| ≤ n , where Zn is the standardized √ sum, and G ∼ N (0, 1) is the standard normal distribution. Interpretation: The bound quantifies the rate at which the distribution of Zn converges to the normal distribution. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem Theorem (Berry-Esseen Central Limit Theorem) 3 Under the assumptions of the Central Limit Theorem, with δ = E|Xσ1 −µ| 3 , δ we have: |P(Zn > t) − P(G > t)| ≤ n , where Zn is the standardized √ sum, and G ∼ N (0, 1) is the standard normal distribution. Interpretation: The bound quantifies the rate at which the distribution of Zn converges to the normal distribution. Tight Bound Example: In the case of a Bernoulli random variable, since each Xi ∼ Ber(1/2), Pn the probability of each outcome is 1/2, and the sum Sn = i=1 Xi follows a binomial distribution: Sn ∼ Binomial(n, 1/2). For the event Sn = n/2 (exactly 1 n half1 n n successes), the probability is: P(Sn = n/2) = n/2 2 = 2n n/2. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem n For large n, we can approximate n/2 using Stirling’s approximation, √ k which states that for large k, k! ≈ 2πk ke. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem n For large n, we can approximate n/2 using Stirling’s approximation, √ k which states that for large k, k! ≈ 2πk ke. n n! Applying Stirling’s approximation to n/2 = (n/2)!(n/2)! yields √ n n 2πn( ne) 2n n/2 ≈ n n n = √. (2π· 2 )( 2e ) πn High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem n For large n, we can approximate n/2 using Stirling’s approximation, √ k which states that for large k, k! ≈ 2πk ke. n n! Applying Stirling’s approximation to n/2 = (n/2)!(n/2)! yields √ n n 2πn( ne) 2n n/2 ≈ n n n = √. (2π· 2 )( 2e ) πn n 1 n 1 √2 √1. Thus, P(Sn = n/2) = 2n n/2 ≈ 2n · πn = πn High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem n For large n, we can approximate n/2 using Stirling’s approximation, √ k which states that for large k, k! ≈ 2πk ke. n n! Applying Stirling’s approximation to n/2 = (n/2)!(n/2)! yields √ n n 2πn( ne) 2n n/2 ≈ n n n = √. (2π· 2 )( 2e ) πn n 1 n 1 √2 √1. Thus, P(Sn = n/2) = 2n n/2 ≈ 2n · πn = πn For simplicity, this is often expressed as P(Sn = n/2) ≈ √1n , showing that the √1n rate is optimal for certain distributions. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem Implication: The approximation error is O √1 , which is relatively large n 2 − nα compared to an exponential bound like O(e 2 ) that we aim to achieve in concentration inequalities. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem Implication: The approximation error is O √1 , which is relatively large n 2 − nα compared to an exponential bound like O(e 2 ) that we aim to achieve in concentration inequalities. Solution: To achieve a tighter bound, we can use concentration inequalities directly, which often yield exponential decay rates. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem Implication: The approximation error is O √1 , which is relatively large n 2 − nα compared to an exponential bound like O(e 2 ) that we aim to achieve in concentration inequalities. Solution: To achieve a tighter bound, we can use concentration inequalities directly, which often yield exponential decay rates. Method: Chernoff Bound: For any λ > 0, M arkov E[eλZn ] P(Zn > t) = P(eλZn > eλt ) ≤ , t ∈ R, eλt which relies on analyzing the moment-generating function (MGF) of Zn. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 1.8.1 Berry-Esseen Central Limit Theorem Implication: The approximation error is O √1 , which is relatively large n 2 − nα compared to an exponential bound like O(e 2 ) that we aim to achieve in concentration inequalities. Solution: To achieve a tighter bound, we can use concentration inequalities directly, which often yield exponential decay rates. Method: Chernoff Bound: For any λ > 0, M arkov E[eλZn ] P(Zn > t) = P(eλZn > eλt ) ≤ , t ∈ R, eλt which relies on analyzing the moment-generating function (MGF) of Zn. NOTE: The moment-generating function (MGF) of Zn given as MZn (λ) = E[eλZn ], is the expected value of eλZn , where λ is a real parameter. The MGF, when it exists, is a useful tool for deriving the moments of Zn by differentiating MZn (λ) with respect to λ and evaluating at λ = 0. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2. Sub-Gaussian Random variables Definition (Sub-Gaussian Random Variable) A random variable X with mean µ = E[X] is called sub-Gaussian with parameter σ > 0 if: h i 2 2 E eλ(X−µ) ≤ eσ λ /2 , ∀λ ∈ R. In this case, we say X is σ-sub-Gaussian and has variance proxy σ 2. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2. Sub-Gaussian Random variables Definition (Sub-Gaussian Random Variable) A random variable X with mean µ = E[X] is called sub-Gaussian with parameter σ > 0 if: h i 2 2 E eλ(X−µ) ≤ eσ λ /2 , ∀λ ∈ R. In this case, we say X is σ-sub-Gaussian and has variance proxy σ 2. Standard Gaussian: If X ∼ N (0, 1), then X is sub-Gaussian with equality. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2. Sub-Gaussian Random variables Definition (Sub-Gaussian Random Variable) A random variable X with mean µ = E[X] is called sub-Gaussian with parameter σ > 0 if: h i 2 2 E eλ(X−µ) ≤ eσ λ /2 , ∀λ ∈ R. In this case, we say X is σ-sub-Gaussian and has variance proxy σ 2. Standard Gaussian: If X ∼ N (0, 1), then X is sub-Gaussian with equality. Rademacher Random Variable: A Rademacher random variable R takes values +1 or −1 with probability 12 each, making R a 1-sub-Gaussian random variable. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2. Sub-Gaussian Random variables Definition (Sub-Gaussian Random Variable) A random variable X with mean µ = E[X] is called sub-Gaussian with parameter σ > 0 if: h i 2 2 E eλ(X−µ) ≤ eσ λ /2 , ∀λ ∈ R. In this case, we say X is σ-sub-Gaussian and has variance proxy σ 2. Standard Gaussian: If X ∼ N (0, 1), then X is sub-Gaussian with equality. Rademacher Random Variable: A Rademacher random variable R takes values +1 or −1 with probability 12 each, making R a 1-sub-Gaussian random variable. Bounded Random Variable: If |X − E[X]| ≤ M almost surely, then X is M -sub-Gaussian. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2 Sub-Gaussian Random Variables To prove that if X ∼ N (0, 1), then X is 1-sub-Gaussian with equality, we need to verify that: h i 2 2 E eλ(X−µ) = eσ λ /2 where µ = E[X] = 0 and σ = 1. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2 Sub-Gaussian Random Variables To prove that if X ∼ N (0, 1), then X is 1-sub-Gaussian with equality, we need to verify that: h i 2 2 E eλ(X−µ) = eσ λ /2 where µ = E[X] = 0 and σ = 1. Step 1: Set up the moment generating function: For a random variable X ∼ N (0, 1), the moment generating function E eλX can be computed directly. Since X is standard normal, we have that E[X] = 0, and Var(X) = 1. We want to calculate E eλX. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2 Sub-Gaussian Random Variables To prove that if X ∼ N (0, 1), then X is 1-sub-Gaussian with equality, we need to verify that: h i 2 2 E eλ(X−µ) = eσ λ /2 where µ = E[X] = 0 and σ = 1. Step 1: Set up the moment generating function: For a random variable X ∼ N (0, 1), the moment generating function E eλX can be computed directly. Since X is standard normal, we have that E[X] = 0, and Var(X) = 1. We want to calculate E eλX. Step 2: Calculate E eλX for X ∼ N (0, 1) By definition of the expectation for continuous random variables: Z ∞ λX E e = eλx fX (x) dx −∞ 2 where fX (x) = √1 e−x /2 is the PDF of X, since X ∼ N (0, 1). 2π High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2 Sub-Gaussian Random Variables R∞ 2 √1 e−x /2 Thus: E eλX = −∞ eλx · 2π dx. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2 Sub-Gaussian Random Variables R∞ 2 √1 e−x /2 Thus: E eλX = −∞ eλx · 2π dx. Step 3: Simplify the Integral R∞ x 2 Combine the exponential terms: E eλX = −∞ √1 eλx− 2 2π dx. 2 Rewrite the exponent: λx − x2 = − 12 x2 − 2λx. Complete the square for x2 − 2λx: x2 − 2λx = (x − λ)2 − λ2. R∞ 1 2 2 Substitute back: E eλX = −∞ √12π e− 2 ((x−λ) −λ ) dx. λ2 R∞ 1 2 √1 e− 2 (x−λ) Separate the terms: E eλX = e 2 −∞ 2π dx. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2 Sub-Gaussian Random Variables R∞ 2 √1 e−x /2 Thus: E eλX = −∞ eλx · 2π dx. Step 3: Simplify the Integral R∞ x 2 Combine the exponential terms: E eλX = −∞ √1 eλx− 2 2π dx. 2 Rewrite the exponent: λx − x2 = − 12 x2 − 2λx. Complete the square for x2 − 2λx: x2 − 2λx = (x − λ)2 − λ2. R∞ 1 2 2 Substitute back: E eλX = −∞ √12π e− 2 ((x−λ) −λ ) dx. λ2 R∞ 1 2 √1 e− 2 (x−λ) Separate the terms: E eλX = e 2 −∞ 2π dx. Step 4: Recognize the Gaussian Integral R∞ 1 2 The integral √1 e− 2 (x−λ) dx is the integral of a normal distribution with −∞ 2π λ2 mean λ and variance 1, which equals 1. So: E eλX = e 2. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2 Sub-Gaussian Random Variables R∞ 2 √1 e−x /2 Thus: E eλX = −∞ eλx · 2π dx. Step 3: Simplify the Integral R∞ x 2 Combine the exponential terms: E eλX = −∞ √1 eλx− 2 2π dx. 2 Rewrite the exponent: λx − x2 = − 12 x2 − 2λx. Complete the square for x2 − 2λx: x2 − 2λx = (x − λ)2 − λ2. R∞ 1 2 2 Substitute back: E eλX = −∞ √12π e− 2 ((x−λ) −λ ) dx. λ2 R∞ 1 2 √1 e− 2 (x−λ) Separate the terms: E eλX = e 2 −∞ 2π dx. Step 4: Recognize the Gaussian Integral R∞ 1 2 The integral √1 e− 2 (x−λ) dx is the integral of a normal distribution with −∞ 2π λ2 mean λ and variance 1, which equals 1. So: E eλX = e 2. Step 5: Conclude that X is 1-sub-Gaussian with equality λ2 2 2 This shows that E eλX = e 2 = eσ λ /2 with σ = 1. Therefore, X is 1-sub-Gaussian with equality. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.1 Tail Bound for Sub-Gaussian Random Variables Theorem (Tail Bound for Sub-Gaussian Random Variables) If a random variable X with finite mean 2µis σ-sub-Gaussian, then for t any t > 0, P(|X − µ| ≥ t) ≤ 2 exp − 2σ 2. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.1 Tail Bound for Sub-Gaussian Random Variables Theorem (Tail Bound for Sub-Gaussian Random Variables) If a random variable X with finite mean 2µis σ-sub-Gaussian, then for t any t > 0, P(|X − µ| ≥ t) ≤ 2 exp − 2σ 2. Intuition: This inequality tells us that the probability of large deviations from the mean µ decreases exponentially for sub-Gaussian random variables, giving a strong concentration around µ. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.1 Tail Bound for Sub-Gaussian Random Variables Theorem (Tail Bound for Sub-Gaussian Random Variables) If a random variable X with finite mean 2µis σ-sub-Gaussian, then for t any t > 0, P(|X − µ| ≥ t) ≤ 2 exp − 2σ 2. Intuition: This inequality tells us that the probability of large deviations from the mean µ decreases exponentially for sub-Gaussian random variables, giving a strong concentration around µ. Significance: Sub-Gaussian tail bounds are widely used in high-dimensional data and concentration inequalities. Proof (sketched below and the details of the proof left as an exercise (?)): 1 Start by applying Markov’s inequality to the exponential moment E[exp(λ(X − µ))]. 2 Use the sub-Gaussian property to bound this moment for λ in terms of σ. 3 Conclude by optimizing over λ and applying symmetry to achieve the final bound. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.2 Sum of Sub-Gaussian Random Variables Proposition (Sum of sub-Gaussian random variables is sub-Gaussian) If X1 ,... , Xn are independentPsub-Gaussian random variables with variance 2 2 n Pn σ2 1 ,... , σn , then Z = i=1 Xi is sub-Gaussian with variance proxy proxies i=1 i σ. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.2 Sum of Sub-Gaussian Random Variables Proposition (Sum of sub-Gaussian random variables is sub-Gaussian) If X1 ,... , Xn are independentPsub-Gaussian random variables with variance 2 2 n Pn σ2 1 ,... , σn , then Z = i=1 Xi is sub-Gaussian with variance proxy proxies i=1 i σ. Proof: λ2 σ 2 Since Xi is sub-Gaussian, there exists σi2 such that ∀λ ∈ R, we have E[exp(λXi )] ≤ exp 2 i. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.2 Sum of Sub-Gaussian Random Variables Proposition (Sum of sub-Gaussian random variables is sub-Gaussian) If X1 ,... , Xn are independentPsub-Gaussian random variables with variance 2 2 n Pn σ2 1 ,... , σn , then Z = i=1 Xi is sub-Gaussian with variance proxy proxies i=1 i σ. Proof: λ2 σ 2 Since Xi is sub-Gaussian, there exists σi2 such that ∀λ ∈ R, we have E[exp(λXi )] ≤ exp 2 i. Pn For the sum Z = Xi , we apply the MGF of the sum of independent variables Xi : i=1 " n !# n n X Y Indep. Y E[exp(λZ)] = E exp λ Xi =E exp(λXi ) = E[exp(λXi )]. i=1 i=1 i=1 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.2 Sum of Sub-Gaussian Random Variables Proposition (Sum of sub-Gaussian random variables is sub-Gaussian) If X1 ,... , Xn are independentPsub-Gaussian random variables with variance 2 2 n Pn σ2 1 ,... , σn , then Z = i=1 Xi is sub-Gaussian with variance proxy proxies i=1 i σ. Proof: λ2 σ 2 Since Xi is sub-Gaussian, there exists σi2 such that ∀λ ∈ R, we have E[exp(λXi )] ≤ exp 2 i. Pn For the sum Z = Xi , we apply the MGF of the sum of independent variables Xi : i=1 " n !# n n X Y Indep. Y E[exp(λZ)] = E exp λ Xi =E exp(λXi ) = E[exp(λXi )]. i=1 i=1 i=1 Using the sub-Gaussian property of each Xi , this becomes: ! n n λ2 σ 2 n λ2 Y Y X i 2 E[exp(λZ)] = E[exp(λXi )] ≤ exp = exp σi. 2 2 i=1 i=1 i=1 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.2 Sum of Sub-Gaussian Random Variables Proposition (Sum of sub-Gaussian random variables is sub-Gaussian) If X1 ,... , Xn are independentPsub-Gaussian random variables with variance 2 2 n Pn σ2 1 ,... , σn , then Z = i=1 Xi is sub-Gaussian with variance proxy proxies i=1 i σ. Proof: λ2 σ 2 Since Xi is sub-Gaussian, there exists σi2 such that ∀λ ∈ R, we have E[exp(λXi )] ≤ exp 2 i. Pn For the sum Z = Xi , we apply the MGF of the sum of independent variables Xi : i=1 " n !# n n X Y Indep. Y E[exp(λZ)] = E exp λ Xi =E exp(λXi ) = E[exp(λXi )]. i=1 i=1 i=1 Using the sub-Gaussian property of each Xi , this becomes: ! n n λ2 σ 2 n λ2 Y Y X i 2 E[exp(λZ)] = E[exp(λXi )] ≤ exp = exp σi. 2 2 i=1 i=1 i=1 Pn Therefore, Z is sub-Gaussian with variance proxy σi2. i=1 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables Theorem (Hoeffding’s Inequality for Sub-Gaussian Sums) Let X1 ,... , Xn be independent sub-Gaussian random Pvariables with n variance proxies σ12 ,... , σn2. Then, for the sum Z = i=1 Xi , we have 2 the following tail bound: P |Z − E[Z]| ≥ t ≤ 2 exp − 2 Ptn σ2 , i=1 i ∀t > 0. Proof: Step 1: Moment-Generating Function of Sub-Gaussian Variables: Each Xi is sub-Gaussian, meaning that there exists σi2 such that: λ2 σ 2 i E eλ(Xi −E[Xi ]) ≤ e 2 , ∀λ ∈ R. This inequality characterizes the concentration properties Pn of each Xi and allows us to control the tail behavior of Z = i=1 Xi. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables Theorem (Hoeffding’s Inequality for Sub-Gaussian Sums) Let X1 ,... , Xn be independent sub-Gaussian random Pvariables with n variance proxies σ12 ,... , σn2. Then, for the sum Z = i=1 Xi , we have 2 the following tail bound: P |Z − E[Z]| ≥ t ≤ 2 exp − 2 Ptn σ2 , i=1 i ∀t > 0. Proof: Step 1: Moment-Generating Function of Sub-Gaussian Variables: Each Xi is sub-Gaussian, meaning that there exists σi2 such that: λ2 σ 2 i E eλ(Xi −E[Xi ]) ≤ e 2 , ∀λ ∈ R. This inequality characterizes the concentration properties Pn of each Xi and allows us to control the tail behavior of Z = i=1 Xi. Step 2: Moment-Generating Pn Function of the Sum Z: Since Z = i=1 Xi , and by independence of Xi , we have: High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables h Pn i Q E eλ(Z−E[Z]) = E eλ i=1 (Xi −E[Xi ]) = i=1 E eλ(Xi −E[Xi ]). n High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables h Pn i Q E eλ(Z−E[Z]) = E eλ i=1 (Xi −E[Xi ]) = i=1 E eλ(Xi −E[Xi ]). n By substituting the sub-Gaussian bound from step 1, we get: λ2 σ 2 λ2 Qn Pn 2 i σ E eλ(Z−E[Z]) ≤ i=1 e 2 = e 2 i=1 i. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables h Pn i Q E eλ(Z−E[Z]) = E eλ i=1 (Xi −E[Xi ]) = i=1 E eλ(Xi −E[Xi ]). n By substituting the sub-Gaussian bound from step 1, we get: λ2 σ 2 λ2 Qn Pn 2 i σ E eλ(Z−E[Z]) ≤ i=1 e 2 = e 2 i=1 i. Step 3: Applying Chernoff’s Bound: To bound P(Z − E[Z] ≥ t), we use Markov’s inequality: E eλ(Z−E[Z]) λ(Z−E[Z]) λt P(Z − E[Z] ≥ t) = P e ≥e ≤. eλt By substituting the result from step 2, we get: λ2 Pn 2 σ e2 i=1 i λ2 Pn 2 σ −λt P(Z − E[Z] ≥ t) ≤ λt =e2 i=1 i. e High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables h Pn i Q E eλ(Z−E[Z]) = E eλ i=1 (Xi −E[Xi ]) = i=1 E eλ(Xi −E[Xi ]). n By substituting the sub-Gaussian bound from step 1, we get: λ2 σ 2 λ2 Qn Pn 2 i σ E eλ(Z−E[Z]) ≤ i=1 e 2 = e 2 i=1 i. Step 3: Applying Chernoff’s Bound: To bound P(Z − E[Z] ≥ t), we use Markov’s inequality: E eλ(Z−E[Z]) λ(Z−E[Z]) λt P(Z − E[Z] ≥ t) = P e ≥e ≤. eλt By substituting the result from step 2, we get: λ2 Pn 2 σ e2 i=1 i λ2 Pn 2 σ −λt P(Z − E[Z] ≥ t) ≤ λt =e2 i=1 i. e Step 4: Optimizing over λ: To get the tightest bound, we minimize the exponent by choosing λ 2 Pn that minimizes the power of the expo function, i.e, λ2 2 i=1 σi − λt. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables d λ2 Pn 2 Pn 2 That is, we solve for λ in dλ 2 i=1 σi − λt = λ i=1 σi − t=0. This gives λ = Pnt. Substituting this value of λ back, we obtain: σi2 i=1 2 P(Z − E[Z] ≥ t) ≤ exp − Ptn. 2 σi2 i=1 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables d λ2 Pn 2 Pn 2 That is, we solve for λ in dλ 2 i=1 σi − λt = λ i=1 σi − t=0. This gives λ = Pnt. Substituting this value of λ back, we obtain: σi2 i=1 2 P(Z − E[Z] ≥ t) ≤ exp − Ptn. 2 σi2 i=1 Step 5: Bounding the Lower Tail: A similar argument shows that: 2 P(Z − E[Z] ≤ −t) ≤ exp − Ptn. 2 σi2 i=1 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables d λ2 Pn 2 Pn 2 That is, we solve for λ in dλ 2 i=1 σi − λt = λ i=1 σi − t=0. This gives λ = Pnt. Substituting this value of λ back, we obtain: σi2 i=1 2 P(Z − E[Z] ≥ t) ≤ exp − Ptn. 2 σi2 i=1 Step 5: Bounding the Lower Tail: A similar argument shows that: 2 P(Z − E[Z] ≤ −t) ≤ exp − Ptn. 2 σi2 i=1 Step 6: Combining Upper and Lower Tail Bounds: By the union bound, i.e., P(|Z − E[Z]| ≥ t) ≤ P(Z −E[Z] ≤ −t) +P(Z − E[Z] ≥ t), we conclude: 2 P(|Z − E[Z]| ≥ t) ≤ 2 exp − Ptn. End of proof. □ 2 σi2 i=1 High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.3 Hoeffding’s Inequality for Sub-Gaussian Random Variables d λ2 Pn 2 Pn 2 That is, we solve for λ in dλ 2 i=1 σi − λt = λ i=1 σi − t=0. This gives λ = Pnt. Substituting this value of λ back, we obtain: σi2 i=1 2 P(Z − E[Z] ≥ t) ≤ exp − Ptn. 2 σi2 i=1 Step 5: Bounding the Lower Tail: A similar argument shows that: 2 P(Z − E[Z] ≤ −t) ≤ exp − Ptn. 2 σi2 i=1 Step 6: Combining Upper and Lower Tail Bounds: By the union bound, i.e., P(|Z − E[Z]| ≥ t) ≤ P(Z −E[Z] ≤ −t) +P(Z − E[Z] ≥ t), we conclude: 2 P(|Z − E[Z]| ≥ t) ≤ 2 exp − Ptn. End of proof. □ 2 σi2 i=1 Please note that Hoeffding’s inequality is a two-sided concentration inequality—it provides control over both the probability that the sum deviates above its expectation and the probability that it deviates below it. High-Dimensional Statistics Lecture 04/05.11.24 Part II: Concentration bounds Marius Yamakou 2.4 Chernoff’s Inequality Lemma (Chernoff’s Inequality) Let Xi be independentPBernoulli random variables with parameters pi. N Define the sum SN = i=1 Xi and let µ = E[SN ] be its mean. Then, for t any t > µ, we have: P(SN ≥ t) ≤ exp(−µ) exp(1)µ t. Proof: Step 1: To bound P(SN ≥ t), we use Markov’s inequality on an exponential function