Lecture 2 - Adv ML 1.pdf

OFFENSIVE AI LECTURE 2: ADVERSARIAL MACHINE LEARNING 1 – ADVERSARIAL EXAMPLES (WHITEBOX) Dr. Yisroel Mirsky [email protected] Today’s Agenda  Intro to Adversarial Machine Learning  Adversarial Examples  Whitebox Attacks  Gradient-based Attacks  Robust Attacks  Attacking other Models (than classifiers) Dr. Yisroel Mirsky 2 3 Adversarial Machine Learning Dr. Yisroel Mirsky Dr. Yisroel Mirsky 4 Adversarial Machine Learning Why is ML Vulnerable? 1. 2. 3. We don’t fully understand our models  Modern models (DNNs) are hard to interpret (blackboxes), behave unexpectedly  Makes it hard for humans to see/understand flaws We don’t plan for unintended functionality  Training data and model parameters are invisible to user  No edge cases (e.g., too many bounding boxes) We naively trust the environment and our users  Clean data in training  Hardware is safe/secure  ‘Normal’ data during execution (expected inputs)  Only good users Dr. Yisroel Mirsky 5 Adversarial Machine Learning Game Specification 𝐴(𝑡𝑟𝑎𝑖𝑛) 𝐷(𝑡𝑟𝑎𝑖𝑛) 𝒟 (𝑡𝑟𝑎𝑖𝑛) Original data distribution 𝐻 𝐷(𝑟𝑒𝑠𝑡) 𝐷(𝑒𝑥𝑒𝑐) 𝑃𝒟 𝐴(𝑒𝑥𝑒𝑐) 𝑒𝑥𝑒𝑐 𝒟𝒳 Learning algorithm 𝐴(𝑟𝑒𝑠𝑡) 𝑓𝜃 The learned hypothesis (model) Evaluator 𝒟 (𝑒𝑥𝑒𝑐) 𝑒𝑥𝑒𝑐 𝒟𝒴 Game Rules: Attacker selects 𝐴∗ to maximize attack utility and stealth Defender selects 𝐷∗ to minimize attack utility and harm on performance...and repeat Dr. Yisroel Mirsky 6 Adversarial Machine Learning Causative Attack: Where the adversary alters the training data (𝔇 𝑡𝑟𝑎𝑖𝑛 ) or model parameters directly (𝜃 in 𝑓𝜃 ) with operations 𝐴𝑡𝑟𝑎𝑖𝑛 or 𝐴𝑟𝑒𝑠𝑡 , to Corrupt 𝑓 (DoS, bias, mis-predict specific classes or features) Poison 𝑓 Insert a backdoor into𝑓𝜃 (triggered failure) Change the model’s behavior (continuous failure, such as evasion) Exploratory Attack: Where the adversary observes model parameters with operation 𝐴𝑟𝑒𝑠𝑡 or crafts execution data (𝔇 𝑒𝑥𝑒𝑐 ) with operations 𝐴𝑒𝑥𝑒𝑐 , to Extract information about 𝑓𝜃 or 𝔇 𝑡𝑟𝑎𝑖𝑛 Harm the performance of 𝑓𝜃 Control 𝑓𝜃 in a way the user did not intend Dr. Yisroel Mirsky 7 Adversarial Machine Learning Causative vs Exploratory: Visualization Original Causative integrity Exploratory availability Change the labels or features in 𝔇 𝑡𝑟𝑎𝑖𝑛 Add new samples 𝒙 ∈ 𝒳 to 𝔇 𝑡𝑟𝑎𝑖𝑛 Modify parameters of 𝑓 integrity confidentiality Observe parameters of 𝑓 Modify features of 𝒙 ∈ 𝒳 for 𝔇 𝑡𝑒𝑠𝑡 (execute) Observe responses from 𝑓 Dr. Yisroel Mirsky 8 Adversarial Machine Learning Causative vs Exploratory: Real World Examples Causative Case: Company retrains model weekly on new data Boiling Frog Attack: Attacker slowly and covertly poison distribution overtime [Virus Total 2019] Exploratory Case: Company trains model to perform spam detection Model Evasion: Attacker buys the software uses the detection scores to generate camouflaged emails [2019 ProofPoint] Dr. Yisroel Mirsky 9 Adversarial Machine Learning Attacker Knowledge Recall... White-box  Full knowledge of 𝑓 (architecture, parameters, and sometimes 𝔇 𝑡𝑟𝑎𝑖𝑛 )  Can freely analyse the model, evaluate the attack, and get guarantees of success Black-box  No knowledge of 𝑓 (assumptions are made, sometimes the model type is known –e.g., CNN)  Can either freely perform experiment (has the software) or is limited (e.g., cloud AI services) Gray-box  Partial knowledge of 𝑓 (may know the architecture and/or 𝔇 𝑡𝑟𝑎𝑖𝑛 ) Chakraborty A, et al. A survey on adversarial attacks and defences. 2021 10 Adversarial Machine Learning Attacker Knowledge  White-box attacks are possible in practice (zoo, reverse engineering)  Black-box is a more realistic assumption Black-box Adversaries: Non-adaptive black-box (priori)  Can only access 𝒟 (𝑡𝑟𝑎𝑖𝑛) or the training distribution 𝒳~𝒟  Data is used to develop an attack that may work on 𝑓 Adaptive black-box (apriori)  Can query 𝑓 as an oracle  Queries are used to optimize the attack Strict black-box  Can only observe past predictions made by 𝑓, or not even that Chakraborty A, et al. A survey on adversarial attacks and defences. 2021 11 Adversarial Machine Learning Selective Misclassification Misclassification Exploratory Confidence Reduction Selective Misclassification Targeted Misclassification Misclassification Confidence Reduction Causative objective objective Targeted Misclassification Attack Difficulty Increasing Complexity Increasing Complexity White-box Attack method method Modify Model Params Modify Existing Data Adaptive Black-box Attack Inject Data Increasing Attack Difficulty Decreasing Capability Non-Adaptive Black-box Attack Strict Black-box Attack Decreasing Capability Increasing Attack Difficulty Yael Mathov 12 Adversarial Machine Learning A Short History 2010 First Domain Taxonomy 2009 Initial Works 2013 Black-Box Attacks 2013 2012 Initial Efficient Weakness Identification of Adversarial Identification in DNNs Data Points [Szegedy et al.] 2015 Model Inversion 2014 FGSM Discovered [Goodfellow at al.] 2017 2017 Model DNN Real World Attacks Extraction 2019 Why Humans can’t see Adv. samples 2018 2017 2016 2020 Blackbox Obfuscated Fast-pace DNN Transferability Attacks on Gradient Arms Race Defences DNNs broken (ZOO) [Carlini N, et al.] 13 Intro to Adversarial Examples Dr. Yisroel Mirsky Dr. Yisroel Mirsky 14 Adversarial Examples Definition  A specialized input created to confuse or fool a machine learning (ML) model.  Designed to be inconspicuous to human/expert, but have a significant impact on the model’s prediction  The input can be crafted with either   Whitebox assumptions  Blackbox assumptions An adversarial example can be  Digital: A precise manipulation of pixels  Physical: A robust pattern (e.g., that works when photographed) Dr. Yisroel Mirsky 15 Adversarial Examples Example What does this look like to you? How about now? + 𝑥 = 100 × 𝛿 𝛿 Adversarial Perturbation 𝑥 + 𝛿 = 𝑥′ A DNN is 99% sure that it’s a hat Dan Hendrycks, et al. Natural Adversarial Examples, 2021 16 Adversarial Examples Why are we interested in them? Security (our focus) Can attackers covertly control our ML models?  Bypass defenses  Cause critical systems to fail  Commit Fraud  … Reliability Are our models robust?  Will they perform as expected?  Why do they fail? Stop Sign 99% Speed limit Dr. Yisroel Mirsky 17 Adversarial Examples Attack Goals (objective)  Targeted: Make 𝑓 𝑥 ′ = 𝑦𝑡  Untargeted: Make 𝑓 𝑥 ′ ≠ 𝑦 where 𝑦𝑡 ≠ 𝑦 Environment  Digital (direct access to 𝑥 (𝑖) ∈ 𝒙, e.g., pixels)  Physical (can only change the interpreted object, paint a stop sign) Examples:  Make malware be classified as benign software with minimal changes  Make an object detector miss a stop sign without a human noticing  Make a credit system give a teenager lots of credit with small changes  Make a drone fly into enemy territory by minimally modifying a map Mission Impossible Let’s help good guy Tom Cruise look like bad guy Nicolas Cage Dr. Yisroel Mirsky 19 Example Scenario 𝑓 𝑓 𝑥 : 𝒳 → 𝐶, 𝑥 ∈ ℝ𝑛𝑚𝑐 , 𝐶 = {cage, henchman, tom, other}, 𝑦: ground truth class (tom) 𝑦𝑡 : target class (cage) Dr. Yisroel Mirsky 20 Adversarial Examples What is an adversarial example is not  Finding a sample 𝑥 ′ that correlates to 𝒴 is trivial Fooled you!  Meaningful only for some attack scenarios...  Most attacks need to be covert ...to the human not the machine ...but, also to the machine when evading defences (foreshadowing) Dr. Yisroel Mirsky 21 Adversarial Examples Definition A sample 𝑥′ which is similar to 𝑥 but misclassified by 𝑓 = “Similar”: looks the same to a human, expert, etc.. For now, we will focus on attacks on neural networks 𝑥′ 𝑓 𝑥 ′ = Cage Dr. Yisroel Mirsky 22 Adversarial Examples Attack Objective: ′ min 𝑅 (𝑥′, 𝑆) + 𝜆 𝑐 𝑥, 𝑥 𝐴 ′ More Formally: 𝑥: The original image of Tom Cruise that needs to be used 𝑥 ′ : The modified version of 𝑥 which looks like Tom but is classified as Cage with high confidence 𝑥 −𝑓 𝑥 ′ 𝑓 𝑥′ cage Targeted: S: Change cruise (𝑥) into cage attack detection attack failure cruise Untargeted: S: Change cruise (𝑥) into anyone but cruise (henchman or cage) 𝑥′ − 𝑥 < 𝜖 Covert: Don’t change cruise (𝑥) too much to be covert Optimization Problem: Targeted: Untargeted: min 𝑓 𝑥 + 𝛿 𝛿 𝛿 = 𝑥′ − 𝑥 𝑦 min −𝑓 𝑥 + 𝛿 𝛿 𝑠. 𝑡. : 𝛿 < 𝜖, 𝑦𝑡 Dr. Yisroel Mirsky 23 Adversarial Examples Measuring the Adversarial Cost (perturbation perceivability)  The perturbation cost is usually measured as the p-norm of 𝛿 𝑥 𝑝 ≔ ෍ 𝑥𝑖 𝑖=1 𝑙0 norm 𝑙1 norm Example 𝜖-budget 𝜹 𝟎=𝟏 𝑥1 = −1.2 𝑥2 = 0 𝑥 0 =1 -1 𝜹 𝟏 =1 𝑥1 = −0.5 𝑥2 = 0.5 𝑥 1 =1 −1 𝜹 𝟎 =2 𝑥1 = 2.1 𝑥2 = −1.8 For minimizing the number of features used, but not limiting the energy added to each pixel 1 𝑝 𝑛 𝜹 𝟏 =1.7 𝑥1 = 0.7 𝑥2 = −1.5 For minimizing the total energy (sum) added to the pixels 𝑝 𝑝>0 Dr. Yisroel Mirsky 24 Adversarial Examples Measuring the Adversarial Cost (perturbation perceivability)  The perturbation cost is usually measured as a p-norm of 𝛿 1 𝑝 𝑛 𝑥 𝑝 ≔ ෍ 𝑥𝑖 𝑝 𝑖=1 𝑙2 norm 𝜹 𝟐=𝟏 𝑥1 = −0.707 𝑥2 = 0.707 𝑙∞ norm 𝑥 2 =1 𝜹 ∞=𝟏 𝑥1 = −1 𝑥2 = 0.9 𝑥 ∞ =1 1 𝜹 𝟐 =𝟐 𝑥1 = 1 𝑥2 = −1.73 For minimizing the overall energy added to the pixels (More natural looking since pixels with larger values get exponentially higher scores) 𝜹 ∞ = 𝟏. 𝟓 𝑥1 = −1.5 𝑥2 = −1 For minimizing the maximum value added to any one pixel (often used in clipping) 𝑝>0 Diagram modified from Ziv Katzir’s slides 25 Adversarial Examples How are adversarial examples made? (DNNs) Recall how DNNs are Trained 5. Update (perturb) the weights 4. Find the loss gradient ∇𝜃 𝐽 𝜃, 𝑥, 𝑦 = ∇𝜃 ℒ 𝑓𝜃 𝑥 , 𝑦 e.g., 0.1, 0.3, 0.6, 0, 0 𝑦෤ 𝑥 ℒ 𝑦, ෤ 𝑦 2. Get prediction 1. Insert valid Input 𝑓𝜃 𝑦 e.g., 0,0,0,1,0 3. Compute the loss Diagram modified from Ziv Katzir’s slides 26 Adversarial Examples How are adversarial examples made? (DNNs) General White Box Attack Approach Here we are optimizing the change to 𝑥, not 𝜃 5. Update (perturb) the input 4. Find the loss gradient ∇𝑥 𝐽 𝜃, 𝑥, 𝑦 = ∇𝑥 ℒ 𝑓𝜃 𝑥 , 𝑦 𝛿 𝑦෤ 𝑥 ℒ 𝑦, ෤ 𝑦′ 3. Compute the loss 2. Get prediction 1. Insert valid Input Examples 𝑓𝜃 𝑦′ targeted untargeted 0, 0, 0, 1, 0 ∗, 0,∗,∗,∗ 27 Whitebox Attacks Dr. Yisroel Mirsky Goodfellow I, et al. Explaining and Harnessing Adversarial Examples. 2015 Kurakin A, et al. Adversarial machine learning at scale. 2016 28 Gradient-based Attacks 𝑦0 𝑦1 Fast Gradient Signal Method (FGSM) Targeted 𝑦2 Untargeted 𝑦2 𝑥 𝑥 𝑥 𝑦1 𝑦1 𝑦1 Correct label 𝑦0 Gradient for minimizing ℒ 𝑦0 Gradient for maximizing ℒ 𝜖-scaled sign of gradient Correct label Goal: Maximize Loss 𝛿 = 𝜖 ∙ 𝑠𝑖𝑔𝑛 ∇𝑥 ℒ 𝑓𝜃 𝑥 , 𝑦 𝑥′ = 𝑥 + 𝛿 𝑥 Use original loss function 𝑦1 Correct label 𝑦0 𝑦0 Gradient for 𝑦𝑡 = 𝑦1 𝜖- scaled sign of gradient Gradient for 𝑦𝑡 = 𝑦2 𝜖- scaled sign of gradient Goal: Minimize Loss 𝛿 = 𝜖 ∙ 𝑠𝑖𝑔𝑛 ∇𝑥 ℒ 𝑓𝜃 𝑥 , 𝑦𝑡 𝑥′ = 𝑥 − 𝛿 𝑠𝑖𝑔𝑛 gives us an 𝜖 size 𝑙∞ step in the adversarial direction Goodfellow I, et al. Explaining and Harnessing Adversarial Examples. 2015 Kurakin A, et al. Adversarial machine learning at scale. 2016 29 Gradient-based Attacks Fast Gradient Signal Method (FGSM) Stays within 𝑙∞ epsilon budget Stay within valid pixel range (-1 to +1 after normalization) # … Dr. Yisroel Mirsky 30 Gradient-based Attacks Fast Gradient Signal Method (FGSM) Challenges  If 𝜖 is too big, we overshoot  As 𝛿 increases, so to does the visibility of the attack!  FGSM takes one step in 𝑙∞ , Do all features need to be perturbed with the same level of energy 𝜖? Goodfellow I, et al. Explaining and Harnessing Adversarial Examples. 2015 31 Gradient-based Attacks Basic Iterative Method (BIM, I-FGSM) ∇𝑥 ℒ 𝑓𝜃 𝑥𝑖′ , 𝑦 FGSM BIM for 𝑛 iterations: 𝑓𝜃 𝑥𝑖′ 𝛿𝑖+1 = 𝛼 ∙ sign ∇𝑥 ℒ ′ 𝑥𝑖+1 = 𝑥𝑖′ + 𝛿𝑖+1 ′ ′ 𝑥𝑖+1 = clip 𝑥𝑖+1 0,255  ,𝑦 𝛼 ∙ sign ∇𝑥 ℒ 𝑓𝜃 𝑥𝑖′ , 𝑦 Feature range limits (e.g., [0,255] for pixels) Clip: ensures that features in 𝑥 ′ stay within range of the valid pixel space 0,255 , [0,1], [-1,+1], … Problem:  𝜖 ∙ sign ∇𝑥 𝐽ℒ 𝑓𝜃 𝑥𝑖′ , 𝑦 How do we ensure that we search within the budget? Recall, a large 𝛿 is no longer covert… ′ 𝛿 = 𝑥𝑖+1 −𝑥 The perturbation so far Dr. Yisroel Mirsky 32 Gradient-based Attacks 𝑙∞ bound for 𝜖 Sign - Projected Gradient Descent (PGD) With 𝜖 bounded to 𝑙∞ 𝜖 for 𝑛 iterations: 𝑓𝜃 𝑥𝑖′ 𝛿𝑖+1 = 𝛼 ∙ sign ∇𝑥 ℒ ′ 𝑥𝑖+1 = 𝑥𝑖′ + 𝛿𝑖+1 ′ 𝛿 = 𝑥𝑖+1 −𝑥 𝛿 = clip 𝛿 𝜖 ′ 𝑥𝑖+1 = clip 𝑥 + 𝛿 0,255  ∇𝑥 𝐽 𝑓𝜃 𝑥𝑖′ , 𝑦 ,𝑦 𝛼 ∙ sign ∇𝑥 𝐽 𝑓𝜃 𝑥𝑖′ , 𝑦 ′ 𝛿 = 𝑥𝑖+1 −𝑥 The perturbation so far w.r.t. 𝑥 The perturbation so far The perturbation projected onto the 𝑙∞ bound of 𝜖 around 𝑥 If we ever overshoot the 𝑙∞ bound from 𝑥, we project the point back to the bound Problem What if our budget 𝜖 is defined on an 𝑙2 bound, etc? Note: Bounds don’t change each iteration Dr. Yisroel Mirsky 33 Gradient-based Attacks 𝑙2 bound for 𝜖 Sign - Projected Gradient Descent (PGD) With 𝜖 bounded to 𝑙2 ball for 𝑛 iterations: 𝑓𝜃 𝑥𝑖′ 𝛿𝑖+1 = 𝛼 ∙ sign ∇𝑥 ℒ ′ 𝑥𝑖+1 = 𝑥𝑖′ + 𝛿𝑖+1 ′ 𝛿 = 𝑥𝑖+1 −𝑥 𝛿 = Proj2 𝛿, 𝜖 ′ 𝑥𝑖+1 = clip 𝑥 + 𝛿 0,255 𝛼 ∙ sign ∇𝑥 𝐽 𝑓𝜃 𝑥𝑖′ , 𝑦 ,𝑦 ′ 𝛿 = 𝑥𝑖+1 −𝑥 The perturbation so far The perturbation so far w.r.t. 𝑥 Proj𝑝 𝑣, 𝜖 : 𝑣 if: 𝑝 𝑣 = σ𝑛𝑖=1 𝑥𝑖 𝑝 ≤𝜖 return 𝑣 𝜖 return 𝑣 ∗ 𝑣 𝑝 ∇𝑥 𝐽 𝑓𝜃 𝑥𝑖′ , 𝑦 𝜖 𝑝 Problem 1 𝑝  Within the 𝜖-ball Scale down to the 𝜖-ball boundary Each step is an 𝛼−size 𝑙∞ step: 𝛼 ∙ sign ∇𝑥 𝐽 𝑓𝜃 𝑥𝑖′ , 𝑦 What about taking natural steps on 𝑙2 ? I.e., just follow the gradient Dr. Yisroel Mirsky 34 Gradient-based Attacks 𝑙2 bound for 𝜖 Projected Gradient Descent (PGD) With 𝜖 bounded to 𝑙2 ball 𝜖 for 𝑛 iterations: 𝑓𝜃 𝑥𝑖′ 𝛿𝑖+1 = 𝛼 ∙ ∇𝑥 ℒ ′ 𝑥𝑖+1 = 𝑥𝑖′ + 𝛿𝑖+1 ′ 𝛿 = 𝑥𝑖+1 −𝑥 𝛿 = Proj2 𝛿, 𝜖 ′ 𝑥𝑖+1 = clip 𝑥 + 𝛿 𝛼 ∙ sign ∇𝑥 𝐽 𝑓𝜃 𝑥𝑖′ , 𝑦 ,𝑦 ′ 𝛿 = 𝑥𝑖+1 −𝑥 The perturbation so far The perturbation so far w.r.t. 𝑥 0,255 Problem   𝛻𝑥 𝐽 𝑓𝜃 𝑥𝑖′ , 𝑦 2 varies in size significantly with each step In an iterative attack, direction is more important than step size!  ∇𝑥 𝐽 𝑓𝜃 𝑥𝑖′ , 𝑦 We could overshoot or use way mor noise than necessary Dr. Yisroel Mirsky 35 Gradient-based Attacks Projected Gradient Descent (PGD) With 𝜖 bounded to 𝑙2 ball, 𝑙2 normalized steps for 𝑛 iterations: 𝛿𝑖+1 = 𝛼 ∙ ′ 𝑥𝑖+1 𝑥𝑖′ ∇𝑥 𝐽 𝑓𝜃 𝑥𝑖′ ,𝑦 ∇𝑥 𝐽 𝑓𝜃 𝑥𝑖′ ,𝑦 = + 𝛿𝑖+1 ′ 𝛿 = 𝑥𝑖+1 −𝑥 𝛿 = Proj2 𝛿, 𝜖 ′ 𝑥𝑖+1 = clip 𝑥 + 𝛿 𝑟=𝛼 2 The perturbation so far w.r.t. 𝑥 0,255 Problem  Gradient Descent is prone to getting stuck in local optima.. 35 Dr. Yisroel Mirsky 36 Gradient-based Attacks Momentum - Projected Gradient Descent (PGD) With 𝜖 bounded to 𝑙2 ball, 𝑙2 normalized steps, 𝜆 momentum for 𝑛 iterations: 𝛿𝑖 = 0 𝛿𝑖+1 = 𝜆 ∙ 𝛿𝑖 + 𝑟=𝛼 36 ∇𝑥 ℒ 𝑓𝜃 𝑥𝑖′ ,𝑦 ∇𝑥 ℒ 𝑓𝜃 𝑥𝑖′ ,𝑦 𝛿𝑖+1 = 𝛼 ∙ 𝛿𝑖+1 ′ 𝑥𝑖+1 = 𝑥𝑖′ + 𝛿𝑖+1 ′ 𝛿 = 𝑥𝑖+1 −𝑥 𝛿 = Proj2 𝛿, 𝜖 ′ 𝑥𝑖+1 = clip 𝑥 + 𝛿 2 The perturbation w.r.t. 𝑥𝑖′ The perturbation so far w.r.t. 𝑥 0,255 Other improvements:  Use better GD optimizers: Adam, AdaGrad, …  Run attack many times with random starts (𝑥 + 𝑧) for a random 𝑧 in 𝜖-ball, then select the best result Dr. Yisroel Mirsky Gradient-based Attacks What value of 𝝐 should be used as a budget? CIFAR 𝑙∞ Rule of thumb: small enough so that a human can’t notice  Value dependent on image resolution and norm  Here epsilon is set before normalization of model’s inputs! Resolution: For 𝒍∞: For 𝒍𝟐 : 32x32x3 𝜖= 3 ≈ 0.01 255 255 𝜖= ≈1 255 224x224x3 𝜖= 13 ≈ 0.05 255 ImageNet Dog 𝑙2 Toilet paper 1300 𝜖= ≈5 255 𝜖=0 𝜖 = 0.05 Goodfellow I, et al. Explaining and Harnessing Adversarial Examples. 2015 38 Gradient-based Attacks Until now,  Staying in the 𝜖-bound requires clipping (bad for GD optimization)  Achieving the target 𝑦𝑡 was constrained to the output of 𝑓’s softmax layer (must sum to one) We did this because... min 𝛿 𝑝 subject to 𝑓 𝑥0′ + 𝛿 = 𝑦𝑡 , 𝑥 + 𝛿 ∈ 0,1 𝑛𝑚...is a very difficult non-linear optimization problem Why not just directly optimize over the attacker’s limitations? Goodfellow I, et al. Explaining and Harnessing Adversarial Examples. 2015 39 Gradient-based Attacks Carlini & Wagner Attack (CW) 1. Logit: the linear combination before the (softamax) activation Capture 𝑅𝐴 objective with a non-negative differentiable function 𝑞 𝑥′ = 𝑚𝑎𝑥 𝑚𝑎𝑥𝑖≠𝑡 𝑧 𝑥′ Properties: 2. 𝑖 − 𝑧 𝑥′ 𝑡 , 0 𝑧 𝑥 𝑖 : output of the i-th logit  Minimum is 0 -when all other logits (𝑧) are smaller that the target logit  Increases when class probability for 𝑡 is less than the largest other class Capture 𝑐 objective with a differentiable function Properties:  1 𝛿𝑖 = tanh 𝜔𝑖 + 1 − 𝑥𝑖′ 2 Forces 𝑥′ + 𝛿 to stay in range [0,1], where 𝜔 is an optimized parameter 𝑣 𝑥 = 1 tanh 𝑥 + 1 2 Image from Dou Z, et al. Mathematical Analysis of Adversarial Attacks. 2018 40 Gradient-based Attacks Carlini & Wagner Attack (CW) Put it all together: min 𝛿 2 +𝛼⋅𝑞 𝑥+𝛿 1 𝛿 = 𝑡𝑎𝑛ℎ 𝜔 + 1 − 𝑥 ′ 2 𝑞 𝑥 ′ = 𝑚𝑎𝑥 𝑚𝑎𝑥𝑖≠𝑡 z 𝑥 ′ BIM 𝑖 − z 𝑥 ′ 𝑡 , −𝑘 Stop here and focus on magnitude 𝜖 = 0.08 Moosavi S, et al. Universal adversarial perturbations. 2016 42 Universal Adversarial Perturbation (UAP) One 𝛿 to Rule Them All +  Until now, we always modified a target sample 𝑥0 into 𝑥 ′  𝛿𝑖 for 𝑥𝑖′ does not transfer to 𝑥𝑗′ → Approach: Optimize the pixels in 𝛿 across batches of samples Moosavi S, et al. Universal adversarial perturbations. 2016 43 Universal Adversarial Perturbation (UAP) PGD-𝑙2 , Targeted Attack Example ℒ 𝑓 𝑋 , 𝑦𝑡 = 1 ෍ 𝑓 𝑥𝑖 , 𝑦𝑡 𝑚 𝑖 Attack with UAP: Generate UAP: 𝛿=0 for batch 𝑋𝑖 in 𝒟: 𝑋𝑖′ = 𝑋𝑖 + 𝛿 𝛿 = 𝛿 + 𝛼 ∙ ∇𝛿 ℒ 𝑓𝜃 𝑋𝑖′ + 𝛿 , 𝑦𝑡 𝛿 = Proj2 𝛿, 𝜖 𝑥 = clip 𝑥 + 𝛿 Add the same 𝛿 to each sample in 𝑋 0,255 Dr. Yisroel Mirsky 44 Robust Adversarial Examples Attacks in the Physical World Can’t select 𝒙𝟎  Attacker has little control over perspective and lighting Can’t use a universal perturbation  𝑓 Face Recognition Attacker cannot modify the pixels! ? Solution: Optimize 𝛿 over many different possible transformations NIDS 𝑓 Intrusion Detection (traffic) Athayle A, et al. Synthesizing Robust Adversarial Examples. 2018 45 Robust Adversarial Examples Attacks in the Physical World Expectation over Transformation (EoT) 𝐴: The function that applies the transformation 𝑇: The set of all possible transformations (e.g., perspectives, noise, brightness,..) 𝑥 ′ = max 𝔼𝑡~𝑇 log Pr(𝑦 ′ |𝐴(𝑥, 𝑡)) 𝑥 “optimize over many[synthetically] augmented samples” Athayle A, et al. Synthesizing Robust Adversarial Examples. 2018 46 Robust Adversarial Examples Attacks in the Physical World Expectation over Transformation (EoT) 𝛿=0 for batch 𝑋𝑖 in 𝒟: 𝑋𝑖′ = 𝐴 𝑇 (𝑋𝑖 + 𝛿) 𝛿 = 𝛿 − 𝛼 ∙ ∇𝛿 ℒ 𝑓𝜃 𝑋𝑖′ , 𝑦𝑡 𝛿 = Proj2 𝛿, 𝜖 Paint surface 𝑥𝑘 : 𝑥𝑘′ = clip 𝑥𝑘 + 𝛿 0,255 e.g., images of turtle’s surfaces Applies random transforms 𝑡 ∈ 𝑇 to each sample in 𝑋𝑖 𝑡 must be a differentiable transform (e.g., random resize) Dr. Yisroel Mirsky 47 Adversarial Patch What is a Patch? A small portion of an image with a strong adversarial perturbation Classification Object Recognition toaster 𝑝′ = max 𝔼𝑡~𝑇 log Pr(𝑦 ′ |𝐴(𝑝, 𝑥, 𝑡)) 𝑝 Brown T, et al. Adversarial Patch. 2018 Adhikari A, et al. ADVERSARIAL PATCH CAMOUFLAGE AGAINST AERIAL DETECTION. 2020 Sharif M. Er al. Accessorize to a Crime: Real and Stealthy Attacks on State-ofthe-Art Face Recognition. 2016 Thys S, et al. Fooling automated surveillance cameras: adversarial patches to attack person detection. 2019 𝐴: applies transformation to patch, and then to image 𝑥 Dr. Yisroel Mirsky 48 Adversarial Patch Non-Printability Score (NPS) Challenge: digital colour does not match printable color Solution: Follow CW approach and add this to the optimization process... ℒ𝑛𝑝𝑠 = ෍ ෍ min 𝑝 𝑖, 𝑗 − 𝑐 𝑖 𝑗 𝑐∈𝐶 𝐶 is the set of all printable colors Athayle A, et al. Synthesizing Robust Adversarial Examples. 2018 49 Adversarial Patch Patch Generation How do you apply a patch? Slicing X[i:j]=P  Works when the patch is square with frame  Won’t work with skew, rotate, … transforms Masking P*M+x*(1-M)  A binary map is used to apply pixels to select locations  Done in GPU memory (fast) Athayle A, et al. Synthesizing Robust Adversarial Examples. 2018 50 Adversarial Patch Patch Generation Masking 𝑃 𝑡(𝑃) Rotate, scale, place 𝑡(𝑀) 𝑥 𝑡 𝑃 ∙𝑀+𝑥 Athayle A, et al. Synthesizing Robust Adversarial Examples. 2018 51 Adversarial Patch Patch Generation Masking 𝑃=0 for batch 𝑋𝑖 in 𝒟: 𝑃𝑡 , 𝑀𝑡 = 𝑡(𝑃, 𝑀) 𝑋𝑖′ = 𝑋𝑖 + 𝑃𝑡 ∗ 𝑀𝑡 𝑃 = 𝑃 + 𝛼 ∙ ∇𝑃 ℒ 𝑓𝜃 𝑋𝑖′ , 𝑦𝑡 + ℒ 𝑃 e.g., images of turtle’s surfaces 𝑁𝑃𝑆 *Note: there is no epsilon budget on patches! - Create apply a random transformation and placement of P, do the same to the mask. - Repeat for each image in batch with random transformations Dr. Yisroel Mirsky 52 Attacking other ML Models Adversarial Examples are not just for Neural Networks! Classifiers Anomaly Detectors Confidently blue class but far into the red class zone Clustering Etc... Reinforcement learning,...... e.g., evasion Dr. Yisroel Mirsky 53 Attacking Regression Models Adversarial Examples also apply to regression too Consider Linear Regression Architecture 𝑦 = 𝜃0 + 𝜃1 𝑥 1 + 𝜃2 𝑥 2 + 𝜃3 𝑥 3 + 𝜃4 𝑥 4 Parameters 𝜃 = [0, 48, −12, −4, 1] 𝑦 = 48𝑥 − 12𝑥 2 − 4𝑥 3 + 𝑥 4 Model Objective: maximize 𝑦 for 𝑥 = 4 Budget: 𝜖 = 0.75 Dr. Yisroel Mirsky 54 Attacking Regression Models Objective: maximize 𝑦 for 𝑥 = 4 Budget: 𝜖 = 0.75 Model: 𝑦 = 48𝑥 − 12𝑥 2 − 4𝑥 3 + 𝑥 4 Method 1: Solve max 48(𝑥 + 𝛿)1 −12(𝑥 + 𝛿)2 −4(𝑥 + 𝛿)3 +(𝑥 + 𝛿)4 𝛿 s. t. : 𝛿 < 𝜖 Dr. Yisroel Mirsky 55 Attacking Regression Models Objective: maximize 𝑦 for 𝑥 = 4 Budget: 𝜖 = 0.75 Model: 𝑦 = 48𝑥 − 12𝑥 2 − 4𝑥 3 + 𝑥 4 Method 2: Attack gradient (FGSM)  Find Derivative: 𝛻𝑥 𝑓 𝑥 = 48 − 24𝑥 − 12𝑥 2 + 4𝑥 3  Set start values: 𝑥 = 4, 𝜖 = 0.75, 𝛼 = 0.8 1. 𝛿 = 𝛼 ∙ 𝑠𝑖𝑔𝑛 𝛻𝑥 𝑓 𝑥 2. 𝑥 ′ = 𝑐𝑙𝑖𝑝 𝑥 + 𝛿 𝜖 = 0.8 ∙ 𝑠𝑖𝑔𝑛 16 = 0.8 ∙ (+1) = 0.8 = 4.75 𝑓 𝑥 = 0, 𝑓 𝑥 ′ = 42.39 Dr. Yisroel Mirsky 56 Attacking Regression Models What about inputs with 2 features? Objective: maximize 𝑦 for 𝑥 = 𝑥1 , 𝑥2 = 4, 5 Budget: 𝜖 = 0.75 𝑦 𝑥2 Model: 𝑦 = 48𝑥1 − 12𝑥12 − 4𝑥13 + 𝑥14 +39𝑥2 − 9𝑥22 − 7𝑥23 + 𝑥24 𝑥2 FGSM: 𝑥1 Partial Derivatives:  𝜕 𝜕𝑥1 Set start values: =4 𝑥13 − 3𝑥12 − 6𝑥1 + 12 , 𝜕 𝜕𝑥2 = 4𝑥23 − 21𝑥22 𝑥 = 4, 5 , 𝜖 = 0.75, 𝛼 = 0.8 1. 𝛿 = 𝛼 ∙ 𝑠𝑖𝑔𝑛 𝛻𝑥 𝑓 𝑥 2. 𝑥 ′ = 𝑐𝑙𝑖𝑝 𝑥 + 𝛿 𝜖 = 0.8 ∙ 𝑠𝑖𝑔𝑛 16, −76 = 0.8 ∙ +1, −1 = 0.8, −0.8 = 𝑐𝑙𝑖𝑝 4, 5 + 0.8, −0.8 𝑓 𝑥 = −280, 𝑓 𝑥 ′ = −160.013 = 4.8, 4.2 − 18𝑥2 + 39 𝑥1 Dr. Yisroel Mirsky 57 Attacking Object Detectors Attacking Object Detectors  ODs perform regression and classification (over a grid)  Sample attack principles apply! YOLOv1 Examples: Liu X, et al. DPATCH: An Adversarial Patch Attack on Object Detectors. 2019 Chow K, et al. TOG: Targeted Adversarial Objectness Gradient Attacks on Real-time Object Detection Systems. 2020 Dr. Yisroel Mirsky 58 Attacking Sequential Models Recurrent networks (LSTM, RNN,..) are vulnerable too Attacking Audio and NLP Models Attacks Carlini N, et al. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. 2018  Change interpretation of speech  Make any noise recognized as speech (covert)  Change identity of speech (covertly) Dr. Yisroel Mirsky 59 Other Ways to Evade Models Generic Evasion Attacks Mimicry  Mix benign features into 𝑥 ′ so it will appear benign Exploit Limitations  Lower 𝑥 ′ below the noise floor e.g., avoid anomaly detector by slowly sending malicious packets Polymorphic Blending  Obfuscate features so they won’t reflect malicious ones anymore (for attacking classifiers, not anomaly detectors) Dr. Yisroel Mirsky 60 Other Ways to Evade Models Evading the Model Altogether  When planning a model, consider how the adversary may interact with it Example scenario: Smartphone Malware detector  Main feature: check if sends data when screen is off Evasion 1: send only when app is open Evasion 2: send only with benign data (in parallel) Dr. Yisroel Mirsky 61 Recommended Reading: Week 2 Ilyas A, et al. Adversarial examples are not bugs, they are features. Advances in neural information processing systems. 2019 https://proceedings.neurips.cc/paper/2019/file/e 2c420d928d4bf8ce0ff2ec19b371514-Paper.pdf

Lecture 2 - Adv ML 1.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue