Lecture 3 - Adv ML 2.pdf
Document Details
Uploaded by EyeCatchingSamarium
Tags
Full Transcript
OFFENSIVE AI LECTURE 3: ADVERSARIAL MACHINE LEARNING 2 – EXPLORATORY ATTACKS Dr. Yisroel Mirsky [email protected] Today’s Agenda Adversarial Examples - Continued Why They Work Blackbox Attacks Transferability & Surrogates Sponge Samples Dr. Yisroel Mirsky 2 3 Adversarial Examples Why do th...
OFFENSIVE AI LECTURE 3: ADVERSARIAL MACHINE LEARNING 2 – EXPLORATORY ATTACKS Dr. Yisroel Mirsky [email protected] Today’s Agenda Adversarial Examples - Continued Why They Work Blackbox Attacks Transferability & Surrogates Sponge Samples Dr. Yisroel Mirsky 2 3 Adversarial Examples Why do they work? Dr. Yisroel Mirsky Adversarial Examples Why are Neural Networks fooled so easily? Are DNNs simply not robust models? If so, why do they work well when deployed? Dr. Yisroel Mirsky 4 5 “Adversarial Examples are the outcome of function fitting” “Adversarial Examples are correlated nonrobust features” Two Schools of Thought Dr. Yisroel Mirsky Dr. Yisroel Mirsky 6 Non-Robust Features What does a network learn? …patterns that correlate to a class Train set → CAT Robust features = CAT But not all features are perceivable by Humans NNs find ‘non-robust’ features very important too Non-robust (brittle) features → CAT Ilyas A, et al. Adversarial Examples Are Not Bugs, They Are Features. 2019 7 Non-Robust Features Adversarial Examples are Features not Bugs Are perturbations a flaw of 𝑓 or its data? Models trained on only nonrobust features work well on standard images NRF are properties of the dataset (not noise) Ilyas A, et al. Adversarial Examples Are Not Bugs, They Are Features. 2019 8 Non-Robust Features Adversarial Examples are Features not Bugs But are NRF important to 𝑓? Models trained on adversarial examples (labelled with 𝑦𝑡 ) also perform well... NRF are important to decision making (even biased) A. Shamir, et al. The Dimpled Manifold Model of Adversarial Examples in Machine Learning. 2022 9 Function Fitting If it’s really just correlated noise… …why isn’t the ideal perturbation (UAP) the same for all targeted attacks? 𝑦cat → 𝑦truck A. Shamir, et al. The Dimpled Manifold Model of Adversarial Examples in Machine Learning. 2022 10 Function Fitting The Nature of Function Fitting Decision boundaries are functions 𝑓 𝑥 = 𝑝 𝑦𝑐 𝑥 Let’s train a model to fit this data: 𝑥2 𝑥1 A. Shamir, et al. The Dimpled Manifold Model of Adversarial Examples in Machine Learning. 2022 11 Function Fitting The Nature of Function Fitting Decision boundaries are functions 𝑓 𝑥 = 𝑝 𝑦𝑐 𝑥 Let’s train a model to fit this data: Observations Every sample is near every other class because there is boundary pressing up against it Competing labels cause a force on the boundary because of the loss function Effect more apparent and subtle in higher dimensions 𝑥2 𝑥1 12 So, which is it? Probably a combination of both, and possibly other unknown factors Dr. Yisroel Mirsky 13 Adversarial Examples Black Box Attacks Dr. Yisroel Mirsky Dr. Yisroel Mirsky 14 Blackbox Attacks Recall Blackbox Model: A target model whose internal workings are not known to the attacker. Architecture, dataset, hyperparameters, … Example Deployments: In the cloud: E.g., Machine Learning as a Service (MLaaS) In software tools: E.g., cyber attack detection In backend: E.g., credit scoring In hardware: E.g., Autonomous vehicles Dr. Yisroel Mirsky 15 Priori vs A Priori Attacks Priori – we know something about 𝑓 before the attack (e.g., Surrogate) A Priori – we know something about 𝑓 during the attack (e.g., query-based) One query, the attack 𝑥′ Numerous queries Best Effort, no feedback Converges on ideal solution But not too many queries! Some attacks combine both… Mahmood K. et al. Back in Black: A Comparative Evaluation of Recent State-Of-The-Art Black-Box Attacks. IEEE Access 2022 16 Blackbox Attacks Taxonomy of approaches Black Box Attacks Transfer-based Lower Score-based Detectability Decision-based Higher 17 Transfer-based Attacks Dr. Yisroel Mirsky Dr. Yisroel Mirsky 18 Transferability Do adversarial examples transfer? Is there some commonality between decision boundaries of different models? Ozbulak U, et al. Selection of Source Images Heavily Influences the Effectiveness of Adversarial Attacks. 2021 19 Transferability Yes! All images in 𝒟 (𝑡𝑒𝑠𝑡) Non-robust images from 𝒟 (𝑡𝑒𝑠𝑡) N. papernot. ‘Practical black-box attacks against machine learning. Asia CCS 2017 20 Transfer-based Attacks Mechanism 1. 2. 3. 4. Knowledge of Training Data: Adversaries knows the training data distribution. 𝑥~𝒟 (𝑡𝑟𝑎𝑖𝑛) Attacker’s Domain 𝑥, 𝑦 ∈ 𝒟 ′ 𝑥~𝒟 (𝑡𝑟𝑎𝑖𝑛) መ Creation of Substitute Model 𝑓(𝑥) : The adversary trains an independent classifier on his/her own data. TRAIN 𝒟 (𝑡𝑟𝑎𝑖𝑛) Whitebox attack on Substitute: A whitebox attack is performed on the substitute (surrogate) model. Transferability: The adversarial sample is sent to the victim model with increased likelihood of success Victim’s Domain መ 𝑓(𝑥) ATTACK 𝑥 + 𝑥′ 𝛿 ∆ መ 𝑓(𝑥) 𝑓(𝑥) Zhou M. et al. DaST: Data-free Substitute Training for Adversarial Attacks. CVPR 2020 21 Transfer-based Attacks Three Versions No Query Access: Query Access: Data: Use similar open source dataset to 𝒟 (𝑡𝑟𝑎𝑖𝑛) Data: Label the dataset using 𝑓(𝑥) Strong: hard to detect and prevent Weaker: more effective but requires queries that can be detected Unbounded Query Access: Data: Generated by Querying 𝑓 𝑥 using a Generative Adversarial Network (GAN) e.g., DAST Weakest: most effective but requires many queries which are very noisy (not realistic) Ambra Demontis et. al. Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. USENIX 2019 22 Transferability Why is there Transferability? Similar data distributions 𝑡𝑟𝑎𝑖𝑛 results in 𝒟𝑡𝑟𝑎𝑖𝑛 and 𝒟 For S1: Same correlated non-robust features For S2: Similar issues with function fitting among architectures Why isn’t there perfect Transferability? Different architectures Different data 𝑙2 𝜖 − ball 𝑓 𝑥 initial / source example whitebox adversarial example blackbox adversarial example 𝑓መ 𝑥 Ambra Demontis et. al. Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. USENIX 2019 23 Transferability When are Samples more likely to transfer? Gradient Alignment 𝑙2 𝜖 − ball 𝑓 𝑥 When the gradient for the adversarial example is in the same direction for both models When the loss surface around the surrogate’s respective boundary is smooth Less likely to reach local minima whitebox adversarial example blackbox adversarial example Low Loss Variance initial / source example 𝑓መ 𝑥 Ambra Demontis et. al. Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. USENIX 2019 24 Transferability Observation 1: Transferability relates to model complexity (loss variance) Tip 1: Use a simpler surrogate Reduces variability in loss landscape Ambra Demontis et. al. Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. USENIX 2019 25 Transferability Observation 2: Transferability relates to 𝝐 budget Gradients between 𝑓(𝑥) and መ 𝑓(𝑥) are not perfectly aligned… Tip 2: Use larger perturbation when possible Maximum-confidence attacks transfer better 𝑙2 𝜖 − ball Dr. Yisroel Mirsky 26 Transferability What if the attacker has more priori information? The architecture 𝑓 The training data 𝒟train The hyper parameters (number of epochs, etc) Will the resulting 𝒇 guarantee perfect transferability? Ziv Katzir, et al. Who’s Afraid of Adversarial Transferability? 2022 27 Transferability No… When training 𝑓 and መ the random seed 𝑓, used to initialize 𝜃 is important too! Dr. Yisroel Mirsky 28 Increasing Transferability Approaches 1. Generate samples using a more generic model (e.g., less complex) 2. Utilize the entire epsilon budget 3. Generate samples with better transfer diversity 4. surrogate diversity (optimize over diverse architectures) input diversity (optimize over diverse inputs) output diversity (optimize over diverse outputs) Perform Ranking Estimate the best image to use Estimate the best perturbation to use Dr. Yisroel Mirsky 29 Increasing Transferability Surrogate Diversity Objective Find a surrogate𝑓መ that is closest to the unknown 𝑓 Other words: find a surrogate that generalizes to any other model Ensemble Approach: find a 𝛿 that works on many different surrogates arg min σ𝑖 𝑓መ𝑖 𝑥 + 𝛿 𝛿 𝑦 𝑠. 𝑡. : 𝛿 < 𝜖 Yanpei L. et al. Delving into transferable adversarial examples and black-box attacks, ICLR 2017 30 Increasing Transferability Surrogate Diversity Ensemble of Surrogates Each surrogate model has a similar boundary with some variance Yanpei L. et al. Delving into transferable adversarial examples and black-box attacks, ICLR 2017 31 Increasing Transferability Surrogate Diversity Ensemble of Surrogates Merging them together helps us make a more generalized decision Yanpei L. et al. Delving into transferable adversarial examples and black-box attacks, ICLR 2017 32 Increasing Transferability Surrogate Diversity Ensemble of Surrogates Loss for first surrogate 𝑥+𝛿 𝑓መ1 𝑦1 ℒ1. 𝑓መ2 𝑦2 ℒ2. … 𝑓መ𝑛 𝑦3 ℒ3. Total loss for all surrogates σ ℒ ∆ update Dong Y. et al. Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks. CVPR 2019 33 Translation Invariant Attacks Input Diversity Problem We can’t assume that our surrogates will be similar to (aligned with) 𝑓 Input Diversity: Make a 𝛿 that is feature invariant instead of being surrogate invariant i.e., Find a UAP for images similar to 𝑥 Generalize the optimization process by transforming the surrogate’s inputs Invariant: “unaffected by changes” Dong Y. et al. Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks. CVPR 2019 34 TIMI Attack Input Diversity Goal Given 𝑥, make a perturbation 𝛿 such that 𝑓 𝑇 𝑥 +𝛿 ≠ 𝑦 Random transformations to the features in 𝑥 Dong Y. et al. Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks. CVPR 2019 35 TIMI Attack Input Diversity Attacker’s Domain Victim’s Domain TRAIN 𝒟 (𝑡𝑟𝑎𝑖𝑛) 𝑥 𝑇 ATTACK 𝑥1 𝑥2 መ 𝑓(𝑥) 𝑥𝑏 + 𝑥′ 𝛿 መ 𝑓(𝑥) ∆ 𝑓(𝑥) Dong Y. et al. Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks. CVPR 2019 36 TIMI Attack Input Diversity Short-cut: Image Shift Augmentations 𝛿 over a pixel shift augmentations is equivalent to simply, applying all shifts to the gradient of on 𝑥 Equivalent to convolving the gradient with a kernel 𝑾 TI-FGSM: No need for batches! (if doing shift only) Dong Y. et al. Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks. CVPR 2019 37 TIMI Attack Input Diversity Tashiro Y, et al. Diversity Can Be Transferred: Output Diversification for White- and Black-box Attacks. NeurIPs 2020 38 Output Diversity Attacks (ODS) Output Diversity Output Diversity: Attacker’s Domain Generalize the optimization process by adding noise to the surrogate’s outputs Victim’s Domain TRAIN 𝒟 (𝑡𝑟𝑎𝑖𝑛) መ 𝑓(𝑥) Simulates uncertainty of victim ATTACK 𝑥 መ 𝑓(𝑥) + 𝑥′ 𝛿 ∆ 𝜎 𝑓(𝑥) M. Levy, Y. Elovici, Y. Mirsky. Transferability Ranking of Adversarial Examples, 2022 39 Transferability Ranking What if the attacker can choose which sample to use in the attack? Victim Attacker +𝛿 𝑓(𝑥) M. Levy, Y. Elovici, Y. Mirsky. Transferability Ranking of Adversarial Examples, 2022 40 Transferability Ranking Then which sample should be use? transferability Attacker Victim 𝑓(𝑥) M. Levy, Y. Elovici, Y. Mirsky. Transferability Ranking of Adversarial Examples, 2022 41 Transferability Ranking Approach See which sample in our set affects other models (surrogates) the most 𝐹0 : a set of surrogate models with different architectures than 𝑓 𝜎𝑦 𝑓 : softmax value of the logit for target label 𝑦 Expected Transferability (ET): heuristic based on a set surrogates that measures the likelihood of a sample to transfer a to an unknown model M. Levy, Y. Elovici, Y. Mirsky. Transferability Ranking of Adversarial Examples, 2022 42 Transferability Ranking Ranking can be performed on perturbation on a single image Victim Attacker transferability Different perturbations from Different attack algorithms Different attack parameters Random starts 𝑓(𝑥) 𝛿𝑖 + 43 Score-based Attacks Dr. Yisroel Mirsky Dr. Yisroel Mirsky 44 Score-based Attacks Mechanism Repeated Queries: These attacks involve repeatedly querying the unseen classifier to craft adversarial noise. Score Vector Requirement: The classifier must output a score vector, such as probabilities or pre-softmax logits, for these attacks to work. Advantage Over Transfer Attacks No Dataset Knowledge Needed: Unlike transfer attacks, score-based attacks don't require knowledge of the dataset Solution has higher guarantee of success Attacker’s Domain 𝑥𝑖′ ′ 𝑥𝑖+1 optimize Victim’s Domain 𝑓(𝑥) 𝑦𝑖′ 𝑦𝑖′ ∈ 0,1 𝑐 Dr. Yisroel Mirsky 45 Score-based Attacks Focus of Recent Developments Reducing Queries: Efforts to decrease the number of queries needed to conduct the attack. Minimizing Noise: Aiming to reduce the magnitude of noise required for successful adversarial examples. Approaches: Estimate Gradient, Locality Search Notable Recent Score-Based Attacks qMeta, P-RGF, ZO-ADMM, TREMBA, Square Attack, TIMI, ZO-NGD, PPBA, Simba, GFCS Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 46 Estimating Gradients Motivation Strong attacks (whitebox) compute the gradient of 𝑓𝜃 (𝑥) In black box, no access to 𝜃 Solution: estimate the gradient! Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 47 Estimating Gradients Gradients in Optimization Example with one feature First Order Gradient Zero-th Order Gradient The gradient is the slop at this point ℒ(𝑥) Requires computing 𝜕ℒ(𝑥) the derivative 𝜕𝑥 ℒ(𝑥) 𝑥 Requires evaluating 𝑓𝜃 (𝑥 + ℎ) and 𝑓𝜃 (𝑥 − ℎ) 𝑥−ℎ 𝑥 ℎ is a small constant (e.g., 0.0001) 𝑥+ℎ Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 48 Estimating Gradients Zero-th Order Optimization One step in Gradient Descent With one feature 𝑥∈ℝ 𝜕ℒ ℒ 𝑥 + ℎ − ℒ 𝑥 − ℎ ≈ 𝜕𝑥 2ℎ With 𝒏 features 𝑥 ∈ ℝ𝑛 𝜕ℒ ℒ 𝑥𝑖 + ℎe𝑖 − ℒ 𝑥𝑖 − ℎe𝑖 ≈ 𝜕𝑥𝑖 2ℎ …repeat for 𝑖 ∈ 𝑛 e𝑖 is a standard basis vector with the 𝑖-th component 1 Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 49 ZOO Attack Zero-th Order Optimization Adversarial Examples - visualized Untargeted Compute… 𝑓𝜃 𝑥0 − ℎe0 𝑦 𝑓𝜃 𝑥0 + ℎe0 𝑦 𝑓𝜃 𝑥𝑖 + ℎe𝑖 Original Image 𝑓𝜃 𝑥1 − ℎe1 𝑦 𝑓𝜃 𝑥1 + ℎe1 − 𝑓𝜃 𝑥𝑖 − ℎe𝑖 2ℎ 𝑦 𝑦 𝑥 … 𝑥 𝑦 − 𝛿 = 𝑥′ Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 50 ZOO Attack Challenge One iteration requires 2𝑛 queries on the model CIFAR: 𝑛 = 32 × 32 × 3 = 3,072 ImageNet: 𝑛 = 224 × 224 × 3 = 150,528 A successful attack requires many iterations! Too many queries makes the attack impractical/non-covert Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 51 ZOO Attack Strategy 1: Stochastic Coordinate Descent Approach: For each iteration, select a random coordinate (feature) to optimize (with replacement) Converges faster Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 52 ZOO Attack Strategy 2: Hierarchical Attack Approach: Attack groups of coordinates together, then gradually decrease the group size. … Increase resolution when loss converges Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 53 ZOO Attack Strategy 3: Attack Important Coordinates First Important pixels are in the center of the image, so sample those more often… Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 54 ZOO Attack MNIST Chen, P. et al. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISEC. 2017 55 ZOO Attack CIFAR10 Rastrigin, L. The convergence of the random search method in the extremal control of a many parameter system. Automaton & Remote Control 24, 1337–1342 (1963) 56 Random Search Avoiding Gradients Instead of computing gradients, we can just follow a greedy approach Random Search Optimization: 1. Try 𝑓(𝑥𝑖′ + 𝛿) where 𝛿 is a random update 2. If 𝑓(𝑥𝑖′ + 𝛿) reduces the loss, ′ 𝑥𝑖+1 = 𝑓(𝑥𝑖′ + 𝛿) Else Try another random 𝛿 𝑦0 𝑦1 Gup C. et al. Simple Black-box Adversarial Attacks. PMLR 2019 SIMBA Simple black Box Attacks A Random Search based Attack Motivation: ℝ𝑛×𝑚×𝑐 is too large for a random search Approach: choose random search directions on an orthonormal basis 𝑄, if the loss improves then proceed. Otherwise take the opposite direction and proceed. The paper offers a few different basis to use as 𝑄 57 Gup C. et al. Simple Black-box Adversarial Attacks. PMLR 2019 58 SIMBA Simple black Box Attacks Cartesian Basis 𝑄 is the identity basis Pixel wise manipulations For 5 × 5 images: 𝑑 = 25 𝑄 = 𝑞1 , 𝑞2 , … 𝑞25 𝑞1 𝑞2 𝑞3 Smiley using our basis 𝑄: = 𝑞7 + 𝑞17 0.8𝑞4 + 0.9𝑞10 + 𝑞15 + 0.9𝑞20 + 0.8𝑞24 Gup C. et al. Simple Black-box Adversarial Attacks. PMLR 2019 59 SIMBA Simple black Box Attacks DCT (Discreet Cosine Transform) Basis Every signal (including image) can be broken down into a sum of sine waves A component of a DCT basis expresses one of these wave patterns 𝑄 = 𝑞1 , 𝑞2 , … 𝑞25 𝑞1 𝑞2 𝑞3 Smiley using our basis 𝑄: = 0.8𝑞6 − 0.2𝑞8 − 6.4𝑞10 6.4𝑞16 + 5.17𝑞18 + 1.97𝑞20 q21 − 6.1𝑞23 − 1.5𝑞25 Gup C. et al. Simple Black-box Adversarial Attacks. PMLR 2019 SIMBA Simple black Box Attacks Try different directions until you see an improvement 60 Gup C. et al. Simple Black-box Adversarial Attacks. PMLR 2019 SIMBA Simple black Box Attacks Attack on Google’s Cloud Vision API 61 Maksym Andriushchenko, et al. Square Attack: a query-efficient black-box adversarial attack via random search. ECCV 2020 62 Square Attack Another Random Search based Attack Works on images only More effective than SIMBA Motivation: Why change random pixels, when we know pixels have spatial context? Approach: add transparent squares randomly to image until classification changes Maksym Andriushchenko, et al. Square Attack: a query-efficient black-box adversarial attack via random search. ECCV 2020 63 Square Attack Square Attack: 𝑥𝑖′ = 𝑥 Repeat until converge: 1. ′ 𝑥𝑖+1 = 𝑆 𝑥𝑖′ 2. ′ ′ 𝑥𝑖+1 = Proj𝑝 𝑥𝑖+1 − 𝑥0′ , 𝜖 3. ′ ′ 𝑥𝑖+1 = clip 𝑥𝑖+1 4. ′ if 𝑓 𝑥𝑖+1 𝑦 // add square to image (random size, color, location) ≥ 𝑓 // project current 𝛿 to 𝜖-bound 0,255 𝑥𝑖′ 𝑦 // fail (did not reduce loss) ′ 𝑥𝑖+1 = 𝑥𝑖′ // remove this square Loop back to step 1 *Gradually reducing square size over time improves convergence Maksym Andriushchenko, et al. Square Attack: a query-efficient black-box adversarial attack via random search. ECCV 2020 64 Square Attack Results on ResNet-50 Dr. Yisroel Mirsky 65 Priori meets A Priori Priori Attacks We use knowledge about the model (e.g. surrogates) A Priori Attacks We query the model and adapt Why not do both? Yan Z. et al. Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks. NeurIPs 2019 66 With ZOO (subspace attack) Approach: For each step: Attacker’s Domain WB: Update 𝑥′ using a random surrogate’s gradient* BB: update 𝑥 ′ using ZOO coordinate decent on 𝑓 ATTACK 𝑥𝑖′ + repeat… paper also uses dropout for robustness 𝛿𝑓መ + ∆ 𝑓 መመ𝑗 (𝑥) 𝑓(𝑥) መ 𝑓(𝑥) 𝑥 ′ + ℎ, 𝑥 ′ − ℎ 𝑥𝑖′ ′ 𝑥𝑖+1 * The Victim’s Domain 𝛿𝑓 ∆ 𝑓(𝑥) Yang J. et al. Learning Black-Box Attackers with Transferable Priors and Query Feedback. NeruIPs 2020 67 With SIMBA Approach: For each step: WB: Update 𝑥′ using a random surrogate’s gradient BB: Update 𝑥′ using SIMBA random search Victim’s Domain Attacker’s Domain ATTACK 𝑥𝑖′ + repeat… 𝛿𝑓መ ∆ 𝑓 መመ𝑗 (𝑥) 𝑓(𝑥) መ 𝑓(𝑥) 𝑥 ′ + 𝑞… 𝑥𝑖′ ′ 𝑥𝑖+1 + 𝛿𝑓 SIMBA 𝑓(𝑥) Lord. A. N, et al, Attacking Deep Networks With Surrogate-based Adversarial Black-box Methods Is Easy. ICLR 2022 68 With SIMBA, Improved (GFCS) Approach: For each step: WB: Update 𝑥′ using a random surrogate’s gradient BB: Update 𝑥′ using SIMBA random search If no good direction found, use ODS (apply random noise to output of surrogate) repeat… Su, J, et al. One Pixel Attack for Fooling Deep Neural Networks. 2019 69 Evolutionary Approaches One Pixel Attack A strict 𝑙0 attack Uses Differential Evolution Uses Differential Evolution Optimization (DE) ℒ 𝑥2 𝑥1 70 Decision-based Attacks Dr. Yisroel Mirsky Dr. Yisroel Mirsky 71 Decision-based Attacks Mechanism Repeated Queries: Focuses on iteratively modifying an input until it crosses the decision boundary of the classifier. Hard-label: Only needs the hard label output from the unseen classifier. Soft label: 𝑓 𝑥 = 0.1, 0.1, 0.0, 0.3, 0.5, 0.0 probability: 50% cat, 30% lion, … Hard label:𝑓 𝑥 = 0 0 0 0 1 0 , cat Stricter black box attack Advantage Over other Attacks Less assumptions, makes it stronger Dr. Yisroel Mirsky 72 Decision-based Attacks Objective of Recent Developments Reducing Queries: Aiming to minimize the number of queries needed for a successful attack. Minimizing Noise: Seeking to lower the noise levels in the adversarial examples for subtler attacks. Approaches: To Origin, From Origin Notable Recent Decision-Based Attacks qFool, HSJA, GeoDA, QEBA, RayS, SurFree, NonLinear-BA Brendel W, et al. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. ICLR 2018 73 Boundary Attack Approach: To Origin 1. Start with 𝑥0′ as random noise (that is adversarial) 2. Choose a random shift 𝜂 such that 𝑥𝑖′ + 𝜂 is a bit closer 𝑥 1. If the random shift step crosses the boundary ′ 𝑥𝑖+1 = 𝑥𝑖′ (try again) 2. Else ′ 𝑥𝑖+1 = 𝑥𝑖′ + 𝜂 (step forward) 3. Loop to 2 Brendel W, et al. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. ICLR 2018 74 Boundary Attack 𝑥0′ 𝑥𝑖′ 𝑥 ′ 𝑥𝑖+1 𝑥 Chen J, et al. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. S&P 2020 75 HopSkipJump Similar strategy as Boundary Attack Uses fewer Queries original Start over the boundary (1) Binary search to find boundary (2) Compute gradient around the boundary (3) Step towards gradient (4) Repeat -back to 1 76 Sponge Attacks Dr. Yisroel Mirsky Dr. Yisroel Mirsky 77 Exploratory Attacks: Availability How do we measure performance of a model? Task performance: accuracy of 𝑓 𝑥 Throughput: processing time of 𝑥 Response time: time until 𝑥 get processed Power: energy consensuses by 𝑓 𝑥 Load: CPU/GPU utilization Adversarial examples Sponge attacks, systems flaws,... Shumailov I, et al. SPONGE EXAMPLES: ENERGY-LATENCY ATTACKS ON NEURAL NETWORKS. 2020 78 Exploratory Attacks: Availability Sponge Examples Objective is power consumption, no need for 𝑥 ′ to be covert (𝜖) Can be blackbox Shumailov I, et al. SPONGE EXAMPLES: ENERGY-LATENCY ATTACKS ON NEURAL NETWORKS. 2020 79 Exploratory Attacks: Availability Overloading the System: Example One image 𝑥′ produces too many bounding boxes to process: Overloads the non maximum suppression (NMS) algorithm or the system Dr. Yisroel Mirsky 80 Recommended Reading: Week 3 Ambra Demontis et. al. Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. USENIX 2019