Lecture 5 - Adv ML Defences.pdf
Document Details
Uploaded by EyeCatchingSamarium
Tags
Full Transcript
OFFENSIVE AI LECTURE 4: DEFENCES AGAINST ADVERSARIAL MACHINE LEARNING Dr. Yisroel Mirsky [email protected] Today’s Agenda Introduction Causative Attacks Exploratory Attacks Dr. Yisroel Mirsky 2 3 Introduction Dr. Yisroel Mirsky Dr. Yisroel Mirsky 4 Introduction Defence in Adversarial ML On g...
OFFENSIVE AI LECTURE 4: DEFENCES AGAINST ADVERSARIAL MACHINE LEARNING Dr. Yisroel Mirsky [email protected] Today’s Agenda Introduction Causative Attacks Exploratory Attacks Dr. Yisroel Mirsky 2 3 Introduction Dr. Yisroel Mirsky Dr. Yisroel Mirsky 4 Introduction Defence in Adversarial ML On going challenge Most solutions mitigate the problem but do not solve it Defence # papers published Attack year Dr. Yisroel Mirsky 5 Introduction Common Approach: Patch the problem Let’s use a different/more robust model Let’s make the model robust against this attack Let’s just detect the attack and block them Let’s just keep the model/defence’s implementation a secret Dr. Yisroel Mirsky 6 Introduction Why patching is a bad idea Do we wait for cars to start crashing first? What if the attack is covert (you won’t know you’ve been compromised!) The attacker can reverse engineer a defence Attackers just need to take one step to the right (defender continuously hit) Patching should be a last resort, not the common approach! Dr. Yisroel Mirsky 7 Introduction The Arms Race Each player races to overwhelm the other Reactive Arms Race Defender deploys defence Attacker finds hole in defence Repeat Proactive Arms Race Defender considers Attacker’s next step(s) Considers greatest threats first (triage) Defender upgrades defence for attacks that do not exist (yet) Threat level = ease of exploitation + motivation + damage to victim Joseph A, et al. Adversarial Machine Learning. 2019 8 Introduction The Principled Approach to Secure Learning 1. Threat modelling Enumerate the threats and their limitations Identify the attack surface (model and system) Who are the threat actors? What can/can’t they do? 2. Proactive Analysis Exercise forward thinking Consider what an attacker will do after your next step (defence) Raise the difficulty bar for the attacker on all fronts Joseph A, et al. Adversarial Machine Learning. 2019 9 Introduction The Principled Approach to Secure Learning 3. Conservative design Limit the attacker’s options Restrict as much as possible (e.g., prediction rates, access control,...) Avoid unnecessary assumptions (e.g., attacker will only attack remotely) Joseph A, et al. Adversarial Machine Learning. 2019 10 Introduction The Principled Approach to Secure Learning 4. Kerckhoffs's Principle: Obscurity is not security! “You can’t attack what you cant see!” We seek defences which......do not rely on the attacker’s ignorance...are provably/certifiably secure Joseph A, et al. Adversarial Machine Learning. 2019 11 Introduction Defence Strategies Mitigation Prevention Data Sanitization, Model Hardening*, Security 𝑥 ML Model 𝑦 *aka robust learning Defence Types: 1. Pre-processor 2. Post-processor 3. Trainer 4. Transformer 5. Detector 6. Cyber Security 7. Physical Security 𝑥 ML Model 𝑦′ Dr. Yisroel Mirsky 12 Causative Attacks Before we proceed... We can't exhaustively review all [attacks and] defences We will cover: Main concepts Central algorithms https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html 13 Defences Against Causative Attacks Dr. Yisroel Mirsky Dr. Yisroel Mirsky 14 Overview How do we defend? Train Mode Attack vector (in/outside premises) Rest Mode (in storage) Defence vector ML Model Data Sanitization Model Hardening Cyber Security Exec Mode (deployed) Dr. Yisroel Mirsky 15 Overview weight min 𝑅𝐴 (𝒟1 , 𝑆𝐴 ) + 𝜆 𝑐 𝒟0 , 𝒟1 𝐷1 attack failure attack detection 𝒟0 : pristine dataset 𝒟1 : tampered dataset 𝒟2 : cleaned dataset min 𝑅𝐷 (𝒟2 , 𝑆𝐷 ) + 𝜆 𝐼 𝒟0 , 𝒟2 𝒟2 attack impact attack presence 𝑅𝐷 : Defence Risk - the negative impact 𝒟2 has after modifying 𝑓 or cleaning 𝒟1 (performance of 𝑓 should not be harmed!) 𝐼: Data Integrity - the presence of tampered data in 𝒟2 w.r.t. 𝒟0 (detect all attacks!) Dr. Yisroel Mirsky 16 Overview Main Defence Approaches Data Sanitization (filter 𝒟 (𝑡𝑟𝑎𝑖𝑛) ) Data sub-sampling Outlier removal Model Hardening (improve 𝐻, 𝑓) Strategic feature selection Reduce learner sensitivity Security (protect 𝒟, 𝑓, 𝐻, 𝜃) Data provenance Digital Signatures Kearns M, et al. Learning in the presence of malicious errors. 1993 17 Data Sanitization Data Sub-sampling Good: Simple, has theoretical guarantees Bad: Expensive, 𝛼 must be small and 𝑋1 ⊂ 𝒟 large, prevents corruption attacks only must make many models… corruption 𝑓𝜃1 𝑓𝜃2 Evaluator 𝑓𝜃3 𝑓𝜃3 Dr. Yisroel Mirsky 18 Data Sanitization Outlier Removal 𝑑 𝑥𝑖 , 𝒟𝑐 , 𝑐 = label of 𝑥𝑖 𝑑(𝑥𝑖 , 𝒟) OR 𝑓𝜃 𝑓𝜃 Outlier Detection: LOF, GMM, Isolation Forest, DBSCAN,... Good: Efficient, works well on dirty label backdoors Bad: Harms generalization, not all 𝑥′ are outliers Klivans A, et al. Learning halfspaces with malicous noise. 2009 19 Data Sanitization Outlier Removal - group Observation: For evasion attacks, several samples are put in group to pull the boundary Strategy: Find groups of samples of high variance and remove them Caveat: algorithm has theoretical guarantees only for linear classifiers Attack objective 𝜎1 Remove largest samples from largest eigen vector Defence PCA 𝜎2 remove 𝜎2 Repeat... Klivans A, et al. Learning halfspaces with malicous noise. 2009 20 Data Sanitization Outlier Removal – Removal on Negative Impact (RONI) 𝒟2 𝒟∗ 𝒟c Calibration set 𝒵 𝑥1𝑧 𝑥1𝑐 𝑥2𝑐 𝒟1 Clean (or assumed clean) 𝒵 𝑓,𝐷1∪𝑥1𝑐 𝑓,𝐷1∪𝑥2𝑐... 𝒟2 𝐸𝑐 𝑓, 𝑓, Get errors 𝑓𝐷1∪𝑥1𝑧 𝑓𝐷1∪𝑥2𝑧 𝒟2 𝑓𝐷2∪𝑥1𝑧 𝑓𝐷2∪𝑥2𝑧 … … 𝐷2 ∪𝑥1𝑐 𝐷2 ∪𝑥2𝑐 𝒟1 … Data we want to add/test 𝑥2𝑧 DETECT … SETUP Get average error (e.g. accuracy) as clean baseline < 𝐸𝑐 > 𝐸𝑐 Reject < 𝐸𝑐 < 𝐸𝑐 Dr. Yisroel Mirsky 21 Model Hardening Strategic Feature Selection Some features 𝑥 (𝑖) ∈ 𝒙 are harder for 𝐴(𝑡𝑟𝑎𝑖𝑛) Requires larger 𝛿 to affect Costly/impractical to access Some features are more vulnerable to 𝐴(𝑡𝑟𝑎𝑖𝑛) Model biased/reliant on 𝑥 (𝑖) Defence Objective: max performance of 𝑓, min performance of 𝐴(𝑡𝑟𝑎𝑖𝑛) Budhraja K, et al. Adversarial Feature Selection. 2015 Zhang F, et al. Adversarial Feature Selection Against Evasion Attacks. 2016 22 Model Hardening Strategic Feature Selection Adversarial Feature Selection 1. Start with complete set 𝐹 = {𝑧1 , 𝑧2 , … 𝑧𝑚 } 2. Measure performance of 𝑓𝐹 and 𝐴 𝑓𝐹 as the tuple 𝐸 3. Measure 𝐸𝑖 with 𝐹 − 𝑧𝑖 4. Set 𝐹 = 𝐹 − 𝑧𝑗 , where 𝐸𝑗 maximizes 𝑓𝐹 and minimizes 𝐴 𝑓𝐹 5. Repeat until performance on 𝑓𝐹 decreases significantly (another version starts with ∅ and adds features) Security by Design Feature Selection Choose features which are hard to tamper (e.g., timing features for an NIDS) Note: Feature selection algorithms are also susceptible to poisoning attacks! Xiao Huang, et al. Is Feature Selection Secure against Training Data Poisoning? 2015 Jagielski M, et al. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning. 2018 23 Model Hardening Reduce Learner Sensitivity Huber: Uses an outlier robust loss function RANSAC: Iteratively trains, rejects samples that cause large errors to the model Problem: both strong against outliers, not inliers TRIM: Uses Trimmed Optimization to focus on samples with small residuals Find 𝜃0 using entire 𝒟, then repeat: 1. Find subset 𝒟 ∗ ⊆ 𝒟 such that ℒ(𝑓𝜃𝑖 𝒟 ∗ ) is minimized (samples with low residuals) 2. Update: 𝜃𝑖+1 ← argmin𝜃 ℒ(𝑓𝜃 𝒟 ∗ ) outlier inlier Dr. Yisroel Mirsky 24 Model Hardening Reduce Learner Sensitivity Observation (TRIM, RONI,...) Subsets of 𝒟 are less affected (for small 𝛼), why? The Curse of Dimensionality – goes both ways! One step futher... 2021 – SGD shown to be more robust to poising Removing data (while keeping the number of dims) breaks the consistency of the adversarial points -exponentially Wang Y, et al. Robust Learning for Data Poisoning Attacks. 2021 Wang B, et al. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. 2019 25 Security Detection Neural Cleanse (transformer) Intuition: if a backdoor is present for class 𝑐𝑖 then it should be significantly easier to perturb samples of 𝑐𝑖 (with a small 𝛿) The trigger 𝛿𝑇 acts like a universal perturbation built into the model Algorithm: 1. Generate an adv patch (trigger) that is ℒ1minimal for each class 𝑐𝑖 ∈ 𝐶 2. Flag 𝛿𝑇𝑖 as a trojan if it’s ℒ1 norm is below a threshold Penultimate representation (second last layer) Trained w/o backdoor Captures high level features Trained with backdoor min ℒ 𝑦𝑡 , 𝑓 𝐴 𝑥, 𝑚, 𝛿 𝑚,𝛿 + 𝜆 ∙ 𝑚 ,𝑥 ∈ 𝒳 𝑚: mask for the patch/trigger Chen B, et al. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering. 2018 26 Security Detection Activation analysis (detector) All images labeled ‘speed limit’ Normal 𝑦 =speed limit Forms 2 clusters! Clean and poisoned images Backdoored 𝑦 =speed limit 1. 2. 3. 4. Train on poisoned data Check clustering of training in penultimate Clean training data Retrain Use PCA then K-means k=2 Baracaldo N, et al. Mitigating Poisoning Attacks on Machine Learning Models: A Data Provenance Based Approach. 2017 27 Security Cyber Security Data Provenance Track modification made to data end-to-end Do we trust our source? Digital Signatures Used to verify the integrity of data end-to-end Signed content cannot be altered without invalidating the signature Anyone can validate, only the signee can sign Improved Cyber Security Access control to models, data, network security, etc... 28 Defences Against Exploratory Attacks Dr. Yisroel Mirsky Dr. Yisroel Mirsky 29 Overview How do we defend? (part 2) Train Mode Attack vector (in/outside premises) Rest Mode (in storage) Defence vector ML Model Same game, different rules... Exec Mode ′ min 𝑅 (𝑥′, 𝑆) + 𝜆 𝑐 𝑥, 𝑥 𝐴 ′ 𝑥 (deployed) attack detection min 𝑅𝐷 (𝑥′, 𝑆) + 𝜆 𝑐 𝑥, 𝑥 ′ 𝑥 Data Sanitization Model Hardening Cyber Security attack failure attack impact attack presence Asymmetry: Attacker normally chooses one Defender must consider all samples! Dr. Yisroel Mirsky 30 Overview Methods Clean [all] Samples Detect and remove Data Sanitization: Impacts performance on benign samples! Obscure Gradient/Parameters Model Hardening: ? ML Model Little/no guarantees, Kherchoff’s principle! False positives are detrimental Modify Model/Parameters ML Model But, often model and attack are implementation specific Dr. Yisroel Mirsky 31 Game Theory Stackelberg Game Two players: leader and follower One turn each ? ? 𝑓 𝐴 𝑥𝑖 ; 𝑓 Model with... Data Sanitization Model Hardening Deployed {𝐴 𝑥𝑖 ; 𝑓 } 𝑖| 𝑥𝑖 , 𝑦𝑖 ∈ 𝑆 Attacker’s ideal feature vectors Vorobeychik Y, et al. Adversarial Machine Learning. 2018 32 Game Theory Stackelberg Equilibrium Leader makes the best possible action given all possible attacks Binary classifier – Choose the most robust threshold for a model Attacker has budget C=0.25 minimized the risk (𝑅𝐷 ) with 𝑓 Follower makes the best possible attack given the defender’s action Example: maximizes his risk (𝑅𝐴 ) given 𝑓 𝑥 (1) Defender chooses decision threshold 𝑟 ′ forms an equilibrium Assuming ties go to defender … Yisroel Mirsky 33 Game Theory Stackelberg Equilibrium Take-away #1: You can try to optimize 𝑓 against an attack if you know the attack’s parameters (limitations) In practice: optimization is hard Defender must simulate non-stackleberg (adaptive) game to tune defense hyperparameters But, the adversary’s decision will change based on evaluated defense's hyperparameters Yisroel Mirsky 34 Game Theory Stackelberg Equilibrium Take-away #2: Defender is not ‘winning’, just performing damage control (on his turn) ? ML Model Yisroel Mirsky 35 Game Theory Stackelberg Equilibrium Take-away #3: Can’t the attacker just side-step us? E.g., we use PGD defense because PGD is strongest, but then he uses CW! Yes… To reach equilibrium, defender must chose a defense that covers as much ground (attacks) as possible (not just mitigate the worst) Dr. Yisroel Mirsky 36 Game Theory A Common Mistake (by researchers etc) Defenders assume a reverse Stackelberg game (=reactive arms race!) Selects known attack method (e.g., FGSM) Develops defence (e.g., against vanilla FGSM) ? ? {𝐴 𝑥𝑖 ; 𝑓 } 𝑖| 𝑥𝑖 , 𝑦𝑖 ∈ 𝑆 𝐴 𝑥𝑖 ; 𝑓 Unprotected model 𝑓′ Reality check: in multi-turn games, the attacker can consider the defence {𝐷 𝑥𝑖 ; 𝑓 } 𝑖| 𝑥𝑖 , 𝑦𝑖 ∈ 𝑆 Dr. Yisroel Mirsky 37 Game Theory Why is it a Mistake? Attackers can consider defences in their attacks Real Scenario Defender wants to protect a DNN from PGD 1. Defender secures model by monitoring neuron activations for anomalies (e.g., and many more) 2. Attacker “kills two birds with one stone” by adding the defense while making the attack: Includes defence in the loss function (of PGD)! Or, can increases 𝛿 gradually to find covert solution X. Li, et al. Robust detection of adversarial attacks on medical images. 2020 Dr. Yisroel Mirsky 38 Game Theory Why is it a Mistake? Attackers can consider defences in their attacks More take-aways: 1. 2. When developing a new defense Consider techniques which limit the adversary In paper: discuss what an adversary can do to evade it! In paper: evaluate adversarial attacks crafted against your defense! Nearly any defence that has a gradient… …can be evaded Athalye A, et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. 2018 Levy M. The Security of Deep Learning Defences for Medical Imaging. 2021 39 Obfuscation Obfuscated Gradients Give a False Sense of Security Many defences try to hide gradients to prevent exploration: 1. Make gradient too small to follow 2. Make gradient noisy to misdirect search 3. Make model non-differentiable 4. Use ensemble model These defences can be evaded too! Try repeatedly (some PDG attempts may succeed) Use surrogate loss function Attack each component separately (then union result) … Dr. Yisroel Mirsky 40 Model Hardening Generic Defence Strategy: Iterative Retraining (aka. Adversarial Retraining) Defender plays out a multi-turn (min-max) game (with self) until Nash-equilibrium is met defender attacker Find the 𝜃 that minimizes the model’s loss given that the attacker will optimize his attack against it Madry A, et al. Towards Deep Learning Models Resistant to Adversarial Attacks. 2018 41 Model Hardening Generic Defence Strategy: Iterative Retraining (aka. Adversarial Retraining) Different approaches Model the game via loss directly Model the game via dataset GAN [Goodfellow 2014] (DNN only) [Madry A, 2018] Currently considered one of the most robust defences (Madry) With replacement (Madry) 1. 𝑓 ← 𝐻 𝒟 2. Repeat: 3. 𝒟′ = ∅ 4. for 𝑖 ∶ 𝑥𝑖 , 𝑦𝑖 ∈ 𝒟 do 𝐷′ ← 𝐷′ ∪ 𝐴 𝑥𝑖 ; 𝑓 5. 𝑓 ← 𝐻 𝒟′ 6. Until convergence Train on attacked version of dataset If 𝐴 𝑥𝑖 ; 𝑓 = ∅, Then: terminate (𝑅𝐷 is an upperbound on 𝑅𝐴 ) Athalye A, et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. 2018 42 Model Hardening Effect of Retraining Gradient-based defences Nongradientbased defences Retraining Note: retraining is mitigation, not prevention Wong, et al. Towards Deep Learning Models Resistant to Adversarial Attacks. 2020 43 Model Hardening Tips & Insights on Retraining ℓ∞ Wong, et al. Towards Deep Learning Models Resistant to Adversarial Attacks. 2020 44 Model Hardening Tips & Insights on Retraining High Capacity + Retraining Increases robustness to attacks Low-capacity networks will be harmed! (𝑅𝐷 performance) Capacity affects other aspects… High-capacity mitigates transferability (more variability!) High-capacity increases risk of model memorization (confidentiality) Attack selection in Retraining Retraining on FGSM harms generalization to other attacks Bottom Line: Increase capacity and use the strongest adversary! A Survey of Privacy Attack in Machine Learning. 2021 45 Model Hardening Downside to Retraining More expensive to train models Harms generalization (performance of 𝑓 on benign) Increases vulnerability to membership inference Decrease of generalization → overfitting to specific samples (memorization) Papernot N, et al. Distillation as a defense to adversarial perturbations against deep neural networks. 2016 46 Model Hardening Transformer Defensive Distillation Train a complex model (teacher) high expressive over 𝒳 Train a simpler model (student) on the softmax layer of the teacher Student ‘distils’ the teacher’s knowledge (capturing only the relevant patterns in 𝒳) model 𝟐 Final (student) model 𝟏 Initial (teacher) 𝑥 𝑇 ℎ(𝑧) 𝑦 𝑆 Broken in the same year by Wagner and Carlini ℎ(𝑧) Defensive Distillation is Not Robust to Adversarial Examples Carlini Similar stories with other transform approached (ensembles, special layers,...) Material from Ziv Katzir, 2021 47 Data Sanitization Preprocessors Compression Dimensionality reduction to limit attacker’s search space Denoising auto-encoders – (Gu, S., & Rigazio, L. 2014) PCA Based reconstruction – (Bhagoji, A. N. et al. 2017) GAN Based dimensionality reduction – (Ilyas et al. 2017; Lee et al. 2017; Samangouei et al. 2018; Song et al. 2017) Image specific methods – Color depth and spatial color smoothing – (Xu, W. et al. 2017) JPEG compression – (Gau, C. et al. 2017) Qin Z, et al. Random Noise Defense Against Query-Based Black-Box Attacks. NeurIPS 2021 48 Data Sanitization Preprocessors Noising Noise is not effective against WB attacks But, it is extremely effective against query-based (iterative) BB attacks! Attacker’s Domain 𝑥𝑖′ ′ 𝑥𝑖+1 optimize Victim’s Domain 𝜎 𝑓(𝑥) 𝑦𝑖′ 𝑦𝑖′ ∈ 0,1 Can also be a post processor 𝑐 Black Box Benchmarks https://blackboxbench.github.io/leaderboard-imagenet.html 2023 Defended 49 Samangouei P, et al. DEFENSE-GAN: PROTECTING CLASSIFIERS AGAINST ADVERSARIAL ATTACKS USING GENERATIVE MODELS. 2018 50 Data Sanitization Preprocessors DefenceGAN Classifier receives cleaned image Generator reconstructs image Random number generator 𝑧 𝑥 Train a GAN to reconstruct images Doing so implicitly cleans images of 𝛿 𝐺 𝑥ො 𝐶 𝑦 What if the attacker adds 𝐺 into he PGD loss function? Xu W, et al. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. NDSS. 2018 Warrern H. et al. Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong. 2018 51 Security Detector Anomaly Detection So many anomaly (OOD) detectors! Yet, many can be evaded. Feature Squeezing Squeezer: bit-color squeezing, smoothing,... Xu W, et al. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. NDSS. 2018 52 Security Detector Trapdoor Defence Carlini Defender poisons own model to make gradient honeypots whose signatures (𝛿) are known Dr. Yisroel Mirsky 53 Security Every defence seems to be broken No guarantee on future/dynamic attacks Can we measure a model’s robustness? Usually we just evaluate on some ‘evil’ test set Original Certified Defences Adversarial Retraining Model/defence 𝑓𝜃 is 𝜖 certified for sample 𝑥 if attacker cannot succeed within the budget 𝜖 ℓ2 ℓ2 Defences E.g., 𝐴 is ℓ𝑝 norm ball, yet 𝛿 > 𝜖 is required Zhang H, et al. Efficient neural network robustness certification with general activation functions. NeurIPS 2018 Weng T, et al. Evaluating the robustness of neural networks: an extreme value theory approach. 2018 54 Security Certified Defences A verification-based approach Measures the robustness of a DNN against an attack Guarantees safety of an input before an attack attack 𝛿 Upperbound for 𝜖 Lowerbound Used to measure safety (robustness) of defences Inspired certified defences as well Speed limit ∞ 0 for 𝜖 Stop sign Examples: CROWN: Gives lower bound guarantee by propagating to class confidences CLEVER: Gives upper bound estimate (score) via backpropagation Person Juuti M, et al. PRADA: protecting against DNN model stealing attacks. EuroS&P 2019 Krishna K, et al. Thieves on sesame street! model extraction of bert-based apis 2019 55 Security Model Privacy Model Extraction (blackbox) Main approach: monitor which queries are made (Juuti 2019) Performs anomaly detection on query set (Krishna 2019) Use membership inference to help identify nonsensical user queries Both papers note that both can be evaded by dynamic adversary... Jayaraman B, et al. Evaluating Differentially Private Machine Learning in Practice. USENIX 2019 56 Security Model Privacy Membership Inference & Property Inference Recall: models tend to memorize specific instances 𝑥 in their boundaries Main approach: use the concept of 𝑘-differential privacy Make sure that no single observation is more important than others (with some bound) Measured as Δ𝑓 = max 𝑓 𝐷 − 𝑓(𝐷′ ) ′ 𝐷,𝐷 where 𝐷 − 𝐷′ = 1...every possible leave one out on 𝐷 Common approach: Add Laplacian or Gaussian noise to 𝑓(𝑥) or 𝒟 Unfortunately, sufficient levels of differential privacy severely harms DNN accuracy MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples Membership inference attacks against machine learning models A Survey of Privacy Attacks in Machine Learning 57 Security Model Privacy Membership Inference & Property Inference Other Approaches: Increase Regularization Increase generalization through dropout, optimization terms, and model ensembles/stacking Tamper the Prediction Vector Limited the length of the confidence vector returned Add noise to the confidence vector Dr. Yisroel Mirsky 58 Security Model Privacy Take-aways It seems that DL must inherently memorize some data Try to restrict access when possible (e.g. , number of queries and to whom) Confidentially in deep learning is still an open problem Dr. Yisroel Mirsky 59 Recommended Reading: Week 5 Shan S. et al. Gotta Catch 'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks. 2020 https://arxiv.org/pdf/1904.08554.pdf (get high-level idea) Watch: https://nicholas.carlini.com/writing/2020/screenrecording-breaking-adversarial-example-defense.html