CYB. Offensive use of AI (part 2) PDF
Document Details
Uploaded by RealizableGorgon
University of Vigo
2024
Tags
Summary
This document discusses the offensive use of AI, focusing on attacks against AI/ML models, the OWASP Top 10 for LLMs, and the MITRE ATLAS, including security regulations and best practices for AI security. It details various attack vectors and corresponding countermeasures.
Full Transcript
CYB. Offensive use of AI (part 2) Master in Artificial Intelligence 2024/25 ESEI – University of Vigo Table of Contents Attacks to AI/ML OWASP Top 10 for LLMs MITRE ATLAS Regulations and best practices for AI security 1 Attacks...
CYB. Offensive use of AI (part 2) Master in Artificial Intelligence 2024/25 ESEI – University of Vigo Table of Contents Attacks to AI/ML OWASP Top 10 for LLMs MITRE ATLAS Regulations and best practices for AI security 1 Attacks to AI/ML Attacks to AI/ML Main components of an typical AI Application Data: ”Fuels” the AI → used to train and refine models Model: Engine of an AI system (core of the AI) Learns from data Does decision-making Generates outputs → Own models vs. open-source models vs. hybrid models → Local deployment vs. API accessed models (via REST) Frontend: Interface for users to interact with AI model 2 Security Issues of an AI application Affecting the Data Data poisoning [attacker injects malicious data into training sets → manipulate model’s output (incorrect or biased results)] Data exfiltration [sensitive data stolen from the AI app through unauthorized access or breaches] Affecting the Model Inherited vulnerabilities in public/open-source models Risks in model accessed via API [unauthorized access, manipulation, intellectual property theft] Adversarial Machine Learning Affecting the Frontend Prompt/imput injection [attacker crafts malicious prompts/entries to exploit vulnerabilities] Common software vulnerabilities and weakness 3 OWASP Top 10 for LLMs OWASP Top 10 for LLMs (Large Language Models) Open Worldwide Application Security Project (OWASP Foundation) Provides freely available open-sourced resources (articles, methodologies, documentation, tools, and technologies) for secure application development OWASP Top Ten(s) project (Most critical risks in soft. development) Collaboratively created and mantained through a well-defined process involving security experts Data Collection (from real-world security incidents, academic research, threat intelligence reports) Risk Assessment and Prioritization (measures prevalence + impact + ease of exploitation + trends) Community Collaboration (draft discussions + feedback integration) 4 OWASP Top 10 for LLMs (Large Language Models) (II) Current versions Top 10 Web Application Security Risks (last edition 2021) OWASP Top 10 API Security Risks (last edition 2023) Top 10 Mobile Security Risks (last edition 2024) Top 10 for LLM Applications (last edition 2025) 5 OWASP Top 10 for LLMs (Large Language Models) (III) OWASP Top 10 for LLMs adn GenAI Critical security risks tied to the deployment and use of LLM and GenAI applications Top 10 Risk for LLMs 2023-24 Top 10 Risks for LLMs 2025 LLM01. Prompt Injection LLM01. Prompt Injection [=] LLM02. Insecure Output Handling LLM02. Sensitive Information Disclosure [+4] LLM03. Training Data Poisoning LLM03. Supply Chain [+2] LLM04. Model Denial of Service LLM04. Data and Model Poisoning [-1] LLM05. Supply Chain Vulnerabilities LLM05. Improper Output Handling [-3] LLM06. Sensitive Information Disclosure LLM06. Excessive Agency [+2] LLM07. Insecure Plugin Design [out] LLM07. System Prompt Leakage [new] LLM08. Excessive Agency LLM08. Vector and Embedding Weaknesses [new] LLM09. Overreliance LLM09. Misinformation [changed from LLM09] LLM10. Model Theft [out] LLM10. Unbounded Consumption [changed from LLM04] 6 LLM01. Prompt Injection User input manipulates behavior/output of a LLM in unintended ways Exploit themodel’s process of handling prompts to provide harmful violating safety guidelines outcomes generating biased or harmful content enabling unauthorized access Jailbreaking: specific form of prompt injection → attackers bypass safety protocols completely ⇒ Details in OWASP 25 - LLM01 ( Prompt injection Additional explanation and examples Jailbreaking (from learnprompting.org) 7 LLM01. Prompt Injection (II) Types of prompt injection Direct Prompt Injections: user input directly alters the model’s behavior intentionally (malicious activity) unintentionally (unexpected behavior by normal user input) Indirect Prompt Injections: external inputs (websites, files, databases) controlled by hostile actors alter model behavior when processed → unintended outputs In both cases: Multimodal Injection → malicious prompts embedded in media (images, audio, video) 8 LLM01. Prompt Injection (III) Source: Prompt Injection 101 for Large Language Models 9 LLM01. Prompt Injection (IV) Related risks Data Disclosure (sensitive information exposure or misuse) Manipulation of Outputs (incorrect or biased responses) Unauthorized Access (gain privileges or system control) Critical Decision Influence (altering organization decision-making processes) Countermeasures Constrain Model Behavior (strict boundaries on model outputs and behaviors) Validate Output Formats (enforce expected output formats) Input/Output Filtering (semantic filters and content validation) 10 LLM01. Prompt Injection (V) Privilege Control (limit model access and functionality to the minimum) → Minimum privilege principle Human Approval for High-Risk Actions (manual checks for sensitive operations) Segregate External Content (separate untrusted input sources) Adversarial Testing (penetration simulations to identify vulnerabilities) 11 LLM02. Sensitive Information Disclosure LLMs unintentionally expose sensitive information (Personal Identifiable Information (PII), financial records, health data, business secrets, security credentials, propietary models) through their outputs ⇒ Details in OWASP 25 - LLM02 Related risks PII (leakage personal data inadvertently disclosed during interactions) Proprietary Algorithm Exposure (improper model configuration → reveal proprietary algorithms or sensitive training data) Sensitive Business Data Disclosure (business secrets exposed through unintended model responses) 12 LLM02. Sensitive Information Disclosure (II) Countermeasures Sanitization Techniques (mask sensitive content before training with it) Robust Input Validation (strict validation to detect and filter sensitive data) Access Controls (apply least privilege access and restrict internal data sources) Federated Learning (decentralized learning and differential privacy techniques to minimize risks) Homomorphic Encryption (process data securely without revealing it) User Education (educate users on safe interaction practices + ensure safe data usage policies) 13 LLM03. Supply Chain External elements (software/tools, third-party pre-trained models and data, deployment platforms) can be manipulated through tampering or poisoning attacks Specific case of OWASP’s A06:2021 – Vulnerable and outdated components Outdated or deprecated models and software components Vulnerable pre-trained models and software components Licensing risks and unclear Terms & Conditions ⇒ Details in OWASP 25 - LLM03 14 LLM03. Supply Chain (II) Countermeasures Use OWASP A06:2021 Best Practices Regularly audit security and access controls of third-party models Mantain model integrity and provenance (vendor signed models and code) Mantain an updated AI assests inventory (models, code and libraries) 15 LLM04. Data and Model Poisoning Training data, fine-tuning data, or embedding data is manipulated to vulnerabilities introduce backdoors biases Injection of adversarial training data Impacts model integrity and compromises the model’s security, performance, or ethical behavior ⇒ harmful or incorrect outputs degraded model performance Consequences biased or toxic content exploitation of downstream systems ⇒ Details in OWASP 25 - LLM04 16 LLM04. Data and Model Poisoning (II) Countermeasures Track the data pipeline, ensuring data integrity at all stages of model development validate data and model providers validate model outputs against trusted sources to identify poisoning early → Machine Learning Bill of Materials (ML-BOM) tools, OWASP CycloneDX Use Retrieval-Augmented Generation (RAG) → help ensure model outputs are grounded in trusted sources (lowers risk of hallucinated or biased outputs) 17 LLM05. Improper Output Handling Insufficient validation, sanitization, and alerthandling of outputs generated by LLM before they are passed to other systems or components LLM-generated content can be manipulated by prompt inputs leading to Remote code execution (RCE) Cross-Site Scripting (XSS) security vulns. SQL Injection (SQLi) Phishing injection in generated Emails, etc Goal: ensure the outputs are safe before they interact with downstream systems ⇒ Details in OWASP 25 - LLM05 18 LLM05. Improper Output Handling (II) Countermeasures Zero-Trust Approach (treat LLM output as untrusted and apply input validation to model responses) Follow OWASP ASVS (Application Security Verification Standard) guidelines (proper input validation, output sanitization, etc) Apply context-aware encoding to LLM output ( HTML encoding for web content, SQL escaping for database queries, JavaScript sanitization for user-facing code) Rate limiting and anomaly detection mechanisms to prevent abuse or exploitation of LLM output generation 19 LLM06. Excessive Agency functionality An LLM-based system has been granted more permissions than is autonomy necessary for its intended purpose Can lead to the execution of harmful or unintended actions based on ambiguous or manipulated LLM outputs Broad range of impacts: perform actions beyond the user’s control, invoke actions that breach security boundaries or execute forbidden commands modifying or deleting data unintentionally ⇒ Details in OWASP 25 - LLM06 20 LLM06. Excessive Agency (II) Countermeasures (prevention) Limit extensions the LLM can call, providing then minimal permissions Implement human-in-the-loop approval for high-impact actions Countermeasures (mitigation) Logging and monitoring LLM and extensions activity Rate limiting the number of extension actions Anomalous behavior analysis to detect unexpected actions by the LLM or its extensions 21 LLM07. System Prompt Leakage Unintended exposure of the system prompt or internal instructions used to guide the behavior of an LLM Can contain sensitive information (credentials, internal rules, security controls) ( bypass security controls Attackers exploit system prompt leakage to bypass LLM content filters ⇒ Details in OWASP 25 - LLM07 Additional explanation (from learnprompting.org): Prompt leaking Example: Leaking Bing Chat (Sydney) system prompt 22 LLM07. System Prompt Leakage (II) Countermeasures Externalize sensitive information (avoid embedding sensitive info. into the prompts) Implement guardrails outside of the LLM (independent systems monitor LLM outputs, avoid relying on the system prompt itself to control output) Privilege separation + regularly review and audit of system prompts 23 LLM08. Vector and Embedding Weaknesses Vulnerabilities associated with the use of Retrieval Augmented Generation (RAG) [new in 2025 edition] RAG combines LLMs with external knowledge sources for enhanced performance and contextual relevance Weakness in how embeddings are generated, stored, or retrieved ⇒ can expose systems to data leakage, poisoning attacks, unintended behavior alteration ⇒ Details in OWASP 25 - LLM08 24 LLM08. Vector and Embedding Weaknesses (II) Related risks Unauthorized access & Data leakage (access and use embedding containing sensitive data) Cross-context information leaks (users from different contexts or applications share the same vector database) Embedding inversion attacks (invert embeddings to recover sensitive information) and Data poisoning pttacks Countermeasures Fine-Grained access control → ensure strict logical and access partitioning of vector databases Data validation & Source authentication → authenticate and verify the integrity of data before it is added to the knowledge base Monitoring and logging to track vector retrieval activities 25 LLM09. Misinformation LLMs generate false or misleading information that appears credible Causes: Hallucination (model generates content that sounds plausible but is fabricated) Biases in training data and incomplete information Incomplete information (model fills in gaps with incorrect data when it doesn’t have full context) Overreliance (excessive trust in the LLM’s outputs without cross-checking or verifying them) Consecuences: security breaches, reputational damage, legal liability ⇒ Details in OWASP 25 - LLM09 26 LLM09. Misinformation (II) Related risks Factual inaccuracies → legal liability, reputational damage Unsupported claims (lack of evidence-based claims) Misrepresentation of Expertise (misleading expert advice) Unsafe code generation (code generator LLM suggests insecure libraries) Countermeasures RAG to ensure verified information (pull verified data from external, trusted sources) Model Fine-tuning using verified, high-quality datasets Cross-verification and human oversight (especially in high-stakes domains) 27 LLM09. Misinformation (III) Secure coding practices on LLM generated code (enforce automated security checks on suggested code/libaries) User training and education (users must know limitations of LLMs + the importance of verifying AI-generated content) 28 LLM10. Unbounded Consumption LLM is exploited to perform excessive or uncontrolled inference operations, leading to significant resource depletion and system degradation LLM are computationally intensive ⇒ vulnerable to exploitation Malicious activities: flooding the model with requests, draining resources, stealing intellectual property Denial of Service (DoS) attacks Financial losses Can result in Service degradation Model theft ⇒ Details in OWASP 25 - LLM10 29 LLM10. Unbounded Consumption (II) Related attacks Variable-Length Input Flood Input overflow to exceed the LLM’s context window Resource-Intensive Queries Denial of Wallet (DoW) in pay-per-use cloud models Model Extraction via API (prompt injection techniques to replicate the underlining model) and Functional Model Replication (synthetic data generation) 30 LLM10. Unbounded Consumption (III) Countermeasures Input validation and Rate limiting Dynamic resource management → Timeouts and throttling for resource-intensive queries Sandboxing to isolate the LLM’s execution environment Watermarking mechanisms to detect unauthorized use or replication of the LLM’s outputs Scalability (load balance) and graceful degradation 31 Weakness removed form Top 10 2023 edition 2023-LLM07: Insecure Plugin Design LLM plugins processing untrusted inputs and having insufficient access control risk severe exploits like remote code execution. Subsumed into ”2025 LLM06. Excessive Agency” 2023-LLM10: Model Theft Unauthorized access to proprietary large language models risks theft, competitive advantage, and dissemination of sensitive information Partially subsumed into ”2025 LLM10. Unbounded Consumption” 32 MITRE ATLAS MITRE ATLAS ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) Knowledge base of adversary tactics and techniques against Al-based systems Derived from MITRE ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge) ( Lateral Movement ATLAS excludes Command and Control ( ML Model Access ATLAS adds ML Attack Staging Goals Provide a better understanding of threat actors’ activity Provide a common language/description 33 MITRE ATLAS (II) MITRE ATLAS Matrix 14 Tactics 91 Techniques and subtechniques Tools Mitre ATLAS Navigator/Heatmap Mitre ATLAS Data (YAML format) 34 MITRE ATLAS Elements Tactics [Why?] General objectives of threat actors against AI-based systems Techniques/subtechniques ([How?]) Specific methods employed to achieve tactic goals Mitigations Best practices and countermeasures against specific techniques Case Studies Real-world examples, with step-by-step descriptions aligned with ATLAS tactics 35 Examples: Tactics and Techniques Reconnaissance - Search for Victim’s Publicly Available Research Materials - Search for Publicly Available Adversarial Vulnerability Analysis - Search Victim-Owned Websites - Search Application Repositories - Active Scanning ML Model Access - AI Model Inference API Access - Physical Environment Access - Full ML Model Access - ML-Enabled Product or Service ML Attack Staging - Create Proxy ML Model - Backdoor ML Model - Verify Attack - Craft Adversarial Data 36 Examples: Case Studies AML.CS0019 PoisonGPT Force a public LLM to return a false facts (using post-training model editing with Rank-One Model Editing (ROME) algorithm) blog post AML.CS0014 Confusing Antimalware Neural Networks Adversarial ML attack against Kaspersky NN-based malaware detector blog post 37 Examples: Case Studies (II) AML.CS0003 Bypassing Cylance’s AI Malware Detection Bypassing malware detector by creating ”magic” feature strings blog post AML.CS0013 Backdoor Attack on Deep Learning Models in Mobile Apps Neural payload injection in Android apps research paper 38 Regulations and best practices for AI security Regulations and best practices for AI security The Framework for AI Cybersecurity Practices (FAICP), from ENISA (European Union Agency for Cybersecurity) Layer I. Cybersecurity best practices to protect the ICT environments where AI systems are hosted, developed, integrated Layer II. Cybersecurity practices focused on the specifics of AI, including its lifecycle, properties, specific assets and threats, and security controls Layer III. Cybersecurity practices tailored for companies in critical sectors such as healthcare, automotive, or energy The Artificial Intelligence Risk Management Framework (AI RMF 1.0) [pdf], from NIST The Artificial Intelligence Act - Regulation (EU) 2024/1689 39