Podcast
Questions and Answers
What is one major advantage of dynamic analysis in malware analysis?
What is one major advantage of dynamic analysis in malware analysis?
- It can analyze malware without any user interaction.
- It requires minimal system resources.
- It allows for quick identification of malicious codes.
- It provides an accurate understanding of the malware's behavior. (correct)
Which of the following features is NOT typically associated with malware analysis?
Which of the following features is NOT typically associated with malware analysis?
- File Metadata
- API Import and Export Functions
- Memory Addressing Patterns (correct)
- Opcode Sequences
What is a critical downside of dynamic malware analysis?
What is a critical downside of dynamic malware analysis?
- It can only analyze known types of malware.
- It does not provide enough data about malware behavior.
- It is resource-intensive. (correct)
- It is usually less accurate than static analysis.
Which sandbox solution is noted for its ability to run in a VirtualBox environment?
Which sandbox solution is noted for its ability to run in a VirtualBox environment?
Which resource is primarily associated with automated static and dynamic malware analysis for mobile apps?
Which resource is primarily associated with automated static and dynamic malware analysis for mobile apps?
What is the primary characteristic of a virus compared to other types of malware?
What is the primary characteristic of a virus compared to other types of malware?
Which type of malware is designed to collect information without user consent?
Which type of malware is designed to collect information without user consent?
What is a significant limitation of static analysis in malware analysis?
What is a significant limitation of static analysis in malware analysis?
Why are worms considered particularly dangerous compared to other malware types?
Why are worms considered particularly dangerous compared to other malware types?
What distinct feature does ransomware have compared to other malware categories?
What distinct feature does ransomware have compared to other malware categories?
What type of malware utilizes existing computers to perform malicious tasks like DDoS attacks?
What type of malware utilizes existing computers to perform malicious tasks like DDoS attacks?
Which malware type employs malicious code that activates under specific conditions?
Which malware type employs malicious code that activates under specific conditions?
Which of the following tools is typically used in static analysis of malware?
Which of the following tools is typically used in static analysis of malware?
What is a primary criticism of the NSL-KDD dataset?
What is a primary criticism of the NSL-KDD dataset?
Which of the following datasets is cited as an alternative to the NSL-KDD dataset?
Which of the following datasets is cited as an alternative to the NSL-KDD dataset?
Which category does not represent the types of attacks in the NSL-KDD dataset?
Which category does not represent the types of attacks in the NSL-KDD dataset?
How many general categories of attacks are represented in the NSL-KDD dataset?
How many general categories of attacks are represented in the NSL-KDD dataset?
What is a key characteristic of the data collection for the NSL-KDD dataset?
What is a key characteristic of the data collection for the NSL-KDD dataset?
What is the primary goal of anomaly-based detection?
What is the primary goal of anomaly-based detection?
Which of the following methods can be used as part of statistical approaches for anomaly detection?
Which of the following methods can be used as part of statistical approaches for anomaly detection?
What is the difference between outlier detection and novelty detection?
What is the difference between outlier detection and novelty detection?
What does continuous learning in anomaly detection help manage?
What does continuous learning in anomaly detection help manage?
Which type of anomalies are characterized as anomalous individual data instances significantly different from the rest of the dataset?
Which type of anomalies are characterized as anomalous individual data instances significantly different from the rest of the dataset?
Which aspect is essential for behavioral profiling in anomaly detection?
Which aspect is essential for behavioral profiling in anomaly detection?
Adaptive models in anomaly detection are necessary to address which of the following?
Adaptive models in anomaly detection are necessary to address which of the following?
Which machine learning approach is commonly used for novelty detection?
Which machine learning approach is commonly used for novelty detection?
What is a characteristic of collective anomalies in data sets?
What is a characteristic of collective anomalies in data sets?
Which of the following is considered a typical signal for host-based anomaly detection?
Which of the following is considered a typical signal for host-based anomaly detection?
What distinguishes traffic metadata from deep packet inspection in network intrusion detection?
What distinguishes traffic metadata from deep packet inspection in network intrusion detection?
Which metric is NOT typically considered in feature engineering for host intrusion detection?
Which metric is NOT typically considered in feature engineering for host intrusion detection?
What is a common use of protocol analyzers in network intrusion detection?
What is a common use of protocol analyzers in network intrusion detection?
Which of the following describes the correlation of signals in anomaly detection?
Which of the following describes the correlation of signals in anomaly detection?
Which application-level log feature is commonly analyzed for anomaly detection?
Which application-level log feature is commonly analyzed for anomaly detection?
What does the term 'system scheduler changes' refer to in the context of anomaly detection metrics?
What does the term 'system scheduler changes' refer to in the context of anomaly detection metrics?
Which type of malware feature utilizes the analysis of how and when malware accesses specific memory regions to identify behavior?
Which type of malware feature utilizes the analysis of how and when malware accesses specific memory regions to identify behavior?
What is the main purpose of Control Flow Graph (CFG) in malware analysis?
What is the main purpose of Control Flow Graph (CFG) in malware analysis?
Which feature is typically analyzed to detect deviations from normal behavior in an intrusion detection system?
Which feature is typically analyzed to detect deviations from normal behavior in an intrusion detection system?
In the context of the Microsoft Malware Classification Challenge, what is meant by opcode n-grams?
In the context of the Microsoft Malware Classification Challenge, what is meant by opcode n-grams?
What distinguishes Network-based IDS from Host-based IDS?
What distinguishes Network-based IDS from Host-based IDS?
Which of the following features would likely be analyzed to measure malware's communication with remote servers?
Which of the following features would likely be analyzed to measure malware's communication with remote servers?
What role does Random Forest play in malware feature selection as mentioned in the context of the classification challenge?
What role does Random Forest play in malware feature selection as mentioned in the context of the classification challenge?
Which type of IDS is designed to take proactive measures against threats?
Which type of IDS is designed to take proactive measures against threats?
What is indicated by a malware sample having 'distinctive visual patterns' when transformed into grayscale images?
What is indicated by a malware sample having 'distinctive visual patterns' when transformed into grayscale images?
Which mechanism would typically be used to ensure a malware’s persistence on a Windows system?
Which mechanism would typically be used to ensure a malware’s persistence on a Windows system?
Flashcards
Malware
Malware
Malicious software designed to harm or exploit computer systems and data. Examples include viruses, worms, Trojans, and ransomware.
Virus
Virus
Self-replicating malware that spreads by infecting files and executing them. Examples include Stuxnet.
Worm
Worm
Self-replicating malware that spreads across networks without user interaction. Examples include SQL Slammer.
Trojan
Trojan
Signup and view all the flashcards
Ransomware
Ransomware
Signup and view all the flashcards
Botnet
Botnet
Signup and view all the flashcards
Static Malware Analysis
Static Malware Analysis
Signup and view all the flashcards
Spyware
Spyware
Signup and view all the flashcards
Dynamic Analysis
Dynamic Analysis
Signup and view all the flashcards
Sandbox
Sandbox
Signup and view all the flashcards
Malware Behavior Monitoring
Malware Behavior Monitoring
Signup and view all the flashcards
Feature Generation
Feature Generation
Signup and view all the flashcards
Opcode Sequence
Opcode Sequence
Signup and view all the flashcards
Anomaly-based detection
Anomaly-based detection
Signup and view all the flashcards
Baseline establishment
Baseline establishment
Signup and view all the flashcards
Behavioral profiling
Behavioral profiling
Signup and view all the flashcards
Anomaly detection
Anomaly detection
Signup and view all the flashcards
Outlier detection
Outlier detection
Signup and view all the flashcards
Novelty detection
Novelty detection
Signup and view all the flashcards
Concept drift
Concept drift
Signup and view all the flashcards
Adaptive models/thresholds
Adaptive models/thresholds
Signup and view all the flashcards
Control Flow Graph (CFG)
Control Flow Graph (CFG)
Signup and view all the flashcards
File Headers and Sections
File Headers and Sections
Signup and view all the flashcards
Image Representation
Image Representation
Signup and view all the flashcards
API Call Sequences and Frequencies
API Call Sequences and Frequencies
Signup and view all the flashcards
Memory Access Patterns
Memory Access Patterns
Signup and view all the flashcards
Network Traffic Patterns
Network Traffic Patterns
Signup and view all the flashcards
System Call Behavior
System Call Behavior
Signup and view all the flashcards
Persistence Mechanisms and Registry Operations
Persistence Mechanisms and Registry Operations
Signup and view all the flashcards
Microsoft Malware Classification Challenge (MMC) Dataset
Microsoft Malware Classification Challenge (MMC) Dataset
Signup and view all the flashcards
XGBoost for Malware Classification
XGBoost for Malware Classification
Signup and view all the flashcards
Anomaly
Anomaly
Signup and view all the flashcards
Collective Anomaly
Collective Anomaly
Signup and view all the flashcards
Anomaly Detection Techniques
Anomaly Detection Techniques
Signup and view all the flashcards
Feature Engineering for Anomaly Detection
Feature Engineering for Anomaly Detection
Signup and view all the flashcards
Host Intrusion Detection Metrics
Host Intrusion Detection Metrics
Signup and view all the flashcards
Network Intrusion Detection Features
Network Intrusion Detection Features
Signup and view all the flashcards
Web/Application Intrusion Detection Features
Web/Application Intrusion Detection Features
Signup and view all the flashcards
Correlating Signals for Anomaly Detection
Correlating Signals for Anomaly Detection
Signup and view all the flashcards
What is the NSL-KDD dataset?
What is the NSL-KDD dataset?
Signup and view all the flashcards
What kind of information does the NSL-KDD dataset contain?
What kind of information does the NSL-KDD dataset contain?
Signup and view all the flashcards
How are attacks categorized in NSL-KDD?
How are attacks categorized in NSL-KDD?
Signup and view all the flashcards
What are some criticisms of the NSL-KDD dataset?
What are some criticisms of the NSL-KDD dataset?
Signup and view all the flashcards
What are some alternatives to the NSL-KDD dataset?
What are some alternatives to the NSL-KDD dataset?
Signup and view all the flashcards
Study Notes
CYB. Defensive AI (part 3)
- Course: Master in Artificial Intelligence
- Year: 2024/25
- Institution: ESEL – University of Vigo
AI/ML in Malware Analysis
- Malware is malicious software designed to harm, exploit, or compromise computer systems and data.
Malware: Definition and Types
- Malware can be a mixture of different types.
- Self-replicating:
- Viruses replicate when infected files execute. Examples include Stuxnet.
- Worms spread across networks without user interaction. (SQL Slammer is an example).
- Auto-hiding malware:
- Trojans disguise as legitimate but contain malicious code (like backdoors or data theft). Examples include Qbot/Qakbot, TrickBot.
- Rootkits hide malicious software, making detection or removal difficult. Examples include Linfo, Pandora, HIDEDRV.
- Designed to harm:
- Ransomware encrypts a victim's files and demands ransom for decryption (e.g., CryptoLocker, Phobos/Dharma).
- Botnets are networks of compromised computers used for malicious activities (e.g., DDoS attacks, spam, Mirai, Andromeda).
- Logic/time bombs are malicious code activated under specific conditions causing system damage.
- Keyloggers record keystrokes to capture sensitive information.
- Cryptojacking uses computers for cryptocurrency mining. Example: Kinsing, LoudMiner.
- Spyware collects information without consent (e.g., CoolWebSearch, Gator).
- Adware shows unwanted advertisements and collects user data. (e.g. Fireball, Appearch).
Malware Analysis
- Understanding the behavior and purpose of suspicious files is key.
- Static analysis:
- Examines malware code and characteristics without executing it.
- This involves studying file structure, strings, metadata, and embedded resources.
- Identifies known patterns, signatures, indicators (like file names, hashes, strings, IP addresses, domains, and file headers).
- Tools used for static analysis include disassemblers and static rules (example: Yara Rules).
- Static analysis can effectively detect known malware via signature-based approaches or heuristic analysis.
- However, static analysis may miss sophisticated or polymorphic threats.
- Dynamic analysis:
- Executes malware in a controlled environment (sandbox), observing its actual behavior.
- This is crucial for preventing harm to the host system.
- Dynamic analysis monitors file system access, changes, network communication (e.g., TCP, DNS), and system calls.
- Dynamic analysis is helpful for identifying unknown or evolving malware.
- A drawback is that dynamic analysis is often resource-intensive.
Malware Analysis (III)
- Resources and online sandboxes:
- MalwareBazaar, VirusShare.com
- Microsoft Malware Classification Challenge (BIG 2015) (Kaggle).
- Cuckoo Sandbox,
- Mobile Security Framework (MobSF)
- Joe Sandbox & tools reports
- Hybrid Analysis, VirusTotal (VT APIv3)
Typical features in Malware analysis
- Static features:
- Opcode Sequences (binary code operation codes).
- API Import and Export Functions (API calls for malicious tasks).
- File Metadata (size, creation dates, certificates).
- String Analysis
- Control Flow Graph (CFG) (flow of code).
- File Headers and Sections (e.g., Portable Executable (PE) headers in Windows).
- Image Representation (visual patterns in malware).
- Permissions and Manifest Information (Mobile malware).
- Dynamic features:
- API Call Sequences & Frequencies (malware system calls).
- Memory Access Patterns.
- Network Traffic Patterns (e.g., communication with malicious sites).
- System Call Behavior (specific system calls more frequent in malware than benign programs).
- Persistence Mechanisms (startup entries, scheduled tasks).
- Registry Operations
Microsoft Malware Classification Challenge (BIG 2015)
- Data: Contains ≈400GB of raw binary data and metadata.
- Problem: Multiclass problem with 9 types of malware.
- Dataset: Available at https://www.kaggle.com/c/malware-classification
- Paper: Available at https://arxiv.org/pdf/1802.10135.pdf
- 9 types of malware families in the dataset, with number of train samples for each
AI/ML in Intrusion Detection
- Intrusion Detection/Prevention Systems (IDS/IPS) monitor for dangerous activities.
- IDS detects and alerts (passive).
- IPS detects and blocks (proactive).
- IDS types:
- Network-based IDS (NIDS) monitors network traffic (e.g., Snort, Suricata, Zeek).
- Host-based IDS (HIDS) monitors host activity like file system, system calls, logs (e.g., Fail2Ban, OSSEC/Wazuh).
- Signature-based IDS: based on known attack patterns or signatures.
- Behavior-based (Anomaly-based) IDS: based on deviations from "normal" baseline.
Behavior-based / Anomaly-based IDS/IPS
- Anomalies are unexpected events
- Data exfiltration, malware activity (e.g., ransomware, virus), botnet activity, etc.
- Baseline establishment: define typical/acceptable behavior by analyzing historical data.
- Behavioral profiling: continuously monitors and profiles user/system behavior.
- Monitor data transfer volumes, protocol usage, system resource usage, login times, frequency, ...
Anomaly detection techniques
- Outlier detection: finding data points significantly different from the majority.
- Novelty detection: finding instances significantly different from training data.
- Types of Anomalies:
- Point Anomalies: Individual data instances.
- Contextual Anomalies: Abnormal behavior in a specific context.
- Collective Anomalies: A set of data points exhibiting anomalous behavior.
Anomaly detection techniques (III and IV and V)
- Various techniques and tools are used, including:
- Features engineering (metrics/signals from host and OS activity).
- OS instrumentation (e.g., OSquery), Cross platform endpoint instrumentation (e.g. Audit Daemon).
- OS signals (Running processes, Active/new user accounts, Permission changes, DNS lookups, Network connections, Kernel mods, System scheduler, Startup, Daemon…etc).
- Network intrusion detection (features from traffic).
- Traffic metadata, Aggregated info, Protocol analyzers, Web/application intrusion detection (features from logs).
NIDS Datasets
- NSL-KDD Dataset: improved benchmark for intrusion detection.
- Collected over ~9 weeks on a simulated network.
- ~4.9M connection records; raw PCAP captures, ~41 processed high level features.
- 22 attack types categorized into four broad groups (dos, unauthorized access, privilege escalation, and probing attempts).
- Criticisms and limitations of the KDD Cup 1999 dataset: outdated, limited, lack of context.
- Alternative Datasets: UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and UGR’16. These can offer a more up to date representations of real-world attacks.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.