Podcast
Questions and Answers
What is one major advantage of dynamic analysis in malware analysis?
What is one major advantage of dynamic analysis in malware analysis?
Which of the following features is NOT typically associated with malware analysis?
Which of the following features is NOT typically associated with malware analysis?
What is a critical downside of dynamic malware analysis?
What is a critical downside of dynamic malware analysis?
Which sandbox solution is noted for its ability to run in a VirtualBox environment?
Which sandbox solution is noted for its ability to run in a VirtualBox environment?
Signup and view all the answers
Which resource is primarily associated with automated static and dynamic malware analysis for mobile apps?
Which resource is primarily associated with automated static and dynamic malware analysis for mobile apps?
Signup and view all the answers
What is the primary characteristic of a virus compared to other types of malware?
What is the primary characteristic of a virus compared to other types of malware?
Signup and view all the answers
Which type of malware is designed to collect information without user consent?
Which type of malware is designed to collect information without user consent?
Signup and view all the answers
What is a significant limitation of static analysis in malware analysis?
What is a significant limitation of static analysis in malware analysis?
Signup and view all the answers
Why are worms considered particularly dangerous compared to other malware types?
Why are worms considered particularly dangerous compared to other malware types?
Signup and view all the answers
What distinct feature does ransomware have compared to other malware categories?
What distinct feature does ransomware have compared to other malware categories?
Signup and view all the answers
What type of malware utilizes existing computers to perform malicious tasks like DDoS attacks?
What type of malware utilizes existing computers to perform malicious tasks like DDoS attacks?
Signup and view all the answers
Which malware type employs malicious code that activates under specific conditions?
Which malware type employs malicious code that activates under specific conditions?
Signup and view all the answers
Which of the following tools is typically used in static analysis of malware?
Which of the following tools is typically used in static analysis of malware?
Signup and view all the answers
What is a primary criticism of the NSL-KDD dataset?
What is a primary criticism of the NSL-KDD dataset?
Signup and view all the answers
Which of the following datasets is cited as an alternative to the NSL-KDD dataset?
Which of the following datasets is cited as an alternative to the NSL-KDD dataset?
Signup and view all the answers
Which category does not represent the types of attacks in the NSL-KDD dataset?
Which category does not represent the types of attacks in the NSL-KDD dataset?
Signup and view all the answers
How many general categories of attacks are represented in the NSL-KDD dataset?
How many general categories of attacks are represented in the NSL-KDD dataset?
Signup and view all the answers
What is a key characteristic of the data collection for the NSL-KDD dataset?
What is a key characteristic of the data collection for the NSL-KDD dataset?
Signup and view all the answers
What is the primary goal of anomaly-based detection?
What is the primary goal of anomaly-based detection?
Signup and view all the answers
Which of the following methods can be used as part of statistical approaches for anomaly detection?
Which of the following methods can be used as part of statistical approaches for anomaly detection?
Signup and view all the answers
What is the difference between outlier detection and novelty detection?
What is the difference between outlier detection and novelty detection?
Signup and view all the answers
What does continuous learning in anomaly detection help manage?
What does continuous learning in anomaly detection help manage?
Signup and view all the answers
Which type of anomalies are characterized as anomalous individual data instances significantly different from the rest of the dataset?
Which type of anomalies are characterized as anomalous individual data instances significantly different from the rest of the dataset?
Signup and view all the answers
Which aspect is essential for behavioral profiling in anomaly detection?
Which aspect is essential for behavioral profiling in anomaly detection?
Signup and view all the answers
Adaptive models in anomaly detection are necessary to address which of the following?
Adaptive models in anomaly detection are necessary to address which of the following?
Signup and view all the answers
Which machine learning approach is commonly used for novelty detection?
Which machine learning approach is commonly used for novelty detection?
Signup and view all the answers
What is a characteristic of collective anomalies in data sets?
What is a characteristic of collective anomalies in data sets?
Signup and view all the answers
Which of the following is considered a typical signal for host-based anomaly detection?
Which of the following is considered a typical signal for host-based anomaly detection?
Signup and view all the answers
What distinguishes traffic metadata from deep packet inspection in network intrusion detection?
What distinguishes traffic metadata from deep packet inspection in network intrusion detection?
Signup and view all the answers
Which metric is NOT typically considered in feature engineering for host intrusion detection?
Which metric is NOT typically considered in feature engineering for host intrusion detection?
Signup and view all the answers
What is a common use of protocol analyzers in network intrusion detection?
What is a common use of protocol analyzers in network intrusion detection?
Signup and view all the answers
Which of the following describes the correlation of signals in anomaly detection?
Which of the following describes the correlation of signals in anomaly detection?
Signup and view all the answers
Which application-level log feature is commonly analyzed for anomaly detection?
Which application-level log feature is commonly analyzed for anomaly detection?
Signup and view all the answers
What does the term 'system scheduler changes' refer to in the context of anomaly detection metrics?
What does the term 'system scheduler changes' refer to in the context of anomaly detection metrics?
Signup and view all the answers
Which type of malware feature utilizes the analysis of how and when malware accesses specific memory regions to identify behavior?
Which type of malware feature utilizes the analysis of how and when malware accesses specific memory regions to identify behavior?
Signup and view all the answers
What is the main purpose of Control Flow Graph (CFG) in malware analysis?
What is the main purpose of Control Flow Graph (CFG) in malware analysis?
Signup and view all the answers
Which feature is typically analyzed to detect deviations from normal behavior in an intrusion detection system?
Which feature is typically analyzed to detect deviations from normal behavior in an intrusion detection system?
Signup and view all the answers
In the context of the Microsoft Malware Classification Challenge, what is meant by opcode n-grams?
In the context of the Microsoft Malware Classification Challenge, what is meant by opcode n-grams?
Signup and view all the answers
What distinguishes Network-based IDS from Host-based IDS?
What distinguishes Network-based IDS from Host-based IDS?
Signup and view all the answers
Which of the following features would likely be analyzed to measure malware's communication with remote servers?
Which of the following features would likely be analyzed to measure malware's communication with remote servers?
Signup and view all the answers
What role does Random Forest play in malware feature selection as mentioned in the context of the classification challenge?
What role does Random Forest play in malware feature selection as mentioned in the context of the classification challenge?
Signup and view all the answers
Which type of IDS is designed to take proactive measures against threats?
Which type of IDS is designed to take proactive measures against threats?
Signup and view all the answers
What is indicated by a malware sample having 'distinctive visual patterns' when transformed into grayscale images?
What is indicated by a malware sample having 'distinctive visual patterns' when transformed into grayscale images?
Signup and view all the answers
Which mechanism would typically be used to ensure a malware’s persistence on a Windows system?
Which mechanism would typically be used to ensure a malware’s persistence on a Windows system?
Signup and view all the answers
Study Notes
CYB. Defensive AI (part 3)
- Course: Master in Artificial Intelligence
- Year: 2024/25
- Institution: ESEL – University of Vigo
AI/ML in Malware Analysis
- Malware is malicious software designed to harm, exploit, or compromise computer systems and data.
Malware: Definition and Types
- Malware can be a mixture of different types.
-
Self-replicating:
- Viruses replicate when infected files execute. Examples include Stuxnet.
- Worms spread across networks without user interaction. (SQL Slammer is an example).
-
Auto-hiding malware:
- Trojans disguise as legitimate but contain malicious code (like backdoors or data theft). Examples include Qbot/Qakbot, TrickBot.
- Rootkits hide malicious software, making detection or removal difficult. Examples include Linfo, Pandora, HIDEDRV.
-
Designed to harm:
- Ransomware encrypts a victim's files and demands ransom for decryption (e.g., CryptoLocker, Phobos/Dharma).
- Botnets are networks of compromised computers used for malicious activities (e.g., DDoS attacks, spam, Mirai, Andromeda).
- Logic/time bombs are malicious code activated under specific conditions causing system damage.
- Keyloggers record keystrokes to capture sensitive information.
- Cryptojacking uses computers for cryptocurrency mining. Example: Kinsing, LoudMiner.
- Spyware collects information without consent (e.g., CoolWebSearch, Gator).
- Adware shows unwanted advertisements and collects user data. (e.g. Fireball, Appearch).
Malware Analysis
- Understanding the behavior and purpose of suspicious files is key.
-
Static analysis:
- Examines malware code and characteristics without executing it.
- This involves studying file structure, strings, metadata, and embedded resources.
- Identifies known patterns, signatures, indicators (like file names, hashes, strings, IP addresses, domains, and file headers).
- Tools used for static analysis include disassemblers and static rules (example: Yara Rules).
- Static analysis can effectively detect known malware via signature-based approaches or heuristic analysis.
- However, static analysis may miss sophisticated or polymorphic threats.
-
Dynamic analysis:
- Executes malware in a controlled environment (sandbox), observing its actual behavior.
- This is crucial for preventing harm to the host system.
- Dynamic analysis monitors file system access, changes, network communication (e.g., TCP, DNS), and system calls.
- Dynamic analysis is helpful for identifying unknown or evolving malware.
- A drawback is that dynamic analysis is often resource-intensive.
Malware Analysis (III)
-
Resources and online sandboxes:
- MalwareBazaar, VirusShare.com
- Microsoft Malware Classification Challenge (BIG 2015) (Kaggle).
- Cuckoo Sandbox,
- Mobile Security Framework (MobSF)
- Joe Sandbox & tools reports
- Hybrid Analysis, VirusTotal (VT APIv3)
Typical features in Malware analysis
-
Static features:
- Opcode Sequences (binary code operation codes).
- API Import and Export Functions (API calls for malicious tasks).
- File Metadata (size, creation dates, certificates).
- String Analysis
- Control Flow Graph (CFG) (flow of code).
- File Headers and Sections (e.g., Portable Executable (PE) headers in Windows).
- Image Representation (visual patterns in malware).
- Permissions and Manifest Information (Mobile malware).
-
Dynamic features:
- API Call Sequences & Frequencies (malware system calls).
- Memory Access Patterns.
- Network Traffic Patterns (e.g., communication with malicious sites).
- System Call Behavior (specific system calls more frequent in malware than benign programs).
- Persistence Mechanisms (startup entries, scheduled tasks).
- Registry Operations
Microsoft Malware Classification Challenge (BIG 2015)
- Data: Contains ≈400GB of raw binary data and metadata.
- Problem: Multiclass problem with 9 types of malware.
- Dataset: Available at https://www.kaggle.com/c/malware-classification
- Paper: Available at https://arxiv.org/pdf/1802.10135.pdf
- 9 types of malware families in the dataset, with number of train samples for each
AI/ML in Intrusion Detection
- Intrusion Detection/Prevention Systems (IDS/IPS) monitor for dangerous activities.
- IDS detects and alerts (passive).
- IPS detects and blocks (proactive).
- IDS types:
- Network-based IDS (NIDS) monitors network traffic (e.g., Snort, Suricata, Zeek).
- Host-based IDS (HIDS) monitors host activity like file system, system calls, logs (e.g., Fail2Ban, OSSEC/Wazuh).
- Signature-based IDS: based on known attack patterns or signatures.
- Behavior-based (Anomaly-based) IDS: based on deviations from "normal" baseline.
Behavior-based / Anomaly-based IDS/IPS
- Anomalies are unexpected events
- Data exfiltration, malware activity (e.g., ransomware, virus), botnet activity, etc.
- Baseline establishment: define typical/acceptable behavior by analyzing historical data.
- Behavioral profiling: continuously monitors and profiles user/system behavior.
- Monitor data transfer volumes, protocol usage, system resource usage, login times, frequency, ...
Anomaly detection techniques
- Outlier detection: finding data points significantly different from the majority.
- Novelty detection: finding instances significantly different from training data.
-
Types of Anomalies:
- Point Anomalies: Individual data instances.
- Contextual Anomalies: Abnormal behavior in a specific context.
- Collective Anomalies: A set of data points exhibiting anomalous behavior.
Anomaly detection techniques (III and IV and V)
- Various techniques and tools are used, including:
- Features engineering (metrics/signals from host and OS activity).
- OS instrumentation (e.g., OSquery), Cross platform endpoint instrumentation (e.g. Audit Daemon).
- OS signals (Running processes, Active/new user accounts, Permission changes, DNS lookups, Network connections, Kernel mods, System scheduler, Startup, Daemon…etc).
- Network intrusion detection (features from traffic).
- Traffic metadata, Aggregated info, Protocol analyzers, Web/application intrusion detection (features from logs).
NIDS Datasets
-
NSL-KDD Dataset: improved benchmark for intrusion detection.
- Collected over ~9 weeks on a simulated network.
- ~4.9M connection records; raw PCAP captures, ~41 processed high level features.
- 22 attack types categorized into four broad groups (dos, unauthorized access, privilege escalation, and probing attempts).
- Criticisms and limitations of the KDD Cup 1999 dataset: outdated, limited, lack of context.
- Alternative Datasets: UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and UGR’16. These can offer a more up to date representations of real-world attacks.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on the various techniques used in malware analysis, including dynamic analysis, static analysis, and sandbox environments. This quiz explores the characteristics of different types of malware, their functionalities, and the tools used for analysis.