Itcan: Automated Software Security Assurance using LLM 2024 PDF
Document Details
Uploaded by ImprovingSynthesizer
King Saud University
2024
Razan Esa Alghonaim
Tags
Related
- Chapter 9 - 02 - Understand Software Security Standards, Models, and Frameworks_ocred.pdf
- Lecture on Vulnerabilities, Weaknesses, Software Security
- Chapter 2: Access Controls PDF
- Implementing Host and Software Security PDF
- Software Security in Software Engineering PDF
- Implementing Host and Software Security PDF
Summary
This document is a master's thesis submitted to King Saud University on automated software security assurance using large language models (LLMs). The thesis explores the use of LLMs to ensure code adherence to cybersecurity standards and identify potential vulnerabilities. The system, named "Itcan," is designed to enhance code security and aid in vulnerability remediation.
Full Transcript
King Saud University College of Computer & Information Sciences Cybersecurity Master Program Itcan: Automated software security assurance using LLM SEC 598 By RAZAN ESA ALGHONAIM...
King Saud University College of Computer & Information Sciences Cybersecurity Master Program Itcan: Automated software security assurance using LLM SEC 598 By RAZAN ESA ALGHONAIM 445205958 Under the supervision of Dr. Abdulaziz Alabodi Submitted in partial fulfillment of the requirements for the Degree of Master in Cybersecurity at the College of Computer and Information Sciences King Saud University 1446 (2024) DEDICATION There is no such thing as a specific format to write a dedication section. Keep it short. 1 DECLARATION We hereby declare that we are the sole authors of this report. We authorize King Saud University to lend this report to other institutions or individuals for the purpose of scholarly research. 2 ACKNOWLEDGMENTS 3 ABSTRACT The full implementation and evaluation of the system will be completed in the next phase of the research, and the results and conclusions will be derived accordingly. 4 TABLE OF CONTENTS 1. Introduction........................................................................................................................ 8 1.1 Problem Statement..................................................................................................... 9 1.2 Suggested Solution................................................................................................... 10 1.3 Research Aim and Objectives.................................................................................. 10 1.4 Report Outline:......................................................................................................... 11 2. Background....................................................................................................................... 12 2.1 Secure Coding Practices........................................................................................... 13 2.2 Automated Security Standard Enforcement............................................................. 13 2.3 Machine Learning (ML)........................................................................................... 14 2.4 Large Language Model (LLM)................................................................................ 14 3. Literature Review............................................................................................................. 16 4. System Design................................................................................................................... 21 5. Evaluation Methodology.................................................................................................. 23 5.1 Research Questions.................................................................................................. 24 5.2 Materials List............................................................................................................ 24 5.3 Procedure.................................................................................................................. 25 6. Results and Discussions................................................................................................... 26 7. Conclusion......................................................................................................................... 28 8. References......................................................................................................................... 30 5 TABLE OF FIGURES Figure 1: System design _____________________________________________________ 22 LIST OF TABLES Table 1: Comparison of AI-based Approaches for Ensuring Secure Coding Practices _____ 20 6 LIST OF ABBREVIATIONS Abbreviation Meaning AI Artificial Intelligence LLM Large Language Model ML Machine Learning NCA National Cybersecurity Authority 7 Introduction 8 1.1 Problem Statement The increasing complexity of modern software systems has elevated the risk of security issues, which can potentially lead to breaches, data loss, or system failures. While numerous tools and frameworks exist to assist in identifying software vulnerabilities, many of these rely on manual efforts or are prone to human error. Moreover, they typically focus on detecting vulnerabilities rather than ensuring that the code complies with well-defined security standards that can prevent such vulnerabilities from arising in the first place. In Saudi Arabia, the National Cybersecurity Authority (NCA) has established rigorous cybersecurity standards and regulations that organizations must adhere to in order to safeguard critical infrastructures and sensitive data. However, ensuring that these standards are consistently applied to the development process remains a significant challenge. Current methods for verifying compliance with these standards are often manual or semi-automated, which results in inefficiencies, increased costs, and a higher chance of overlooking potential security gaps. While some existing tools help assess code for known vulnerabilities, they typically fall short in ensuring that the code adheres to specific, pre-defined secure coding standards. Moreover, these tools often do not integrate effectively with the broader set of standards set by cybersecurity authorities or organizations themselves. There is a critical need for a tool that can automatically verify whether the source code meets these security standards, preventing violations that could lead to vulnerabilities. Emerging technologies, such as ML and LLMs, offer significant potential in automating this verification process. These models have the capability to understand and analyze code at a deeper, contextual level, enabling more accurate and adaptable assessments of code quality and security. However, there remains a gap in the application of LLMs to ensure compliance with security standards in real-time, dynamic software development environments. While existing tools often help assess code for known vulnerabilities, they typically fall short in ensuring that the code adheres to specific, pre-defined secure coding standards. There is a critical need for an automated solution that can verify whether the source code meets these standards and prevent violations that could lead to vulnerabilities. In response to this gap, this research aims to develop “Itcan”, a tool that leverages LLMs to automatically verify compliance with both national and internal security regulations, improving the reliability and effectiveness of security verification. 9 1.2 Suggested Solution This research presents an innovative approach to automated software security assurance by integrating LLMs with established secure coding standards and a rule-based engine. The developed system "Itcan" takes its name from the Arabic word " "إتقانmeaning "proficiency" which reflects its capability to rigorously verify software code compliance with stringent cybersecurity standards. The process begins with Itcan receiving source code in multiple programming languages, ensuring its adaptability across diverse development environments. Furthermore, it integrates secure coding standards and industry best practices as benchmarks for security compliance. The source code is then processed by Itcan’s AI component, which leverages LLMs to generate context-aware prompts and apply predefined rules to assess compliance. These prompts are passed to a rule-based engine for further analysis, focusing on compliance rather than vulnerability detection. Upon completion of the analysis, Itcan generates a report outlining the code’s adherence to security standards, accompanied by actionable recommendations for remediation. This report serves as a crucial resource for developers and security teams, providing insights to enhance code security and ensure ongoing compliance with cybersecurity regulations. By automating these processes, Itcan significantly reduces the manual effort involved in security assurance, thereby improving the efficiency and accuracy of security assessments throughout the software development lifecycle. 1.3 Research Aim and Objectives The primary aim of this research is to develop an automated system that leverages LLMs to ensure compliance with established cybersecurity standards in software code while preventing security weaknesses by addressing potential violations of these standards. This research will specifically focus on evaluating how effectively LLMs can translate regulatory and secure coding standards into practical compliance rules, and how these models can improve the precision and consistency of compliance verification. The objectives of the study are as follows: 1. To design and implement an AI-driven system that uses LLMs to automate compliance assurance ensuring code adheres to cybersecurity standards and best practices. 10 2. To evaluate the precision and recall metrics of the system, assessing its effectiveness in minimizing false positives and false negatives. 3. To assess the system’s capability to translate cybersecurity standards into actionable compliance rules and measure its effectiveness in real-world applications. 4. To gather and analyze developers' feedback on the tool's effectiveness and practical applicability in improving software security. 5. To compare the proposed LLM-based compliance assurance system with traditional, manual, and semi-automated methods, highlighting its strengths, limitations, and potential for future advancements in ensuring software security. 1.4 Report Outline: This report outlines the background on secure coding and LLMs in Chapter 2, reviews related literature in Chapter 3, presents the system design in Chapter 4, details the evaluation methodology in Chapter 5, and concludes with a summary and future directions in Chapter 6, with implementation and results to be discussed in the next semester. 11 Background 12 This section provides an overview of key concepts related to automated software security assurance, focusing on the role of ML and LLMs in helping developers adhere to secure coding standards and best practices. 2.1 Secure Coding Practices Secure coding practices refer to the set of guidelines and techniques developers use to write software that is resistant to security vulnerabilities. These practices aim to minimize risks such as unauthorized access, data breaches, and system failures by ensuring that code is developed in a way that adheres to security best practices. Common secure coding practices include input validation, proper error handling, and protection against common attacks like SQL injection and cross-site scripting (XSS). By following secure coding practices, developers can prevent vulnerabilities from being introduced into the software in the first place, reducing the potential for exploitation. Integrating these practices into the Software Development Life Cycle (SDLC) is crucial for ensuring long-term software security. The importance of secure coding is underscored by standards such as ISO/IEC 25010, which identifies security as a core quality attribute in software development. 2.2 Automated Security Standard Enforcement Automated Security Standard Enforcement involves using tools and techniques to ensure that software code adheres to established secure coding standards throughout the development process. This approach helps reduce the manual effort and time traditionally spent on reviewing code for compliance with security best practices, while improving accuracy and efficiency in identifying deviations. Unlike vulnerability detection tools that focus on finding weaknesses in code, automated security standard enforcement takes a proactive approach by preventing vulnerabilities from the outset, ensuring that code aligns with security guidelines. With the rise of ML and LLMs, ensuring code compliance has become more effective. These technologies offer context-aware capabilities that can analyze and evaluate code at scale, providing real-time feedback on adherence to security standards. The integration of these automated tools into development workflows, such as CI/CD pipelines, is expected to streamline the process and enhance software security across the development lifecycle. However, challenges remain in adapting tools to evolving security standards and ensuring their reliability across different programming languages and environments. 13 2.3 Machine Learning (ML) Machine Learning (ML) is a branch of artificial intelligence (AI) that enables systems to learn from data and enhance their performance over time. In the context of software security, ML can automate the analysis of code for compliance with security standards, enhancing both efficiency and accuracy. By analyzing large datasets, ML models can identify patterns that signify adherence to security best practices or deviations from them, enabling real-time feedback on code quality. Various ML techniques are used in security assurance tasks. Supervised learning, for instance, uses labeled datasets to train models for detecting standard violations, while unsupervised learning can uncover hidden patterns in code. Methods such as decision trees, support vector machines (SVM), and neural networks are often employed for such tasks. Furthermore, deep learning models like convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have proven effective in analyzing complex codebases at scale. Nevertheless, ML faces challenges, including limited labeled data, the risk of overfitting, and interpretability issues. However, advances in reinforcement learning and transfer learning hold potential for overcoming these barriers, improving model adaptability and enhancing security assurance capabilities in the future. 2.4 Large Language Model (LLM) Large Language Models (LLMs) are advanced deep learning models based on transformer architecture designed to process and generate human language. These models excel in natural language processing (NLP) and can be applied to various software engineering tasks, including ensuring adherence to secure coding practices. Trained on vast datasets, LLMs can learn to identify patterns in code, which can be leveraged to assess whether software complies with established security standards and best practices. In the context of secure coding, LLMs can analyze source code to provide feedback on potential deviations from secure coding guidelines. They can suggest code improvements, highlight areas that may introduce security risks, and offer alternative approaches that align with security best practices. This makes them valuable tools for ensuring that developers follow secure coding standards throughout the software development lifecycle, reducing the likelihood of introducing security flaws into the code. 14 While LLMs show great promise, challenges remain, such as their high computational costs, potential biases in training data, and the complexity of interpreting their outputs. Nevertheless, ongoing advancements in LLM technology are expected to greatly enhance their ability to support secure coding practices , thereby improving overall software security assurance. 15 Literature Review 16 This section provides an overview of the recent studies that apply various AI techniques to software security. All selected papers were published in reputable journals and conferences, ensuring the relevance and reliability of the sources in reflecting the latest advancements in the field. Rajapaksha et al. present a method driven by AI vulnerability detection for C and C++ source code that uses Concrete Syntax Trees (CST) to identify crucial features in source code for training a machine learning model to detect vulnerabilities. The model is trained on NIST SATE IV dataset. The results show an F1-score of 0.96 for binary classification and 0.85 for multi-class classification. Tihanyi et al. propose ESBMC-AI novel framework which integrates the Efficient SMT- based Context-Bounded Model Checker (ESBMC). By using Bounded Model Checking (BMC), they detect vulnerabilities in C source code and obtain counterexamples. Subsequently, a customized-designed prompt is used to integrate the original source code with the detected vulnerability. Accordingly, the integrated information is input into LLM which is directed to correct the code. Afterwards, the modified code is re-verified using Bounded Model Checking (BMC) to confirm fix has been successfully implemented. The results show the framework’s ability in automating the detection and correction of defects with high accuracy. Mathews et al. present a framework for Android vulnerability detection using LLM. Through the application of prompt engineering techniques, they evaluated three different prompt configurations: a standard prompt without vulnerability details, a prompt containing a description of the vulnerability, and a prompt designed to request files as needed. The results show that LLM exceed expectations in detecting application vulnerabilities. Rebecca et al. develop a vulnerability detection tool for C and C++ code that can effectively learn features from large source code repositories to identify vulnerabilities using deep representation learning. They apply feature-extraction methods using Convolutional Neural Network (CNN) and Recurrent Neural Networks (RNN) for function-level vulnerability classification. Results showed the CNN successfully extracted relevant features, achieving high accuracy. Eberhardt et al. present a vulnerability detection novel method using LLM, leveraging a collaborative system built on AutoGPT, in which a controller AI and an evaluator AI collaborate where the controller supervises the evaluation process, while the evaluator AI performs the vulnerability analysis. The results show that across 381 test cases, LLMs achieved an F1 score of 0.92 for vulnerability detection. Yu et al. study the commonly adopted LLMs for detecting vulnerability in C/C++ and Python. First, LLMs were instructed to generate details on the detected security vulnerabilities, including the line number, type, description, and recommended fixes. After that they compared the performance of LLMs with static analysis tools in finding vulnerabilities using five different 17 prompts. The results demonstrate that LLM has outperformed static analysis tools. Bakhshandeh et al. present a model that use LLM (GPT 3.5) for identifying vulnerabilities in Python. using four different prompts. Afterwards they compared the findings with SAST tools: Bandit, Semgrep, and SonarQube. The results show that the model achieved the best result with SAST facilitator by 0.7807 precision. Purba et al. develop a model using LLM that detect SQL injection and buffer overflow vulnerabilities in C/C++. They use four LLMs, including large OpenAI-hosted models and smaller local models due to privacy concerns. The results show that unlick prompting, fine- tuning enhances the performance of LLMs. Akuthota et al. propose vulnerability detection model leveraging LLM for Java code using OWASP Benchmark repository. It works by analyzing the code, finding vulnerabilities, and recommend remediations for detected vulnerabilities. The results show that the model obtained of 0.77 accuracy. Liu et al. present a vulnerability detection model using LLM (VUL-GPT). They employ GPT to analyze test code and apply BM-25 and TF-IDF to retrieve similar code snippets and their vulnerability data from the training set. This, along with the test code analysis, is processed by the GPT model to utilize its contextual learning. For the results, the recall 5.12% increases to 60.64% which indicate that integrating code analysis and retrieval enhances the effectiveness of GPT models in identifying code vulnerabilities. Thapa et al. propose a transformer based along with recurrent neural network (RNN) methods for vulnerability detection and review common techniques to fine-tune models effectively for C/C++ code using VulDeePecker dataset. The results show the transformer- based language models proved effective in vulnerability detection. Zhou et al. investigate the effect of using LLMs in vulnerability detection. Employing a diverse prompts in GPT-3.5, GPT-4, and CodeBERT for C/C++ code, they found that GPT-4 surpassed CodeBERT by 34.8% with regard to accuracy. Pelofske et al. evaluate five open-source GPT models including: Llama-2-70b-chat-hf, zephyr-7b-alpha, zephyr-7b-beta, Mistral-7B-Instruct-v0.1, and Turdus in SW vulnerability analysis. Leveraging test cases from NIST SARD dataset. They designed a prompt that asks to identify vulnerabilities in source code and provide description of the vulnerability and the exact code in JSON format. Their results indicated that Llama-2-70b-chat-hf outperformed the other models scoring 1.0 in both recall and precision. Zhang et al. study SW vulnerability detection using ChatGPT specifically GPT-4 with various prompts. They provide prompts with an explanation of the program’s data flow and APIs. After benchmarking ChatGPT against two leading vulnerability detection methods, the findings show that ChatGPT achieved better accuracy rate with 74.70%. Kalouptsoglou et al. develope methods for classifying vulnerabilities in source code 18 leveraging Natural Language Processing (NLP) and using expert-generated textual information. First, they prepared the code segments for analysis. Then the source code was encoded using Bag-of-Words and token sequence representations for model selection purposes. Subsequently, the previously mentioned models were trained on the dataset's training set. The results indicate the fine-tuned CodeBERT model outperformed other models attaining the F1 score of 85.5%. Marwan and Stavros propose a new approach for vulnerability detection using a transformer-based framework, referred to as VulDetect enabled through LLM (GPT-2) for C/C++ source code. The results demonstrate that the proposed model achieved an accuracy of 92.65%, surpassing two leading, state-of-the-art techniques. Yin et al. investigate the capability of LLMs for vulnerability detection, assessment, localization, and description using fine-tuning. After conducting experiments, the results show that using fine-tuning enhances LLM effectiveness in detecting, assessing, locating, and describing vulnerabilities. Liu et al. examine the use of different prompting strategies for LLMs in vulnerability detection, focusing on basic prompts, prompts with code-specific information, and chain-of- thought (CoT) prompts. The researchers enhance basic prompts with semantic, structural, and data flow graph (DFG) details, using a custom code search algorithm. CoT prompts are also introduced to guide step-by-step reasoning. Results show that combining code-specific information with CoT prompts improves detection accuracy, while the study also explores the impact of prompt structure and temperature settings on LLM performance. Noever investigate GPT-4’s ability to identify software security flaws. The analysis based on 129 code samples from eight programming languages. The results show that GPT-4 significantly outperforms traditional static code analyzers with fewer false positives, which suggest that GPT-4 could offer a more efficient approach to automated vulnerability detection and remediation. Mohamad and Hesham propose a deep learning-based approach utilizing Convolutional Neural Networks (CNNs) and Artificial Neural Networks (ANNs) for automated function level vulnerability detection, eliminating the need for expert knowledge. By generating feature representations through abstract syntax trees (ASTs) and optimizing model parameters, the approach was evaluated on a public dataset. The ANN model achieved the highest performance, with an accuracy of 84.22%. The reviewed studies focus on AI techniques like LLMs, Deep Learning, and Transformers for software vulnerability detection, often with high accuracy in specific languages. However, most are limited to detection or specific languages. In contrast, our proposed system combines LLMs with a rule-based engine, providing a flexible, multi-language solution for ensuring secure coding practices, setting it apart from existing models that focus on individual aspects or lack adaptability. 19 Table 1: Comparison of AI-based Approaches for Ensuring Secure Coding Practices Algorithm/ Evaluation Supported Provide Ref Dataset Granularity Classification method metrics Language Fixes Binary classification, multi- NIST SATE IV Token level F1 score C/C++ No class classification LLM (GPT 4) FormAI Program level Accuracy C Yes LLM (GPT 4) Ghera benchmark File level TP Java No NIST SATE IV, GitHub Deep representation learning public repositories, and Function level Precision, Recall C/C++ No Debian Linux distribution OWASP Benchmark TP, TN, FP, FN, LLM (GPT 3.5, GPT 4) Source code level Java No Project Precision, F score OpenStack Nova and LLM (ChatGPT, Gemini, I-Score, M-Score, C/C++, Neutron, Qt Base and File level Yes LLama) and IH- Score Python Creato F-measure, Recall, LLM (ChatGPT) securityEval, PyT File level Python No Precision Code gadgets and TP, TN, FP, FN, LLM (GPT 3.5) Source code level C/C++ No CVEfixes Precision, and Recall OWASP Benchmark LLM (GPT 3.5) Source code level Accuracy Java Yes Project C, C++, Accuracy, Recall, LLM (GPT 3.5) Devign dataset Source code level Java and No Precision, F1 score more. Transformer-based model VulDeePecker dataset Function level FPR, FNR, F1 score C/C++ No Accuracy, Recall, LLM (GPT 3.5, GPT 4) Vulnerability fixing dataset File level C/C++ No Precision, F1, F0.5 LLM (GPT) NIST SARD dataset Source code level Precision and Recall C/C++ No Collected vulnerability Accuracy, Recall, LLM (GPT 4) File level C/C++, Java No datasets Precision, F1 score Random forest classifier, Accuracy, Recall, NLP, Transformer-based Wartschinski et. Al. dataset Source code level Python No Precision, F1-score models NIST SARD and SeVC Accuracy, Recall, LLM (GPT 2) Token level C/C++ No dataset Precision, F1 score LLM (DeepSeek-Coder, Big-Vul, Devign dataset, CodeLlama, StarCoder, Accuracy, Recall, Reveal dataset, IVDetect, Function level C/C++ No WizardCoder, Mistral, Phi- Precision, F1 score LineVu, and SVulD 2) Devign dataset and Reveal Accuracy, Recall, LLM (GPT 3.5) Source code level C/C++ No dataset. Precision, F1 score Public scientific Java, LLM (GPT 4) repositories and real-world Source code level FP Python, C, Yes challenges and more, Accuracy, Recall, Deep learning-based neural Draper VDISC dataset Function level Precision, F1 score, C/C++ No networks (CNNs, ANNs) ROC-AUC 20 System Design 21 In this chapter, we present the design of our proposed solution, outlining how the system utilizes LLMs to ensure secure coding practices and generate reports with tailored recommendations. Figure 1: System design Itcan employs a multi-step process to enhance code security. Initially, Itcan takes in source code written in various programming languages as input. Simultaneously, secure coding standards and industry best practices are incorporated to serve as a benchmark for code evaluation. The source code, along with the custom code snippets generated based on these standards, are fed into the AI component. The AI component, powered by LLMs, processes the code, generating prompts and applying rules to identify potential vulnerabilities. These prompts and rules are then passed to the rule-based engine, which further analyzes the code for specific vulnerabilities. Once the analysis is complete, Itcan generates a comprehensive report detailing the detected vulnerabilities and providing tailored remediation recommendations. This report serves as an invaluable resource for developers and security teams to enhance code security and mitigate potential risks. 22 Evaluation Methodology 23 5.1 Research Questions 5.1.1. How effectively does LLM translate cybersecurity standards into actionable coding guidelines? This question explores the ability of LLMs to convert cybersecurity standards into clear, valid rules that can be used by a custom engine to assess and enforce secure coding practices. 5.1.2. What is the precision and recall metrics of LLM-generated rules in ensuring software security compliance? This question evaluates the effectiveness of the LLM-generated rules in ensuring software security compliance, focusing on precision (the accuracy of detected compliance issues) and recall (the ability to identify all relevant violations), providing insight into the model's reliability and coverage. 5.1.3. How do developers evaluate the effectiveness and practical applicability of the proposed tool in enhancing software security? The third question explored the tool’s real-world relevance by gathering feedback from developers on its usability, utility, and potential impact on improving software security practices. 5.2 Materials List The hardware and software used in this project: Hardware The hardware used for the research is MacBook Pro – M1 chip with the following specifications: Operating System: macOS CPU: 8 Core Capacity: 256 GB. RAM: 8 GB. 24 Software Open source LLM. Semgrep to be used as an engine. VS code as the IDE for the tool. Python and javaScript will be used as programming languages. 5.3 Procedure To be elaborated in GP2, the procedure generally will include: 5.3.1. Data Collection: Collect publicly available codebases from sources like GitHub, CVE, and OWASP, ensuring they include examples of both secure and insecure code. The data will be annotated with secure coding violations for training and evaluation. 5.3.2. Preprocessing: Clean and standardize the collected code, labeling secure coding violations and converting the code into a format compatible with the system's analysis engine. 5.3.3. Prompt Engineering: Design specific prompts for the LLM to analyze source code, focusing on identifying deviations from secure coding standards and best practices, and recommending improvements. 5.3.4. System Development: Develop the LLM-based analysis system in combination with a rule- based engine to identify secure coding violations (e.g., improper input validation, insecure data handling). Integrate the system into a software development pipeline to assist developers in adhering to secure coding practices. 5.3.5. Performance Evaluation: Evaluate the system's performance using precision and recall metrics, assessing its effectiveness in identifying secure coding violations compared to traditional static analysis tools. 5.3.6. Developer Feedback: Gather feedback from developers on the tool’s usability and effectiveness in improving software security practices according to NCA’s secure coding standards. 25 Results and Discussions 26 Since the implementation of the LLM-based system for ensuring secure coding practices will be completed in GP2, the detailed results and their analysis will be presented and discussed during that phase. 27 Conclusion 28 The conclusion will be completed after the implementation and results are finalized next semester in GP2. 29 References S. Rajapaksha, J. Senanayake, H. Kalutarage, and M. O. Al-Kadri, "Enhancing Security Assurance in Software Development: AI-Based Vulnerable Code Detection with Static Analysis," in Proc. of the International Conference on Artificial Intelligence for Security, Springer, 2024, pp. 283-298, doi: 10.1007/978-3-031-54129-2_20. [N. Tihanyi, R. Jain, M. A. Ferrag, Y. Charalambous, Y. Sun, and L. C. Cordeiro, "A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification," arXiv preprint arXiv:2305.14752, May 2023. [Online]. Available: https://arxiv.org/abs/2305.14752. N. S. Mathews, Y. Brus, Y. Aafer, M. Nagappan, and S. McIntosh, "LLbezpeky: Leveraging Large Language Models for Vulnerability Detection," arXiv preprint arXiv:2401.01269v2 [cs.CR], Feb. 13, 2024. [Online]. Available: https://arxiv.org/abs/2401.01269v2. R. Russell et al., "Automated Vulnerability Detection in Source Code Using Deep Representation Learning," in Proc. of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 2018, pp. 757-762, doi: 10.1109/ICMLA.2018.00120. G. Eberhardt and Á. Milánkovich, "VulnGPT: Enhancing Source Code Vulnerability Detection Using AutoGPT and Adaptive Supervision Strategies," in Proc. of the 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS- IoT), Abu Dhabi, United Arab Emirates, 2024, pp. 450-454, doi: 10.1109/DCOSS- IoT61029.2024.00072. National Cybersecurity Authority (NCA), "Home," National Cybersecurity Authority (NCA), [Online]. Available: https://nca.gov.sa/ar/. [Accessed: Nov. 1, 2024]. J. Yu, P. Liang, Y. Fu, A. Tahir, M. Shahin, C. Wang, and Y. Cai, "An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors," arXiv preprint arXiv:2401.16310v3 [cs.SE], Oct. 4, 2024. [Online]. Available: https://arxiv.org/abs/2401.16310v3. A. Bakhshandeh, A. Keramatfar, A. Norouzi, and M. M. Chekidehkhoun, "Using ChatGPT as a Static Application Security Testing Tool," arXiv preprint arXiv:2308.14434v1 [cs.CR], Aug. 28, 2023. [Online]. Available: https://arxiv.org/abs/2308.14434v1. M. D. Purba, A. Ghosh, B. J. Radford, and B. Chu, "Software Vulnerability Detection Using Large Language Models," in Proc. of the 34th IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Florence, Italy, 2023, pp. 112-119, doi: 10.1109/ISSREW60843.2023.00058. V. Akuthota, R. Kasula, S. T. Sumona, M. Mohiuddin, M. T. Reza, and M. M. Rahman, "Vulnerability Detection and Monitoring Using LLM," in Proc. of the 9th IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON- ECE), Thiruvananthapuram, India, 2023, pp. 309-314, doi: 10.1109/WIECON- ECE60392.2023.10456393. [Z. Liu, Q. Liao, W. Gu, and C. Gao, "Software Vulnerability Detection with GPT and In- Context Learning," in Proc. of the 8th International Conference on Data Science in Cyberspace (DSC), Hefei, China, 2023, pp. 229-236, doi: 10.1109/DSC59305.2023.00041. N. Shenoy and A. V. Mbaziira, "An Extended Review: LLM Prompt Engineering in Cyber Defense," in Proc. of theInternational Conference on Electrical, Computer and Energy Technologies (ICECET), Sydney, Australia, 2024, pp. 1-6, doi: 30 10.1109/ICECET61485.2024.10698605. C. Thapa, S. Jang, M. Ahmed, S. Camtepe, J. Pieprzyk, and S. Nepal, "Transformer-based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms," arXiv preprint arXiv:2204.03214, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2204.03214. X. Zhou, T. Zhang, and D. Lo, "Large Language Model for Vulnerability Detection: Emerging Results and Future Directions," in Proc. of the 46th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), Lisbon, Portugal, 2024, pp. 47- 51, doi: 10.1145/3639476.3639762. E. Pelofske, V. Urias, and L. M. Liebrock, "Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models," arXiv preprint arXiv:2408.00197v1 [cs.CR], Jul. 31, 2024. [Online]. Available: https://arxiv.org/abs/2408.00197v1. C. Zhang, H. Liu, J. Zeng, K. Yang, Y. Li, and H. Li, "Prompt-Enhanced Software Vulnerability Detection Using ChatGPT," in Proc. of the 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), Lisbon, Portugal, 2024, pp. 276-277, doi: 10.1145/3639478.3643065. I. Kalouptsoglou, M. Siavvas, A. Ampatzoglou, D. Kehagias, and A. Chatzigeorgiou, "Vulnerability Classification on Source Code Using Text Mining and Deep Learning Techniques," in Proc. of the 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C), Cambridge, United Kingdom, 2024, pp. 47-56, doi: 10.1109/QRS-C63300.2024.00017. M. O. Shiaeles and S. Shiaeles, "VulDetect: A Novel Technique for Detecting Software Vulnerabilities Using Language Models," in Proc. of the IEEE International Conference on Cyber Security and Resilience (CSR), Venice, Italy, 2023, pp. 105-110, doi: 10.1109/CSR57506.2023.10224924. X. Yin, C. Ni, and S. Wang, "Multitask-Based Evaluation of Open-Source LLM on Software Vulnerability," IEEE Trans. on Software Engineering, vol. 50, no. 11, pp. 3071-3087, Nov. 2024, doi: 10.1109/TSE.2024.3470333. Z. Liu, Z. Yang, and Q. Liao, "Exploration on Prompting LLM With Code-Specific Information for Vulnerability Detection," in Proc. of the IEEE International Conference on Software Services Engineering (SSE), Shenzhen, China, 2024, pp. 273-281, doi: 10.1109/SSE62657.2024.00049. D. Noever, "Can Large Language Models Find and Fix Vulnerable Software?," arXiv preprint arXiv:2308.10345, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.10345. M. T. Sultan and H. El Sayed, "A Deep Learning-Based Approach for Automated Vulnerability Detection in Source Code," in Proc. of the IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT), Dubai, United Arab Emirates, 2023, pp. 17-22, doi: 10.1109/GCAIoT61060.2023.10385129. Semgrep, "Semgrep: The Code Search and Static Analysis Tool," [Online]. Available: https://semgrep.dev. [Accessed: Nov. 20, 2024]. 31