Last Week in GAI Security Research - 05/05/25

Highlights from Last Week
- 🐟 Improving Phishing Email Detection Performance of Small Large Language Models
- 🔗 Understanding Large Language Model Supply Chain: Structure, Domain, and Vulnerabilities
- 🫘 Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models
- 🎞 Good News for Script Kiddies? Evaluating Large Language Models for Automated Exploit Generation
- 🚨 The Automation Advantage in AI Red Teaming
Partner Content

Pillar Security is the security stack for AI teams. Fortify the entire AI application development lifecycle while helping Security teams regain visibility and visibility control.
- Gain complete oversight of your AI inventory. Audit usage, app interactions, inputs, outputs, meta-prompts, user sessions, models and tools with full transparency.
- Safeguard your apps with enterprise-grade low-latency security and safety guardrails. Detect and prevent attacks that can affect your users, data and AI-app integrity.
- Assess and reduce risk by continuously stress-testing your AI apps with automated security and safety evaluations. Enhance resilience against novel attacks and stay ahead of emerging threats.
🐟 Improving Phishing Email Detection Performance of Small Large Language Models (http://arxiv.org/pdf/2505.00034v1.pdf)
- Larger language models such as GPT-4 and LLaMA-3.1-70b achieved a phishing email detection accuracy of 0.99 on the SpamAssassin dataset, indicating their effectiveness at balancing computational costs with high accuracy.
- Ensemble methods, including Majority Vote and Confidence scoring techniques, resulted in improved accuracy and F1 scores, outperforming several smaller parameter models, with Majority Vote achieving up to 0.980 in F1 score.
- Explanation-Augmented Fine-tuning enhanced detection capabilities significantly, with noticeable performance drops, such as a 40.7% reduction in F1 score for LLaMA-3.2-3B-Instruct, when not utilizing explanation augmentation.
🔗 Understanding Large Language Model Supply Chain: Structure, Domain, and Vulnerabilities (http://arxiv.org/pdf/2504.20763v1.pdf)
- Dependency analysis reveals that 79.7% of dependency trees within the ecosystem have fewer than five nodes, with high reliance on ten large trees covering 77.66% of all nodes, indicating concentrated risks and potential points of failure.
- Analysis of 180 identified vulnerabilities shows that they can propagate through an average of 142.1 nodes, highlighting the risk of cascading failures that can amplify the impact of a vulnerability across the ecosystem.
- Among the domains, 'Plugins/External Tools' dominate with 44.26% of packages, emphasizing their critical role in interoperability and potential as security risk vectors due to high dependency rates.
🫘 Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models (http://arxiv.org/pdf/2505.00817v1.pdf)
- Side-channel attacks through CPU cache-monitoring can potentially leak high-entropy API keys and sensitive tokens from Large Language Models (LLMs), achieving up to 40% leakage in monitored token sets.
- Sophisticated cache side-channel attacks exploit memory access patterns and cache state alterations, threatening the privacy and security of LLM deployments in shared or multi-tenant environments.
- Adaptive monitoring and rotating token sets improve leakage coverage for high-entropy credentials, with strategies achieving an 83.4% probability of capturing full API keys at a certain token monitoring scale.
🎞 Good News for Script Kiddies? Evaluating Large Language Models for Automated Exploit Generation (http://arxiv.org/pdf/2505.01065v1.pdf)
- GPT-4 demonstrated the highest cooperativeness in automated exploit generation tasks, whereas Llama3 showed resistance, indicating variability in how models respond to such tasks.
- Common vulnerabilities like buffer overflow and race conditions often resulted in exploit generation errors, with LLMs failing primarily due to incorrect computations of buffer sizes and payload arrangements.
- Uncensored models such as Dolphin-Mistral and Dolphin-Phi had high success rates, yet they consistently made errors with padding sizes and sequencing of shellcode in exploit generation attempts.
🚨 The Automation Advantage in AI Red Teaming (http://arxiv.org/pdf/2504.19855v2.pdf)
- Automated approaches in AI red teaming yielded a 69.5% success rate, significantly outperforming manual efforts with a 47.6% success rate, demonstrating the efficiency of systematic exploration.
- The study identified major security vulnerabilities in LLMs, such as susceptibility to injection attacks and information leaks, underscoring the need for improved defensive strategies against such exploits.
- Automated solutions drastically reduced time-to-solve challenges, being 5.2 times faster than manual solutions, particularly excelling in complex exploratory tasks which benefits from volume-based testing.
Other Interesting Research
- Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling (http://arxiv.org/pdf/2504.19277v1.pdf) - The study reveals that fine-tuning small language models notably boosts their ability to securely generate structured function calls in resource-constrained environments.
- OET: Optimization-based prompt injection Evaluation Toolkit (http://arxiv.org/pdf/2505.00843v1.pdf) - Explores the vulnerability and defense mechanisms of LLMs against optimization-based prompt injection attacks, revealing critical insights into their security resilience and adaptability.
- Prompt Injection Attack to Tool Selection in LLM Agents (http://arxiv.org/pdf/2504.19793v1.pdf) - ToolHijacker effectively compromises LLM tool selection with unprecedented success rates, overwhelming current defenses and emphasizing the urgent need for robust countermeasures.
- CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks (http://arxiv.org/pdf/2504.21228v1.pdf) - CachePrune mitigates indirect prompt injection attacks by effectively pruning task-trigger neurons without compromising response quality.
- Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression (http://arxiv.org/pdf/2504.20493v1.pdf) - The research reveals that exploiting simple arithmetic tasks can effectively trigger thinking-stopped vulnerabilities in language models, offering a new avenue for security attacks.
- Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction (http://arxiv.org/pdf/2504.20472v1.pdf) - A novel defense method demonstrates state-of-the-art effectiveness against prompt injection attacks with negligible performance trade-offs.
- Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption (http://arxiv.org/pdf/2504.20769v1.pdf) - Chain-of-defensive-thought prompting significantly enhances the robustness of large language models against reference corruption, maintaining higher accuracy and reducing attack success rates.
- ACE: A Security Architecture for LLM-Integrated App Systems (http://arxiv.org/pdf/2504.20984v1.pdf) - ACE's innovative separation of planning and execution in LLM-integrated app systems significantly bolsters security against malicious app interference.
- The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them) (http://arxiv.org/pdf/2505.00626v1.pdf) - By enhancing role differentiation in LLMs through position-enhanced fine-tuning, the study demonstrates significant improvements in role-separation security without compromising general performance.
- Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary (http://arxiv.org/pdf/2504.21038v1.pdf) - Introduction of prefill-based jailbreak methods reveals significant efficacy in bypassing safety measures in large language models, reaching success rates of up to 99.94%.
- XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs (http://arxiv.org/pdf/2504.21700v1.pdf) - The research unveils a novel strategy allowing for subtle yet effective layer manipulations in LLMs, dramatically amplifying harmful outputs without requiring extensive retraining.
- LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures (http://arxiv.org/pdf/2505.01177v1.pdf) - The vulnerabilities in LLMs are escalating with their expansion into sensitive domains, requiring more robust defensive mechanisms to ensure security and privacy.
- Attack and defense techniques in large language models: A survey and new perspectives (http://arxiv.org/pdf/2505.00976v1.pdf) - This paper underscores the need for robust security frameworks in large language models to mitigate the rising threats of jailbreak and prompt injection attacks.
- Traceback of Poisoning Attacks to Retrieval-Augmented Generation (http://arxiv.org/pdf/2504.21668v1.pdf) - RAGForensics reliably bolsters security against poisoning attacks in retrieval-augmented language models with an impressive 99.6% detection accuracy, underscoring its effectiveness and robustness.
- The Hidden Risks of LLM-Generated Web Application Code: A Security-Centric Evaluation of Code Generation Capabilities in Large Language Models (http://arxiv.org/pdf/2504.20612v1.pdf) - LLM-generated web code significantly heightens the risk of vulnerabilities, necessitating urgent improvements in security practices to align with NIST standards and prevent potential exploits.
- An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding (http://arxiv.org/pdf/2504.21803v1.pdf) - The paper highlights that large language models like ChatGPT and CodeLlama excel in decoding complex binary code, suggesting significant potential for aiding reverse engineering and enhancing software security analysis.
- Security Steerability is All You Need (http://arxiv.org/pdf/2504.19521v2.pdf) - The research introduces the concept of security steerability for LLMs to address GenAI-specific threats and highlights the inadequacy of traditional security measures in application-level settings.
- Evaluate-and-Purify: Fortifying Code Language Models Against Adversarial Attacks Using LLM-as-a-Judge (http://arxiv.org/pdf/2504.19730v1.pdf) - Innovative purification techniques like EP-Shield greatly enhance model stability against adversarial code manipulations, restoring both functionality and natural readability.
- Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System (http://arxiv.org/pdf/2505.01315v1.pdf) - A lightweight, retraining-free defense framework enhances LLMs' resistance to adversarial prompts while optimizing efficiency and computational resources.
- HyPerAlign: Hypotheses-driven Personalized Alignment (http://arxiv.org/pdf/2505.00038v1.pdf) - A pioneering approach, HyPerAlign, sets new benchmarks in personalized and safe LLM alignment by leveraging hypotheses-driven customization.
- AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security (http://arxiv.org/pdf/2504.20965v1.pdf) - AegisLLM showcases significant advancements in LLM security with improved resistance to threats and efficient unlearning mechanisms.
- CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain (http://arxiv.org/pdf/2504.21043v1.pdf) - CodeBC significantly improves the security of blockchain smart contracts using a novel three-stage fine-tuning strategy.
- Enhancing Leakage Attacks on Searchable Symmetric Encryption Using LLM-Based Synthetic Data Generation (http://arxiv.org/pdf/2504.20414v1.pdf) - Utilizing language models for generating synthetic data can enhance leakage attacks by exploiting statistical properties, revealing critical vulnerabilities in searchable encryption systems.
- Can Differentially Private Fine-tuning LLMs Protect Against Privacy Attacks? (http://arxiv.org/pdf/2504.21036v2.pdf) - Differential privacy trades utility for privacy in large language model fine-tuning, with varied impacts across parameter-efficient methods.
- From Texts to Shields: Convergence of Large Language Models and Cybersecurity (http://arxiv.org/pdf/2505.00841v1.pdf) - The convergence of LLMs and cybersecurity reveals both transformational opportunities and the necessity for human oversight in leveraging AI for robust digital defenses.
- Securing Agentic AI: A Comprehensive Threat Model and Mitigation Framework for Generative AI Agents (http://arxiv.org/pdf/2504.19956v1.pdf) - The study uncovers a comprehensive threat framework for AI agents introducing nine critical threats, underlining the necessity of specialized security controls for GenAI systems.
- Leveraging LLM to Strengthen ML-Based Cross-Site Scripting Detection (http://arxiv.org/pdf/2504.21045v1.pdf) - The study reveals that integrating LLMs for generating complex obfuscated XSS payloads enhances machine learning models' ability to identify and mitigate sophisticated security threats.
- A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories (http://arxiv.org/pdf/2505.01067v1.pdf) - The research introduces CONFIG SCAN, a pioneering tool leveraging AI and LLM techniques to strengthen security validation, reducing risks of malicious AI configuration exploits.
- LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs (http://arxiv.org/pdf/2504.21770v1.pdf) - The study explores the impact of large language models in enhancing static analysis precision for hardware security bugs, achieving up to 87.5% plausible CWE detection.
- Red Teaming Large Language Models for Healthcare (http://arxiv.org/pdf/2505.00467v1.pdf) - This study uncovers critical weaknesses in how large language models handle healthcare queries, revealing significant areas for improvement in model deployment and safety measures within high-risk domains.
- Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models (http://arxiv.org/pdf/2505.00557v1.pdf) - Exploring the impact of prompt design on hallucination occurrence sheds light on model vulnerabilities and potential improvements for AI reliability.
- ReCIT: Reconstructing Full Private Data from Gradient in Parameter-Efficient Fine-Tuning of Large Language Models (http://arxiv.org/pdf/2504.20570v1.pdf) - ReCIT's innovative approach to privacy attacks on language models reveals substantial vulnerabilities in existing fine-tuning techniques, highlighting the need for improved privacy-preserving mechanisms.
- NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models (http://arxiv.org/pdf/2504.21053v1.pdf) - The study highlights the vulnerabilities in language model alignment mechanisms and introduces NeuRel-Attack, a sophisticated method with a high success rate, challenging the robustness of current model safety protocols.
- From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review (http://arxiv.org/pdf/2504.19678v1.pdf) - This paper unveils significant advancements in AI-agent reasoning through comparative benchmarks, enhanced factual accuracy via RAG, and diagnostic precision in healthcare applications by 2025.
Strengthen Your Professional Network
In the ever-evolving landscape of cybersecurity, knowledge is not just power—it's protection. If you've found value in the insights and analyses shared within this newsletter, consider this an opportunity to strengthen your network by sharing it with peers. Encourage them to subscribe for cutting-edge insights into generative AI.