Last Week in GAI Security Research - 01/13/25

Highlights from Last Week

📝 A survey of textual cyber abuse detection using cutting-edge language models and large language models
🪲 Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware
📦 FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models
🌊 RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models
💬 SpaLLM-Guard: Pairing SMS Spam Detection Using Open-source and Commercial LLMs

Partner Content

Pillar Security is the security stack for AI teams. Fortify the entire AI application development lifecycle while helping Security teams regain visibility and visibility control.

Gain complete oversight of your AI inventory. Audit usage, app interactions, inputs, outputs, meta-prompts, user sessions, models and tools with full transparency.
Safeguard your apps with enterprise-grade low-latency security and safety guardrails. Detect and prevent attacks that can affect your users, data and AI-app integrity.
Assess and reduce risk by continuously stress-testing your AI apps with automated security and safety evaluations. Enhance resilience against novel attacks and stay ahead of emerging threats.

📝 A survey of textual cyber abuse detection using cutting-edge language models and large language models (http://arxiv.org/pdf/2501.05443v1.pdf)

Large Language Models (LLMs) like BERT and RoBERTa dominate the landscape in detecting hate speech and cyberbullying, with BERT being used in 77% of published papers for these tasks.
Research highlights the significant challenge of class imbalance in cyber abuse detection, with 31.4% of studies addressing this through techniques like oversampling, undersampling, and weighted metrics.
Cyberbullying detection has shown promising advancements in accuracy through models that combine BERT with Bi-LSTM networks, achieving a macro-averaged F1 score of up to 0.9231.

🪲 Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware (http://arxiv.org/pdf/2501.04848v1.pdf)

MalParse, a model leveraging large language models for Android malware analysis, achieved a 77% categorization accuracy compared to 49.5% with simpler approaches, without needing pre-training on malware datasets.
Through hierarchical and contextual summarization of Android application components, MalParse provides actionable insights into malicious behaviors, outperforming other scoping methods by efficiently tracing root causes within the software structure.
The study underscores the potential of advanced prompt engineering and hierarchical-tiered summarization techniques in enhancing the accuracy and comprehensibility of large language models in cybersecurity, notably in identifying and classifying malware threats.

📦 FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models (http://arxiv.org/pdf/2501.02968v1.pdf)

The FlipedRAG attack method demonstrated a successful 50% change in opinion polarity and induced a 20% shift in user cognition, revealing the vulnerability of retrieval-augmented generation models in opinion manipulation contexts.
Advanced adversarial strategies targeting black-box retrieval systems showed that language models' outputs could be manipulated, affecting reliability and potentially spreading biased or false information.
Defense mechanisms against opinion manipulation in RAG systems are currently inadequate, necessitating new defensive strategies to counteract the effects of adversarial retrieval attacks and safeguard against cognitive manipulation.

🌊 RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models (http://arxiv.org/pdf/2501.05249v1.pdf)

RAG-WM detects IP infringement in RAG systems with 100% verification success and minimal impact on task performance across various large language models and datasets.
The approach effectively embeds high-quality watermark texts within a retrieval-augmented generation system, maintaining stealthiness against adversarial paraphrasing and unrelated content removal attacks.
Watermark detection methods using perplexity and duplicate text filtering provide robust protection for intellectual property without degrading the quality of generated content.

💬 SpaLLM-Guard: Pairing SMS Spam Detection Using Open-source and Commercial LLMs (http://arxiv.org/pdf/2501.04985v1.pdf)

Fine-tuning LLMs, especially the Mixtral model, achieves near-perfect spam detection accuracy of 98.61%, with a false positive rate below 2%.
Few-shot learning significantly improves the variability and performance of LLMs in detecting SMS spam, achieving a 97.18% accuracy with GPT-4.
LLMs exhibit strong resilience against concept drift and adversarial attacks after fine-tuning, maintaining robust performance over new spam datasets.

Other Interesting Research

Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense (http://arxiv.org/pdf/2501.02629v1.pdf) - Layer-AdvPatcher enhances LLM safety by efficiently editing toxic layers and substantially reducing vulnerabilities to jailbreak attacks while preserving essential model performance.
Bringing Order Amidst Chaos: On the Role of Artificial Intelligence in Secure Software Engineering (http://arxiv.org/pdf/2501.05165v1.pdf) - The study provides actionable insights into improving defect prediction, leveraging machine learning for vulnerability detection, and enhancing software development with quantum computing capabilities.
LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models (http://arxiv.org/pdf/2501.03446v1.pdf) - Leveraging an advanced LLM pipeline, software vulnerability repair can reach unprecedented levels of effectiveness, quality improvement, and operational efficiency.
CommitShield: Tracking Vulnerability Introduction and Fix in Version Control Systems (http://arxiv.org/pdf/2501.03626v1.pdf) - COMMIT SHIELD's integration of natural language processing and code analysis for vulnerability fix detection significantly outperforms traditional SZZ algorithms in precision, recall, and F1-score, showcasing its efficacy in managing open-source software security issues.
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models (http://arxiv.org/pdf/2501.04312v1.pdf) - The paper presents DFUZZ, a highly efficient LLM-based framework that excels in bug detection and API coverage by extracting transferable edge cases in deep learning libraries like PyTorch and TensorFlow.
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection (http://arxiv.org/pdf/2501.04510v1.pdf) - CGP-Tuning demonstrates enhanced efficiency and performance in code vulnerability detection by integrating type-aware embeddings and optimizing graph-text interactions, outperforming traditional methods.
Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets (http://arxiv.org/pdf/2501.02628v1.pdf) - An evaluation of Stack v2 datasets reveals substantial security vulnerabilities and licensing issues, providing key insights into risks associated with large-scale LLM dataset curation and the necessity for enhanced automated code quality interventions.
PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models (http://arxiv.org/pdf/2501.03544v1.pdf) - PromptGuard stands out for its efficiency and effectiveness in moderating NSFW content in text-to-image models without significant image quality loss.
Navigating the Designs of Privacy-Preserving Fine-tuning for Large Language Models (http://arxiv.org/pdf/2501.04323v2.pdf) - GuardedTuning offers a robust and privacy-focused solution for fine-tuning large language models by balancing utility, privacy, and communication costs.
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection (http://arxiv.org/pdf/2501.03940v1.pdf) - PAWN emerges as a robust, resource-efficient method for cross-domain AI-generated text detection, performing well even under adversarial conditions.
Multimodal-to-Text Prompt Engineering in Large Language Models Using Feature Embeddings for GNSS Interference Characterization (http://arxiv.org/pdf/2501.05079v1.pdf) - The integration of language models with GNSS interference monitoring systems significantly enhances classification accuracy and supports real-time interference detection, promoting more resilient GNSS applications.
HP-BERT: A framework for longitudinal study of Hinduphobia on social media via LLMs (http://arxiv.org/pdf/2501.05482v1.pdf) - HP-BERT shows strong potential for monitoring harmful sentiments like Hinduphobia on social media, especially during crises like the COVID-19 pandemic.

Strengthen Your Professional Network

In the ever-evolving landscape of cybersecurity, knowledge is not just power—it's protection. If you've found value in the insights and analyses shared within this newsletter, consider this an opportunity to strengthen your network by sharing it with peers. Encourage them to subscribe for cutting-edge insights into generative AI.

🎯

This post was generated using generative AI (OpenAI GPT-4o). Specific approaches were taken to reduce fabrications. As with any AI-generated content, mistakes might be present. Sources for all content have been included for reference.