prompt-injection

Trail of Bits used ML-centered threat modeling and adversarial testing to identify four prompt injection techniques that could exploit Perplexity’s Comet browser AI assistant to exfiltrate private Gmail data. The audit demonstrated how fake security mechanisms, system instructions, and user requests could manipulate the AI agent into accessing and transmitting sensitive user information.

We bypassed human approval protections for system command execution in AI agents, achieving RCE in three agent platforms.

In this blog post, we’ll detail how attackers can exploit image scaling on Gemini CLI, Vertex AI Studio, Gemini’s web and API interfaces, Google Assistant, Genspark, and other production AI systems. We’ll also explain how to mitigate and defend against these attacks, and we’ll introduce Anamorpher, our open-source tool that lets you explore and generate these crafted images.

This post describes attacks using ANSI terminal code escape sequences to hide malicious instructions to the LLM, leveraging the line jumping vulnerability we discovered in MCP.

Malicious MCP servers can inject trigger phrases into tool descriptions to exfiltrate entire conversation histories and steal sensitive credentials and IP.

MCP’s ’line jumping’ vulnerability lets malicious servers inject prompts through tool descriptions to manipulate AI behavior before tools are ever invoked.

prompt-injection

Using threat modeling and prompt injection to audit Comet

Prompt injection to RCE in AI agents

Weaponizing image scaling against production AI systems

Deceiving users with ANSI terminal codes in MCP

How MCP servers can steal your conversation history

Jumping the line: How MCP servers can attack you before you ever use them