Adversarial instructions are intended to manipulate LLM Applications into changing their actions, in a way that poses a security risk. Let us start with an example:

Example

(source: greshake.github.io)

(source: greshake.github.io)

In this example, the LLM Application is manipulated into convincing the user to input their name, and then ensuring that the user clicks on a link that will ensure that an attacker can get their name.

Threat

The threat here is clear. An adversarial instruction can manipulate agents into having malicious agendas such as getting the user to reveal their private data. Even worse, if this agent is hooked up to the ability to take actions, the agent can then execute that action (e.g. sending emails) which leads to much worse problems.

Detectors

PromptArmor's detectors for adversarial instructions are tailored to the use case of the LLM application, as the best attacks are the ones that are most specific to the use case of the LLM Application. For example, if it is a sales bot, a good attack would reference ways to send personalized emails.