These are the detectors PromptArmor runs by default to determine whether you are being attacked. These are run for /analyze/input, /analyze/action, and analyze/output

We use a combination of heuristics, models, and signature-based detection to determine whether you are at risk. This is powered and augmented by our threat intelligence, which keeps it up to date.

The things we actively check for are:

  • Adversarial Instructions: Instructions intended to mislead the LLM in a way that would manipulate systems
  • Anomalies: Behavior outside of expected patterns that indicate a security risk
  • Data Exfiltration: User data, content in the context window, or content outside context window being exfiltrated to an attacker
  • Phishing: Attacks being delivered within the trusted internal LLM interface or convincing a user to take an action that would constitute a phishing attack.