Prompt Injection Scanner

The Prompt Injection Scanner provides advanced protection against malicious input manipulations targeting Large Language Models (LLMs). It uses state-of-the-art detection models to identify and prevent potential injection attacks.

Prompt Injection Scanner Architecture

Prompt Injection detection workflow using DeBERTa-v3

Warning: This scanner is specifically designed for user inputs and is not recommended for system prompts.

Attack Scenarios

Common Attack Types

  • Direct Injection: Attempts to overwrite system prompts
  • Indirect Injection: Manipulates external source inputs

Vulnerable Scenarios

  • RAG Systems: Vector databases containing potentially compromised documents
  • Web-Browsing Chatbots: Exposure to unfiltered internet content
  • Automated Customer Service: Processing potentially malicious email content

Configuration Options

  • threshold: Detection confidence threshold (default: 0.5)
  • match_type: Analysis mode
    • FULL: Complete text analysis
    • SENTENCE: Sentence-by-sentence scanning
  • model: ProtectAI/deberta-v3-base-prompt-injection-v2

Output Format

  • sanitized_prompt: Analyzed text
  • is_valid: Boolean indicating if injection was detected
  • risk_score: Injection probability score (0-1)

Note: The scanner uses the DeBERTa-v3 model fine-tuned on prompt injection datasets. Classification results: 0 for safe content, 1 for detected injection.

Tip: For longer prompts, experiment with different match types to optimize detection accuracy. Consider implementing additional security layers for critical applications.