Prompt Injection Scanner
The Prompt Injection Scanner provides advanced protection against malicious input manipulations targeting Large Language Models (LLMs). It uses state-of-the-art detection models to identify and prevent potential injection attacks.

Prompt Injection detection workflow using DeBERTa-v3
Warning: This scanner is specifically designed for user inputs and is not recommended for system prompts.
Attack Scenarios
Common Attack Types
- Direct Injection: Attempts to overwrite system prompts
- Indirect Injection: Manipulates external source inputs
Vulnerable Scenarios
- RAG Systems: Vector databases containing potentially compromised documents
- Web-Browsing Chatbots: Exposure to unfiltered internet content
- Automated Customer Service: Processing potentially malicious email content
Configuration Options
- threshold: Detection confidence threshold (default: 0.5)
- match_type: Analysis mode
- FULL: Complete text analysis
- SENTENCE: Sentence-by-sentence scanning
- model: ProtectAI/deberta-v3-base-prompt-injection-v2
Output Format
- sanitized_prompt: Analyzed text
- is_valid: Boolean indicating if injection was detected
- risk_score: Injection probability score (0-1)
Note: The scanner uses the DeBERTa-v3 model fine-tuned on prompt injection datasets. Classification results: 0 for safe content, 1 for detected injection.
Tip: For longer prompts, experiment with different match types to optimize detection accuracy. Consider implementing additional security layers for critical applications.