Toxicity Scanner
The Toxicity Scanner analyzes and mitigates harmful content in text data, helping maintain safe and constructive online interactions. Using advanced machine learning models, it identifies potentially toxic or offensive content before it reaches your users.

Toxicity Scanner workflow using the unitary/unbiased-toxic-roberta model
How It Works
The scanner utilizes the unitary/unbiased-toxic-roberta model from Hugging Face to perform binary classification of text content. It analyzes input text and provides detailed toxicity scoring with configurable thresholds.
Key Features
- Binary classification (toxic/non-toxic)
- Confidence scoring system
- Configurable toxicity thresholds
- Multiple scanning modes (sentence-level or full text)
- Real-time content analysis
- High accuracy and low false-positive rates
Configuration Options
- Threshold: Customizable toxicity threshold (default: 0.5)
- Match Type: Choose between:
- SENTENCE - Analyzes text sentence by sentence
- FULL - Processes entire text as one unit
- Model Parameters: Fine-tune model behavior and sensitivity
Output Format
The scanner returns a tuple containing three elements:
- sanitized_prompt: The processed text
- is_valid: Boolean indicating if the content passes toxicity checks
- risk_score: Float between 0 and 1 indicating toxicity level
Performance Metrics
- Average processing time: ~100ms per request
- Accuracy rate: >95% on benchmark datasets
- Support for multiple languages
- Scalable to high-volume applications
Note: The scanner works best with clear, well-structured text. Very long or ambiguous content may require additional processing time. Consider implementing rate limiting for high-volume applications.
Tip: For optimal results, implement content caching and use batch processing when analyzing multiple texts. Regular model updates ensure the best detection accuracy for emerging toxic content patterns.