Ban Substrings Scanner

The Ban Substrings Scanner provides precise control over text content by detecting and optionally redacting specific substrings from prompts. It offers flexible matching options and supports both word-level and string-level filtering.

Ban Substrings Scanner Architecture

Substring detection and redaction workflow

Matching Modes

Available Match Types

  • String Level (STR):

    Searches for banned substrings anywhere within the text, regardless of word boundaries.

  • Word Level (WORD):

    Only matches complete words that exactly match the banned substrings.

Key Features

  • Flexible matching modes
  • Case-sensitive/insensitive options
  • Optional content redaction
  • Multiple substring support
  • All/Any matching logic

Configuration Options

  • substrings: List of strings to ban
  • match_type: STR or WORD
  • case_sensitive: Boolean for case sensitivity
  • redact: Boolean to enable redaction
  • contains_all: Match all or any substrings

Common Use Cases

  • Competitor name filtering
  • Harmful content prevention
  • URL blacklisting
  • Sensitive term redaction
  • Brand protection

Output Format

  • sanitized_prompt: Text with optional redactions
  • is_valid: Boolean indicating if banned substrings were found
  • risk_score: Proportion of banned substrings found

Note: When using word-level matching, consider potential variations in word boundaries and punctuation. Regular testing with sample inputs is recommended.

Tip: Maintain your substring lists in external configuration files for easier updates and management. Consider implementing regular expressions for more complex pattern matching needs.