Documentation

Ban Substrings Agent

The Ban Substrings Agent detects and filters specific words, phrases, and character sequences in text. It provides precise control over content by blocking exact matches of prohibited strings, helping maintain content policies and prevent unwanted language in various applications.

Ban Substrings Component

Ban Substrings Agent interface and configuration

Implementation Notice: Substring matching is case-sensitive by default. For more comprehensive filtering, consider enabling case-insensitive matching and accounting for common character substitutions or variations of prohibited strings.

Component Inputs

  • Input Text: The text content to be analyzed for banned substrings

    Example: "Visit our website at www.example.com for more information."

  • Banned Substrings: List of specific strings to be detected and filtered

    Example: "www.example.com,123-456-7890,admin@example.com"

  • Case Sensitive: Whether to perform case-sensitive matching

    Options: true (match exact case) or false (ignore case)

  • Is Blocked: Whether content with banned substrings should be blocked

    Options: true (block content) or false (allow but flag content)

Component Outputs

  • Processed Text: The input text with banned substrings potentially redacted

    Example: "Visit our website at [REDACTED] for more information."

  • Safety Status: Indicator of whether banned substrings were detected

    Values: Safe (no banned substrings), Unsafe (banned substrings detected)

  • Risk Score: Numerical evaluation of policy violation risk

    Scale: 0.0 (no risk) to 1.0 (high risk)

  • Matched Substrings: List of detected banned substrings and their positions

    Example: [substring : www.example.composition : 17]

Common Usage Categories

Contact Information

  • Email Addresses
  • Phone Numbers
  • URLs/Websites
  • Street Addresses
  • Social Media Handles

Content Policy

  • Profanity Words
  • Competitor Names
  • Proprietary Terms
  • Sensitive Keywords
  • Command Triggers

How It Works

The Ban Substrings Agent uses string matching algorithms to identify exact occurrences of prohibited text fragments within content. Unlike semantic or topic-based approaches, it performs literal matching against a predefined list of banned strings.

Matching Techniques

  • Exact substring matching for precise identification
  • Optional case-insensitive matching to catch variations
  • Position tracking to identify where matches occur
  • Multiple match detection for comprehensive analysis
  • Configurable redaction options for identified substrings

Use Cases

  • Personal Information Protection: Prevent sharing of contact details in public forums
  • Profanity Filtering: Block specific profane or inappropriate words
  • External Link Control: Restrict sharing of specific website URLs
  • Competitive Intelligence: Block mentions of competitor products or services
  • Command Prevention: Block potential system commands or injection strings

Implementation Example

const substringFilter = new BanSubstringsAgent({ bannedSubstrings: [ "www.example.com", "contact@example.com", "555-123-4567" ], caseSensitive: false, isBlocked: true, redactionText: "[REDACTED]" }); const inputText = "Please contact us at contact@example.com"; const result = substringFilter.process(inputText); // Output: // { // processedText: "Please contact us at [REDACTED] ", // safetyStatus: "Unsafe", // riskScore: 1.0, // matchedSubstrings: [ // { // substring: "contact@example.com", // position: 19, // length: 19 // }, // { // substring: "555-123-4567", // position: 47, // length: 12 // } // ] // }

Best Practices

  • Include common variations and misspellings of banned substrings
  • Consider character substitutions (e.g., "@" for "a") when creating banned substring lists
  • Use case-insensitive matching for more comprehensive filtering
  • Regularly update substring lists based on emerging patterns
  • Combine with other content filtering techniques for more robust protection