Ban Substrings Agent

The Ban Substrings Agent detects and filters specific words, phrases, and character sequences in text. It provides precise control over content by blocking exact matches of prohibited strings, helping maintain content policies and prevent unwanted language in various applications.

Ban Substrings Agent interface and configuration

Implementation Notice: Substring matching is case-sensitive by default. For more comprehensive filtering, consider enabling case-insensitive matching and accounting for common character substitutions or variations of prohibited strings.

Component Inputs

Input Text: The text content to be analyzed for banned substrings
Example: "Visit our website at www.example.com for more information."
Banned Substrings: List of specific strings to be detected and filtered
Example: "www.example.com,123-456-7890,admin@example.com"
Case Sensitive: Whether to perform case-sensitive matching
Options: true (match exact case) or false (ignore case)
Is Blocked: Whether content with banned substrings should be blocked
Options: true (block content) or false (allow but flag content)

Component Outputs

Processed Text: The input text with banned substrings potentially redacted
Example: "Visit our website at [REDACTED] for more information."
Safety Status: Indicator of whether banned substrings were detected
Values: Safe (no banned substrings), Unsafe (banned substrings detected)
Risk Score: Numerical evaluation of policy violation risk
Scale: 0.0 (no risk) to 1.0 (high risk)
Matched Substrings: List of detected banned substrings and their positions
Example: [substring : www.example.composition : 17]

Common Usage Categories

Contact Information

Email Addresses
Phone Numbers
URLs/Websites
Street Addresses
Social Media Handles

Content Policy

Profanity Words
Competitor Names
Proprietary Terms
Sensitive Keywords
Command Triggers

How It Works

The Ban Substrings Agent uses string matching algorithms to identify exact occurrences of prohibited text fragments within content. Unlike semantic or topic-based approaches, it performs literal matching against a predefined list of banned strings.

Matching Techniques

Exact substring matching for precise identification
Optional case-insensitive matching to catch variations
Position tracking to identify where matches occur
Multiple match detection for comprehensive analysis
Configurable redaction options for identified substrings

Use Cases

Personal Information Protection: Prevent sharing of contact details in public forums
Profanity Filtering: Block specific profane or inappropriate words
External Link Control: Restrict sharing of specific website URLs
Competitive Intelligence: Block mentions of competitor products or services
Command Prevention: Block potential system commands or injection strings

Implementation Example

const substringFilter = new BanSubstringsAgent({
  bannedSubstrings: [
    "www.example.com",
    "contact@example.com",
    "555-123-4567"
  ],
  caseSensitive: false,
  isBlocked: true,
  redactionText: "[REDACTED]"
});

const inputText = "Please contact us at contact@example.com";
const result = substringFilter.process(inputText);

// Output:
// {
//   processedText: "Please contact us at [REDACTED] ",
//   safetyStatus: "Unsafe",
//   riskScore: 1.0,
//   matchedSubstrings: [
//     {
//       substring: "contact@example.com",
//       position: 19,
//       length: 19
//     },
//     {
//       substring: "555-123-4567",
//       position: 47,
//       length: 12
//     }
//   ]
// }

Useful Resources

Best Practices

Include common variations and misspellings of banned substrings
Consider character substitutions (e.g., "@" for "a") when creating banned substring lists
Use case-insensitive matching for more comprehensive filtering
Regularly update substring lists based on emerging patterns
Combine with other content filtering techniques for more robust protection

Documentation