Documentation

RegEx Scanner Agent

The RegEx Scanner Agent provides pattern-based text analysis using regular expressions. It offers precise detection of specific text patterns, sensitive information, or structured data formats, enabling targeted content filtering, data validation, and security compliance.

RegEx Scanner Component

RegEx Scanner Agent interface and configuration

Performance Note: Complex regular expressions can lead to performance issues with very large inputs. Consider input size limits and optimizing patterns for efficient processing.

Component Inputs

  • Input Text: The text content to be scanned with regular expressions

    Example: "Please contact me at user@example.com or call 555-123-4567"

  • RegEx Patterns: One or more regular expression patterns to match against the input

    Example: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]" (email pattern)

  • Match Type: Matching strategy to apply

    Options: "search" (find any match) or "full" (match entire text)

  • Is Blocked: Whether to block or allow content that matches patterns

    Options: true (block matches) or false (allow matches)

  • Redact Matches: Whether to redact matched content in the output

    Options: true (replace matches with placeholders) or false (preserve matches)

Component Outputs

  • Scanned Text: The processed text, potentially with matches highlighted or redacted

    Example: "Please contact me at [EMAIL_REDACTED] or call [PHONE_REDACTED]"

  • Safety Status: Indication of whether the content matches the specified patterns

    Values: Safe (no matches or matches are allowed), Unsafe (matches are blocked)

  • Risk Score: Numerical evaluation of risk based on pattern matches

    Scale: 0.0 (no matches) to 1.0 (multiple high-risk matches)

Common Pattern Types

PII Detection

  • Email Addresses
  • Phone Numbers
  • Social Security Numbers
  • Credit Card Numbers
  • Passport Numbers

Security Protection

  • SQL Injection Attempts
  • XSS Attack Patterns
  • Command Injection
  • Authentication Bypasses
  • Path Traversal Attacks

Pattern Examples

Pattern TypeRegular Expression
Email Address[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]2
US Phone Number(\+\d2\s)?\(?\d3\)?[\s.-]?\d3[\s.-]?\d4
Credit Card(?:4[0-9]12(?:[0-9]3)?|5[1-5][0-9]14|3[47][0-9]13|3(?:0[0-5]|[68][0-9])[0-9]11|6(?:011|5[0-9]2)[0-9]12)
SQL Injection('|"|;|\b(SELECT|INSERT|UPDATE|DELETE|FROM|WHERE|DROP)\b)

Use Cases

  • Data Privacy: Scan and redact personally identifiable information (PII)
  • Security: Detect potential security threats like SQL injection or XSS attacks
  • Content Filtering: Block specific content patterns or restricted terminology
  • Data Validation: Enforce proper formatting for user inputs
  • Compliance: Assist with regulatory requirements by detecting sensitive data

Implementation Example

const regexScanner = new RegExScanner({ patterns: [ "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", // Email "\d{3}-\d{3}-\d{4}", // US Phone format "\b(SELECT|INSERT|UPDATE|DELETE)\b.*\bFROM\b" // SQL keywords ], matchType: "search", isBlocked: true, redactMatches: true }); const inputText = "Please contact john.doe@example.com "; const result = regexScanner.scan(inputText); // Output: // { // scannedText: "Please contact [EMAIL_REDACTED] or call + // [PHONE_REDACTED] for support", // safetyStatus: "Unsafe", // riskScore: 0.75, // matches: [ // { pattern: "email", matched: "john.doe@example.com", + // position: { start: 15, end: 35 } }, // { pattern: "phone", matched: "555-123-4567", position: + // { start: 45, end: 57 } } // ] // }

Best Practices

  • Start with simple patterns and refine them to reduce false positives
  • Test patterns extensively on representative data samples
  • Keep regular expressions as specific as possible to avoid performance issues
  • Document patterns with clear descriptions of what they detect
  • Combine with other security agents for comprehensive content protection