RegEx Scanner Agent

The RegEx Scanner Agent provides pattern-based text analysis using regular expressions. It offers precise detection of specific text patterns, sensitive information, or structured data formats, enabling targeted content filtering, data validation, and security compliance.

RegEx Scanner Agent interface and configuration

Performance Note: Complex regular expressions can lead to performance issues with very large inputs. Consider input size limits and optimizing patterns for efficient processing.

Component Inputs

Input Text: The text content to be scanned with regular expressions
Example: "Please contact me at user@example.com or call 555-123-4567"
RegEx Patterns: One or more regular expression patterns to match against the input
Example: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]" (email pattern)
Match Type: Matching strategy to apply
Options: "search" (find any match) or "full" (match entire text)
Is Blocked: Whether to block or allow content that matches patterns
Options: true (block matches) or false (allow matches)
Redact Matches: Whether to redact matched content in the output
Options: true (replace matches with placeholders) or false (preserve matches)

Component Outputs

Scanned Text: The processed text, potentially with matches highlighted or redacted
Example: "Please contact me at [EMAIL_REDACTED] or call [PHONE_REDACTED]"
Safety Status: Indication of whether the content matches the specified patterns
Values: Safe (no matches or matches are allowed), Unsafe (matches are blocked)
Risk Score: Numerical evaluation of risk based on pattern matches
Scale: 0.0 (no matches) to 1.0 (multiple high-risk matches)

Common Pattern Types

PII Detection

Email Addresses
Phone Numbers
Social Security Numbers
Credit Card Numbers
Passport Numbers

Security Protection

SQL Injection Attempts
XSS Attack Patterns
Command Injection
Authentication Bypasses
Path Traversal Attacks

Pattern Examples

Pattern Type	Regular Expression
Email Address	`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]2`
US Phone Number	`(\+\d2\s)?\(?\d3\)?[\s.-]?\d3[\s.-]?\d4`
Credit Card	`(?:4[0-9]12(?:[0-9]3)?\|5[1-5][0-9]14\|3[47][0-9]13\|3(?:0[0-5]\|[68][0-9])[0-9]11\|6(?:011\|5[0-9]2)[0-9]12)`
SQL Injection	`('\|"\|;\|\b(SELECT\|INSERT\|UPDATE\|DELETE\|FROM\|WHERE\|DROP)\b)`

Use Cases

Data Privacy: Scan and redact personally identifiable information (PII)
Security: Detect potential security threats like SQL injection or XSS attacks
Content Filtering: Block specific content patterns or restricted terminology
Data Validation: Enforce proper formatting for user inputs
Compliance: Assist with regulatory requirements by detecting sensitive data

Implementation Example

const regexScanner = new RegExScanner({
  patterns: [
    "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",  // Email
    "\d{3}-\d{3}-\d{4}",  // US Phone format
    "\b(SELECT|INSERT|UPDATE|DELETE)\b.*\bFROM\b"  // SQL keywords
  ],
  matchType: "search",
  isBlocked: true,
  redactMatches: true
});

const inputText = "Please contact john.doe@example.com ";
const result = regexScanner.scan(inputText);

// Output:
// {
//   scannedText: "Please contact [EMAIL_REDACTED] or call +
//  [PHONE_REDACTED] for support",
//   safetyStatus: "Unsafe",
//   riskScore: 0.75,
//   matches: [
//     { pattern: "email", matched: "john.doe@example.com", +
//  position: { start: 15, end: 35 } },
//     { pattern: "phone", matched: "555-123-4567", position: +
//  { start: 45, end: 57 } }
//   ]
// }

Useful Resources

Best Practices

Start with simple patterns and refine them to reduce false positives
Test patterns extensively on representative data samples
Keep regular expressions as specific as possible to avoid performance issues
Document patterns with clear descriptions of what they detect
Combine with other security agents for comprehensive content protection

Documentation