RegEx Scanner Agent
The RegEx Scanner Agent provides pattern-based text analysis using regular expressions. It offers precise detection of specific text patterns, sensitive information, or structured data formats, enabling targeted content filtering, data validation, and security compliance.

RegEx Scanner Agent interface and configuration
Performance Note: Complex regular expressions can lead to performance issues with very large inputs. Consider input size limits and optimizing patterns for efficient processing.
Component Inputs
- Input Text: The text content to be scanned with regular expressions
Example: "Please contact me at user@example.com or call 555-123-4567"
- RegEx Patterns: One or more regular expression patterns to match against the input
Example: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]" (email pattern)
- Match Type: Matching strategy to apply
Options: "search" (find any match) or "full" (match entire text)
- Is Blocked: Whether to block or allow content that matches patterns
Options: true (block matches) or false (allow matches)
- Redact Matches: Whether to redact matched content in the output
Options: true (replace matches with placeholders) or false (preserve matches)
Component Outputs
- Scanned Text: The processed text, potentially with matches highlighted or redacted
Example: "Please contact me at [EMAIL_REDACTED] or call [PHONE_REDACTED]"
- Safety Status: Indication of whether the content matches the specified patterns
Values: Safe (no matches or matches are allowed), Unsafe (matches are blocked)
- Risk Score: Numerical evaluation of risk based on pattern matches
Scale: 0.0 (no matches) to 1.0 (multiple high-risk matches)
Common Pattern Types
PII Detection
- Email Addresses
- Phone Numbers
- Social Security Numbers
- Credit Card Numbers
- Passport Numbers
Security Protection
- SQL Injection Attempts
- XSS Attack Patterns
- Command Injection
- Authentication Bypasses
- Path Traversal Attacks
Pattern Examples
Pattern Type | Regular Expression |
---|---|
Email Address | [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]2 |
US Phone Number | (\+\d2\s)?\(?\d3\)?[\s.-]?\d3[\s.-]?\d4 |
Credit Card | (?:4[0-9]12(?:[0-9]3)?|5[1-5][0-9]14|3[47][0-9]13|3(?:0[0-5]|[68][0-9])[0-9]11|6(?:011|5[0-9]2)[0-9]12) |
SQL Injection | ('|"|;|\b(SELECT|INSERT|UPDATE|DELETE|FROM|WHERE|DROP)\b) |
Use Cases
- Data Privacy: Scan and redact personally identifiable information (PII)
- Security: Detect potential security threats like SQL injection or XSS attacks
- Content Filtering: Block specific content patterns or restricted terminology
- Data Validation: Enforce proper formatting for user inputs
- Compliance: Assist with regulatory requirements by detecting sensitive data
Implementation Example
const regexScanner = new RegExScanner({
patterns: [
"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", // Email
"\d{3}-\d{3}-\d{4}", // US Phone format
"\b(SELECT|INSERT|UPDATE|DELETE)\b.*\bFROM\b" // SQL keywords
],
matchType: "search",
isBlocked: true,
redactMatches: true
});
const inputText = "Please contact john.doe@example.com ";
const result = regexScanner.scan(inputText);
// Output:
// {
// scannedText: "Please contact [EMAIL_REDACTED] or call +
// [PHONE_REDACTED] for support",
// safetyStatus: "Unsafe",
// riskScore: 0.75,
// matches: [
// { pattern: "email", matched: "john.doe@example.com", +
// position: { start: 15, end: 35 } },
// { pattern: "phone", matched: "555-123-4567", position: +
// { start: 45, end: 57 } }
// ]
// }
Best Practices
- Start with simple patterns and refine them to reduce false positives
- Test patterns extensively on representative data samples
- Keep regular expressions as specific as possible to avoid performance issues
- Document patterns with clear descriptions of what they detect
- Combine with other security agents for comprehensive content protection