Language Scanner Agent
The Language Scanner Agent identifies and validates the languages used in text content. It provides language detection capabilities to enforce language policies, route content to appropriate processing pipelines, and enhance content management workflows.

Language Scanner Agent interface and configuration
Localization Notice: Language detection accuracy may vary for very short text samples or languages with similar linguistic patterns. For critical applications, consider setting a minimum confidence threshold and implementing additional verification steps.
Component Inputs
- Input Text: The text content to be analyzed for language identification
Example: "Buenos días, ¿cómo estás hoy?"
- Valid Languages: List of language codes that are considered acceptable
Example: "en,es,fr,de,it"
- Confidence Threshold: Minimum confidence level required for language detection
Range: 0.0 to 1.0 (default: 0.6)
- Match Type: How to handle multiple language detection in text
Options: "primary" (dominant language) or "any" (any detected language)
Component Outputs
- Processed Text: The analyzed text with potential language markup
- Safety Status: Indicator of whether the content meets language requirements
Values: Safe (valid language), Unsafe (invalid language), Warning (uncertain detection)
- Risk Score: Measure of language policy violation risk
Scale: 0.0 (compliant) to 1.0 (non-compliant)
- Detected Languages: List of identified languages with confidence scores
Example: [language : es, confidence : 0.94, language : pt, confidence : 0.05]
Supported Languages
- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Italian (it)
- Portuguese (pt)
- Russian (ru)
- Dutch (nl)
- Chinese (zh)
- Japanese (ja)
- Korean (ko)
- Arabic (ar)
- Hindi (hi)
- Turkish (tr)
- Polish (pl)
- Vietnamese (vi)
- Thai (th)
- Swedish (sv)
- Greek (el)
- Hebrew (he)
- Finnish (fi)
- Danish (da)
- Norwegian (no)
- + 70 more
How It Works
The Language Scanner Agent employs statistical language models and linguistic pattern analysis to identify the language of text content. It considers character frequency, word structure, n-gram analysis, and other language-specific features to make its determinations.
Detection Process
- Text normalization and preprocessing
- Feature extraction (character n-grams, word patterns)
- Language model scoring against reference profiles
- Confidence calculation based on comparative scores
- Application of validation rules and policy constraints
- Generation of detailed language analysis report
Use Cases
- Content Routing: Direct content to language-specific processing pipelines
- Policy Enforcement: Ensure content adheres to defined language requirements
- Automatic Translation: Identify content requiring translation
- Content Categorization: Organize and tag content by language
- User Experience: Direct users to appropriate language-specific resources
Implementation Example
const languageScanner = new LanguageScanner({
validLanguages: ["en", "es", "fr", "de"],
confidenceThreshold: 0.7,
matchType: "primary"
});
const inputText = "Bonjour, comment allez-vous aujourd'hui?";
const result = languageScanner.analyze(inputText);
// Output:
// {
// processedText: "Bonjour, comment allez-vous aujourd'hui?",
// safetyStatus: "Safe",
// riskScore: 0.0,
// detectedLanguages: [
// { language: "fr", confidence: 0.96 },
// { language: "ca", confidence: 0.03 },
// { language: "it", confidence: 0.01 }
// ],
// primaryLanguage: "fr",
// isValidLanguage: true
// }
Best Practices
- Use longer text samples (50+ characters) for more accurate language detection
- Set appropriate confidence thresholds based on your application's requirements
- Consider implementing fallback mechanisms for low-confidence detections
- Be aware that similar languages (e.g., Spanish/Portuguese) may be confused with short text
- Combine with content analysis to handle multilingual documents appropriately