Language Scanner Agent

The Language Scanner Agent identifies and validates the languages used in text content. It provides language detection capabilities to enforce language policies, route content to appropriate processing pipelines, and enhance content management workflows.

Language Scanner Agent interface and configuration

Localization Notice: Language detection accuracy may vary for very short text samples or languages with similar linguistic patterns. For critical applications, consider setting a minimum confidence threshold and implementing additional verification steps.

Component Inputs

Input Text: The text content to be analyzed for language identification
Example: "Buenos días, ¿cómo estás hoy?"
Valid Languages: List of language codes that are considered acceptable
Example: "en,es,fr,de,it"
Confidence Threshold: Minimum confidence level required for language detection
Range: 0.0 to 1.0 (default: 0.6)
Match Type: How to handle multiple language detection in text
Options: "primary" (dominant language) or "any" (any detected language)

Component Outputs

Processed Text: The analyzed text with potential language markup
Safety Status: Indicator of whether the content meets language requirements
Values: Safe (valid language), Unsafe (invalid language), Warning (uncertain detection)
Risk Score: Measure of language policy violation risk
Scale: 0.0 (compliant) to 1.0 (non-compliant)
Detected Languages: List of identified languages with confidence scores
Example: [language : es, confidence : 0.94, language : pt, confidence : 0.05]

Supported Languages

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)

Russian (ru)
Dutch (nl)
Chinese (zh)
Japanese (ja)
Korean (ko)
Arabic (ar)

Hindi (hi)
Turkish (tr)
Polish (pl)
Vietnamese (vi)
Thai (th)
Swedish (sv)

Greek (el)
Hebrew (he)
Finnish (fi)
Danish (da)
Norwegian (no)
+ 70 more

How It Works

The Language Scanner Agent employs statistical language models and linguistic pattern analysis to identify the language of text content. It considers character frequency, word structure, n-gram analysis, and other language-specific features to make its determinations.

Detection Process

Text normalization and preprocessing
Feature extraction (character n-grams, word patterns)
Language model scoring against reference profiles
Confidence calculation based on comparative scores
Application of validation rules and policy constraints
Generation of detailed language analysis report

Use Cases

Content Routing: Direct content to language-specific processing pipelines
Policy Enforcement: Ensure content adheres to defined language requirements
Automatic Translation: Identify content requiring translation
Content Categorization: Organize and tag content by language
User Experience: Direct users to appropriate language-specific resources

Implementation Example

const languageScanner = new LanguageScanner({
  validLanguages: ["en", "es", "fr", "de"],
  confidenceThreshold: 0.7,
  matchType: "primary"
});

const inputText = "Bonjour, comment allez-vous aujourd'hui?";
const result = languageScanner.analyze(inputText);

// Output:
// {
//   processedText: "Bonjour, comment allez-vous aujourd'hui?",
//   safetyStatus: "Safe",
//   riskScore: 0.0,
//   detectedLanguages: [
//     { language: "fr", confidence: 0.96 },
//     { language: "ca", confidence: 0.03 },
//     { language: "it", confidence: 0.01 }
//   ],
//   primaryLanguage: "fr",
//   isValidLanguage: true
// }

Useful Resources

Best Practices

Use longer text samples (50+ characters) for more accurate language detection
Set appropriate confidence thresholds based on your application's requirements
Consider implementing fallback mechanisms for low-confidence detections
Be aware that similar languages (e.g., Spanish/Portuguese) may be confused with short text
Combine with content analysis to handle multilingual documents appropriately

Documentation