OCRMYPDF Agent
The OCRMYPDF Agent integrates with the OCRmyPDF library to add an OCR text layer to scanned PDF documents. It converts non-searchable scanned PDFs into searchable documents while preserving the original visual appearance, enabling text extraction, search functionality, and improved accessibility.

OCRMYPDF Agent interface and configuration
Processing Note: OCR processing can be resource-intensive for large documents. For optimal performance, ensure your system has adequate memory and processing capacity, especially when handling multi-page documents with complex layouts.
Component Inputs
- Source Type: The type of source for the PDF document
Options: "Local File", "URL", "Google Drive", etc.
- Scanned PDF or ZIP File: Upload field for local PDF or ZIP containing PDFs
Accept formats: .pdf, .zip
- Google Drive URL: URL link to a PDF stored in Google Drive
Example: "https://drive.google.com/file/d/abc123/view"
Component Outputs
- OCR Result: Text extracted from the PDF document
Example: Complete text content from all pages of the PDF
- Processed PDF: The PDF document with an added OCR text layer
Searchable PDF with preserved visual formatting
- Page Count: Number of pages processed
Integer value representing total pages in the document
- Processing Metadata: Information about the OCR process
Includes processing time, OCR engine details, and confidence scores
How It Works
The OCRMYPDF Agent leverages the OCRmyPDF library to process scanned PDF documents. It analyzes each page, identifies text regions, applies optical character recognition, and adds a searchable text layer underneath the original scanned images. This preserves the original appearance while enabling text-based features.
Processing Flow
- Document acquisition from the selected source (upload, URL, or Google Drive)
- PDF validation and preparation
- Page-by-page content analysis
- OCR processing with language detection
- Text layer generation and embedding
- Final PDF assembly with searchable text
- Extraction of complete text content
Use Cases
- Document Archiving: Make archived documents searchable
- Legal Document Processing: Convert scanned legal documents for text search
- Research Material: Convert academic papers and research documents
- Historical Document Preservation: Digitize and make searchable historical records
- Bulk Document Processing: Process multiple scanned documents in batches
Implementation Example
const ocrmypdfAgent = new OCRmyPDFAgent({
sourceType: "File",
outputFormat: "pdf"
});
// Process a locally uploaded file
const fileBuffer = await getFileBuffer(uploadedFile);
const result = await ocrmypdfAgent.processDocument(fileBuffer);
// Output:
// {
// ocrText: "This is the complete extracted text from the document...",
// processedPdf: <Buffer ...>, // PDF with OCR layer
// pageCount: 5,
// metadata: {
// processingTime: "2.3s",
// confidence: 0.95,
// engine: "Tesseract 5.0.0",
// languages: ["eng"]
// }
// }
Best Practices
- Use high-quality scans for the best OCR results
- Ensure documents are properly oriented before processing
- For multi-language documents, specify the languages to improve accuracy
- Process documents in batches for more efficient handling
- Use image preprocessing options for challenging documents with low contrast