OCRMYPDF Agent

The OCRMYPDF Agent integrates with the OCRmyPDF library to add an OCR text layer to scanned PDF documents. It converts non-searchable scanned PDFs into searchable documents while preserving the original visual appearance, enabling text extraction, search functionality, and improved accessibility.

OCRMYPDF Agent interface and configuration

Processing Note: OCR processing can be resource-intensive for large documents. For optimal performance, ensure your system has adequate memory and processing capacity, especially when handling multi-page documents with complex layouts.

Component Inputs

Source Type: The type of source for the PDF document
Options: "Local File", "URL", "Google Drive", etc.
Scanned PDF or ZIP File: Upload field for local PDF or ZIP containing PDFs
Accept formats: .pdf, .zip
Google Drive URL: URL link to a PDF stored in Google Drive
Example: "https://drive.google.com/file/d/abc123/view"

Component Outputs

OCR Result: Text extracted from the PDF document
Example: Complete text content from all pages of the PDF
Processed PDF: The PDF document with an added OCR text layer
Searchable PDF with preserved visual formatting
Page Count: Number of pages processed
Integer value representing total pages in the document
Processing Metadata: Information about the OCR process
Includes processing time, OCR engine details, and confidence scores

How It Works

The OCRMYPDF Agent leverages the OCRmyPDF library to process scanned PDF documents. It analyzes each page, identifies text regions, applies optical character recognition, and adds a searchable text layer underneath the original scanned images. This preserves the original appearance while enabling text-based features.

Processing Flow

Document acquisition from the selected source (upload, URL, or Google Drive)
PDF validation and preparation
Page-by-page content analysis
OCR processing with language detection
Text layer generation and embedding
Final PDF assembly with searchable text
Extraction of complete text content

Use Cases

Document Archiving: Make archived documents searchable
Legal Document Processing: Convert scanned legal documents for text search
Research Material: Convert academic papers and research documents
Historical Document Preservation: Digitize and make searchable historical records
Bulk Document Processing: Process multiple scanned documents in batches

Implementation Example

const ocrmypdfAgent = new OCRmyPDFAgent({
  sourceType: "File",
  outputFormat: "pdf"
});

// Process a locally uploaded file
const fileBuffer = await getFileBuffer(uploadedFile);
const result = await ocrmypdfAgent.processDocument(fileBuffer);

// Output:
// {
//   ocrText: "This is the complete extracted text from the document...",
//   processedPdf: <Buffer ...>, // PDF with OCR layer
//   pageCount: 5,
//   metadata: {
//     processingTime: "2.3s",
//     confidence: 0.95,
//     engine: "Tesseract 5.0.0",
//     languages: ["eng"]
//   }
// }

Useful Resources

Best Practices

Use high-quality scans for the best OCR results
Ensure documents are properly oriented before processing
For multi-language documents, specify the languages to improve accuracy
Process documents in batches for more efficient handling
Use image preprocessing options for challenging documents with low contrast

Documentation