Documentation

OCRMYPDF Agent

The OCRMYPDF Agent integrates with the OCRmyPDF library to add an OCR text layer to scanned PDF documents. It converts non-searchable scanned PDFs into searchable documents while preserving the original visual appearance, enabling text extraction, search functionality, and improved accessibility.

OCRMYPDF Agent Component

OCRMYPDF Agent interface and configuration

Processing Note: OCR processing can be resource-intensive for large documents. For optimal performance, ensure your system has adequate memory and processing capacity, especially when handling multi-page documents with complex layouts.

Component Inputs

  • Source Type: The type of source for the PDF document

    Options: "Local File", "URL", "Google Drive", etc.

  • Scanned PDF or ZIP File: Upload field for local PDF or ZIP containing PDFs

    Accept formats: .pdf, .zip

  • Google Drive URL: URL link to a PDF stored in Google Drive

    Example: "https://drive.google.com/file/d/abc123/view"

Component Outputs

  • OCR Result: Text extracted from the PDF document

    Example: Complete text content from all pages of the PDF

  • Processed PDF: The PDF document with an added OCR text layer

    Searchable PDF with preserved visual formatting

  • Page Count: Number of pages processed

    Integer value representing total pages in the document

  • Processing Metadata: Information about the OCR process

    Includes processing time, OCR engine details, and confidence scores

How It Works

The OCRMYPDF Agent leverages the OCRmyPDF library to process scanned PDF documents. It analyzes each page, identifies text regions, applies optical character recognition, and adds a searchable text layer underneath the original scanned images. This preserves the original appearance while enabling text-based features.

Processing Flow

  1. Document acquisition from the selected source (upload, URL, or Google Drive)
  2. PDF validation and preparation
  3. Page-by-page content analysis
  4. OCR processing with language detection
  5. Text layer generation and embedding
  6. Final PDF assembly with searchable text
  7. Extraction of complete text content

Use Cases

  • Document Archiving: Make archived documents searchable
  • Legal Document Processing: Convert scanned legal documents for text search
  • Research Material: Convert academic papers and research documents
  • Historical Document Preservation: Digitize and make searchable historical records
  • Bulk Document Processing: Process multiple scanned documents in batches

Implementation Example

const ocrmypdfAgent = new OCRmyPDFAgent({ sourceType: "File", outputFormat: "pdf" }); // Process a locally uploaded file const fileBuffer = await getFileBuffer(uploadedFile); const result = await ocrmypdfAgent.processDocument(fileBuffer); // Output: // { // ocrText: "This is the complete extracted text from the document...", // processedPdf: <Buffer ...>, // PDF with OCR layer // pageCount: 5, // metadata: { // processingTime: "2.3s", // confidence: 0.95, // engine: "Tesseract 5.0.0", // languages: ["eng"] // } // }

Best Practices

  • Use high-quality scans for the best OCR results
  • Ensure documents are properly oriented before processing
  • For multi-language documents, specify the languages to improve accuracy
  • Process documents in batches for more efficient handling
  • Use image preprocessing options for challenging documents with low contrast