LeapOCRLeapOCR Docs

Output Formats

Understanding different output formats for extracted data

Output Formats

LeapOCR supports multiple output formats to match your processing needs. Choose the format that best fits your application architecture.

Format Types

Structured Format

Returns a single JSON object with extracted fields from the entire document.

Use case: Extract specific data points across a complete document

Example output:

{
  "invoice_number": "INV-2024-001",
  "total_amount": 1234.56,
  "invoice_date": "2024-01-15",
  "vendor_name": "ACME Corp",
  "line_items": [
    {
      "description": "Service Fee",
      "amount": 1000.0
    },
    {
      "description": "Tax",
      "amount": 234.56
    }
  ]
}

Best for:

  • Invoices
  • Forms with specific fields
  • Single-record documents
  • Database insertion

Markdown Format

Returns text content from each page in markdown format.

Use case: Convert documents to readable, formatted text

Example output:

# Invoice

**Invoice Number**: INV-2024-001
**Date**: January 15, 2024
**Total**: $1,234.56

## Line Items

- Service Fee: $1,000.00
- Tax: $234.56

Best for:

  • Document archival
  • Text analysis
  • Search indexing
  • Human-readable output

Per-Page Structured Format

Returns an array of JSON objects, one per page, with extracted fields.

Use case: Extract data from multi-section documents where each page has different content

Example output:

{
  "pages": [
    {
      "page_number": 1,
      "data": {
        "patient_name": "John Doe",
        "date_of_birth": "1980-05-15"
      }
    },
    {
      "page_number": 2,
      "data": {
        "medications": ["Aspirin", "Lisinopril"]
      }
    }
  ]
}

Best for:

  • Multi-page medical records
  • Forms with page-specific sections
  • Documents with varying layouts per page
  • Page-by-page processing pipelines

Comparison Table

FormatOutput TypeBest ForData Granularity
structuredSingle objectComplete document dataDocument-level
markdownText per pageReadable text conversionPage-level text
per_page_structuredArray of objectsMulti-section documentsPage-level data

Choosing the Right Format

  1. Need specific data extraction? → Use structured
  2. Converting to text? → Use markdown
  3. Page-specific processing? → Use per_page_structured