LeapOCRLeapOCR DocsAPI, SDKs, and integration guides

Output Formats

Understanding different output formats for extracted data

Output Formats

LeapOCR supports multiple output formats to match your processing needs. Choose the format that best fits your application architecture.

Format Types

Structured Format

Returns structured JSON for each page in the document. Each page carries its own result object based on your schema.

Use case: Extract specific fields from documents where page-level structure matters

Example output:

{
  "pages": [
    {
      "page_number": 1,
      "result": {
        "invoice_number": "INV-2024-001",
        "total_amount": 1234.56,
        "invoice_date": "2024-01-15",
        "vendor_name": "ACME Corp",
        "line_items": [
          {
            "description": "Service Fee",
            "amount": 1000.0
          },
          {
            "description": "Tax",
            "amount": 234.56
          }
        ]
      }
    }
  ]
}

Best for:

  • Invoices
  • Forms with specific fields
  • Page-aware validation
  • Structured downstream processing

Markdown Format

Returns text content from each page in markdown format.

Use case: Convert documents to readable, formatted text

Example output:

# Invoice

**Invoice Number**: INV-2024-001
**Date**: January 15, 2024
**Total**: $1,234.56

## Line Items

- Service Fee: $1,000.00
- Tax: $234.56

Best for:

  • Document archival
  • Text analysis
  • Search indexing
  • Human-readable output

Comparison Table

FormatOutput TypeBest ForData Granularity
structuredJSON object/pageStructured extractionPage-level data
markdownText per pageReadable text conversionPage-level text

Choosing the Right Format

  1. Need specific data extraction? → Use structured
  2. Converting to text? → Use markdown
  3. Need a schema for each page? → Use structured

On this page