Output Formats
Understanding different output formats for extracted data
Output Formats
LeapOCR supports multiple output formats to match your processing needs. Choose the format that best fits your application architecture.
Format Types
Structured Format
Returns structured JSON for each page in the document. Each page carries its own result object based on your schema.
Use case: Extract specific fields from documents where page-level structure matters
Example output:
{
"pages": [
{
"page_number": 1,
"result": {
"invoice_number": "INV-2024-001",
"total_amount": 1234.56,
"invoice_date": "2024-01-15",
"vendor_name": "ACME Corp",
"line_items": [
{
"description": "Service Fee",
"amount": 1000.0
},
{
"description": "Tax",
"amount": 234.56
}
]
}
}
]
}Best for:
- Invoices
- Forms with specific fields
- Page-aware validation
- Structured downstream processing
Markdown Format
Returns text content from each page in markdown format.
Use case: Convert documents to readable, formatted text
Example output:
# Invoice
**Invoice Number**: INV-2024-001
**Date**: January 15, 2024
**Total**: $1,234.56
## Line Items
- Service Fee: $1,000.00
- Tax: $234.56Best for:
- Document archival
- Text analysis
- Search indexing
- Human-readable output
Comparison Table
| Format | Output Type | Best For | Data Granularity |
|---|---|---|---|
structured | JSON object/page | Structured extraction | Page-level data |
markdown | Text per page | Readable text conversion | Page-level text |
Choosing the Right Format
- Need specific data extraction? → Use
structured - Converting to text? → Use
markdown - Need a schema for each page? → Use
structured