Output Formats
Understanding different output formats for extracted data
Output Formats
LeapOCR supports multiple output formats to match your processing needs. Choose the format that best fits your application architecture.
Format Types
Structured Format
Returns a single JSON object with extracted fields from the entire document.
Use case: Extract specific data points across a complete document
Example output:
{
"invoice_number": "INV-2024-001",
"total_amount": 1234.56,
"invoice_date": "2024-01-15",
"vendor_name": "ACME Corp",
"line_items": [
{
"description": "Service Fee",
"amount": 1000.0
},
{
"description": "Tax",
"amount": 234.56
}
]
}Best for:
- Invoices
- Forms with specific fields
- Single-record documents
- Database insertion
Markdown Format
Returns text content from each page in markdown format.
Use case: Convert documents to readable, formatted text
Example output:
# Invoice
**Invoice Number**: INV-2024-001
**Date**: January 15, 2024
**Total**: $1,234.56
## Line Items
- Service Fee: $1,000.00
- Tax: $234.56Best for:
- Document archival
- Text analysis
- Search indexing
- Human-readable output
Per-Page Structured Format
Returns an array of JSON objects, one per page, with extracted fields.
Use case: Extract data from multi-section documents where each page has different content
Example output:
{
"pages": [
{
"page_number": 1,
"data": {
"patient_name": "John Doe",
"date_of_birth": "1980-05-15"
}
},
{
"page_number": 2,
"data": {
"medications": ["Aspirin", "Lisinopril"]
}
}
]
}Best for:
- Multi-page medical records
- Forms with page-specific sections
- Documents with varying layouts per page
- Page-by-page processing pipelines
Comparison Table
| Format | Output Type | Best For | Data Granularity |
|---|---|---|---|
structured | Single object | Complete document data | Document-level |
markdown | Text per page | Readable text conversion | Page-level text |
per_page_structured | Array of objects | Multi-section documents | Page-level data |
Choosing the Right Format
- Need specific data extraction? → Use
structured - Converting to text? → Use
markdown - Page-specific processing? → Use
per_page_structured