Custom Schemas
Define extraction schemas for structured data extraction
Custom Schemas
Custom schemas allow you to define exactly what data to extract from your documents. LeapOCR uses JSON Schema format to understand your extraction requirements.
Basic Schema Structure
Schemas define the structure and types of data you want to extract.
Simple Schema
{
"invoice_number": "string",
"total_amount": "number",
"invoice_date": "string"
}Full JSON Schema
{
"type": "object",
"properties": {
"invoice_number": {
"type": "string",
"description": "The unique invoice identifier"
},
"total_amount": {
"type": "number",
"description": "Total invoice amount in dollars"
},
"invoice_date": {
"type": "string",
"description": "Invoice date in ISO format"
}
},
"required": ["invoice_number", "total_amount"]
}Nested Objects
Extract complex, nested data structures.
{
"type": "object",
"properties": {
"customer": {
"type": "object",
"properties": {
"name": { "type": "string" },
"email": { "type": "string" },
"address": {
"type": "object",
"properties": {
"street": { "type": "string" },
"city": { "type": "string" },
"zip": { "type": "string" }
}
}
}
}
}
}Arrays
Extract lists and repeating data.
{
"type": "object",
"properties": {
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "number" },
"price": { "type": "number" },
"total": { "type": "number" }
}
}
}
}
}Supported Types
| Type | Description | Example |
|---|---|---|
string | Text data | "John Doe" |
number | Numeric values | 1234.56 |
integer | Whole numbers | 42 |
boolean | True/false values | true |
array | Lists of items | ["item1", "item2"] |
object | Nested structures | {"key": "value"} |
null | Empty/missing values | null |
Best Practices
1. Be Specific with Descriptions
{
"properties": {
"total_amount": {
"type": "number",
"description": "Total invoice amount in USD, excluding tax"
}
}
}2. Use Required Fields
{
"required": ["invoice_number", "date", "total"],
"properties": {
"invoice_number": { "type": "string" },
"date": { "type": "string" },
"total": { "type": "number" }
}
}3. Provide Examples in Descriptions
{
"properties": {
"date": {
"type": "string",
"description": "Invoice date in YYYY-MM-DD format, e.g., 2024-01-15"
}
}
}4. Keep Schemas Focused
Don't try to extract everything at once. Focus on the most important fields for your use case.
Real-World Examples
Medical Record
{
"type": "object",
"properties": {
"patient_name": { "type": "string" },
"date_of_birth": { "type": "string" },
"visit_date": { "type": "string" },
"diagnosis": { "type": "string" },
"medications": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"dosage": { "type": "string" },
"frequency": { "type": "string" }
}
}
}
}
}Receipt
{
"type": "object",
"properties": {
"merchant_name": { "type": "string" },
"date": { "type": "string" },
"total": { "type": "number" },
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" }
}
}
}
}
}Contract
{
"type": "object",
"properties": {
"contract_number": { "type": "string" },
"effective_date": { "type": "string" },
"expiration_date": { "type": "string" },
"parties": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"role": { "type": "string" }
}
}
},
"terms": { "type": "string" }
}
}Schema vs Instructions vs Templates
You can specify extraction requirements in three ways:
- Schema: Structured data extraction with defined types
- Instructions: Natural language description (e.g., "Extract the invoice total and date")
- Template: Pre-defined document type with standard fields
Note: Only one can be used per request. Choose based on your needs:
- Use schema for complex, structured extraction
- Use instructions for simple, ad-hoc extraction