LeapOCRLeapOCR Docs

Custom Schemas

Define extraction schemas for structured data extraction

Custom Schemas

Custom schemas allow you to define exactly what data to extract from your documents. LeapOCR uses JSON Schema format to understand your extraction requirements.

Basic Schema Structure

Schemas define the structure and types of data you want to extract.

Simple Schema

{
  "invoice_number": "string",
  "total_amount": "number",
  "invoice_date": "string"
}

Full JSON Schema

{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "The unique invoice identifier"
    },
    "total_amount": {
      "type": "number",
      "description": "Total invoice amount in dollars"
    },
    "invoice_date": {
      "type": "string",
      "description": "Invoice date in ISO format"
    }
  },
  "required": ["invoice_number", "total_amount"]
}

Nested Objects

Extract complex, nested data structures.

{
  "type": "object",
  "properties": {
    "customer": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "email": { "type": "string" },
        "address": {
          "type": "object",
          "properties": {
            "street": { "type": "string" },
            "city": { "type": "string" },
            "zip": { "type": "string" }
          }
        }
      }
    }
  }
}

Arrays

Extract lists and repeating data.

{
  "type": "object",
  "properties": {
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "price": { "type": "number" },
          "total": { "type": "number" }
        }
      }
    }
  }
}

Supported Types

TypeDescriptionExample
stringText data"John Doe"
numberNumeric values1234.56
integerWhole numbers42
booleanTrue/false valuestrue
arrayLists of items["item1", "item2"]
objectNested structures{"key": "value"}
nullEmpty/missing valuesnull

Best Practices

1. Be Specific with Descriptions

{
  "properties": {
    "total_amount": {
      "type": "number",
      "description": "Total invoice amount in USD, excluding tax"
    }
  }
}

2. Use Required Fields

{
  "required": ["invoice_number", "date", "total"],
  "properties": {
    "invoice_number": { "type": "string" },
    "date": { "type": "string" },
    "total": { "type": "number" }
  }
}

3. Provide Examples in Descriptions

{
  "properties": {
    "date": {
      "type": "string",
      "description": "Invoice date in YYYY-MM-DD format, e.g., 2024-01-15"
    }
  }
}

4. Keep Schemas Focused

Don't try to extract everything at once. Focus on the most important fields for your use case.

Real-World Examples

Medical Record

{
  "type": "object",
  "properties": {
    "patient_name": { "type": "string" },
    "date_of_birth": { "type": "string" },
    "visit_date": { "type": "string" },
    "diagnosis": { "type": "string" },
    "medications": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "dosage": { "type": "string" },
          "frequency": { "type": "string" }
        }
      }
    }
  }
}

Receipt

{
  "type": "object",
  "properties": {
    "merchant_name": { "type": "string" },
    "date": { "type": "string" },
    "total": { "type": "number" },
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "price": { "type": "number" }
        }
      }
    }
  }
}

Contract

{
  "type": "object",
  "properties": {
    "contract_number": { "type": "string" },
    "effective_date": { "type": "string" },
    "expiration_date": { "type": "string" },
    "parties": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "role": { "type": "string" }
        }
      }
    },
    "terms": { "type": "string" }
  }
}

Schema vs Instructions vs Templates

You can specify extraction requirements in three ways:

  • Schema: Structured data extraction with defined types
  • Instructions: Natural language description (e.g., "Extract the invoice total and date")
  • Template: Pre-defined document type with standard fields

Note: Only one can be used per request. Choose based on your needs:

  • Use schema for complex, structured extraction
  • Use instructions for simple, ad-hoc extraction