Official JavaScript/TypeScript SDK for LeapOCR - Full type safety for Node.js and browsers

JavaScript/TypeScript SDK

Official JavaScript/TypeScript SDK for LeapOCR - Transform documents into structured data using AI-powered OCR.

Installation

npm install leapocr
# or
yarn add leapocr
# or
pnpm add leapocr

Prerequisites

Node.js 18 or higher
LeapOCR API key (sign up here)

Quick Start

import { LeapOCR } from "leapocr";

// Initialize the SDK
const client = new LeapOCR({
  apiKey: process.env.LEAPOCR_API_KEY,
});

// Process a document
const job = await client.ocr.processURL("https://example.com/document.pdf", {
  format: "structured",
});

// Wait for completion
const result = await client.ocr.waitUntilDone(job.jobId);

// Access extracted data from pages
// For structured format, result is an object
// For markdown format, result is a string
console.log("Pages:", result.pages);

// Delete the job
await client.ocr.deleteJob(job.jobId);

New to LeapOCR? Check out the Getting Started guide for a complete walkthrough.

Key Features

TypeScript First - Full type safety with comprehensive TypeScript definitions
Built-in Retry Logic - Automatic handling of transient failures
Progress Tracking - Real-time progress callbacks

For models, formats, and schemas, see Core Concepts.

Usage Examples

Processing from URL

const client = new LeapOCR({
  apiKey: process.env.LEAPOCR_API_KEY,
});

const job = await client.ocr.processURL("https://example.com/invoice.pdf", {
  format: "structured",
  model: "standard-v1",
});

const result = await client.ocr.waitUntilDone(job.jobId);

console.log(`Credits used: ${result.credits_used}`);
console.log(`Pages processed: ${result.pages.length}`);
// For structured format, result is an object
// For markdown format, result is a string
console.log("First page result:", result.pages[0].result);

Processing Local Files (Node.js)

import { readFileSync } from "fs";

const fileBuffer = readFileSync("invoice.pdf");

const job = await client.ocr.processFile(fileBuffer, "invoice.pdf", {
  format: "structured",
  model: "pro-v1",
  schema: {
    invoice_number: "string",
    total_amount: "number",
    invoice_date: "string",
    vendor_name: "string",
  },
});

const result = await client.ocr.waitUntilDone(job.jobId);

Custom Schema Extraction

See the Custom Schemas guide for detailed schema design patterns.

const schema = {
  type: "object",
  properties: {
    patient_name: { type: "string" },
    date_of_birth: { type: "string" },
    medications: {
      type: "array",
      items: {
        type: "object",
        properties: {
          name: { type: "string" },
          dosage: { type: "string" },
        },
      },
    },
  },
};

const job = await client.ocr.processURL(
  "https://example.com/medical-record.pdf",
  {
    format: "structured",
    schema: schema,
  },
);

const result = await client.ocr.waitUntilDone(job.jobId);

Using Instructions

Provide natural language instructions to guide the extraction process. Instructions are useful when you need specific formatting, handling of edge cases, or contextual information:

const job = await client.ocr.processURL("https://example.com/invoice.pdf", {
  format: "structured",
  instructions:
    "Extract invoice number, date, and total amount. If the invoice number is missing, use 'N/A'. Format dates as YYYY-MM-DD.",
});

const result = await client.ocr.waitUntilDone(job.jobId);

You can combine instructions with schemas for more precise control:

const job = await client.ocr.processURL("https://example.com/receipt.pdf", {
  format: "structured",
  schema: {
    total: "number",
    items: {
      type: "array",
      items: {
        type: "object",
        properties: {
          name: { type: "string" },
          price: { type: "number" },
        },
      },
    },
  },
  instructions:
    "Only extract items that have a price greater than $0. Ignore any promotional text or disclaimers.",
});

const result = await client.ocr.waitUntilDone(job.jobId);

Monitoring Job Progress

// Poll for status updates
const job = await client.ocr.processURL("https://example.com/document.pdf");

const checkStatus = setInterval(async () => {
  const status = await client.ocr.getJobStatus(job.jobId);

  console.log(`Status: ${status.status} (${status.progress * 100}% complete)`);

  if (status.status === "completed") {
    const result = await client.ocr.getJobResult(job.jobId);
    console.log(`Processing complete! Pages: ${result.pages.length}`);
    clearInterval(checkStatus);
  }
}, 2000);

Using Progress Callbacks

const result = await client.ocr.waitUntilDone(job.jobId, {
  onProgress: (status) => {
    console.log(`Progress: ${status.progress * 100}%`);
  },
  pollInterval: 1000, // Check every second
});

Using Templates

Use pre-configured templates for common document types. Templates include predefined schemas, instructions, and model settings:

// Use a template by its slug
const job = await client.ocr.processFile(fileBuffer, "invoice.pdf", {
  templateSlug: "invoice-extraction", // Reference existing template
});

const result = await client.ocr.waitUntilDone(job.jobId);

// Access extracted data from pages
result.pages.forEach((page) => {
  console.log(`Page ${page.page_number} result:`, page.result);
});

// Delete after processing
await client.ocr.deleteJob(job.jobId);

Common template use cases:

Invoice extraction
Receipt processing
Medical records
Identity documents
Custom organizational templates

Deleting Jobs

All jobs are automatically deleted after 7 days. Delete jobs immediately after processing to remove sensitive data:

// Process and delete
const job = await client.ocr.processURL("https://example.com/document.pdf", {
  format: "structured",
});

const result = await client.ocr.waitUntilDone(job.jobId);

// Access extracted data from pages
result.pages.forEach((page) => {
  console.log(`Page ${page.page_number} result:`, page.result);
});

// Delete immediately
await client.ocr.deleteJob(job.jobId);
console.log("Job deleted successfully");

Configuration

Custom Configuration

const client = new LeapOCR({
  apiKey: "your-api-key",
  baseUrl: "https://api.leapocr.com",
  timeout: 60000, // 60 seconds
  retryConfig: {
    maxRetries: 3,
    retryDelay: 1000,
  },
});

Environment Variables

export LEAPOCR_API_KEY="your-api-key"

Error Handling

The SDK provides typed errors for robust error handling:

try {
  const result = await client.ocr.waitUntilDone(job.jobId);
  console.log("Success:", result);
} catch (error) {
  if (error instanceof LeapOCRError) {
    switch (error.code) {
      case "AUTH_ERROR":
        console.error("Authentication failed - check your API key");
        break;
      case "VALIDATION_ERROR":
        console.error("Validation error:", error.message);
        break;
      case "NETWORK_ERROR":
        console.error("Network error - retrying...");
        break;
      case "PROCESSING_ERROR":
        console.error("Processing failed:", error.message);
        break;
    }
  }
}

Error Types

AUTH_ERROR - Authentication failures
VALIDATION_ERROR - Input validation errors
NETWORK_ERROR - Network/connectivity issues (retryable)
PROCESSING_ERROR - Document processing errors
TIMEOUT_ERROR - Operation timeouts

See the Troubleshooting Guide for common issues and solutions.

API Reference

Client Initialization

new LeapOCR(config: LeapOCRConfig)

Processing Methods

// Process from URL
client.ocr.processURL(url: string, options?: ProcessOptions): Promise<Job>

// Process file
client.ocr.processFile(file: Buffer | File, filename: string, options?: ProcessOptions): Promise<Job>

// Get job status
client.ocr.getJobStatus(jobId: string): Promise<JobStatus>

// Get job result
client.ocr.getJobResult(jobId: string): Promise<OCRResult>

// Wait for completion
client.ocr.waitUntilDone(jobId: string, options?: WaitOptions): Promise<OCRResult>

// Delete job
client.ocr.deleteJob(jobId: string): Promise<void>

Processing Options

interface ProcessOptions {
  format?: "structured" | "markdown" | "per_page_structured";
  model?: string; // "standard-v1", "english-pro-v1", "pro-v1"
  schema?: Record<string, any>; // JSON schema for extraction
  instructions?: string; // Processing instructions
  templateSlug?: string; // Document template slug
}

Wait Options

interface WaitOptions {
  pollInterval?: number; // Milliseconds between status checks
  timeout?: number; // Maximum wait time
  onProgress?: (status: JobStatus) => void; // Progress callback
}

TypeScript Support

The SDK is written in TypeScript and provides full type definitions:

import type {
  LeapOCR,
  LeapOCRConfig,
  ProcessOptions,
  Job,
  JobStatus,
  OCRResult,
  LeapOCRError,
} from "leapocr";

Next Steps

Explore Common Use Cases for real-world examples
Learn about Processing Models
Review Custom Schemas for structured extraction
Check the FAQ for common questions

Resources

npm: npmjs.com/package/leapocr
GitHub: github.com/leapocr/leapocr-js
Issues: GitHub Issues

JavaScript/TypeScript SDK

On this page