Tutorial

Batch Processing 1,000+ Documents with Webhooks

When you need to process hundreds or thousands of freight documents at once, the synchronous API is not practical. This tutorial covers the async batch endpoint and webhook-driven architecture.

Overview

The KabyTech batch API is designed for high-volume ingestion. Instead of waiting for each document to finish before sending the next, you submit an entire batch and receive results asynchronously via webhooks or polling.

This architecture decouples your upload speed from processing time. A batch of 1,000 AWBs can be uploaded in under a minute, while processing completes over the following 5–10 minutes depending on document complexity. Webhooks notify your system as each document finishes, so you can start consuming results immediately.

The batch endpoint supports the same document formats as the single-document API — PDF, TIFF, JPEG, PNG — and automatically enables multi-page detection for PDFs and TIFFs.

Step 1 — Submit a Batch

Create a batch by POSTing a JSON manifest that lists the documents to process. Each entry can reference a file by URL or by a previously uploaded file ID. The response returns a batch_id for tracking.

curl -X POST https://api.kabytech.com/v1/batch \
  -H "Authorization: Bearer $KABY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "webhook_url": "https://your-server.com/hooks/kaby",
    "documents": [
      { "id": "doc_001", "url": "https://s3.example.com/awb-001.pdf" },
      { "id": "doc_002", "url": "https://s3.example.com/awb-002.pdf" }
    ]
  }'

The manifest can contain up to 5,000 document references per batch. For larger volumes, create multiple batches and track them independently. Each document receives its own doc_id so you can correlate results with your internal records.

Step 2 — Register and Handle Webhooks

The webhook_url in your batch request receives a POST for each completed document. The payload includes the document ID, extraction result, and processing metadata. Your endpoint must return a 2xx status within 10 seconds to acknowledge receipt.

# Example webhook payload
{
  "event": "document.complete",
  "batch_id": "b_xyz789",
  "doc_id": "doc_001",
  "status": "success",
  "result": {
    "awb_number": "160-12345675",
    "origin": "BKK",
    "destination": "NRT"
  },
  "processing_ms": 1230
}

If your endpoint is unreachable or returns a non-2xx status, the API retries with exponential backoff: 10 seconds, 30 seconds, 2 minutes, 10 minutes, and a final attempt at 1 hour. After five failures the webhook is marked as failed and you can retrieve the result via the polling endpoint.

Step 3 — Error Handling and Retry Strategy

Not every document will succeed on the first attempt. Common failure reasons include corrupted files, unsupported formats, and extremely low-resolution scans. The webhook payload includes a status field that can be success, partial, or failed, along with an error_code for failed documents.

For partial results, some fields were extracted but confidence is below the threshold. You can accept these with caution or resubmit the document with enhanced preprocessing options. For failed documents, check the error_code and retry if appropriate — transient errors like TIMEOUT are safe to retry.

import requests

def handle_webhook(payload):
    if payload["status"] == "success":
        save_result(payload["doc_id"], payload["result"])
    elif payload["status"] == "partial":
        flag_for_review(payload["doc_id"], payload["result"])
    elif payload["status"] == "failed":
        if payload["error_code"] in ("TIMEOUT", "RATE_LIMITED"):
            retry_document(payload["doc_id"])
        else:
            log_permanent_failure(payload["doc_id"], payload["error_code"])

Summary

Batch processing with webhooks lets you scale from tens to thousands of documents without changing your integration architecture. Submit a manifest, receive results asynchronously, and handle errors with a simple status-based routing pattern.

Rate limits apply per API key: the default tier allows 100 concurrent documents and 10,000 per day. Contact the KabyTech team if you need higher throughput. Combine batch processing with the multi-page scanning tutorial for maximum flexibility.

Need to process documents at scale?

The batch API handles thousands of freight documents with built-in retry logic.