Tutorial

How to Parse Multi-Page Scanned AWBs

Many air waybills arrive as multi-page scans — the original, carrier copy, and shipper copy stapled together. This tutorial shows how KabyTech handles them automatically.

Overview

Scanned air waybills frequently contain two to four pages in a single PDF or TIFF file. Each page may be rotated, skewed, or scanned at a different resolution. KabyTech's multi-page pipeline detects individual pages, normalizes orientation, and merges extracted fields into a single structured result.

This tutorial walks through the three main steps: uploading the document, understanding auto-detection, and consuming the merged output. By the end you will be able to send a multi-page scan and receive a unified JSON response with all AWB fields.

Multi-page support works with PDF, TIFF, and multi-image uploads. Single-page formats like JPEG and PNG are also accepted — the API treats them as one-page documents automatically.

Step 1 — Upload the Document

The API accepts three upload methods: multipart form data, base64-encoded payloads, and public URLs. For multi-page documents, multipart is usually the simplest because you can stream the file without base64 overhead.

Below is a curl example that uploads a multi-page PDF. The response includes a job_id you can poll or receive via webhook.

curl -X POST https://api.kabytech.com/v1/parse \
  -H "Authorization: Bearer $KABY_API_KEY" \
  -F "file=@awb-scan-3pages.pdf" \
  -F "mode=multipage"

In Python, the equivalent call uses the requests library with a files dict. The mode=multipage parameter tells the API to expect and handle multiple pages rather than treating only the first page.

import requests

resp = requests.post(
    "https://api.kabytech.com/v1/parse",
    headers={"Authorization": f"Bearer {API_KEY}"},
    files={"file": open("awb-scan-3pages.pdf", "rb")},
    data={"mode": "multipage"},
)
print(resp.json())

Step 2 — Auto-Detection and Normalization

Once the file is received, the pipeline splits it into individual page images. For PDFs this uses rasterization at 300 DPI; for TIFFs the embedded frames are extracted directly. Each page image is then analyzed independently.

Auto-detection performs three operations on every page: page-count verification (confirming the number of logical documents), orientation correction (0°, 90°, 180°, 270° rotation), and deskew (straightening pages scanned at a slight angle). These steps ensure the OCR engine receives clean, upright images regardless of how the original was scanned.

You can inspect the per-page detection metadata in the response under the pages array. Each entry includes rotation_applied, skew_angle, and confidence fields so you can audit the preprocessing.

Step 3 — Merging Results Across Pages

After OCR runs on each page, the merge engine combines fields into a single AWB record. It uses field-level confidence scores to resolve conflicts — for example, if the AWB number appears on pages 1 and 3 with different confidence, the higher-confidence value wins.

The merged result is returned in the top-level result object, while per-page raw extractions remain available in pages[].fields for debugging. This two-level structure lets you trust the merged output while retaining full traceability.

{
  "job_id": "j_abc123",
  "status": "complete",
  "page_count": 3,
  "result": {
    "awb_number": "160-12345675",
    "origin": "BKK",
    "destination": "NRT",
    "pieces": 12,
    "weight": { "value": 840.5, "unit": "K" }
  },
  "pages": [
    { "page": 1, "rotation_applied": 0, "skew_angle": 1.2, "confidence": 0.97 },
    { "page": 2, "rotation_applied": 180, "skew_angle": 0.4, "confidence": 0.93 },
    { "page": 3, "rotation_applied": 0, "skew_angle": 0.0, "confidence": 0.95 }
  ]
}

Summary

Multi-page AWB scanning follows a straightforward three-step flow: upload the document (multipart, base64, or URL), let the API auto-detect pages and normalize orientation, then consume the merged result. The merge engine resolves conflicts using per-field confidence so you always get the best extraction.

For high-volume scenarios, combine multi-page support with the batch endpoint covered in the next tutorial. If you need to inspect per-page results for quality assurance, the pages array gives you full transparency into each step of the pipeline.

Ready to parse your first multi-page AWB?

Start with 50 free API calls — no credit card required.