HomeBlogThai Language Support
Product Announcement

Announcing Thai language AWB support — reading mixed Thai-English documents natively

KabyTech can now extract structured data from AWBs that contain Thai script, mixed Thai-English text, and transliterated Thai names — with the same accuracy as pure English documents.

Air Waybills are, by IATA convention, English-language documents. The standard Cargo-IMP message format uses ASCII text, and the printed AWB form has English field labels. But in practice, Thai air freight documents frequently contain Thai script. Shipper names are written in Thai. Consignee addresses mix Thai and English. Nature-of-goods descriptions include Thai product names. And supplementary documents attached to AWBs — packing lists, invoices, phytosanitary certificates — are often entirely in Thai.

Until now, AWB parsing tools have treated Thai text as noise — either skipping it, replacing it with garbled characters, or failing to process the document entirely. Today we are announcing native Thai language support in KabyTech's AWB Intelligence API. Thai characters are now recognized, extracted, and returned as properly encoded UTF-8 text in the JSON response, with the same field-level accuracy we achieve on English-only documents.

Why Thai text appears on AWBs

Despite the IATA convention, Thai text appears on air cargo documents for several practical reasons:

  • Domestic shipper/consignee names. Thai companies often have their legal name in Thai script. While the AWB might use an English transliteration, many Thai-origin AWBs include both the Thai and English names. Some AWBs issued by domestic carriers use only the Thai name.
  • Thai addresses. Bangkok addresses in particular are notoriously complex in English transliteration. Soi (lane) names, moo (village) numbers, and tambon (sub-district) designations often appear in Thai script even on otherwise English documents because the transliteration would be ambiguous.
  • Nature of goods descriptions. Agricultural and food products frequently have Thai names that do not translate cleanly. A shipper might write "FRESH DURIAN MONTHONG" in the rate description but include the Thai name on attached documentation or in supplementary fields.
  • Handwritten annotations. Warehouse staff, customs brokers, and carrier agents add handwritten notes to AWB copies in Thai. When these annotated copies are scanned, the Thai text becomes part of the document image.
  • Domestic carrier AWBs. Thai domestic air cargo (e.g., Bangkok Airways cargo, Nok Air cargo) sometimes uses bilingual AWB forms with Thai field labels and Thai-language instructions.

The technical challenge of mixed-language AWBs

Parsing a document that contains both Thai and English text is significantly harder than parsing either language alone. Here is why:

Script detection at the character level

Thai and English use completely different writing systems. English uses the Latin alphabet with clear word boundaries (spaces). Thai uses the Thai script, which is an abugida — consonants carry inherent vowels that are modified by diacritical marks placed above, below, before, or after the base consonant. Crucially, Thai does not use spaces between words. Spaces in Thai text indicate clause or sentence boundaries, not word boundaries.

When Thai and English text appear on the same line — for example, a shipper name like "บริษัท Thai Silk Export จำกัด" — the parser must switch between two entirely different recognition models at the character level. A model trained only on English will attempt to interpret Thai characters as distorted Latin characters, producing nonsensical output.

Thai diacritical marks and OCR

Thai script includes diacritical marks (tone marks, vowel marks) that are positioned above or below the base consonant. In scanned documents, especially at lower resolutions, these marks can be confused with document noise (dust, print artifacts, fax degradation). A generic OCR engine might strip these marks as noise, fundamentally changing the meaning of the word. For example, the Thai characters for "rice" and "news" differ only by a tone mark.

Thai-English code-switching in addresses

Thai addresses on AWBs frequently code-switch between languages within a single field. An address might read: "123/45 ซอยสุขุมวิท 55 Sukhumvit Rd, Watthana, Bangkok 10110". The parser must handle the transition from Thai soi name to English road name within a single address line, maintaining the correct reading order and field segmentation.

How KabyTech handles Thai text

Our approach uses a three-stage pipeline specifically designed for multilingual document processing:

Stage 1: Language-aware layout analysis

Before attempting text recognition, we analyze the document layout to identify text regions and their probable language. This uses visual features: Thai script has a distinctive vertical profile (tall ascenders from vowel marks, descenders from certain consonants) that differs from Latin text. We classify each text region as Thai-primary, English-primary, or mixed, and route it to the appropriate recognition model.

Stage 2: Dual-model text recognition

We run two specialized OCR models in parallel: one optimized for Thai script (including all 44 consonants, 32 vowel forms, 4 tone marks, and Thai numerals) and one optimized for English text and Latin numerals. For mixed regions, we use a fusion model that handles character-level language switching. The fusion model was trained on a corpus of 50,000+ real Thai air cargo documents, so it understands the specific patterns of Thai-English mixing that occur in AWB contexts.

Stage 3: Field-level language normalization

After text recognition, we apply field-level normalization rules. For example, AWB numbers are always numeric regardless of whether surrounding text is Thai or English. IATA airport codes are always three Latin characters. Weight values always use Latin numerals. Thai text typically appears only in name, address, and nature-of-goods fields. By applying these field-level constraints, we can correct recognition errors that would be ambiguous at the character level.

Accuracy comparison: Thai vs. English

We benchmarked our Thai language support against a test set of 500 AWBs: 250 English-only documents and 250 documents containing Thai text (ranging from a few Thai characters to fully bilingual documents).

MetricEnglish-only AWBsThai/Mixed AWBs
Overall field accuracy97.8%96.9%
AWB number accuracy99.9%99.9%
Shipper name accuracy96.2%95.1%
Consignee name accuracy96.5%95.4%
Address accuracy94.8%93.2%
Routing accuracy99.7%99.7%
Weight/pieces accuracy99.4%99.3%
Rate description accuracy97.1%96.5%
Nature of goods accuracy95.8%94.6%
Processing time (median)1.4s1.7s

The accuracy gap between English and Thai/mixed documents is less than 1 percentage point on most fields. The largest gap is in address accuracy (1.6 percentage points), which reflects the inherent complexity of Thai address formatting rather than a language recognition limitation. Structured fields like AWB numbers, routing, and weight values show virtually identical accuracy regardless of the document's language.

Processing time increases by approximately 300 milliseconds for Thai/mixed documents due to the dual-model recognition pipeline. At 1.7 seconds median processing time, this remains well within the sub-2-second SLA.

What this means for Thai freight forwarders

Thai language support has several practical implications for our customers:

  • No more rejected documents. Previously, documents with significant Thai content sometimes failed to parse or produced garbled output. Now they process normally. If your team was pre-filtering documents to exclude Thai-heavy AWBs from automated processing, you can stop doing that.
  • Accurate Thai names for customs filing. Thai Customs e-Filing requires Thai-language shipper/consignee names for domestic entities. With Thai character support, the parsed output can be used directly for customs filing without manual re-entry of Thai names.
  • Supplementary document support. While our core product focuses on AWB parsing, the same Thai language engine powers our extraction of packing lists and invoices that accompany AWBs. Thai-language packing lists are now parseable.
  • Mixed-language search. In the Operations Portal, you can now search for shipments using Thai text. Search for a shipper's Thai name and find all their AWBs instantly.

How to enable Thai language support

Thai language support is enabled by default for all KabyTech accounts. There is no configuration change needed. When you submit a document containing Thai text, the API automatically detects the language mix and applies the appropriate recognition pipeline. The JSON response includes a language_detected field that indicates whether Thai text was found in the document.

If you want to force English-only processing (e.g., for benchmarking purposes), you can pass the language_hint: "en" parameter in your API request. But for production use, we recommend leaving the automatic detection enabled.

What's next for language support

Thai and English are our first two fully supported languages, reflecting our focus on the Thai air freight market. We are currently developing support for Chinese characters (Simplified and Traditional), which appear frequently on AWBs for Thailand-China routes — the single largest air cargo corridor for Thai perishable exports. Chinese language support is expected in Q3 2026.

Beyond that, our roadmap includes Japanese, Korean, and Vietnamese — covering the key air cargo routes in and out of Thailand. Each language requires a dedicated recognition model trained on air cargo-specific text, so these additions take time to develop and validate to our accuracy standards.

Try Thai language AWB parsing today

Upload a Thai or mixed Thai-English AWB and see the results. Free 30-day trial.

Related Articles