blog.thai-language.eyebrow

Announcing Thai Language AWB Support

Naturally read mixed Thai-English AWB documents. Full support for Thai shipper/consignee names, Thai addresses, and handwritten notes.

Air Waybills are, by IATA convention, English-language documents. The standard Cargo-IMP message format uses ASCII text, and the printed AWB form has English field labels. But in practice, Thai air freight documents frequently contain Thai script. Shipper names are written in Thai. Consignee addresses mix Thai and English. Nature-of-goods descriptions include Thai product names. And supplementary documents attached to AWBs — packing lists, invoices, phytosanitary certificates — are often entirely in Thai.

Until now, AWB parsing tools have treated Thai text as noise — either skipping it, replacing it with garbled characters, or failing to process the document entirely. Today we are announcing native Thai language support in KabyTech's AWB Intelligence API. Thai characters are now recognized, extracted, and returned as properly encoded UTF-8 text in the JSON response, with the same field-level accuracy we achieve on English-only documents.

Why Thai text appears on AWBs

Despite the IATA convention, Thai text appears on air cargo documents for several practical reasons:

  • Domestic shipper/consignee names. Thai companies often have their legal name in Thai script. While the AWB might use an English transliteration, many Thai-origin AWBs include both the Thai and English names. Some AWBs issued by domestic carriers use only the Thai name.
  • Thai addresses. Bangkok addresses in particular are notoriously complex in English transliteration. Soi (lane) names, moo (village) numbers, and tambon (sub-district) designations often appear in Thai script even on otherwise English documents because the transliteration would be ambiguous.
  • Nature of goods descriptions. Agricultural and food products frequently have Thai names that do not translate cleanly. A shipper might write "FRESH DURIAN MONTHONG" in the rate description but include the Thai name on attached documentation or in supplementary fields.
  • Handwritten annotations. Warehouse staff, customs brokers, and carrier agents add handwritten notes to AWB copies in Thai. When these annotated copies are scanned, the Thai text becomes part of the document image.
  • Domestic carrier AWBs. Thai domestic air cargo (e.g., Bangkok Airways cargo, Nok Air cargo) sometimes uses bilingual AWB forms with Thai field labels and Thai-language instructions.

The technical challenge of mixed-language AWBs

Parsing a document that contains both Thai and English text is significantly harder than parsing either language alone. Here is why:

Script detection at the character level

Thai and English use completely different writing systems. English uses the Latin alphabet with clear word boundaries (spaces). Thai uses the Thai script, which is an abugida — consonants carry inherent vowels that are modified by diacritical marks placed above, below, before, or after the base consonant. Crucially, Thai does not use spaces between words.

When Thai and English text appear on the same line — for example, a shipper name like "บริษัท Thai Silk Export จำกัด" — the parser must switch between two entirely different recognition models at the character level.

Thai diacritical marks and OCR

Thai script includes diacritical marks (tone marks, vowel marks) that are positioned above or below the base consonant. In scanned documents, especially at lower resolutions, these marks can be confused with document noise. A generic OCR engine might strip these marks as noise, fundamentally changing the meaning of the word.

Thai-English code-switching in addresses

Thai addresses on AWBs frequently code-switch between languages within a single field. An address might read: "123/45 ซอยสุขุมวิท 55 Sukhumvit Rd, Watthana, Bangkok 10110". The parser must handle the transition from Thai soi name to English road name within a single address line.

How KabyTech handles Thai text

Our approach uses a three-stage pipeline specifically designed for multilingual document processing:

Stage 1: Language-aware layout analysis

Before attempting text recognition, we analyze the document layout to identify text regions and their probable language. Thai script has a distinctive vertical profile (tall ascenders from vowel marks, descenders from certain consonants) that differs from Latin text. We classify each text region as Thai-primary, English-primary, or mixed, and route it to the appropriate recognition model.

Stage 2: Dual-model text recognition

We run two specialized OCR models in parallel: one optimized for Thai script (including all 44 consonants, 32 vowel forms, 4 tone marks, and Thai numerals) and one optimized for English text and Latin numerals. For mixed regions, we use a fusion model that handles character-level language switching. The fusion model was trained on a corpus of 50,000+ real Thai air cargo documents.

Stage 3: Field-level language normalization

After text recognition, we apply field-level normalization rules. For example, AWB numbers are always numeric regardless of whether surrounding text is Thai or English. IATA airport codes are always three Latin characters. Weight values always use Latin numerals. Thai text typically appears only in name, address, and nature-of-goods fields. By applying these field-level constraints, we can correct recognition errors that would be ambiguous at the character level.

Accuracy comparison & what this means for Thai freight forwarders

We benchmarked our Thai language support against a test set of 500 AWBs: 250 English-only documents and 250 documents containing Thai text. The accuracy gap between English and Thai/mixed documents is less than 1 percentage point on most fields. The largest gap is in address accuracy (1.6 percentage points), which reflects the inherent complexity of Thai address formatting. Structured fields like AWB numbers, routing, and weight values show virtually identical accuracy regardless of the document's language. Processing time increases by approximately 300 milliseconds for Thai/mixed documents, remaining well within the sub-2-second SLA.

Thai language support has several practical implications for our customers:

  • No more rejected documents. Previously, documents with significant Thai content sometimes failed to parse or produced garbled output. Now they process normally.
  • Accurate Thai names for customs filing. Thai Customs e-Filing requires Thai-language shipper/consignee names for domestic entities. With Thai character support, the parsed output can be used directly for customs filing without manual re-entry.
  • Supplementary document support. The same Thai language engine powers our extraction of packing lists and invoices that accompany AWBs. Thai-language packing lists are now parseable.
  • Mixed-language search. In the Operations Portal, you can now search for shipments using Thai text.

How to enable Thai language support & what's next

Thai language support is enabled by default for all KabyTech accounts. There is no configuration change needed. When you submit a document containing Thai text, the API automatically detects the language mix and applies the appropriate recognition pipeline. The JSON response includes a language_detected field that indicates whether Thai text was found in the document.

Thai and English are our first two fully supported languages, reflecting our focus on the Thai air freight market. We are currently developing support for Chinese characters (Simplified and Traditional), which appear frequently on AWBs for Thailand-China routes — the single largest air cargo corridor for Thai perishable exports. Chinese language support is expected in Q3 2026. Beyond that, our roadmap includes Japanese, Korean, and Vietnamese — covering the key air cargo routes in and out of Thailand.

Try Thai language AWB parsing today

Upload a Thai or mixed Thai-English AWB and see the results. Free 30-day trial.