UPDATED - Unravelling Lexical and Narrative Patterns in the Hikayat Lonthoir: A Computational Linguistics Approach

  • Muhamad Iko Kersapati*
  • , Francesco Perono Cacciafoco*
  • , Bimasyah Sihite
  • , Shiyue Wu
  • , Khofiyana Putri Widyaningrum
  • , Mohamad Atqa
  • , Elvis A.B. Toni
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Hikayat Lonthoir, a rare saga manuscript collection originating from the Banda Archipelago, Maluku, Indonesia, retains significant Indigenous oral history amidst the Western colonial narrative. This study seeks to leverage computational methods to analyze the historic manuscript that constitutes a combination of OCR-supervised transcription, corpus linguistic profiling, semantic clustering (Word2Vec + K-Means), and named entity network analysis. A validation of the dataset is performed on 2793 cleaned word tokens towards Indonesian and Malay dictionaries, showing that 50.3% overlapped with both dictionaries, with strong cross-dictionary agreement (κ = 0.76). The lexical analysis indicates that monarchy/governance, kinship, maritime vocabulary, and extensive morphological productivity (me-, di-, ter-, pe-/per-, -nya, -an), while semantic and network analyses identify two narrative cores, developed into Aarne–Thompson–Uther (ATU) and Stith Thompson’s Motif Index of Folk Literature classification systems. These findings demonstrate how computational methods can extract structural, thematic, and relational patterns from historical manuscripts and contribute evidence-based insights to digital philology and historical linguistics.

Original languageEnglish
Article number1069
JournalInformation (Switzerland)
Volume16
Issue number12
DOIs
Publication statusPublished - 4 Dec 2025

Keywords

  • Banda Archipelago
  • Digital Philology
  • Linguistic Documentation
  • NLP
  • Oral History
  • Semantic Analysis

Cite this