UNDERSTANDING CHINESE MORAL STORIES WITH FURTHER PRE-TRAINING

Jing Qian, Gangmin Li, Yong Yue, Katie Atkinson

Research output: Contribution to journalArticlepeer-review

Abstract

The goal of moral understanding is to grasp the theoretical concepts embedded in a narrative by delving beyond the concrete occurrences and dynamic personas. Specifically, the narrative is compacted into a single statement without involving any characters within the original text, necessitating a more astute language model that can comprehend connotative morality and exhibit commonsense reasoning. The “pre-training + fine-tuning” paradigm is widely embraced in neural language models. In this paper, we propose an intermediary phase to establish an improved paradigm of “pre-training + further pre-training + fine- tuning”. Further pre-training generally refers to continual learning on task-specific or domain-relevant corpora before being applied to target tasks, which aims at bridging the gap in data distribution between the phases of pre-training and fine-tuning. Our work is based on a Chinese dataset named STORAL-ZH that composes of 4k human-written story-moral pairs. Furthermore, we design a two-step process of domain-adaptive pre-training in the intermediary phase. The first step depends on a newly-collected Chinese dataset of Confucian moral culture. And the second step bases on the Chinese version of a frequently-used commonsense knowledge graph (i.e. ATOMIC) to enrich the backbone model with inferential knowledge besides morality. By comparison with several advanced models including BERT-
base, RoBERTa-base and T5-base, experimental results on two understanding tasks demonstrate the effectiveness of our proposed three-phase paradigm.
Original languageEnglish
JournalInternational Journal on Natural Language Computing
Volume12
Issue number2
DOIs
Publication statusPublished - 29 Dec 2023

Fingerprint

Dive into the research topics of 'UNDERSTANDING CHINESE MORAL STORIES WITH FURTHER PRE-TRAINING'. Together they form a unique fingerprint.

Cite this