TY - GEN
T1 - DYNAMIC DATA SAMPLER FOR CROSS-LANGUAGE TRANSFER LEARNING IN LARGE LANGUAGE MODELS
AU - Li, Yudong
AU - Feng, Yuhao
AU - Zhou, Wen
AU - Zhao, Zhe
AU - Shen, Linlin
AU - Hou, Cheng
AU - Hou, Xianxu
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based LLM, to address these challenges and train large Chinese language models in a cost-effective manner. We employ a mix of Chinese, English, and parallel corpus to continuously train the LLaMA2 model, aiming to align cross-language representations and facilitate the knowledge transfer specifically to the Chinese language model. In addition, we use a dynamic data sampler to progressively transition the model from unsupervised pre-training to supervised fine-tuning. Experimental results demonstrate that our approach accelerates model convergence and achieves superior performance. We evaluate ChatFlow on popular Chinese and English benchmarks, the results indicate that it outperforms other Chinese models post-trained on LLaMA-2-7B.
AB - Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based LLM, to address these challenges and train large Chinese language models in a cost-effective manner. We employ a mix of Chinese, English, and parallel corpus to continuously train the LLaMA2 model, aiming to align cross-language representations and facilitate the knowledge transfer specifically to the Chinese language model. In addition, we use a dynamic data sampler to progressively transition the model from unsupervised pre-training to supervised fine-tuning. Experimental results demonstrate that our approach accelerates model convergence and achieves superior performance. We evaluate ChatFlow on popular Chinese and English benchmarks, the results indicate that it outperforms other Chinese models post-trained on LLaMA-2-7B.
KW - cross-language
KW - knowledge transfer
KW - Large language model
UR - http://www.scopus.com/inward/record.url?scp=85195416893&partnerID=8YFLogxK
U2 - 10.1109/ICASSP48485.2024.10446640
DO - 10.1109/ICASSP48485.2024.10446640
M3 - Conference Proceeding
AN - SCOPUS:85195416893
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 11291
EP - 11295
BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Y2 - 14 April 2024 through 19 April 2024
ER -