TY - GEN
T1 - Lost in Pronunciation
T2 - 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, EMNLP 2025
AU - Guo, Haotan
AU - He, Jianfei
AU - Ma, Jiayuan
AU - Na, Hongbin
AU - Wang, Zimu
AU - Zhang, Haiyang
AU - Chen, Qi
AU - Wang, Wei
AU - Shi, Zijing
AU - Shen, Tao
AU - Chen, Ling
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Warning: this paper contains content that may be offensive or upsetting. Phonetic Cloaking Replacement (PCR), defined as the deliberate use of homophonic or near-homophonic variants to hide toxic intent, has become a major obstacle to Chinese content moderation. While this problem is well-recognized, existing evaluations predominantly rely on rule-based, synthetic perturbations that ignore the creativity of real users. We organize PCR into a four-way surface-form taxonomy and compile PCR-ToxiCN, a dataset of 500 naturally occurring, phonetically cloaked offensive posts gathered from the RedNote platform. Benchmarking state-of-the-art LLMs on this dataset exposes a serious weakness: the best model reaches only an F1-score of 0.672, and zero-shot chain-of-thought prompting pushes performance even lower. Guided by error analysis, we revisit a Pinyin-based prompting strategy that earlier studies judged ineffective and show that it recovers much of the lost accuracy. This study offers the first comprehensive taxonomy of Chinese PCR, a realistic benchmark that reveals current detectors’ limits, and a lightweight mitigation technique that advances research on robust toxicity detection.
AB - Warning: this paper contains content that may be offensive or upsetting. Phonetic Cloaking Replacement (PCR), defined as the deliberate use of homophonic or near-homophonic variants to hide toxic intent, has become a major obstacle to Chinese content moderation. While this problem is well-recognized, existing evaluations predominantly rely on rule-based, synthetic perturbations that ignore the creativity of real users. We organize PCR into a four-way surface-form taxonomy and compile PCR-ToxiCN, a dataset of 500 naturally occurring, phonetically cloaked offensive posts gathered from the RedNote platform. Benchmarking state-of-the-art LLMs on this dataset exposes a serious weakness: the best model reaches only an F1-score of 0.672, and zero-shot chain-of-thought prompting pushes performance even lower. Guided by error analysis, we revisit a Pinyin-based prompting strategy that earlier studies judged ineffective and show that it recovers much of the lost accuracy. This study offers the first comprehensive taxonomy of Chinese PCR, a realistic benchmark that reveals current detectors’ limits, and a lightweight mitigation technique that advances research on robust toxicity detection.
UR - https://www.scopus.com/pages/publications/105039609017
U2 - 10.18653/v1/2025.emnlp-industry.172
DO - 10.18653/v1/2025.emnlp-industry.172
M3 - Conference Proceeding
AN - SCOPUS:105039609017
T3 - EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Industry Track
SP - 2538
EP - 2550
BT - EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Industry Track
A2 - Potdar, Saloni
A2 - Rojas-Barahona, Lina
A2 - Montella, Sebastien
PB - Association for Computational Linguistics (ACL)
Y2 - 4 November 2025 through 9 November 2025
ER -