Abstract
本发明公开了一种攻击文本的确定方法、装置及电子设备,涉及人工智能技术领域,包括:获取在输入为干净文本的情况下辅助模型的第一输出结果;并获取在输入为当前攻击文本的情况下辅助模型的第二输出结果;基于第一输出结果与第二输出结果,确定输出差异;将令输出差异满足预设差异条件的当前攻击文本的当前攻击向量,确定为目标攻击向量;获取基于干净文本生成的候选攻击文本,并确定候选攻击文本的候选攻击向量与目标攻击向量之间的向量相似度;若向量相似度满足预设相似度条件,则将候选攻击文本确定为目标攻击文本。其中,目标攻击文本用于测试或者训练目标模型。本方案提高了目标攻击文本的攻击性。
| Translated title of the contribution | Method, device and electronic device for determining attack text: The invention relies on the auxiliary model to screen the target attack vector by comparing the model output difference between the clean text and the attack text, and then discriminates the target attack text from the candidate attack text according to the vector similarity, so as to improve the attack effect of the attack text used for target model testing or training. |
|---|---|
| Original language | Chinese (Simplified) |
| Patent granted number | CN118885819A |
| IPC | G06F16/35;G06F18/22 |
| Publication status | Published - 1 Nov 2024 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver