Skip to main navigation Skip to search Skip to main content

攻击文本的确定方法、装置及电子设备

Translated title of the contribution: Method, device and electronic device for determining attack text: The invention relies on the auxiliary model to screen the target attack vector by comparing the model output difference between the clean text and the attack text, and then discriminates the target attack text from the candidate attack text according to the vector similarity, so as to improve the attack effect of the attack text used for target model testing or training.

Research output: Patent

Abstract

本发明公开了一种攻击文本的确定方法、装置及电子设备,涉及人工智能技术领域,包括:获取在输入为干净文本的情况下辅助模型的第一输出结果;并获取在输入为当前攻击文本的情况下辅助模型的第二输出结果;基于第一输出结果与第二输出结果,确定输出差异;将令输出差异满足预设差异条件的当前攻击文本的当前攻击向量,确定为目标攻击向量;获取基于干净文本生成的候选攻击文本,并确定候选攻击文本的候选攻击向量与目标攻击向量之间的向量相似度;若向量相似度满足预设相似度条件,则将候选攻击文本确定为目标攻击文本。其中,目标攻击文本用于测试或者训练目标模型。本方案提高了目标攻击文本的攻击性。
Translated title of the contributionMethod, device and electronic device for determining attack text: The invention relies on the auxiliary model to screen the target attack vector by comparing the model output difference between the clean text and the attack text, and then discriminates the target attack text from the candidate attack text according to the vector similarity, so as to improve the attack effect of the attack text used for target model testing or training.
Original languageChinese (Simplified)
Patent granted numberCN118885819A
IPCG06F16/35;G06F18/22
Publication statusPublished - 1 Nov 2024

Cite this