Python3环境下cleverhans对抗样本防护编译与测试（含FGSM攻击与ADV防护）

在看人工智能安全方面的资料，顺手看到cleverhans的资料，就将它在python 3.6的环境下进行编译和测试。

在Ian Goodfellow的《Machine learning privacy and security》报告中才了解到cleverhans项目名字的由来：“一匹叫做 Clever Hans 的马。刚出现的时候人们认为这匹马会做算术，但实际上它只是会阅读人的表情，当它点马蹄的次数接近正确答案时，人们的表情会更兴奋，它就知道该这个时候停止了。”

这个项目是tensorflow的子项目（https://github.com/tensorflow/cleverhans），原始的代码版本是PYTHON 2.7环境，于代码下载后进行了重构和3.6版本的编译。发现这个代码的工作量挺多的。下面就重点关注的几块进行测试。

（1）FGSM的图像扰动攻击

FGSM，是 Goodfellow等人提出的比较典型的对抗样本生成算法。

它的数据生成方式如下（由于

很小，因此x'和x的数值相差不大，因此人眼一般不会感知到明显区别, 但是对于CNN模型来说，识别的错误还是发生了。）：

,具体的代码可见https://github.com/tensorflow/cleverhans/blob/master/cleverhans/attacks_tf.py相关的函数。

前后

此外，这个网址上提供了许多FGSM的例子。（见https://www.kaggle.com/benhamner/fgsm-attack-example/code）

将生成后的FGSM扰动数据送到图像识别模型中如代码中给出的inceptionv3中，可以看到图像的识别结果全部变乱了。

A：下图为原始的图片识别结果

B：下图为FGSM扰动后的的图片识别结果，可以看出识别分类结果相差特别的大。

（2）FGSM攻击的防护（NIPS2017 论文相关代码）

在找防护的过程中，才发现cleverhans集成的代码居然也是tensorflow models中的相关代码，见https://github.com/tensorflow/models/tree/master/research/adv_imagenet_models。本质上而言，它需要在扰动的图片上进行训练，从而才能实现对扰动的代码进行准确识别。如论文原文中指出的贡献如下：

实际代码中，cleverhans提供了两种对抗训练，一种是基于inceptionv3的，一种是inception-resnet-v2的增强版。测试结果如下，则扰动后的图片，也能被正确识别。

（3）一些其他的例子，cleverhans代码库提供了多样性的对抗样本生成方法，具体如下：

sample_attacks/ - directory with examples of attacks:
- sample_attacks/fgsm/ - Fast gradient sign attack.
- sample_attacks/noop/ - No-op attack, which just copied images unchanged.
- sample_attacks/random_noise/ - Attack which adds random noise to images.
sample_targeted_attacks/ - directory with examples of targeted attacks:
- sample_targeted_attacks/step_target_class/ - one step towards target class attack. This is not particularly good targeted attack, but it demonstrates how targeted attack could be written.
- sample_targeted_attacks/iter_target_class/ - iterative target class attack. This is a pretty good white-box attack, but it does not do well in black box setting.
sample_defenses/ - directory with examples of defenses:
- sample_defenses/base_inception_model/ - baseline inception classifier, which actually does not provide any defense against adversarial examples.
- sample_defenses/adv_inception_v3/ - adversarially trained Inception v3 model from Adversarial Machine Learning at Scale paper.
- sample_defenses/ens_adv_inception_resnet_v2/ - Inception ResNet v2 model which is adversarially trained against an ensemble of different kind of adversarial examples. Model is described in Ensemble Adversarial Training: Attacks and Defenses paper.

同时也提供了好几个example。还是对抗样本生成与对抗训练非常好的一个库。

附图为其中第一个example。

tensorflow https 网络安全 github git

0 人点赞