筆記：Pytorch官方教程-對抗樣本生成

本文轉載自查看原文 2021-11-10 16:09 1082

翻譯自：https://pytorch.org/tutorials/beginner/fgsm_tutorial.html

盡管深度學習的模型越來越快速、越准確，但深入了解對抗學習之后，你會驚訝的發現，向圖像添加微小的難以察覺的擾動可能使模型性能發生顯著改變。

這個教程將通過圖像分類器來討論這個問題，具體來說，我們將使用最早的也是最流行的FGSM方法，來愚弄MNIST分類器。

Threat Model

在上下文中，有許多種類的對抗性攻擊，每種攻擊都有不同的目標和攻擊者的知識假設。然而，一般來說，首要目標是對輸入數據添加最少的擾動，以導致所需的誤分類。

攻擊者的知識有幾種假設，其中兩種是：白盒和黑盒。白盒攻擊假定攻擊者完全了解和訪問模型，包括體系結構、輸入、輸出和權重。黑盒攻擊假定攻擊者只能訪問模型的輸入和輸出，而對底層架構或權重一無所知。

還有幾種類型的目標，包括錯誤分類和源/目標錯誤分類。錯誤分類的目標意味着對手只希望輸出分類是錯誤的，而不關心新分類是什么。源/目標錯誤分類意味着對手想要更改最初屬於特定源類別的圖像，以便將其分類為特定目標類別，可見 https://momodel.cn/workspace/618b8c6d85f93acaac091381?type=app這個例子，非常有趣。

在這種情況下，FGSM攻擊是以錯誤分類為目標的白盒攻擊。有了這些背景信息，我們現在可以詳細討論這次攻擊

Fast Grandient Sign Attack

這最早且最流行的對抗攻擊方式是Fast Grandient Sign Attack(FGSA)，它是GoodFellow在 Explaining and Harnessing Adversarial Examples. 中提出的，這種攻擊非常強大且很直觀。它是通過神經網絡本身的學習方式——梯度來進行攻擊，它的思路很簡單，不是通過反向傳播的梯度去調整權重使最小化loss，而是根據梯度來調整輸入使得loss最大化.

在開始編寫代碼之前，讓我們先看看著名的FGSM 熊貓例子和介紹一些符號公式

根據這張圖片，$x$ 是原始的輸入圖像其分類是"panda"，$y$ 是 $x$ 的 $ground truth label$，$\theta$ 指模型的參數，$J(\theta, \mathbf{x}, y)$ 是用來訓練網絡的loss。攻擊將梯度反向傳播到輸入來計算 loss對 $x$ 的偏導數 $\nabla_{x} J(\theta, \mathbf{x}, y)$，然后，我們將小步（$\epsilon$ 或這個例子中的0.7）地調整輸入數據，在 $\operatorname{sign}\left(\nabla_{x} J(\theta, \mathbf{x}, y)\right.$ 方向上，從而最大化loss。由此產生的擾動圖像 ${x}'$ 將會被目標網絡識別成"gibbon"（長臂猿），而人眼看來還是"panda"（熊貓）

希望本教程的動機現在已經很清楚了，讓我們開始實現。

Implementation

完整代碼可見 https://gist.github.com/growvv/c3188af99b49315423afbd6843fcd05d

FGSM Attack

我們定義一個創建對抗樣本的函數，它有三個輸入：原始圖片 $x$、擾動幅度 $\epsilon$、loss的梯度 ${data_grad}$，創建擾動圖像的函數如下：

$$\text { perturbed_image }=\text { image }+\text { epsilon } * \operatorname{sign}(\text { data_grad })=x+\epsilon * \operatorname{sign}\left(\nabla_{x} J(\theta, \mathbf{x}, y)\right)$$

最后，為了保持數據原來的范圍，這個擾動圖像將截斷在 $[0, 1]$

# FGSM attack code
def fgsm_attack(image, epsilon, data_grad):
    # Collect the element-wise sign of the data gradient
    sign_data_grad = data_grad.sign()
    # Create the perturbed image by adjusting each pixel of the input image
    perturbed_image = image + epsilon*sign_data_grad
    # Adding clipping to maintain [0,1] range
    perturbed_image = torch.clamp(perturbed_image, 0, 1)
    # Return the perturbed image
    return perturbed_image

Testing Function

我們會設置不同的 $\epsilon$ 來調用測試函數。測試函數將原來能正確分類的樣本加上擾動，達到擾動樣本，在將擾動樣本進行測試，並且擾動后分類出錯的樣本保存下來，用於后面的可視化

點擊查看代碼

def test( model, device, test_loader, epsilon ):

    # Accuracy counter
    correct = 0
    adv_examples = []

    # 一個一個測試, batch_size=1
    for data, target in test_loader:

        data, target = data.to(device), target.to(device)
        # Set requires_grad attribute of tensor. Important for Attack
        data.requires_grad = True

        output = model(data)
        init_pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability

        # 本來就分類錯誤的話，就不用進行攻擊了
        if init_pred.item() != target.item():
            continue
        
        loss = F.nll_loss(output, target)

        # Zero all existing gradients
        model.zero_grad()

        # Calculate gradients of model in backward pass
        loss.backward()

        # Collect datagrad
        data_grad = data.grad.data

        # Call FGSM Attack
        perturbed_data = fgsm_attack(data, epsilon, data_grad)

        # Re-classify the perturbed image
        output = model(perturbed_data)

        # Check for success
        final_pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
        if final_pred.item() == target.item():
            correct += 1
            # Special case for saving 0 epsilon examples
            if (epsilon == 0) and (len(adv_examples) < 5):
                adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
                adv_examples.append( (init_pred.item(), final_pred.item(), adv_ex) )
        else:
            # Save some adv examples for visualization later
            if len(adv_examples) < 5:
                adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
                adv_examples.append( (init_pred.item(), final_pred.item(), adv_ex) )

    # Calculate final accuracy for this epsilon
    final_acc = correct/float(len(test_loader))
    print("Epsilon: {}\tTest Accuracy = {} / {} = {}".format(epsilon, correct, len(test_loader), final_acc))

    # Return the accuracy and an adversarial example
    return final_acc, adv_examples

Run Attack

在每個 $\epsilon$ 上進行測試

accuracies = []
examples = []

# Run test for each epsilon
for eps in epsilons:
    acc, ex = test(model, device, test_loader, eps)
    accuracies.append(acc)
    examples.append(ex)

Out:

Epsilon: 0      Test Accuracy = 9810 / 10000 = 0.981
Epsilon: 0.05   Test Accuracy = 9426 / 10000 = 0.9426
Epsilon: 0.1    Test Accuracy = 8510 / 10000 = 0.851
Epsilon: 0.15   Test Accuracy = 6826 / 10000 = 0.6826
Epsilon: 0.2    Test Accuracy = 4301 / 10000 = 0.4301
Epsilon: 0.25   Test Accuracy = 2082 / 10000 = 0.2082
Epsilon: 0.3    Test Accuracy = 869 / 10000 = 0.0869

正如預期的一樣，$\epsilon$ 越大，准確度會越低，但是注意，並不是線性下降的，盡管 $\epsilon$ 是線性間隔

Sample Adversarial Examples

把前面保存的samples可視化，每一類有5張。可以發現，“天下沒有免費的午餐”，隨着 $\epsilon$ 增大，准確度降低，但擾動變得更容易察覺。在現實中，攻擊者必須在准確性和可感知性之間做折中。

Where to go next

這種攻擊僅僅代表對抗性攻擊研究的開始，因此這隨后還有許多想法關於對抗性攻擊和防御。事實上，在NIPS 2017上有一個對抗攻擊與防御比賽，這篇論文 Adversarial Attacks and Defences Competition 記錄了比賽中用到的許多方法。除此之外，這個工作也使得機器學習模型變得更魯棒，無論是對於自然擾動輸入還是惡意輸入。

另一個方向是在其他領域使用對抗攻擊與防御，比如用到語音和文本。但是學習對抗式機器學習最好的方式是 get your hands dirty，去嘗試實現NIPS 2017中其他的attack方式，看它們與FSGM有什么不同，然后嘗試防御你自己的攻擊.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 用Caffe生成對抗樣本 [PyTorch 學習筆記] 8.3 GAN（生成對抗網絡）簡介 12種生成對抗樣本的方法 5 12種生成對抗樣本的方法八、對抗樣本1 對抗樣本與對抗訓練2 人工智能中小樣本問題相關的系列模型演變及學習筆記（二）：生成對抗網絡 GAN FGSM（Fast Gradient Sign Method）生成對抗樣本機器學習筆記：sklearn.datasets樣本生成器——make_classification、make_blobs、make_regression Pytorch——GPT-2 預訓練模型及文本生成