IBM Adversarial Robustness Toolbox Helps to Protect Neural Networks Against Malicious Attacks

The open source framework provides best practices to defend deep neural networks against adversarial threats.

Jesus Rodriguez

Mar 9 . 6 min read

Deep neural networks(DNNs) are rapidly achieving a level of sophistication that is dazzling even the most optimistic technologists in the space. However, with sophistication have come a lack of interpretability of DNN models and with that come risks. If we can’t understand what a DNN is doing, how can we possibly protect it against potential vulnerabilities. Not surprisingly, sophisticated DNNs have proven to even extremely vulnerable for the simple manipulations in the models and training datasets. Generative adversarial neural networks(GANs) have emerged as one of the fundamental techniques to attack DNNs. The raised of GAN-based attacks have forced machine learning specialists to regularly evaluate the robustness of DNN models. IBM has been one of the most active companies in this area and, about a year ago, compiled some of its findings into an open source framework known as the adversarial robustness toolbox.

Generative adversarial neural networks(GANs) are one of the most active areas of research in the deep learning ecosystem. Conceptually, GANs are a form of unsupervised learning in which two neural networks build knowledge by competing against each other in a zero-sum game. While GANs are a great mechanism for knowledge acquisition, they can also be used to generate attacks against deep neural networks. In a very well-known example, a GAN attacker can cause imperceptible changes in training images to trick a classification model.

The topic of evaluating the robustness of models against adversarial attacks have been a top priority of AI powerhouses such as OpenAI or Google. A bit under the radar, IBM has been doing a lot of work trying to advance the research and implementation of adversarial attacks in deep neural network. Just last week, IBM AI researchers published two different research papers in the area of GAN protection. Today, I would like to explore some of IBM’s recent work about protecting neural networks against adversarial attacks and discuss its relevance in modern deep learning implementation.

Black-box adversarial attacks describe scenarios in which the attacker does not have complete access to the policy network. In AI research literary, black-box attacks are classified into two main groups:

1) The adversary has access to the training environment and knowledge of the training algorithm and hyperparameters. It knows the neural network architecture of the target policy network, but not its random initialization. They refer to this model as transferability across policies.

2) The adversary additionally has no knowledge of the training algorithm or hyperparameters. They refer to this model as transferability across algorithms.

A simpler way to think about white-box and black-box adversarial attacks is whether the attacker is targeting a model during training time or after is deployed. Despite that simple distinction, the techniques used to defend against white-box or black-box attack are fundamentally different. Recently, IBM has been dabbling into both attack models both from the research and implementation standpoint. Let’s take a look at some of IBM’s recent efforts in adversarial attacks.

ART operates by examining and clustering the neural activations produced by a training dataset and trying to discriminate legit examples from those likely manipulated by an adversarial attack. The current version of ART focuses on two types of adversarial attacks: evasion and poisoning. For each type of adversarial attack, ART includes defense methods that can be incorporated into deep learning models.

Developers can start using ART via its Python SDK which doesn’t require any major modifications in the architecture of the deep neural network.

IBM’s Autoencoder-based Zeroth Order Optimization Method(AutoZOOM) is a technique for creating more efficient black-box attacks. Initially published in a research paper, AutoZOOM also includes an open source implementation that can be used by developers across several deep learning frameworks. The goal of AutoZOOM is to accelerate the efficiency of queries targeting adversarial examples and it accomplishes that using two main building blocks:

i. An adaptive random gradient estimation strategy to balance query counts and distortion.

ii. An autoencoder that is either trained offline with unlabeled data or a bilinear resizing operation for acceleration.

To achieve i, AutoZOOM features an optimized and query-efficient gradient estimator, which has an adaptive scheme that uses few queries to find the first successful adversarial perturbation and then uses more queries to fine-tune the distortion and make the adversarial example more realistic. To achieve ii, AutoZOOM implements a technique called “dimension reduction” to reduce the complexity of finding adversarial examples. The dimension reduction can be realized by an offline trained autoencoder to capture data characteristics or a simple bilinear image resizer which does not require any training.

The initial tests of AutoZOOM showed that the method is able to generate black-box adversarial examples with a far fewer queries than traditional methods.

The key innovation in CNN-Cert is deriving explicit network output bound by considering the input/output relations of each building block. The activation layer can be general activations other than ReLU. The approach demonstrated to be about 11 to 17 times more efficient than traditional adversarial robustness certification methods. CNN-Cert is able to handle various architectures including convolutional layers, max-pooling layers, batch normalization layer, residual blocks, as well as general activation functions such as ReLU, tanh, sigmoid and arctan.

As you can see, IBM seems to be really committed to advance the conversation about adversarial attacks in deep neural networks. Efforts like ART, AutoZOOM and CNN-Cert are among the most creative recent efforts in adversarial techniques. Hopefully, we will see some of these implementations included in mainstream deep learning frameworks soon.

No Comments

Post A Comment