Verifying Probabilistically Significant Regions

Typically, neural networks are trained under the assumption that the training data is sampled independently from a fixed underlying distribution $\mathcal{D}$ of the input space. In particular, neural networks are trained to perform a particular task, such as regression or classification. In practical settings these tasks usually have a ground truth outcome which is used to facilitate the training process as the model can be explicitly evaluated against this ground truth. However, training data is a finite and discrete set of points, and thus is only an approximation of the underlying distribution from which it is assumed to be sampled from. Therefore, for a trained neural network there remains uncertainty on how it will perform on points not in the training set.

Methods of formal verification attempt to verify the behaviour of neural networks in regions of the input space. They use the architecture and properties of the neural network to evaluate the neural network beyond points to regions of the input space. Crucially, methods of formal verification can only provide verification results for regions of the input space for which the ground truth is known. For instance, for the task of image classification this may be translated as small L2 $\epsilon$-neighbourhood around training samples. A verification result of this form provides a quantitative measure of the absolute robustness of the neural network. I use the term absolute robustness as the $\epsilon$-neighbourhood has no relation to the underlying distribution $\mathcal{D}$. On the other hand, contextualised robustness refers to results that are in some way related to the domain in which the neural network is deployed. For instance, contextualised robustness for an image classifier may mean a verification result regarding the brightness of an image. In the language domain contextualised robustness may refer to the models sensitivity to inputs with different but semantically similar words. What I note from these examples is that there are multiple different ways a neural network can be verified contextually. Therefore, I will go further to make the distinction between partial contextual robustness and full contextual robustness. The previous examples are what I consider partial contextual robustness results, whereas I will regard a full contextual robustness as one which is related to the underlying distribution $\mathcal{D}$. Partial contextual robustness results are useful when one is interested in verifying the behaviour for a specific application. It is difficult to conclude full contextual robustness from partial contextual robustness results since it is difficult to enumerate all the possible ways to contextually verify a model. However, full contextual robustness results are difficult to achieve from the simple fact that $\mathcal{D}$ is unknown, hence, we may only be able to achieve partial contextual robustness results.

Despite the challenge of obtaining full contextual robustness results, I think that there are methods for achieving such results and trying to obtain such results can shift our perspective on how we perform neural network verification. Specifically, I suggest that instead of evaluating verification methods by the absolute size of the verified regions, say by noting the size of the $\epsilon$-neighbourhood, we should evaluate the methods by the probability mass of the region under the distribution $\mathcal{D}$. Concretely, a neural network that has full contextual robustness would be verified in a region with a high probability mass. Immediately, this motivates a different approach to the verification problem.

An important consequence of thinking probabilistically is that single points in the input space have no probability mass. Therefore, from our perspective they do not affect the contextual robustness of neural networks. Of course this is an issue as current methods of formal verification are designed to conclude that no such singular points in a region exist. In a more extreme scenario, we could have countably many individual points within a region that are adversarial to our neural network, but from our probabilistic perspective they do not affect the contextual robustness of the neural network. On the one hand, this may be an uncomfortable result for those wanting to guarantee the safety and performance of neural networks. On the other hand, someone may be satisfied with these conclusions as they understand that encountering such samples in reality is extremely unlikely.

There has already been work in developing probabilistic robustness verification results, which say that for points uniformly sampled within a region there is some (low) probability that they are adversarial. Relaxing the verification condition in this way leads to the verification of much larger (in absolute terms) regions of the input space, as individual samples cannot influence the verification results as such points have no probability mass. However, these probabilistic verification results are not contextual from our perspective since the probabilities are generated using uniform sampling with the region. To be contextual from our perspective we would need the points to be sampled according to the underlying distribution $\mathcal{D}$. The reason that points are sampled uniformly in these previous works is that we do not have access to $\mathcal{D}$, and so in practice it cannot be used to form these verification results. Moreover, we typically do not have sufficient distinct training sample points within a region to accurately approximate $\mathcal{D}$. Therefore, there is still work to be done to develop methods of verification that provide full contextual probabilistic robustness.