Masksembles for Uncertainty Estimation

Abstract

Deep neural networks have amply demonstrated their prowess but estimating the reliability of their predictions remains challenging. Deep Ensembles are widely considered as being one of the best methods for generating uncertainty estimates but are very expensive to train and evaluate. MC-Dropout is another popular alternative, which is less expensive, but also less reliable. Our central intuition is that there is a continuous spectrum of ensemble-like models of which MC-Dropout and Deep Ensembles are extreme examples. The first uses an effectively infinite number of highly correlated models while the second relies on a finite number of independent models.

To combine the benefits of both, we introduce Masksembles. Instead of randomly dropping parts of the network as in MC-dropout, Masksembles relies on a fixed number of binary masks, which are parameterized in a way that allows to change correlations between individual models. Namely, by controlling the overlap between the masks and their density one can choose the optimal configuration for the task at hand. This leads to a simple and easy to implement method with performance on par with Ensembles at a fraction of the cost. We experimentally validate Masksembles on two widely used datasets, CIFAR10 and ImageNet

Video

Masksembles layer

General idea of Masksembles layer consists of pre-generating a set of binary masks before training a network and dropping out network's weights in controllable manner during training and inference.

In such a way, Masksembles offers number of configurable parameters that allow one to span the whole spectrum of methods between Single Model, MC-Dropout and Ensembles approaches.

Masksembles transformation from Single Model to Ensembles

Masksembles layer is designed as a simple drop-in replacement for other popular NN's layers (such as Dropout) and therefore it is easy to use and integrate into existing models and architectures.

Efficient implementation (similar to [link]) of Masksembles induces no additional memory or computational overheads and provides out-of-the-box parallelization for submodels inferences.

Model Transitions

Controllable correlation of Masksembles submodels creates trade-off between quality of generated uncertainty, accuracy and model's computational overheads. Tuning these parameters leads to optimal configurations that allow for the fulfillment of computational and performance constraints for each particular task.

Example 1: binary classification task (red vs blue); background color depicts the entropy assigned by different models to individual points on the grid, low in violet and high in yellow. Varying Masksembles parameters (Middle) creates transition from Single Model to Ensembles.

Single Model

Loading...

Deep Ensembles

Example 2: uncertainty quality evaluated on corrupted versions of CIFAR and Imagenet datasets. x-axis is severity of distortions, y-axis - is Expected Calibration Error (ECE) of classification. The lower the better.