ETH Zürich identifies priors who drive Bayesian models of deep learning
It is well known in the machine learning community that choosing the right prerequisite – an initial belief about an event expressed in terms of a probability distribution – is crucial for Bayesian inference. Many recent Bayesian deep learning models, however, rely on established but non-informative or weak informative priors which can adversely affect the inference capabilities of their models.
In the newspaper The Bayesian Deep Learning Priors: A Review, a research team from ETH Zürich presents an overview of the different priors for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks. The team proposes that well-chosen a priori can effectively achieve theoretical and empirical properties such as uncertainty estimation, model selection and optimal decision support; and provides advice on how to choose them.
The main idea of Bayesian models is to deduce a posteriori distribution on the parameters of a model based on a prior probability for certain observed data. This approach can be used to update the probability of a hypothesis as more evidence or information becomes available.
Although the choice of the correct prior is crucial for Bayesian models, in practice it is often non-trivial to map prior subjective beliefs to treatable probability distributions. Moreover, the guarantees of asymptotic coherence of the Bernstein-von-Mises theorem have led some researchers to believe that the a priori can have a detrimental influence on the posterior. Therefore, there is a growing tendency in contemporary Bayesian deep learning research to choose seemingly “uninformative” priors such as Standard Gaussians.
This theorem does not hold, however, in many applications, because its regularity conditions are often not satisfied. Moreover, in the non-asymptotic regime of practical inferences, the a priori have a strong influence on the posterior. Worse yet, bad assumptions can undermine the very properties that motivate researchers to use Bayesian inference in the first place.
Motivated by these ideas, the team argues that it’s time to look at past choices other than the usual, uninformative choices.
The article first reviews existing prior designs for (deep) Gaussian processes. Gaussian processes (GP) are nonparametric models which, instead of deriving a distribution over the parameters of a parametric function, can be used to directly derive a distribution over functions. This approach is not only ideal for problems with few observations; it also has the potential to harness the information available in increasingly large data sets. The team specifies how to combine general practitioners with deep neural networks via parameterized functions and neural network limits; and how to use them to build full-fledged deep models.
The team is also examining the priors of Variational Autoencoders (VAE), Bayesian latent variable models with architectures including both an encoder and a decoder trained to minimize the reconstruction error between the encoded-decoded data and the original data. . In such models, observations are generated from unobserved latent variables via a likelihood function. The team is examining a number of suitable distributional VAE priors that can directly replace standard Gaussian, some structural VAE priors, and a particularly interesting VAE model: the neural process.
Regarding the a priori in Bayesian neural networks, the team proves that the standard Gaussian a priori on the parameters are insufficient, and that the inductive biases should rather be represented by the choice of architectures. They also review the a priori defined in weight and function spaces, and explore methods to extend these ideas to Bayesian sets of neural networks.
The above approaches all assume that prior knowledge is available to encode in Bayesian deep learning models, but what if there is no useful prior knowledge to encode? In this case, the team suggests that researchers can alternatively rely on a learning-to-learn or meta-learning framework, which exploits previously solved tasks that are linked to the current task to learn hyperparameters for the most part. a priori discussed above (Gaussian processes, variational autoencoders and Bayesian neural networks).
The team examines many alternative prerequisites for popular Bayesian deep learning models and demonstrates that useful priors for these models can even be learned from data alone. They hope that their study can encourage researchers to choose their preconceptions more carefully and motivate the research community to develop priors better suited to Bayesian deep learning models.
The paper The Bayesian Deep Learning Priors: A Review is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research findings. Subscribe to our popular newsletter Global AI synchronized weekly to get weekly AI updates.