DeepMind is funding research at the intersection of deep and probabilistic machine learning in the OxCSML group. The students and faculty supported by these funds are:

C. J. Maddison
,
D. Lawson
,
G. Tucker
,
N. Heess
,
M. Norouzi
,
A. Mnih
,
A. Doucet
,
Y. W. Teh
,
Filtering Variational Objectives, in Advances in Neural Information Processing Systems (NeurIPS), 2017.

The evidence lower bound (ELBO) appears in many algorithms for maximum likelihood estimation (MLE) with latent variables because it is a sharp lower bound of the marginal log-likelihood. For neural latent variable models, optimizing the ELBO jointly in the variational posterior and model parameters produces state-of-the-art results. Inspired by the success of the ELBO as a surrogate MLE objective, we consider the extension of the ELBO to a family of lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We show that the tightness of such bounds is asymptotically related to the variance of the underlying estimator. We introduce a special case, the filtering variational objectives (FIVOs), which takes the same arguments as the ELBO and passes them through a particle filter to form a tighter bound. FIVOs can be optimized tractably with stochastic gradients, and are particularly suited to MLE in sequential latent variable models. In standard sequential generative modeling tasks we present uniform improvements over models trained with ELBO, including some whole nat-per-timestep improvements.

@inproceedings{MadLawTuc2017b,
author = {Maddison, C. J. and Lawson, D. and Tucker, G. and Heess, N. and Norouzi, M. and Mnih, A. and Doucet, A. and Teh, Y. W.},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
title = {Filtering Variational Objectives},
year = {2017},
month = dec,
bdsk-url-1 = {https://arxiv.org/pdf/1705.09279v1.pdf}
}

C. J. Maddison
,
A. Mnih
,
Y. W. Teh
,
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, in International Conference on Learning Representations (ICLR), 2017.

The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of the gradients of the expected loss. While many continuous random variables have such reparameterizations, discrete random variables lack continuous reparameterizations due to the discontinuous nature of discrete states. In this work we introduce concrete random variables – continuous relaxations of discrete random variables. The concrete distribution is a new family of distributions with closed form densities and a simple reparameterization. Whenever a discrete stochastic node of a computation graph can be refactored into a one-hot bit representation that is treated continuously, concrete stochastic nodes can be used with automatic differentiation to produce low-variance biased gradients of objectives (including objectives that depend on the log-probability of latent stochastic nodes) on the corresponding discrete graph. We demonstrate effectiveness of concrete relaxations on density estimation and structured prediction tasks using neural networks.

@inproceedings{maddison2016concrete,
author = {Maddison, Chris J. and Mnih, Andriy and Teh, Yee Whye},
booktitle = {International Conference on Learning Representations (ICLR)},
note = {ArXiv e-prints:1611.00712},
title = {{The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables}},
year = {2017},
bdsk-url-1 = {https://arxiv.org/pdf/1611.00712.pdf}
}

C. J. Maddison
,
D. Lawson
,
G. Tucker
,
N. Heess
,
M. Norouzi
,
A. Mnih
,
A. Doucet
,
Y. W. Teh
,
Particle Value Functions, in ICLR 2017 Workshop Proceedings, 2017.

The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on an example. This risk-sensitive value function is not always applicable to reinforcement learning problems, so we introduce the particle value function defined by a particle filter over the distributions of an agent’s experience, which bounds the risk-sensitive one. We illustrate the benefit of the policy gradients of this objective in Cliffworld.

@inproceedings{MadLawTuc2017a,
author = {Maddison, C. J. and Lawson, D. and Tucker, G. and Heess, N. and Norouzi, M. and Mnih, A. and Doucet, A. and Teh, Y. W.},
booktitle = {ICLR 2017 Workshop Proceedings},
note = {ArXiv e-prints: 1703.05820},
title = {Particle Value Functions},
year = {2017},
bdsk-url-1 = {https://arxiv.org/pdf/1705.09279v1.pdf}
}