OxCSML : People : Sebastian Vollmer

Publications

2017

T. Nagapetyan , A. B. Duncan , L. Hasenclever , S. J. Vollmer , L. Szpruch , K. Zygalakis , The True Cost of Stochastic Gradient Langevin Dynamics, Jun-2017.
The problem of posterior inference is central to Bayesian statistics and a wealth of Markov Chain Monte Carlo (MCMC) methods have been proposed to obtain asymptotically correct samples from the posterior. As datasets in applications grow larger and larger, scalability has emerged as a central problem for MCMC methods. Stochastic Gradient Langevin Dynamics (SGLD) and related stochastic gradient Markov Chain Monte Carlo methods offer scalability by using stochastic gradients in each step of the simulated dynamics. While these methods are asymptotically unbiased if the stepsizes are reduced in an appropriate fashion, in practice constant stepsizes are used. This introduces a bias that is often ignored. In this paper we study the mean squared error of Lipschitz functionals in strongly log- concave models with i.i.d. data of growing data set size and show that, given a batchsize, to control the bias of SGLD the stepsize has to be chosen so small that the computational cost of reaching a target accuracy is roughly the same for all batchsizes. Using a control variate approach, the cost can be reduced dramatically. The analysis is performed by considering the algorithms as noisy discretisations of the Langevin SDE which correspond to the Euler method if the full data set is used. An important observation is that the 1scale of the step size is determined by the stability criterion if the accuracy is required for consistent credible intervals. Experimental results confirm our theoretical findings.
```
@unpublished{Nagapetyan2017,
  month = jun,
  author = {Nagapetyan, T. and Duncan, A. B. and Hasenclever, L. and Vollmer, S. J. and Szpruch, L. and Zygalakis, K.},
  eprint = {1706.02692},
  title = {{The True Cost of Stochastic Gradient Langevin Dynamics}},
  year = {2017}
}
```
X. Lu , V. Perrone , L. Hasenclever , Y. W. Teh , S. J. Vollmer , Relativistic Monte Carlo, in Artificial Intelligence and Statistics (AISTATS), 2017.
Project: bigbayes

Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a Metropolis-Hastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In particular the mass matrix of HMC is hard to tune well. In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduce a maximum velocity on particles. We also derive stochastic gradient versions of the algorithm and show that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning. Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo. In experiments we show that the relativistic algorithms perform better than classical Newtonian variants and Adam.
```
@inproceedings{LuPerHas2016a,
  author = {Lu, X. and Perrone, V. and Hasenclever, L. and Teh, Y. W. and Vollmer, S. J.},
  booktitle = {Artificial Intelligence and Statistics (AISTATS)},
  title = {Relativistic {M}onte {C}arlo},
  month = apr,
  year = {2017},
  bdsk-url-1 = {https://arxiv.org/pdf/1609.04388v1.pdf}
}
```

2016

Y. W. Teh , A. H. Thiéry , S. J. Vollmer , Consistency and Fluctuations for Stochastic Gradient Langevin Dynamics, Journal of Machine Learning Research, 2016.
Project: sgmcmc

Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally expensive. Both the calculation of the acceptance probability and the creation of informed proposals usually require an iteration through the whole data set. The recently proposed stochastic gradient Langevin dynamics (SGLD) method circumvents this problem by generating proposals which are only based on a subset of the data, by skipping the accept-reject step and by using decreasing step-sizes sequence (δ_m)_m≥0. We provide in this article a rigorous mathematical framework for analysing this algorithm. We prove that, under verifiable assumptions, the algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δ_m)_m≥0. We leverage this analysis to give practical recommendations for the notoriously difficult tuning of this algorithm: it is asymptotically optimal to use a step-size sequence of the type δ_m≍m^−1/3, leading to an algorithm whose mean squared error (MSE) decreases at rate O(m^−1/3).
```
@article{TehThiVol2016a,
  author = {Teh, Y. W. and Thi\'ery, A. H. and Vollmer, S. J.},
  journal = {Journal of Machine Learning Research},
  title = {Consistency and Fluctuations for Stochastic Gradient {L}angevin Dynamics},
  year = {2016},
  bdsk-url-1 = {http://jmlr.org/papers/v17/teh16a.html},
  bdsk-url-2 = {http://www.jmlr.org/papers/volume17/teh16a/teh16a.pdf}
}
```
S. J. Vollmer , K. C. Zygalakis , Y. W. Teh , Exploration of the (Non-)asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics, Journal of Machine Learning Research (JMLR), 2016.
Project: sgmcmc

Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally infeasible. The recently proposed stochastic gradient Langevin dynamics (SGLD) method circumvents this problem in three ways: it generates proposed moves using only a subset of the data, it skips the Metropolis- Hastings accept-reject step, and it uses sequences of decreasing step sizes. In Teh et al. (2014), we provided the mathematical foundations for the decreasing step size SGLD, including consistency and a central limit theorem. However, in practice the SGLD is run for a relatively small number of iterations, and its step size is not decreased to zero. The present article investigates the behaviour of the SGLD with fixed step size. In particular we characterise the asymptotic bias explicitly, along with its dependence on the step size and the variance of the stochastic gradient. On that basis a modified SGLD which removes the asymptotic bias due to the variance of the stochastic gradients up to first order in the step size is derived. Moreover, we are able to obtain bounds on the finite-time bias, variance and mean squared error (MSE). The theory is illustrated with a Gaussian toy model for which the bias and the MSE for the estimation of moments can be obtained explicitly. For this toy model we study the gain of the SGLD over the standard Euler method in the limit of large data sets.
```
@article{VolZygTeh2016a,
  author = {Vollmer, S. J. and Zygalakis, K. C. and Teh, Y. W.},
  journal = {Journal of Machine Learning Research (JMLR)},
  title = {Exploration of the (Non-)asymptotic Bias and Variance of Stochastic Gradient {L}angevin Dynamics},
  year = {2016},
  bdsk-url-1 = {http://jmlr.org/papers/v17/15-494.html},
  bdsk-url-2 = {http://www.jmlr.org/papers/volume17/15-494/15-494.pdf}
}
```