Bayesian inference, probabilistic programming, Monte Carlo methods
I am postdoc working in Yee Whye Teh’s group at the department of statistics in Oxford. I have a broad range of interest varying from Monte Carlo methods and probabilistic programming to random forests and variational auto-encoders.
Publications
2018
T. Rainforth
,
Nesting Probabilistic Programs, arXiv preprint arXiv:1803.06328, 2018.
We formalize the notion of nesting probabilistic programming queries and
investigate the resulting statistical implications. We demonstrate that query nesting
allows the definition of models which could not otherwise be expressed, such as
those involving agents reasoning about other agents, but that existing systems take
approaches that lead to inconsistent estimates. We show how to correct this by delineating
possible ways one might want to nest queries and asserting the respective conditions required
for convergence. We further introduce, and prove the correctness of, a new online nested
Monte Carlo estimation method that makes it substantially easier to ensure these
conditions are met, thereby providing a simple framework for designing statistically correct inference engines.
@article{rainforth2018nesting,
title = {Nesting Probabilistic Programs},
author = {Rainforth, Tom},
journal = {arXiv preprint arXiv:1803.06328},
year = {2018}
}
We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted auto-encoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Moreover, PIWAE can simultaneously deliver improvements in both the quality of the inference network and generative network, relative to IWAE.
@article{rainforth2018tighter,
title = {Tighter Variational Bounds are Not Necessarily Better},
author = {Rainforth, Tom and Kosiorek, Adam R. and Le, Tuan Anh and Maddison, Chris J. and Igl, Maximilian and Wood, Frank and Teh, Yee Whye},
journal = {arXiv preprint arXiv:1802.04537},
year = {2018}
}
T. A. Le
,
M. Igl
,
T. Rainforth
,
T. Jin
,
F. Wood
,
Auto-Encoding Sequential Monte Carlo, in International Conference on Learning Representations, 2018.
We build on auto-encoding sequential Monte Carlo (AESMC):
a method for model
and proposal learning based on maximizing the lower bound to the log marginal
likelihood in a broad family of structured probabilistic models. Our approach
relies on the efficiency of sequential Monte Carlo (SMC) for performing inference
in structured probabilistic models and the flexibility of deep neural networks
to model complex conditional probability distributions. We develop additional
theoretical insights and experiment with a new training procedure which can
improve both model and proposal learning. We demonstrate that our approach
provides a fast, easy-to-implement and scalable means for simultaneous model
learning and proposal adaptation in deep generative models.
@inproceedings{le2018autoencoding,
title = {Auto-Encoding Sequential Monte Carlo},
author = {Le, Tuan Anh and Igl, Maximilian and Rainforth, Tom and Jin, Tom and Wood, Frank},
booktitle = {International Conference on Learning Representations},
year = {2018}
}
2017
T. Rainforth
,
Automating Inference, Learning, and Design using
Probabilistic Programming, PhD thesis, University of Oxford, 2017.
Imagine a world where computational simulations can be inverted as easily as running them forwards, where data can be used to refine models automatically, and where the only expertise one needs to carry out powerful statistical analysis is a basic proficiency in scientific coding. Creating such a world is the ambitious long-term aim of probabilistic programming.
<br><br>The bottleneck for improving the probabilistic models, or simulators, used throughout the quantitative sciences, is often not an ability to devise better models conceptually, but a lack of expertise, time, or resources to realize such innovations. Probabilistic programming systems (PPSs) help alleviate this bottleneck by providing an expressive and accessible modeling framework, then automating the required computation to draw inferences from the model, for example finding the model parameters likely to give rise to a certain output. By decoupling model specification and inference, PPSs streamline the process of developing and drawing inferences from new models, while opening up powerful statistical methods to non-experts. Many systems further provide the flexibility to write new and exciting models which would be hard, or even impossible, to convey using conventional statistical frameworks.
<br><br>The central goal of this thesis is to improve and extend PPSs. In particular, we will make advancements to the underlying inference engines and increase the range of problems which can be tackled. For example, we will extend PPSs to a mixed inference-optimization framework, thereby providing automation of tasks such as model learning and engineering design. Meanwhile, we make inroads into constructing systems for automating adaptive sequential design problems, providing potential applications across the sciences. Furthermore, the contributions of the work reach far beyond probabilistic programming, as achieving our goal will require us to make advancements in a number of related fields such as particle Markov chain Monte Carlo methods, Bayesian optimization, and Monte Carlo fundamentals.
@phdthesis{rainforth2017thesis,
title = {{Automating Inference, Learning, and Design using
Probabilistic Programming}},
author = {Rainforth, Tom},
institution = {University of Oxford},
year = {2017}
}
B. T. Vincent
,
T. Rainforth
,
The DARC Toolbox: automated, flexible, and efficient delayed and risky choice experiments using Bayesian adaptive design, 2017.
Delayed and risky choice (DARC) experiments are a cornerstone of research in
psychology, behavioural economics and neuroeconomics. By collecting an agent’s preferences
between pairs of prospects we can characterise their preferences, investigate
what affects them, and probe the underlying decision making mechanisms. We present
a state-of-the-art approach and software toolbox allowing such DARC experiments to
be run in a highly efficient way. Data collection is costly, so our toolbox automatically
and adaptively generates pairs of prospects in real time to maximise the information
gathered about the participant’s behaviours. We demonstrate that this leads to improvements
over alternative experimental paradigms. The key to releasing our real
time and automatic performance is a number of advances over current Bayesian adaptive
design methodology. In particular, we derive an improved estimator for discrete
output problems and design a novel algorithm for automating sequential adaptive design.
We provide a number of pre-prepared DARC tools for researchers to use, but a
key contribution is an adaptive experiment toolbox that can be extended to virtually
any 2-alternative-choice tasks. In particular, to carry out custom adaptive experiments
using our toolbox, the user need only encode their behavioural model and design space
– both the subsequent inference and sequential design optimisation are automated for
arbitrary models the user might write.
@article{vincent2017darc,
title = {The DARC Toolbox: automated, flexible, and efficient delayed and risky choice experiments using Bayesian adaptive design},
author = {Vincent, Benjamin T and Rainforth, Tom},
year = {2017},
publisher = {PsyArXiv}
}
T. Rainforth
,
R. Cornish
,
H. Yang
,
A. Warrington
,
F. Wood
,
On the Opportunities and Pitfalls of Nesting Monte Carlo Estimators, arXiv preprint arXiv:1709.06181, 2017.
We present a formalization of nested Monte Carlo (NMC) estimation,
whereby terms in an outer estimator themselves involve calculation of separate,
nested, Monte Carlo (MC) estimators. We demonstrate that, under mild conditions,
NMC can provide consistent estimates of nested expectations, including cases involving
arbitrary levels of nesting; establish corresponding rates of convergence; and provide
empirical evidence that these rates are observed in practice. We further establish a number
of pitfalls that can arise from naive nesting of MC estimators, provide guidelines about
how these can be avoided, and lay out novel methods for reformulating certain classes of
nested expectation problems into single expectations, leading to improved convergence rates.
Finally, we use one of these reformulations to derive a new estimator for use in discrete
Bayesian experimental design problems which has a better convergence rate than existing methods.
Our results have implications for a wide range of fields from probabilistic programming to deep
generative models and serve both as an invitation for further inquiry and a caveat against careless use.
@article{rainforth2017opportunities,
title = {On the Opportunities and Pitfalls of Nesting Monte Carlo Estimators},
author = {Rainforth, Tom and Cornish, Robert and Yang, Hongseok and Warrington, Andrew and Wood, Frank},
journal = {arXiv preprint arXiv:1709.06181},
year = {2017}
}
T. Rainforth
,
Y. Zhou
,
X. Lu
,
Y. W. Teh
,
F. Wood
,
H. Yang
,
J. Meent
,
Inference Trees: Adaptive Inference with Exploration [Workshop Version], NIPS Workshop on Advances in Approximate Bayesian Inference, 2017.
We introduce inference trees (ITs), a new adaptive Monte Carlo inference method building on ideas from Monte Carlo tree search. Unlike most existing methods which are implicitly based on pure exploitation, ITs explicitly aim to balance explo- ration and exploitation in the inference process, alleviating common pathologies and ensuring consistency. More specifically, ITs use bandit strategies to adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partitions in an online manner. This allows ITs to “hone-down” on regions of interest, but also maintain uncertainty estimates on whether regions of significant posterior mass have been missed. Though they can be used more generally, we show that ITs are particularly effective when combined with sequen- tial Monte Carlo (SMC), allowing long-range dependencies to be captured and potentially improving beyond what can be achieved by proposal adaptation alone.
@article{rainforth2017it,
title = {Inference Trees: Adaptive Inference with Exploration [Workshop Version]},
author = {Rainforth, Tom and Zhou, Yuan and Lu, Xiaoyu and Teh, Yee Whye and Wood, Frank and Yang, Hongseok and van de Meent, Jan-Willem},
journal = {NIPS Workshop on Advances in Approximate Bayesian Inference},
year = {2017}
}
B. Bloem-Reddy
,
E. Mathieu
,
A. Foster
,
T. Rainforth
,
H. Ge
,
M. Lomelí
,
Z. Ghahramani
,
Y. W. Teh
,
Sampling and inference for discrete random probability measures in probabilistic programs, NIPS Workshop on Advances in Approximate Bayesian Inference, 2017.
We consider the problem of sampling a sequence from a discrete random probability measure (RPM) with countable support, under (probabilistic) constraints of finite memory and computation. A canonical example is sampling from the Dirichlet Process, which can be accomplished using its stick-breaking representation and lazy initialization of its atoms. We show that efficiently lazy initialization is possible if and only if a size-biased representation of the discrete RPM is used. For models constructed from such discrete RPMs, we consider the implications for generic particle-based inference methods in probabilistic programming systems. To demonstrate, we implement SMC for Normalized Inverse Gaussian Process mixture models in Turing.
@article{bloemreddy2017rpm,
title = {Sampling and inference for discrete random probability measures in probabilistic programs},
author = {Bloem-Reddy, Benjamin and Mathieu, Emile and Foster, Adam and Rainforth, Tom and Ge, Hong and Lomelí, María and Ghahramani, Zoubin and Teh, Yee Whye},
journal = {NIPS Workshop on Advances in Approximate Bayesian Inference},
year = {2017}
}
X. Lu
,
T. Rainforth
,
Y. Zhou
,
Y. W. Teh
,
F. Wood
,
H. Yang
,
J. Meent
,
On Exploration, Exploitation and Learning in Adaptive Importance Sampling, NIPS Workshop on Advances in Approximate Bayesian Inference, 2017.
@article{lu2017ai,
title = {On Exploration, Exploitation and Learning in Adaptive Importance Sampling},
author = {Lu, Xiaoyu and Rainforth, Tom and Zhou, Yuan and Teh, Yee Whye and Wood, Frank and Yang, Hongseok and van de Meent, Jan-Willem},
journal = {NIPS Workshop on Advances in Approximate Bayesian Inference},
year = {2017}
}
2016
T. Rainforth
,
T. A. Le
,
J. Meent
,
M. A. Osborne
,
F. Wood
,
Bayesian Optimization for Probabilistic Programs, in Advances in Neural Information Processing Systems, 2016, 280–288.
We present the first general purpose framework for marginal maximum a
posteriori estimation of probabilistic program variables. By using a
series of code transformations, the evidence of any probabilistic
program, and therefore of any graphical model, can be optimized with
respect to an arbitrary subset of its sampled variables. To carry out
this optimization, we develop the first Bayesian optimization package
to directly exploit the source code of its target, leading to innovations
in problem-independent hyperpriors, unbounded optimization, and implicit
constraint satisfaction; delivering significant performance improvements
over prominent existing packages. We present applications of our method
to a number of tasks including engineering design and parameter optimization.
@inproceedings{rainforth2016BOPP,
title = {Bayesian {O}ptimization for {P}robabilistic {P}rograms},
author = {Rainforth, Tom and Le, Tuan Anh and van de Meent, Jan-Willem and Osborne, Michael A and Wood, Frank},
booktitle = {Advances in Neural Information Processing Systems},
pages = {280--288},
year = {2016}
}
T. Rainforth
,
R. Cornish
,
H. Yang
,
F. Wood
,
On the Pitfalls of Nested Monte Carlo, NIPS Workshop on Advances in Approximate Bayesian Inference, 2016.
There is an increasing interest in estimating expectations outside of the
classical inference framework, such as for models expressed as probabilistic
programs. Many of these contexts call for some form of nested inference to
be applied. In this paper, we analyse the behaviour of nested Monte Carlo
(NMC) schemes, for which classical convergence proofs are insufficient.
We give conditions under which NMC will converge, establish a rate of
convergence, and provide empirical data that suggests that this rate is
observable in practice. Finally, we prove that general-purpose nested
inference schemes are inherently biased. Our results serve to warn of
the dangers associated with na ̈ıve composition of inference and models.
@article{rainforth2016nestedMC,
title = {On the {P}itfalls of {N}ested {M}onte {C}arlo},
author = {Rainforth, Tom and Cornish, Robert and Yang, Hongseok and Wood, Frank},
year = {2016},
journal = {NIPS Workshop on Advances in Approximate Bayesian Inference}
}
D. Janz
,
B. Paige
,
T. Rainforth
,
J. Meent
,
F. Wood
,
Probabilistic Structure Discovery in Time Series Data, NIPS Workshop on Artificial Intelligence for Data Science, 2016.
Existing methods for structure discovery in time
series data construct interpretable, compositional kernels for
Gaussian process regression models. While the learned Gaussian
process model provides posterior mean and variance estimates,
typically the structure is learned via a greedy optimization procedure.
This restricts the space of possible solutions and leads to
over-confident uncertainty estimates. We introduce a fully Bayesian
approach, inferring a full posterior over structures, which more
reliably captures the uncertainty of the model.
@article{janz2016probstruct,
title = {Probabilistic {S}tructure {D}iscovery in {T}ime {S}eries {D}ata},
author = {Janz, David and Paige, Brooks and Rainforth, Tom and van de Meent, Jan-Willem and Wood, Frank},
year = {2016},
journal = {NIPS Workshop on Artificial Intelligence for Data Science}
}
T. Rainforth
,
C. A. Naesseth
,
F. Lindsten
,
B. Paige
,
J. Meent
,
A. Doucet
,
F. Wood
,
Interacting Particle Markov Chain Monte Carlo, in Proceedings of the 33rd International Conference on Machine Learning, 2016, vol. 48.
We introduce interacting particle Markov chain
Monte Carlo (iPMCMC), a PMCMC method
based on an interacting pool of standard and conditional
sequential Monte Carlo samplers. Like
related methods, iPMCMC is a Markov chain
Monte Carlo sampler on an extended space. We
present empirical results that show significant improvements
in mixing rates relative to both noninteracting
PMCMC samplers, and a single PMCMC
sampler with an equivalent memory and
computational budget. An additional advantage
of the iPMCMC method is that it is suitable for
distributed and multi-core architectures.
@inproceedings{rainforth2016ipmcmc,
title = {Interacting Particle {M}arkov Chain {M}onte {C}arlo},
author = {Rainforth, Tom and Naesseth, Christian A and Lindsten, Fredrik and Paige, Brooks and van de Meent, Jan-Willem and Doucet, Arnaud and Wood, Frank},
booktitle = {Proceedings of the 33rd International Conference on Machine Learning},
series = {JMLR: W\&CP},
volume = {48},
year = {2016}
}
2015
T. Rainforth
,
F. Wood
,
Canonical Correlation Forests, arXiv preprint arXiv:1507.05444, 2015.
We introduce canonical correlation forests (CCFs), a new decision tree ensemble method for classification and regression. Individual canonical correlation trees are binary decision trees with hyperplane splits based on local canonical correlation coefficients calculated during training. Unlike axis-aligned alternatives, the decision surfaces of CCFs are not restricted to the coordinate system of the inputs features and therefore more naturally represent data with correlated inputs. CCFs naturally accommodate multiple outputs, provide a similar computational complexity to random forests, and inherit their impressive robustness to the choice of input parameters. As part of the CCF training algorithm, we also introduce projection bootstrapping, a novel alternative to bagging for oblique decision tree ensembles which maintains use of the full dataset in selecting split points, often leading to improvements in predictive accuracy. Our experiments show that, even without parameter tuning, CCFs out-perform axis-aligned random forests and other state-of-the-art tree ensemble methods on both classification and regression problems, delivering both improved predictive accuracy and faster training times. We further show that they outperform all of the 179 classifiers considered in a recent extensive survey.
@article{rainforth2015canonical,
title = {Canonical Correlation Forests},
author = {Rainforth, Tom and Wood, Frank},
journal = {arXiv preprint arXiv:1507.05444},
year = {2015},
arixv = {https://arxiv.org/abs/1507.05444}
}