Machine learning, Bayesian inference, Probabilistic programming, Deep generative models
I am a Florence Nightingale Bicentennial Fellow and Tutor in Statistics and
Probability and a Junior Research Fellow at Christ Church College Oxford.
Please see my personal website here for
further details.
Publications
2021
A. Foster
,
R. Pukdee
,
T. Rainforth
,
Improving Transformation Invariance in Contrastive Representation Learning, International Conference on Learning Representations (ICLR), 2021.
We propose methods to strengthen the invariance properties of representations obtained by contrastive learning. While existing approaches implicitly induce a degree of invariance as representations are learned, we look to more directly enforce invariance in the encoding process. To this end, we first introduce a training objective for contrastive learning that uses a novel regularizer to control how the representation changes under transformation. We show that representations trained with this objective perform better on downstream tasks and are more robust to the introduction of nuisance transformations at test time. Second, we propose a change to how test time representations are generated by introducing a feature averaging approach that combines encodings from multiple transformations of the original input, finding that this leads to across the board performance gains. Finally, we introduce the novel Spirograph dataset to explore our ideas in the context of a differentiable generative process with multiple downstream tasks, showing that our techniques for learning invariance are highly beneficial.
@article{foster2021improving,
title = {Improving Transformation Invariance in Contrastive Representation Learning},
author = {Foster, Adam and Pukdee, Rattana and Rainforth, Tom},
year = {2021},
journal = {International Conference on Learning Representations (ICLR)}
}
A. Foster
,
D. R. Ivanova
,
I. Malik
,
T. Rainforth
,
Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design, International Conference on Machine Learning (ICML, long presentation), 2021.
We introduce Deep Adaptive Design (DAD), a general method for amortizing the cost of performing sequential adaptive experiments using the framework of Bayesian optimal experimental design (BOED). Traditional sequential BOED approaches require substantial computational time at each stage of the experiment. This makes them unsuitable for most real-world applications, where decisions must typically be made quickly. DAD addresses this restriction by learning an amortized design network upfront and then using this to rapidly run (multiple) adaptive experiments at deployment time. This network takes as input the data from previous steps, and outputs the next design using a single forward pass; these design decisions can be made in milliseconds during the live experiment. To train the network, we introduce contrastive information bounds that are suitable objectives for the sequential setting, and propose a customized network architecture that exploits key symmetries. We demonstrate that DAD successfully amortizes the process of experimental design, outperforming alternative strategies on a number of problems.
@article{foster2021deep,
title = {{Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design}},
author = {Foster, Adam and Ivanova, Desi R and Malik, Ilyas and Rainforth, Tom},
year = {2021},
journal = {International Conference on Machine Learning (ICML, long presentation)}
}
M. Willetts
,
A. Camuto
,
T. Rainforth
,
S. Roberts
,
C. Holmes
,
Improving VAEs’ Robustness to Adversarial Attack, in International Conference on Learning Representations (ICLR), 2021.
@inproceedings{Willetts2019VAEAdvRobustness,
archiveprefix = {arXiv},
arxivid = {1906.00230},
author = {Willetts, Matthew and Camuto, Alexander and Rainforth, Tom and Roberts, Stephen and Holmes, Chris},
eprint = {1906.00230},
booktitle = {International Conference on Learning Representations (ICLR)},
title = {{Improving VAEs' Robustness to Adversarial Attack}},
year = {2021}
}
A. Camuto
,
M. Willetts
,
S. Roberts
,
C. Holmes
,
T. Rainforth
,
Towards a Theoretical Understanding of the Robustness of Variational Autoencoders, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
@inproceedings{Camuto2020VAERobustnessTheory,
archiveprefix = {arXiv},
arxivid = {2007.07365},
author = {Camuto, Alexander and Willetts, Matthew and Roberts, Stephen and Holmes, Chris and Rainforth, Tom},
eprint = {2007.07365},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
title = {{Towards a Theoretical Understanding of the Robustness of Variational Autoencoders}},
year = {2021}
}
2020
A. Foster
,
M. Jankowiak
,
M. O’Meara
,
Y. W. Teh
,
T. Rainforth
,
A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments, International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.
We introduce a fully stochastic gradient based approach to Bayesian optimal experimental design (BOED). This is achieved through the use of variational lower bounds on the expected information gain (EIG) of an experiment that can be simultaneously optimized with respect to both the variational and design parameters. This allows the design process to be carried out through a single unified stochastic gradient ascent procedure, in contrast to existing approaches that typically construct an EIG estimator on a pointwise basis, before passing this estimator to a separate optimizer. We show that this, in turn, leads to more efficient BOED schemes and provide a number of a different variational objectives suited to different settings. Furthermore, we show that our gradient-based approaches are able to provide effective design optimization in substantially higher dimensional settings than existing approaches.
@article{foster2020unified,
title = {{A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments}},
author = {Foster, Adam and Jankowiak, Martin and O'Meara, Matthew and Teh, Yee Whye and Rainforth, Tom},
journal = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
year = {2020}
}
T. Joy
,
S. M. Schmon
,
P. Torr
,
S. Narayanaswamy
,
T. Rainforth
,
Rethinking Semi–Supervised Learning in VAEs, https://arxiv.org/abs/2006.10102, 2020.
@article{Joy2020,
author = {Joy, Tom and Schmon, Sebastian M and Torr, Philipp and Narayanaswamy, Siddharth and Rainforth, Tom},
journal = {https://arxiv.org/abs/2006.10102},
title = {Rethinking Semi–Supervised Learning in VAEs},
year = {2020}
}
2019
S. Webb
,
T. Rainforth
,
Y. W. Teh
,
M. P. Kumar
,
A Statistical Approach to Assessing Neural Network Robustness, in International Conference on Learning Representations (ICLR), 2019.
We present a new approach to assessing the robustness of neural networks based on estimating the proportion of inputs for which a property is violated. Specifically, we estimate the probability of the event that the property is violated under an input model. Our approach critically varies from the formal verification framework in that when the property can be violated, it provides an informative notion of how robust the network is, rather than just the conventional assertion that the network is not verifiable. Furthermore, it provides an ability to scale to larger networks than formal verification approaches. Though the framework still provides a formal guarantee of satisfiability whenever it successfully finds one or more violations, these advantages do come at the cost of only providing a statistical estimate of unsatisfiability whenever no violation is found. Key to the practical success of our approach is an adaptation of multi-level splitting, a Monte Carlo approach for estimating the probability of rare events, to our statistical robustness framework. We demonstrate that our approach is able to emulate formal verification procedures on benchmark problems, while scaling to larger networks and providing reliable additional information in the form of accurate estimates of the violation probability.
@inproceedings{webb2018statistical,
title = {{A Statistical Approach to Assessing Neural Network Robustness}},
author = {Webb, Stefan and Rainforth, Tom and Teh, Yee Whye and Kumar, M Pawan},
booktitle = {International Conference on Learning Representations (ICLR)},
month = may,
year = {2019}
}
B. Gram-Hansen
,
C. S. Witt
,
T. Rainforth
,
P. H. Torr
,
Y. W. Teh
,
A. G. Baydin
,
Hijacking Malaria Simulators with Probabilistic Programming, in International Conference on Machine Learning (ICML) AI for Social Good workshop (AI4SG), 2019.
@inproceedings{gram2019hijacking,
title = {Hijacking Malaria Simulators with Probabilistic Programming},
author = {Gram-Hansen, Bradley and de Witt, Christian Schr{\"o}der and Rainforth, Tom and Torr, Philip HS and Teh, Yee Whye and Baydin, At{\i}l{\i}m G{\"u}ne{\c{s}}},
booktitle = {International Conference on Machine Learning (ICML) AI for Social Good workshop (AI4SG)},
year = {2019}
}
E. Mathieu
,
T. Rainforth
,
N. Siddharth
,
Y. W. Teh
,
Disentangling Disentanglement in Variational Autoencoders, in Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, USA, 2019, vol. 97, 4402–4412.
We develop a generalisation of disentanglement in VAEs—decomposition of the latent representation—characterising it as the fulfilment of two factors: a) the latent encodings of the data having an appropriate level of overlap, and b) the aggregate encoding of the data conforming to a desired structure, represented through the prior. Decomposition permits disentanglement, i.e. explicit independence between latents, as a special case, but also allows for a much richer class of properties to be imposed on the learnt representation, such as sparsity, clustering, independent subspaces, or even intricate hierarchical dependency relationships. We show that the β-VAE varies from the standard VAE predominantly in its control of latent overlap and that for the standard choice of an isotropic Gaussian prior, its objective is invariant to rotations of the latent representation. Viewed from the decomposition perspective, breaking this invariance with simple manipulations of the prior can yield better disentanglement with little or no detriment to reconstructions. We further demonstrate how other choices of prior can assist in producing different decompositions and introduce an alternative training objective that allows the control of both decomposition factors in a principled manner.
@inproceedings{pmlr-v97-mathieu19a,
title = {Disentangling Disentanglement in Variational Autoencoders},
author = {Mathieu, Emile and Rainforth, Tom and Siddharth, N and Teh, Yee Whye},
booktitle = {Proceedings of the 36th International Conference on Machine Learning},
pages = {4402--4412},
year = {2019},
volume = {97},
series = {Proceedings of Machine Learning Research},
address = {Long Beach, California, USA},
month = {09--15 Jun},
publisher = {PMLR}
}
A. Foster
,
M. Jankowiak
,
E. Bingham
,
P. Horsfall
,
Y. W. Teh
,
T. Rainforth
,
N. Goodman
,
Variational Bayesian Optimal Experimental Design, Advances in Neural Information Processing Systems (NeurIPS, spotlight), 2019.
Bayesian optimal experimental design (BOED) is a principled framework
for making efficient use of limited experimental resources. Unfortunately,
its applicability is hampered by the difficulty of obtaining accurate estimates
of the expected information gain (EIG) of an experiment. To address this, we
introduce several classes of fast EIG estimators by building on ideas from
amortized variational inference. We show theoretically and empirically that
these estimators can provide significant gains in speed and accuracy over
previous approaches. We further demonstrate the practicality of our approach
on a number of end-to-end experiments.
@article{foster2019variational,
title = {{Variational Bayesian Optimal Experimental Design}},
author = {Foster, Adam and Jankowiak, Martin and Bingham, Eli and Horsfall, Paul and Teh, Yee Whye and Rainforth, Tom and Goodman, Noah},
journal = {Advances in Neural Information Processing Systems (NeurIPS, spotlight)},
year = {2019}
}
F. Locatello
,
G. Abbati
,
T. Rainforth
,
S. Bauer
,
B. Schölkopf
,
O. Bachem
,
On the Fairness of Disentangled Representations, Advances in Neural Information Processing Systems (NeurIPS, to appear), 2019.
Recently there has been a significant interest in learning
disentangled representations, as they promise increased interpretability,
generalization to unseen scenarios and faster learning on downstream tasks.
In this paper, we investigate the usefulness of different notions of
disentanglement for improving the fairness of downstream prediction tasks
based on representations. We consider the setting where the goal is to predict
a target variable based on the learned representation of high-dimensional
observations (such as images) that depend on both the target variable and
an unobserved sensitive variable. We show that in this setting both the
optimal and empirical predictions can be unfair, even if the target variable
and the sensitive variable are independent. Analyzing more than 12600 trained
representations of state-of-the-art disentangled models, we observe that
various disentanglement scores are consistently correlated with increased
fairness, suggesting that disentanglement may be a useful property to encourage
fairness when sensitive variables are not observed.
@article{locatello2019fairness,
title = {{On the Fairness of Disentangled Representations}},
author = {Locatello, Francesco and Abbati, Gabriele and Rainforth, Tom and Bauer, Stefan and Sch{\"o}lkopf, Bernhard and Bachem, Olivier},
journal = {Advances in Neural Information Processing Systems (NeurIPS, to appear)},
year = {2019}
}
A. Golinski
,
F. Wood
,
T. Rainforth
,
Amortized Monte Carlo Integration, International Conference on Machine Learning (ICML, Best Paper honorable mention), 2019.
Current approaches to amortizing Bayesian inference focus solely on
approximating the posterior distribution. Typically, this approximation is,
in turn, used to calculate expectations for one or more target functions—a
computational pipeline which is inefficient when the target function(s) are
known upfront. In this paper, we address this inefficiency by introducing
AMCI, a method for amortizing Monte Carlo integration directly. AMCI operates
similarly to amortized inference but produces three distinct amortized
proposals, each tailored to a different component of the overall expectation
calculation. At run-time, samples are produced separately from each amortized
proposal, before being combined to an overall estimate of the expectation.
We show that while existing approaches are fundamentally limited in the level
of accuracy they can achieve, AMCI can theoretically produce arbitrarily
small errors for any integrable target function using only a single sample
from each proposal at run-time. We further show that it is able to empirically
outperform the theoretically optimal self-normalized importance sampler on a
number of example problems. Furthermore, AMCI allows not only for amortizing
over datasets but also amortizing over target functions.
@article{golinski2018amci,
title = {{Amortized Monte Carlo Integration}},
author = {Golinski, Adam and Wood, Frank and Rainforth, Tom},
journal = {International Conference on Machine Learning (ICML, Best Paper honorable mention)},
year = {2019}
}
Y. Zhou
,
B. Gram-Hansen
,
T. Kohn
,
T. Rainforth
,
H. Yang
,
F. Wood
,
A Low-Level Probabilistic Programming Language
for Non-Differentiable Models, International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
We develop a new Low-level, First-order Prob- abilistic Programming Language (LF-PPL) suited for models containing a mix of contin- uous, discrete, and/or piecewise-continuous variables. The key success of this language and its compilation scheme, is in its ability to automatically distinguish the discontinuous and continuous parameters in the density func- tion, while further providing runtime checks of when discontinuity boundaries have been crossed. This enables the introduction of new inference engines that are able to exploit gra- dient information, while remaining efficient for models which are not everywhere differen- tiable. We demonstrate this ability by intro- ducing a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this lan- guage has a density with a sufficiently low measure of discontinuities to maintain the validity of the inference engine.
@article{zhou2018lfppla,
title = {{A Low-Level Probabilistic Programming Language
for Non-Differentiable Models}},
author = {Zhou, Yuan and Gram-Hansen, Bradley and Kohn, Tobias and Rainforth, Tom and Yang, Hongseok and Wood, Frank},
year = {2019},
journal = {International Conference on Artificial Intelligence and Statistics (AISTATS)}
}
A. Golinski*
,
M. Lezcano-Casado*
,
T. Rainforth
,
Improving Normalizing Flows via Better Orthogonal Parameterizations, ICML Workshop on Invertible Neural Nets and Normalizing Flows, 2019.
Normalizing flows have to be designed in a manner that permits efficient
computation of the determinant of the transformation Jacobian, while
ensuring that the transformation remains invertible. Many methods achieve this
by using transformations based on matrix decompositions, with
decompositions that produce an orthogonal matrix being a popular such approach. However,
this introduces the additional difficulty of performing a constrained optimization over the space
of orthogonal matrices. Current approaches for
achieving this use methods which are prone to
numerical instability and which may introduce additional local minima into the
optimization landscape. To address this, we introduce two orthogonal matrix parameterizations stemming from Lie
group theory—the exponential map and the Cayley map—which alleviate these issues.
In particular, they are proven to not introduce new local
minima. Focusing on the example architecture of
Sylvester normalizing flows (van den Berg et al.
(2018)), we show empirically that using our suggested parameterizations lead to
significantly improved optimization and, in turn, more effective
normalizing flows.
@article{golinski2019snf,
title = {{Improving Normalizing Flows via Better Orthogonal Parameterizations}},
author = {Golinski*, Adam and Lezcano-Casado*, Mario and Rainforth, Tom},
journal = {ICML Workshop on Invertible Neural Nets and Normalizing Flows},
year = {2019}
}
Y. Zhou
,
B. Gram-Hansen
,
T. Kohn
,
T. Rainforth
,
H. Yang
,
F. Wood
,
LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models, in The 22nd International Conference on Artificial Intelligence and Statistics, 2019, 148–157.
@inproceedings{zhou2019lf,
title = {LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models},
author = {Zhou, Yuan and Gram-Hansen, Bradley and Kohn, Tobias and Rainforth, Tom and Yang, Hongseok and Wood, Frank},
booktitle = {The 22nd International Conference on Artificial Intelligence and Statistics},
pages = {148--157},
year = {2019}
}
Y. Zhou
,
H. Yang
,
Y. W. Teh
,
T. Rainforth
,
Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support, International Conference on Machine Learning (ICML, to appear), 2019.
@article{zhou2019divide,
title = {Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support},
author = {Zhou, Yuan and Yang, Hongseok and Teh, Yee Whye and Rainforth, Tom},
journal = {International Conference on Machine Learning (ICML, to appear)},
year = {2019}
}
Current approaches to amortizing Bayesian inference focus solely on approximating the posterior distribution. Typically, this approximation is in turn used to calculate expec- tations for one or more target functions. In this paper, we address the inefficiency of this computational pipeline when the target function(s) are known upfront. To this end, we introduce a method for amortizing Monte Carlo integration. Our approach operates in a similar manner to amortized inference, but tailors the produced amortization arti- facts to maximize the accuracy of the resulting expectation calculation(s). We show that while existing approaches have fundamental limitations in the level of accuracy that can be achieved for a given run time computational budget, our framework can produce arbitrary small errors for a wide range of target functions with O(1) computational cost at run time. Furthermore, our framework allows not only for amortizing over possible datasets, but also over possible target functions.
@inproceedings{golinski2018amcj,
title = {{Amortized Monte Carlo Integration}},
author = {Golinski, Adam and Teh, Yee Whye and Wood, Frank and Rainforth, Tom},
booktitle = {Symposium on Advances in Approximate Bayesian Inference},
year = {2018},
month = dec
}
We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted auto-encoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Moreover, PIWAE can simultaneously deliver improvements in both the quality of the inference network and generative network, relative to IWAE.
@inproceedings{rainforth2018tighter,
title = {Tighter Variational Bounds are Not Necessarily Better},
author = {Rainforth, Tom and Kosiorek, Adam R. and Le, Tuan Anh and Maddison, Chris J. and Igl, Maximilian and Wood, Frank and Teh, Yee Whye},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2018},
month = jul
}
A. Foster
,
M. Jankowiak
,
E. Bingham
,
Y. W. Teh
,
T. Rainforth
,
N. Goodman
,
Variational Optimal Experiment Design: Efficient Automation of Adaptive Experiments, NeurIPS Workshop on Bayesian Deep Learning, 2018.
Bayesian optimal experimental design (OED) is a principled framework for making efficient use of limited experimental resources. Unfortunately, the applicability of OED is hampered by the difficulty of obtaining accurate estimates of the expected information gain (EIG) for different experimental designs. We introduce a class of fast EIG estimators that leverage amortised variational inference and show that they provide substantial empirical gains over previous approaches. We integrate our approach into a deep probabilistic programming framework, thus making OED accessible to practitioners at large.
@article{foster2018voed,
title = {{Variational Optimal Experiment Design: Efficient Automation of Adaptive Experiments}},
author = {Foster, Adam and Jankowiak, Martin and Bingham, Eli and Teh, Yee Whye and Rainforth, Tom and Goodman, Noah},
journal = {NeurIPS Workshop on Bayesian Deep Learning},
year = {2018}
}
S. Webb
,
A. Golinski
,
R. Zinkov
,
N. Siddharth
,
T. Rainforth
,
Y. W. Teh
,
F. Wood
,
Faithful Inversion of Generative Models for Effective Amortized Inference, in Advances in Neural Information Processing Systems (NeurIPS), 2018.
Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently. Generally, they require the inversion of the dependency structure in the generative model, as the modeller must learn a mapping from observations to distributions approximating the posterior. Previous approaches have involved inverting the dependency structure in a heuristic way that fails to capture these dependencies correctly, thereby limiting the achievable accuracy of the resulting approximations. We introduce an algorithm for faithfully, and minimally, inverting the graphical model structure of any generative model. Such inverses have two crucial properties: (a) they do not encode any independence assertions that are absent from the model and; (b) they are local maxima for the number of true independencies encoded. We prove the correctness of our approach and empirically show that the resulting minimally faithful inverses lead to better inference amortization than existing heuristic approaches.
@inproceedings{webb2018minimal,
title = {Faithful Inversion of Generative Models for Effective Amortized Inference},
author = {Webb, Stefan and Golinski, Adam and Zinkov, Robert and Siddharth, N. and Rainforth, Tom and Teh, Yee Whye and Wood, Frank},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2018}
}
B. Gram-Hansen
,
Y. Zhou
,
T. Kohn
,
T. Rainforth
,
H. Yang
,
F. Wood
,
Hamiltonian Monte Carlo for Probabilistic Programs with Discontinuities, in International Conference on Probabilistic Programming, 2018.
@inproceedings{gram2018hamiltonian,
title = {Hamiltonian Monte Carlo for Probabilistic Programs with Discontinuities},
author = {Gram-Hansen, Bradley and Zhou, Yuan and Kohn, Tobias and Rainforth, Tom and Yang, Hongseok and Wood, Frank},
booktitle = {International Conference on Probabilistic Programming},
year = {2018}
}
T. Rainforth
,
Y. Zhou
,
X. Lu
,
Y. W. Teh
,
F. Wood
,
H. Yang
,
J. Meent
,
Inference Trees: Adaptive Inference with Exploration, arXiv preprint arXiv:1806.09550, 2018.
We introduce inference trees (ITs), a new class of inference methods that build
on ideas from Monte Carlo tree search to perform adaptive sampling in a manner
that balances exploration with exploitation, ensures consistency, and alleviates
pathologies in existing adaptive methods. ITs adaptively sample from hierarchical
partitions of the parameter space, while simultaneously learning these partitions
in an online manner. This enables ITs to not only identify regions of high posterior
mass, but also maintain uncertainty estimates to track regions where significant
posterior mass may have been missed. ITs can be based on any inference method
that provides a consistent estimate of the marginal likelihood. They are particularly
effective when combined with sequential Monte Carlo, where they capture long-range
dependencies and yield improvements beyond proposal adaptation alone.
@article{rainforth2018it,
title = {Inference Trees: Adaptive Inference with Exploration},
author = {Rainforth, Tom and Zhou, Yuan and Lu, Xiaoyu and Teh, Yee Whye and Wood, Frank and Yang, Hongseok and van de Meent, Jan-Willem},
journal = {arXiv preprint arXiv:1806.09550},
year = {2018}
}
X. Lu
,
T. Rainforth
,
Y. Zhou
,
J. Meent
,
Y. W. Teh
,
On Exploration, Exploitation and Learning in Adaptive Importance Sampling, arXiv preprint arXiv:1810.13296, 2018.
We study adaptive importance sampling (AIS) as an online learning problem and argue for the importance of the trade-off between exploration and exploitation in this adaptation. Borrowing ideas from the bandits literature, we propose Daisee, a partition-based AIS algorithm. We further introduce a notion of regret for AIS
and show that Daisee has O((log T)^(3/4) √T) cumulative pseudo-regret, where T is
the number of iterations. We then extend Daisee to adaptively learn a hierarchical partitioning of the sample space for more efficient sampling and confirm the performance of both algorithms empirically.
@article{lu2018exploration,
title = {{On Exploration, Exploitation and Learning in Adaptive Importance Sampling}},
author = {Lu, Xiaoyu and Rainforth, Tom and Zhou, Yuan and van de Meent, Jan-Willem and Teh, Yee Whye},
journal = {arXiv preprint arXiv:1810.13296},
year = {2018}
}
T. Rainforth
,
Nesting Probabilistic Programs, Conference on Uncertainty in Artificial Intelligence (UAI), 2018.
We formalize the notion of nesting probabilistic programming queries and
investigate the resulting statistical implications. We demonstrate that query nesting
allows the definition of models which could not otherwise be expressed, such as
those involving agents reasoning about other agents, but that existing systems take
approaches that lead to inconsistent estimates. We show how to correct this by delineating
possible ways one might want to nest queries and asserting the respective conditions required
for convergence. We further introduce, and prove the correctness of, a new online nested
Monte Carlo estimation method that makes it substantially easier to ensure these
conditions are met, thereby providing a simple framework for designing statistically correct inference engines.
@article{rainforth2018nesting,
title = {Nesting Probabilistic Programs},
author = {Rainforth, Tom},
journal = {Conference on Uncertainty in Artificial Intelligence (UAI)},
year = {2018}
}
T. Rainforth
,
R. Cornish
,
H. Yang
,
A. Warrington
,
F. Wood
,
On Nesting Monte Carlo Estimators, International Conference on Machine Learning (ICML), 2018.
Many problems in machine learning and statistics involve nested expectations
and thus do not permit conventional Monte Carlo (MC) estimation. For such problems,
one must nest estimators, such that terms in an outer estimator themselves involve
calculation of a separate, nested, estimation. We investigate the statistical
implications of nesting MC estimators, including cases of multiple levels of nesting,
and establish the conditions un- der which they converge. We derive corresponding
rates of convergence and provide empirical evidence that these rates are observed
in practice. We further establish a number of pitfalls that can arise from naive
nesting of MC estimators, provide guidelines about how these can be avoided,
and lay out novel methods for reformulating certain classes of nested expectation
problems into single expectations, leading to improved convergence rates. We
demonstrate the applicability of our work by using our results to develop a new
estimator for discrete Bayesian experimental design problems and derive error
bounds for a class of variational objectives.
@article{rainforth2017opportunities,
title = {{On Nesting Monte Carlo Estimators}},
author = {Rainforth, Tom and Cornish, Robert and Yang, Hongseok and Warrington, Andrew and Wood, Frank},
journal = {International Conference on Machine Learning (ICML)},
year = {2018}
}
T. A. Le
,
M. Igl
,
T. Rainforth
,
T. Jin
,
F. Wood
,
Auto-Encoding Sequential Monte Carlo, in International Conference on Learning Representations, 2018.
We build on auto-encoding sequential Monte Carlo (AESMC):
a method for model
and proposal learning based on maximizing the lower bound to the log marginal
likelihood in a broad family of structured probabilistic models. Our approach
relies on the efficiency of sequential Monte Carlo (SMC) for performing inference
in structured probabilistic models and the flexibility of deep neural networks
to model complex conditional probability distributions. We develop additional
theoretical insights and experiment with a new training procedure which can
improve both model and proposal learning. We demonstrate that our approach
provides a fast, easy-to-implement and scalable means for simultaneous model
learning and proposal adaptation in deep generative models.
@inproceedings{le2018autoencoding,
title = {Auto-Encoding Sequential Monte Carlo},
author = {Le, Tuan Anh and Igl, Maximilian and Rainforth, Tom and Jin, Tom and Wood, Frank},
booktitle = {International Conference on Learning Representations},
year = {2018}
}
2017
T. Rainforth
,
Automating Inference, Learning, and Design using
Probabilistic Programming, PhD thesis, University of Oxford, 2017.
Imagine a world where computational simulations can be inverted as easily as running them forwards, where data can be used to refine models automatically, and where the only expertise one needs to carry out powerful statistical analysis is a basic proficiency in scientific coding. Creating such a world is the ambitious long-term aim of probabilistic programming.
<br><br>The bottleneck for improving the probabilistic models, or simulators, used throughout the quantitative sciences, is often not an ability to devise better models conceptually, but a lack of expertise, time, or resources to realize such innovations. Probabilistic programming systems (PPSs) help alleviate this bottleneck by providing an expressive and accessible modeling framework, then automating the required computation to draw inferences from the model, for example finding the model parameters likely to give rise to a certain output. By decoupling model specification and inference, PPSs streamline the process of developing and drawing inferences from new models, while opening up powerful statistical methods to non-experts. Many systems further provide the flexibility to write new and exciting models which would be hard, or even impossible, to convey using conventional statistical frameworks.
<br><br>The central goal of this thesis is to improve and extend PPSs. In particular, we will make advancements to the underlying inference engines and increase the range of problems which can be tackled. For example, we will extend PPSs to a mixed inference-optimization framework, thereby providing automation of tasks such as model learning and engineering design. Meanwhile, we make inroads into constructing systems for automating adaptive sequential design problems, providing potential applications across the sciences. Furthermore, the contributions of the work reach far beyond probabilistic programming, as achieving our goal will require us to make advancements in a number of related fields such as particle Markov chain Monte Carlo methods, Bayesian optimization, and Monte Carlo fundamentals.
@phdthesis{rainforth2017thesis,
title = {{Automating Inference, Learning, and Design using
Probabilistic Programming}},
author = {Rainforth, Tom},
institution = {University of Oxford},
year = {2017}
}
B. T. Vincent
,
T. Rainforth
,
The DARC Toolbox: automated, flexible, and efficient delayed and risky choice experiments using Bayesian adaptive design, 2017.
Delayed and risky choice (DARC) experiments are a cornerstone of research in
psychology, behavioural economics and neuroeconomics. By collecting an agent’s preferences
between pairs of prospects we can characterise their preferences, investigate
what affects them, and probe the underlying decision making mechanisms. We present
a state-of-the-art approach and software toolbox allowing such DARC experiments to
be run in a highly efficient way. Data collection is costly, so our toolbox automatically
and adaptively generates pairs of prospects in real time to maximise the information
gathered about the participant’s behaviours. We demonstrate that this leads to improvements
over alternative experimental paradigms. The key to releasing our real
time and automatic performance is a number of advances over current Bayesian adaptive
design methodology. In particular, we derive an improved estimator for discrete
output problems and design a novel algorithm for automating sequential adaptive design.
We provide a number of pre-prepared DARC tools for researchers to use, but a
key contribution is an adaptive experiment toolbox that can be extended to virtually
any 2-alternative-choice tasks. In particular, to carry out custom adaptive experiments
using our toolbox, the user need only encode their behavioural model and design space
– both the subsequent inference and sequential design optimisation are automated for
arbitrary models the user might write.
@article{vincent2017darc,
title = {The DARC Toolbox: automated, flexible, and efficient delayed and risky choice experiments using Bayesian adaptive design},
author = {Vincent, Benjamin T and Rainforth, Tom},
year = {2017},
publisher = {PsyArXiv}
}
B. Bloem-Reddy
,
E. Mathieu
,
A. Foster
,
T. Rainforth
,
H. Ge
,
M. Lomelí
,
Z. Ghahramani
,
Y. W. Teh
,
Sampling and inference for discrete random probability measures in probabilistic programs, NIPS Workshop on Advances in Approximate Bayesian Inference, 2017.
We consider the problem of sampling a sequence from a discrete random probability measure (RPM) with countable support, under (probabilistic) constraints of finite memory and computation. A canonical example is sampling from the Dirichlet Process, which can be accomplished using its stick-breaking representation and lazy initialization of its atoms. We show that efficiently lazy initialization is possible if and only if a size-biased representation of the discrete RPM is used. For models constructed from such discrete RPMs, we consider the implications for generic particle-based inference methods in probabilistic programming systems. To demonstrate, we implement SMC for Normalized Inverse Gaussian Process mixture models in Turing.
@article{bloemreddy2017rpm,
title = {Sampling and inference for discrete random probability measures in probabilistic programs},
author = {Bloem-Reddy, Benjamin and Mathieu, Emile and Foster, Adam and Rainforth, Tom and Ge, Hong and Lomelí, María and Ghahramani, Zoubin and Teh, Yee Whye},
journal = {NIPS Workshop on Advances in Approximate Bayesian Inference},
year = {2017}
}
2016
T. Rainforth
,
T. A. Le
,
J. Meent
,
M. A. Osborne
,
F. Wood
,
Bayesian Optimization for Probabilistic Programs, in Advances in Neural Information Processing Systems, 2016, 280–288.
We present the first general purpose framework for marginal maximum a
posteriori estimation of probabilistic program variables. By using a
series of code transformations, the evidence of any probabilistic
program, and therefore of any graphical model, can be optimized with
respect to an arbitrary subset of its sampled variables. To carry out
this optimization, we develop the first Bayesian optimization package
to directly exploit the source code of its target, leading to innovations
in problem-independent hyperpriors, unbounded optimization, and implicit
constraint satisfaction; delivering significant performance improvements
over prominent existing packages. We present applications of our method
to a number of tasks including engineering design and parameter optimization.
@inproceedings{rainforth2016BOPP,
title = {Bayesian {O}ptimization for {P}robabilistic {P}rograms},
author = {Rainforth, Tom and Le, Tuan Anh and van de Meent, Jan-Willem and Osborne, Michael A and Wood, Frank},
booktitle = {Advances in Neural Information Processing Systems},
pages = {280--288},
year = {2016}
}
T. Rainforth
,
R. Cornish
,
H. Yang
,
F. Wood
,
On the Pitfalls of Nested Monte Carlo, NIPS Workshop on Advances in Approximate Bayesian Inference, 2016.
There is an increasing interest in estimating expectations outside of the
classical inference framework, such as for models expressed as probabilistic
programs. Many of these contexts call for some form of nested inference to
be applied. In this paper, we analyse the behaviour of nested Monte Carlo
(NMC) schemes, for which classical convergence proofs are insufficient.
We give conditions under which NMC will converge, establish a rate of
convergence, and provide empirical data that suggests that this rate is
observable in practice. Finally, we prove that general-purpose nested
inference schemes are inherently biased. Our results serve to warn of
the dangers associated with na ̈ıve composition of inference and models.
@article{rainforth2016nestedMC,
title = {On the {P}itfalls of {N}ested {M}onte {C}arlo},
author = {Rainforth, Tom and Cornish, Robert and Yang, Hongseok and Wood, Frank},
year = {2016},
journal = {NIPS Workshop on Advances in Approximate Bayesian Inference}
}
D. Janz
,
B. Paige
,
T. Rainforth
,
J. Meent
,
F. Wood
,
Probabilistic Structure Discovery in Time Series Data, NIPS Workshop on Artificial Intelligence for Data Science, 2016.
Existing methods for structure discovery in time
series data construct interpretable, compositional kernels for
Gaussian process regression models. While the learned Gaussian
process model provides posterior mean and variance estimates,
typically the structure is learned via a greedy optimization procedure.
This restricts the space of possible solutions and leads to
over-confident uncertainty estimates. We introduce a fully Bayesian
approach, inferring a full posterior over structures, which more
reliably captures the uncertainty of the model.
@article{janz2016probstruct,
title = {Probabilistic {S}tructure {D}iscovery in {T}ime {S}eries {D}ata},
author = {Janz, David and Paige, Brooks and Rainforth, Tom and van de Meent, Jan-Willem and Wood, Frank},
year = {2016},
journal = {NIPS Workshop on Artificial Intelligence for Data Science}
}
T. Rainforth
,
C. A. Naesseth
,
F. Lindsten
,
B. Paige
,
J. Meent
,
A. Doucet
,
F. Wood
,
Interacting Particle Markov Chain Monte Carlo, in Proceedings of the 33rd International Conference on Machine Learning, 2016, vol. 48.
We introduce interacting particle Markov chain
Monte Carlo (iPMCMC), a PMCMC method
based on an interacting pool of standard and conditional
sequential Monte Carlo samplers. Like
related methods, iPMCMC is a Markov chain
Monte Carlo sampler on an extended space. We
present empirical results that show significant improvements
in mixing rates relative to both noninteracting
PMCMC samplers, and a single PMCMC
sampler with an equivalent memory and
computational budget. An additional advantage
of the iPMCMC method is that it is suitable for
distributed and multi-core architectures.
@inproceedings{rainforth2016ipmcmc,
title = {Interacting Particle {M}arkov Chain {M}onte {C}arlo},
author = {Rainforth, Tom and Naesseth, Christian A and Lindsten, Fredrik and Paige, Brooks and van de Meent, Jan-Willem and Doucet, Arnaud and Wood, Frank},
booktitle = {Proceedings of the 33rd International Conference on Machine Learning},
series = {JMLR: W\&CP},
volume = {48},
year = {2016}
}
2015
T. Rainforth
,
F. Wood
,
Canonical Correlation Forests, arXiv preprint arXiv:1507.05444, 2015.
We introduce canonical correlation forests (CCFs), a new decision tree ensemble method for classification and regression. Individual canonical correlation trees are binary decision trees with hyperplane splits based on local canonical correlation coefficients calculated during training. Unlike axis-aligned alternatives, the decision surfaces of CCFs are not restricted to the coordinate system of the inputs features and therefore more naturally represent data with correlated inputs. CCFs naturally accommodate multiple outputs, provide a similar computational complexity to random forests, and inherit their impressive robustness to the choice of input parameters. As part of the CCF training algorithm, we also introduce projection bootstrapping, a novel alternative to bagging for oblique decision tree ensembles which maintains use of the full dataset in selecting split points, often leading to improvements in predictive accuracy. Our experiments show that, even without parameter tuning, CCFs out-perform axis-aligned random forests and other state-of-the-art tree ensemble methods on both classification and regression problems, delivering both improved predictive accuracy and faster training times. We further show that they outperform all of the 179 classifiers considered in a recent extensive survey.
@article{rainforth2015canonical,
title = {Canonical Correlation Forests},
author = {Rainforth, Tom and Wood, Frank},
journal = {arXiv preprint arXiv:1507.05444},
year = {2015},
arixv = {https://arxiv.org/abs/1507.05444}
}
Software
2019
Y. Zhou
,
B. Gram-Hansen
,
T. Kohn
,
T. Rainforth
,
H. Yang
,
F. Wood
,
A Low-Level Probabilistic Programming Language for Non-Differentiable Models, International Conference on Artificial Intelligence and Statistics (AISTATS). 2019.
We develop a new Low-level, First-order Prob- abilistic Programming Language (LF-PPL) suited for models containing a mix of contin- uous, discrete, and/or piecewise-continuous variables. The key success of this language and its compilation scheme, is in its ability to automatically distinguish the discontinuous and continuous parameters in the density func- tion, while further providing runtime checks of when discontinuity boundaries have been crossed. This enables the introduction of new inference engines that are able to exploit gra- dient information, while remaining efficient for models which are not everywhere differen- tiable. We demonstrate this ability by intro- ducing a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this lan- guage has a density with a sufficiently low measure of discontinuities to maintain the validity of the inference engine.
@software{zhou2018lfpplb,
title = {{A Low-Level Probabilistic Programming Language for Non-Differentiable Models}},
author = {Zhou, Yuan and Gram-Hansen, Bradley and Kohn, Tobias and Rainforth, Tom and Yang, Hongseok and Wood, Frank},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
year = {2019},
bdsk-url-1 = {https://github.com/bradleygramhansen/PyLFPPL}
}