I am a final year PhD student from the OxWaSp program, supervised by Professor Yee Whye Teh. My research interests lie in the field of machine learning. In Particular, I have completed several projects in Tucker Gaussian Processes, Relativistic Monte Carlo for Bayesian inference and Bayesian Optimization on combinatorial spaces. Currently I am working on the exploration-exploitation in adaptive importance sampling, from the view of a multi-armed bandit problem.
Publications
2018
T. Rainforth
,
Y. Zhou
,
X. Lu
,
Y. W. Teh
,
F. Wood
,
H. Yang
,
J. Meent
,
Inference Trees: Adaptive Inference with Exploration, arXiv preprint arXiv:1806.09550, 2018.
We introduce inference trees (ITs), a new class of inference methods that build
on ideas from Monte Carlo tree search to perform adaptive sampling in a manner
that balances exploration with exploitation, ensures consistency, and alleviates
pathologies in existing adaptive methods. ITs adaptively sample from hierarchical
partitions of the parameter space, while simultaneously learning these partitions
in an online manner. This enables ITs to not only identify regions of high posterior
mass, but also maintain uncertainty estimates to track regions where significant
posterior mass may have been missed. ITs can be based on any inference method
that provides a consistent estimate of the marginal likelihood. They are particularly
effective when combined with sequential Monte Carlo, where they capture long-range
dependencies and yield improvements beyond proposal adaptation alone.
@article{rainforth2018it,
title = {Inference Trees: Adaptive Inference with Exploration},
author = {Rainforth, Tom and Zhou, Yuan and Lu, Xiaoyu and Teh, Yee Whye and Wood, Frank and Yang, Hongseok and van de Meent, Jan-Willem},
journal = {arXiv preprint arXiv:1806.09550},
year = {2018}
}
X. Lu
,
T. Rainforth
,
Y. Zhou
,
J. Meent
,
Y. W. Teh
,
On Exploration, Exploitation and Learning in Adaptive Importance Sampling, arXiv preprint arXiv:1810.13296, 2018.
We study adaptive importance sampling (AIS) as an online learning problem and argue for the importance of the trade-off between exploration and exploitation in this adaptation. Borrowing ideas from the bandits literature, we propose Daisee, a partition-based AIS algorithm. We further introduce a notion of regret for AIS
and show that Daisee has O((log T)^(3/4) √T) cumulative pseudo-regret, where T is
the number of iterations. We then extend Daisee to adaptively learn a hierarchical partitioning of the sample space for more efficient sampling and confirm the performance of both algorithms empirically.
@article{lu2018exploration,
title = {{On Exploration, Exploitation and Learning in Adaptive Importance Sampling}},
author = {Lu, Xiaoyu and Rainforth, Tom and Zhou, Yuan and van de Meent, Jan-Willem and Teh, Yee Whye},
journal = {arXiv preprint arXiv:1810.13296},
year = {2018}
}
Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a Metropolis-Hastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In particular the mass matrix of HMC is hard to tune well. In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduce a maximum velocity on particles. We also derive stochastic gradient versions of the algorithm and show that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning. Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo. In experiments we show that the relativistic algorithms perform better than classical Newtonian variants and Adam.
@inproceedings{LuPerHas2016a,
author = {Lu, X. and Perrone, V. and Hasenclever, L. and Teh, Y. W. and Vollmer, S. J.},
booktitle = {Artificial Intelligence and Statistics (AISTATS)},
title = {Relativistic {M}onte {C}arlo},
month = apr,
year = {2017},
bdsk-url-1 = {https://arxiv.org/pdf/1609.04388v1.pdf}
}
We tackle the problem of collaborative filtering (CF) with side information, through the lens of Gaussian Process (GP) regression. Driven by the idea of using the kernel to explicitly model user-item similarities, we formulate the GP in a way that allows the incorporation of low-rank matrix factorisation, arriving at our model, the Tucker Gaussian Process (TGP). Consequently, TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information, giving enhanced predictive performance for CF problems. Moreover we show that it is a novel model for regression, especially well-suited to grid-structured data and problems where the dependence on covariates is close to being separable.
@unpublished{kimluflateh16,
title = {Collaborative Filtering with Side Information: a Gaussian Process Perspective},
author = {Kim, H. and Lu, X. and Flaxman, S. and Teh, Y. W.},
note = {ArXiv e-prints: 1605.07025},
year = {2016}
}