A second year DPhil student as part of the OxWaSP program, supervised by Professor Dino Sejdinovic and Professor Yee Whye Teh. My research interests lie in the use of kernel methods in the meta-learning setup. I am also interested in Gaussian processes and un/self-supervised learning.

Publications

2019

J. Ton
,
L. Chan
,
Y. W. Teh
,
D. Sejdinovic
,
Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings, ArXiv e-prints:1906.02236, 2019.

Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. on estimating conditional expectations in regression. In many applications, however, we are faced with conditional distributions which cannot be meaningfully summarized using expectation only (due to e.g. multimodality). Hence, we consider the problem of conditional density estimation in the meta-learning setting. We introduce a novel technique for meta-learning which combines neural representation and noise-contrastive estimation with the established literature of conditional mean embeddings into reproducing kernel Hilbert spaces. The method is validated on synthetic and real-world problems, demonstrating the utility of sharing learned representations across multiple conditional density estimation tasks.

@unpublished{TonChaTehSej2019,
author = {Ton, Jean-Francois and Chan, Lucian and Teh, Yee Whye and Sejdinovic, Dino},
title = {{{Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings}}},
journal = {ArXiv e-prints:1906.02236},
year = {2019}
}

Z. Li
,
J. Ton
,
D. Oglic
,
D. Sejdinovic
,
Towards A Unified Analysis of Random Fourier Features, in International Conference on Machine Learning (ICML), 2019, PMLR 97:3905–3914.

Random Fourier features is a widely used, simple, and effective technique for scaling up kernel methods. The existing theoretical analysis of the approach, however, remains focused on specific learning tasks and typically gives pessimistic bounds which are at odds with the empirical results. We tackle these problems and provide the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions. In our bounds, the trade-off between the computational cost and the expected risk convergence rate is problem specific and expressed in terms of the regularization parameter and the \emphnumber of effective degrees of freedom. We study both the standard random Fourier features method for which we improve the existing bounds on the number of features required to guarantee the corresponding minimax risk convergence rate of kernel ridge regression, as well as a data-dependent modification which samples features proportional to \emphridge leverage scores and further reduces the required number of features. As ridge leverage scores are expensive to compute, we devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.

@inproceedings{LiTonOglSej2019,
author = {Li, Z. and Ton, J.-F. and Oglic, D. and Sejdinovic, D.},
title = {{{Towards A Unified Analysis of Random Fourier Features}}},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {PMLR 97:3905-3914},
year = {2019}
}

H. Chai
,
J. Ton
,
M. Osborne
,
R. Garnett
,
Automated Model Selection with Bayesian Quadrature, in International Conference on Machine Learning (ICML), 2019, PMLR 97:931–940.

We present a novel technique for tailoring Bayesian quadrature (BQ) to model selection. The state-of-the-art for comparing the evidence of multiple models relies on Monte Carlo methods, which converge slowly and are unreliable for computationally expensive models. Previous research has shown that BQ offers sample efficiency superior to Monte Carlo in computing the evidence of an individual model. However, applying BQ directly to model comparison may waste computation producing an overly-accurate estimate for the evidence of a clearly poor model. We propose an automated and efficient algorithm for computing the most-relevant quantity for model selection: the posterior probability of a model. Our technique maximizes the mutual information between this quantity and observations of the models’ likelihoods, yielding efficient acquisition of samples across disparate model spaces when likelihood observations are limited. Our method produces more-accurate model posterior estimates using fewer model likelihood evaluations than standard Bayesian quadrature and Monte Carlo estimators, as we demonstrate on synthetic and real-world examples.

@inproceedings{chai2019automated,
author = {Chai, H. and Ton, J.-F. and Osborne, M. and Garnett, R.},
title = {{{Automated Model Selection with Bayesian Quadrature}}},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {PMLR 97:931-940},
year = {2019}
}

2018

J. Ton
,
S. Flaxman
,
D. Sejdinovic
,
S. Bhatt
,
Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features, Spatial Statistics, vol. 28, 59–78, 2018.

The use of covariance kernels is ubiquitous in the field of spatial statistics. Kernels allow data to be mapped into high-dimensional feature spaces and can thus extend simple linear additive methods to nonlinear methods with higher order interactions. However, until recently, there has been a strong reliance on a limited class of stationary kernels such as the Matern or squared exponential, limiting the expressiveness of these modelling approaches. Recent machine learning research has focused on spectral representations to model arbitrary stationary kernels and introduced more general representations that include classes of nonstationary kernels. In this paper, we exploit the connections between Fourier feature representations, Gaussian processes and neural networks to generalise previous approaches and develop a simple and efficient framework to learn arbitrarily complex nonstationary kernel functions directly from the data, while taking care to avoid overfitting using state-of-the-art methods from deep learning. We highlight the very broad array of kernel classes that could be created within this framework. We apply this to a time series dataset and a remote sensing problem involving land surface temperature in Eastern Africa. We show that without increasing the computational or storage complexity, nonstationary kernels can be used to improve generalisation performance and provide more interpretable results.

@article{TonFlaSejBha2018,
author = {Ton, J.-F. and Flaxman, S. and Sejdinovic, D. and Bhatt, S.},
title = {{Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features}},
journal = {Spatial Statistics},
year = {2018},
volume = {28},
pages = {59--78}
}