Gaussian Processes, probabilistic inference, deep generative models

I am a third year DPhil student supervised by Prof. Yee Whye Teh. My research interests fall under the topic of scalable probabilistic inference and interpretable machine learning. My current research interests lie in deep generative models and representation learning, especially in using deep generative models to learn disentangled factors of variation in the data. I am also interested in gradient based inference for generative models with discrete units, which ties in closely with interpretability. Previously, I have worked on scaling up inference for Gaussian processes, in particular on regression models for collaborative filtering that are motivated by a scalable approximation to a GP, as well as a method for scaling up the compositional kernel search used by the Automatic Statistician via variational sparse GP methods.

Publications

2018

H. Kim
,
A. Mnih
,
Disentangling by Factorising, 2018.

We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon beta-VAE by providing a better trade-off between disentanglement and reconstruction quality. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.

@unpublished{KimMnih18,
author = {Kim, H. and Mnih, A.},
note = {ArXiv e-prints: arXiv:1802.05983},
title = {Disentangling by Factorising},
year = {2018}
}

H. Kim
,
Y. W. Teh
,
Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes, in Artificial Intelligence and Statistics (AISTATS), 2018.

Automating statistical modelling is a challenging problem in artificial intelligence. The Automatic Statistician takes a first step in this direction, by employing a kernel search algorithm with Gaussian Processes (GP) to provide interpretable statistical models for regression problems. However this does not scale due to its O(N^3) running time for the model selection. We propose Scalable Kernel Composition (SKC), a scalable kernel search algorithm that extends the Automatic Statistician to bigger data sets. In doing so, we derive a cheap upper bound on the GP marginal likelihood that sandwiches the marginal likelihood with the variational lower bound . We show that the upper bound is significantly tighter than the lower bound and thus useful for model selection.

@inproceedings{KimTeh18,
author = {Kim, H. and Teh, Y. W.},
booktitle = {Artificial Intelligence and Statistics (AISTATS)},
title = {Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes},
year = {2018}
}

H. Kim
,
Y. W. Teh
,
Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes, in Artificial Intelligence and Statistics (AISTATS), 2018.

Automating statistical modelling is a challenging problem in artificial intelligence. The Automatic Statistician takes a first step in this direction, by employing a kernel search algorithm with Gaussian Processes (GP) to provide interpretable statistical models for regression problems. However this does not scale due to its O(N^3) running time for the model selection. We propose Scalable Kernel Composition (SKC), a scalable kernel search algorithm that extends the Automatic Statistician to bigger data sets. In doing so, we derive a cheap upper bound on the GP marginal likelihood that sandwiches the marginal likelihood with the variational lower bound . We show that the upper bound is significantly tighter than the lower bound and thus useful for model selection.

@inproceedings{KimTeh19,
author = {Kim, H. and Teh, Y. W.},
booktitle = {Artificial Intelligence and Statistics (AISTATS)},
title = {Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes},
year = {2018}
}

We tackle the problem of collaborative filtering (CF) with side information, through the lens of Gaussian Process (GP) regression. Driven by the idea of using the kernel to explicitly model user-item similarities, we formulate the GP in a way that allows the incorporation of low-rank matrix factorisation, arriving at our model, the Tucker Gaussian Process (TGP). Consequently, TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information, giving enhanced predictive performance for CF problems. Moreover we show that it is a novel model for regression, especially well-suited to grid-structured data and problems where the dependence on covariates is close to being separable.

@unpublished{kimluflateh16,
title = {Collaborative Filtering with Side Information: a Gaussian Process Perspective},
author = {Kim, H. and Lu, X. and Flaxman, S. and Teh, Y. W.},
note = {ArXiv e-prints: 1605.07025},
year = {2016}
}

H. Kim
,
Y. W. Teh
,
Scalable Structure Discovery in Regression using Gaussian Processes, in Proceedings of the 2016 Workshop on Automatic Machine Learning, 2016.

Automatic Bayesian Covariance Discovery(ABCD) in Lloyd et. al (2014) provides a framework for automating statistical modelling as well as exploratory data analysis for regression problems. However ABCD does not scale due to its O(N^3) running time. This is undesirable not only because the average size of data sets is growing fast, but also because there is potentially more information in bigger data, implying a greater need for more expressive models that can discover sophisticated structure. We propose a scalable version of ABCD, to encompass big data within the boundaries of automated statistical modelling.

@inproceedings{KimTeh2016a,
author = {Kim, H. and Teh, Y. W.},
booktitle = {Proceedings of the 2016 Workshop on Automatic Machine Learning},
title = {Scalable Structure Discovery in Regression using Gaussian Processes},
year = {2016},
bdsk-url-1 = {http://www.jmlr.org/proceedings/papers/v64/kim_scalable_2016.html},
bdsk-url-2 = {http://www.jmlr.org/proceedings/papers/v64/kim_scalable_2016.pdf}
}

We tackle the problem of collaborative filtering (CF) with side information, through the lens of Gaussian Process (GP) regression. Driven by the idea of using the kernel to explicitly model user-item similarities, we formulate the GP in a way that allows the incorporation of low-rank matrix factorisation, arriving at our model, the Tucker Gaussian Process (TGP). Consequently, TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information, giving enhanced predictive performance for CF problems. Moreover we show that it is a novel model for regression, especially well-suited to grid-structured data and problems where the dependence on covariates is close to being separable.

@unpublished{kimluflateh17,
title = {Collaborative Filtering with Side Information: a Gaussian Process Perspective},
author = {Kim, H. and Lu, X. and Flaxman, S. and Teh, Y. W.},
note = {ArXiv e-prints: 1605.07025},
year = {2016}
}