A second year DPhil student as part of the OxWaSP program, supervised by Professor Dino Sejdinovic and Dr Christopher Yau. My research interests lies in the use of kernel methods in the field of machine learning, as well as its potential application in Deep Learning. I am currently interested in the use of kernel methods in learning on distributions with symmetric noise invariances.

Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We construct a Bayesian distribution regression formalism that accounts for this uncertainty, improving the robustness and performance of the model when group sizes vary. We frame the model in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on an astrostatistics problem in which velocity distributions are used to predict galaxy cluster masses, quantifying the distribution of dark matter in the universe.

@unpublished{LawSutSejFla2017,
author = {Law, H.C.L. and Sutherland, D.J. and Sejdinovic, D. and Flaxman, S.},
title = {{Bayesian Distribution Regression}},
journal = {ArXiv e-prints:1705.04293},
year = {2017}
}

H. Law,
C. Yau,
D. Sejdinovic,
Testing and Learning on Distributions with Symmetric Noise Invariance, in Advances in Neural Information Processing Systems (NIPS), 2017.

Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest – discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise. Such features lend themselves to a straightforward neural network implementation and can thus also be learned given a supervised signal.

@inproceedings{LawYauSej2017,
author = {Law, H.C.L. and Yau, C. and Sejdinovic, D.},
booktitle = {Advances in Neural Information Processing Systems (NIPS)},
title = {{Testing and Learning on Distributions with Symmetric Noise Invariance}},
year = {2017}
}