Statistical Machine Learning, Kernel Method, Nonparametric Association Measures
Qinyi was a DPhil student supervised by Dino Sejdinovic and Sarah Filippi. Her research interests lie in large scale nonparametric association measures, in particular, those based on representations of probability distributions in the reproducing kernel Hilbert spaces (RKHSs). Nonparametric association of independence, conditional independence and multivariate interactions are all of interests. Qinyi has also been working on Bayesian nonparametric testing utilising distance measures in the RKHS.
Qinyi graduated in 2020 with the thesis entitled Kernel Based Hypothesis Tests: Large-Scale Approximations and Bayesian Perspectives.
Publications
2018
Q. Zhang
,
S. Filippi
,
A. Gretton
,
D. Sejdinovic
,
Large-Scale Kernel Methods for Independence Testing, Statistics and Computing, vol. 28, no. 1, 113–130, Jan. 2018.
Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.
@article{ZhaFilGreSej2018,
author = {Zhang, Q. and Filippi, S. and Gretton, A. and Sejdinovic, D.},
journal = {Statistics and Computing},
year = {2018},
month = jan,
volume = {28},
number = {1},
pages = {113--130},
title = {{Large-Scale Kernel Methods for Independence Testing}}
}
The algorithms for causal discovery and more broadly for learning the structure of graphical models require well calibrated and consistent conditional independence (CI) tests. We revisit the CI tests which are based on two-step procedures and involve regression with subsequent (unconditional) independence test (RESIT) on regression residuals and investigate the assumptions under which these tests operate. In particular, we demonstrate that when going beyond simple functional relationships with additive noise, such tests can lead to an inflated number of false discoveries. We study the relationship of these tests with those based on dependence measures using reproducing kernel Hilbert spaces (RKHS) and propose an extension of RESIT which uses RKHS-valued regression. The resulting test inherits the simple two-step testing procedure of RESIT, while giving correct Type I control and competitive power. When used as a component of the PC algorithm, the proposed test is more robust to the case where hidden variables induce a switching behaviour in the associations present in the data.
@inproceedings{ZhaFilFlaSej2017,
author = {Zhang, Q. and Filippi, S. and Flaxman, S. and Sejdinovic, D.},
title = {{Feature-to-Feature Regression for a Two-Step Conditional Independence Test}},
booktitle = {Uncertainty in Artificial Intelligence (UAI)},
year = {2017}
}