OxCSML : People : Dino Sejdinovic

I am an Associate Professor in Statistics at the University of Oxford, a Fellow of Mansfield College, and a Turing Fellow. I conduct research at the interface between machine learning and statistical methodology, with an emphasis on nonparametric and kernel methods.

Publications

2023

S. Bouabid , J. Fawkes , D. Sejdinovic , Returning The Favour: When Regression Benefits From Probabilistic Causal Knowledge, arXiv preprint arXiv:2301.11214, 2023.

@article{bouabid2023returning,
  title = {Returning The Favour: When Regression Benefits From Probabilistic Causal Knowledge},
  author = {Bouabid, Shahine and Fawkes, Jake and Sejdinovic, Dino},
  journal = {arXiv preprint arXiv:2301.11214},
  year = {2023}
}

2022

S. Bouabid , D. Watson-Parris , D. Sejdinovic , Bayesian inference for aerosol vertical profiles, in NeurIPS Workshop on Tackling Climate Change with Machine Learning, 2022.

@inproceedings{bouabid2022bayesian,
  title = {Bayesian inference for aerosol vertical profiles},
  author = {Bouabid, Shahine and Watson-Parris, Duncan and Sejdinovic, Dino},
  year = {2022},
  booktitle = {NeurIPS Workshop on Tackling Climate Change with Machine Learning}
}

S. Bouabid , D. Watson-Parris , S. Stefanović , A. Nenes , D. Sejdinovic , AODisaggregation: toward global aerosol vertical profiles, arXiv preprint arXiv:2205.04296, 2022.

@article{bouabid2022aodisaggregation,
  title = {AODisaggregation: toward global aerosol vertical profiles},
  author = {Bouabid, Shahine and Watson-Parris, Duncan and Stefanovi{\'c}, Sofija and Nenes, Athanasios and Sejdinovic, Dino},
  journal = {arXiv preprint arXiv:2205.04296},
  year = {2022}
}

J. Fawkes , R. J. Evans , D. Sejdinovic , Selection, ignorability and challenges with causal fairness, in Conference on Causal Learning and Reasoning, 2022, 275–289.

@inproceedings{fawkes22selection,
  title = {Selection, ignorability and challenges with causal fairness},
  author = {Fawkes, Jake and Evans, R. J. and Sejdinovic, Dino},
  booktitle = {Conference on Causal Learning and Reasoning},
  pages = {275--289},
  year = {2022},
  organization = {PMLR},
  archiveprefix = {arXiv},
  eprint = {2202.13774}
}

J. Fawkes , R. Evans , D. Sejdinovic , Selection, Ignorability and Challenges With Causal Fairness, arXiv preprint arXiv:2202.13774, 2022.

@article{fawkes2022selection,
  title = {Selection, Ignorability and Challenges With Causal Fairness},
  author = {Fawkes, Jake and Evans, Robin and Sejdinovic, Dino},
  journal = {arXiv preprint arXiv:2202.13774},
  year = {2022}
}

2021

S. L. Chau , S. Bouabid , D. Sejdinovic , Deconditional Downscaling with Gaussian Processes, in Advances in Neural Information Processing Systems (NeurIPS), 2021.
Refining low-resolution (LR) spatial fields with high-resolution (HR) information is challenging as the diversity of spatial datasets often prevents direct matching of observations. Yet, when LR samples are modeled as aggregate conditional means of HR samples with respect to a mediating variable that is globally observed, the recovery of the underlying fine-grained field can be framed as taking an "inverse" of the conditional expectation, namely a deconditioning problem. In this work, we introduce conditional mean processes (CMP), a new class of Gaussian Processes describing conditional means. By treating CMPs as inter-domain features of the underlying field, a posterior for the latent field can be established as a solution to the deconditioning problem. Furthermore, we show that this solution can be viewed as a two-staged vector-valued kernel ridge regressor and show that it has a minimax optimal convergence rate under mild assumptions. Lastly, we demonstrate its proficiency in a synthetic and a real-world atmospheric field downscaling problem, showing substantial improvements over existing methods.
```
@inproceedings{ChaBouSej2021,
  title = {{Deconditional Downscaling with Gaussian Processes}},
  author = {Chau, Siu Lun and Bouabid, Shahine and Sejdinovic, Dino},
  year = {2021},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}
}
```
S. L. Chau , J. Ton , J. Gonzalez , Y. W. Teh , D. Sejdinovic , BayesIMP: Uncertainty Quantification for Causal Data Fusion, in Advances in Neural Information Processing Systems (NeurIPS), 2021.
While causal models are becoming one of the mainstays of machine learning, the problem of uncertainty quantification in causal inference remains challenging. In this paper, we study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable. As data arises from multiple sources and can vary in quality and quantity, principled uncertainty quantification becomes essential. To that end, we introduce Bayesian Interventional Mean Processes, a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space, while taking into account the uncertainty within each causal graph. To demonstrate the utility of our uncertainty estimation, we apply our method to the Causal Bayesian Optimisation task and show improvements over state-of-the-art methods.
```
@inproceedings{ChaTonGonTehSej2021,
  title = {{BayesIMP: Uncertainty Quantification for Causal Data Fusion}},
  author = {Chau, Siu Lun and Ton, Jean-Francois and Gonzalez, Javier and Teh, Yee Whye and Sejdinovic, Dino},
  year = {2021},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}
}
```
R. Hu , G. K. Nicholls , D. Sejdinovic , Large Scale Tensor Regression using Kernels and Variational Inference, Machine Learning, 2021.
We outline an inherent weakness of tensor factorization models when latent factors are expressed as a function of side information and propose a novel method to mitigate this weakness. We coin our method Kernel Fried Tensor (KFT) and present it as a large scale forecasting tool for high dimensional data. Our results show superior performance against LightGBM and Field Aware Factorization Machines (FFM), two algorithms with proven track records widely used in industrial forecasting. We also develop a variational inference framework for KFT and associate our forecasts with calibrated uncertainty estimates on three large scale datasets. Furthermore, KFT is empirically shown to be robust against uninformative side information in terms of constants and Gaussian noise.
```
@article{HuNicSej2021,
  title = {{{Large Scale Tensor Regression using Kernels and Variational Inference}}},
  author = {Hu, R. and Nicholls, G. K. and Sejdinovic, D.},
  journal = {Machine Learning},
  year = {2021}
}
```
T. Fernandez , A. Gretton , D. Rindt , D. Sejdinovic , A Kernel Log-Rank Test of Independence for Right-Censored Data, Journal of the American Statistical Association, 2021.
With the incorporation of new data gathering methods in clinical research, it becomes fundamental for survival analysis techniques to deal with high-dimensional or/and non-standard covariates. In this paper we introduce a general non-parametric independence test between right-censored survival times and covariates taking values on a general (not necessarily Euclidean) space X. We show that our test statistic has a dual interpretation, first in terms of the supremum of a potentially infinite collection of weight-indexed log-rank tests, with weight functions belonging to a reproducing kernel Hilbert space (RKHS) of functions; and second, as the norm of the difference of embeddings of certain finite measures into the RKHS, similar to the Hilbert-Schmidt Independence Criterion (HSIC) test-statistic. We study the asymptotic properties of the test, finding sufficient conditions to ensure that our test is omnibus. The test statistic can be computed straightforwardly, and the rejection threshold is obtained via an asymptotically consistent Wild-Bootstrap procedure. We perform extensive simulations demonstrating that our testing procedure generally performs better than competing approaches in detecting complex nonlinear dependence.
```
@article{FerGreRinSej2021,
  author = {Fernandez, Tamara and Gretton, Arthur and Rindt, David and Sejdinovic, Dino},
  title = {{{A Kernel Log-Rank Test of Independence for Right-Censored Data}}},
  journal = {Journal of the American Statistical Association},
  year = {2021},
  doi = {10.1080/01621459.2021.1961784}
}
```
V. Nguyen , S. B. Orbell , D. T. Lennon , H. Moon , F. Vigneau , L. C. Camenzind , L. Yu , D. M. Zumbühl , G. A. D. Briggs , M. A. Osborne , D. Sejdinovic , N. Ares , Deep Reinforcement Learning for Efficient Measurement of Quantum Devices, npj Quantum Information, vol. 7, no. 100, 2021.
Deep reinforcement learning is an emerging machine-learning approach that can teach a computer to learn from their actions and rewards similar to the way humans learn from experience. It offers many advantages in automating decision processes to navigate large parameter spaces. This paper proposes an approach to the efficient measurement of quantum devices based on deep reinforcement learning. We focus on double quantum dot devices, demonstrating the fully automatic identification of specific transport features called bias triangles. Measurements targeting these features are difficult to automate, since bias triangles are found in otherwise featureless regions of the parameter space. Our algorithm identifies bias triangles in a mean time of <30 min, and sometimes as little as 1 min. This approach, based on dueling deep Q-networks, can be adapted to a broad range of devices and target transport features. This is a crucial demonstration of the utility of deep reinforcement learning for decision making in the measurement and operation of quantum devices.
```
@article{Nguyen2021,
  title = {Deep Reinforcement Learning for Efficient Measurement of Quantum Devices},
  author = {Nguyen, V. and Orbell, S. B. and Lennon, D. T. and Moon, H. and Vigneau, F. and Camenzind, L. C. and Yu, L. and Zumbühl, D. M. and Briggs, G. A. D. and Osborne, M. A. and Sejdinovic, D. and Ares, N.},
  year = {2021},
  volume = {7},
  number = {100},
  journal = {{npj Quantum Information}},
  doi = {10.1038/s41534-021-00434-x}
}
```
A. Caterini , R. Cornish , D. Sejdinovic , A. Doucet , Variational Inference with Continuously-Indexed Normalizing Flows, in Uncertainty in Artificial Intelligence (UAI), 2021.
Continuously-indexed flows (CIFs) have recently achieved improvements over baseline normalizing flows on a variety of density estimation tasks. CIFs do not possess a closed-form marginal density, and so, unlike standard flows, cannot be plugged in directly to a variational inference (VI) scheme in order to produce a more expressive family of approximate posteriors. However, we show here how CIFs can be used as part of an auxiliary VI scheme to formulate and train expressive posterior approximations in a natural way. We exploit the conditional independence structure of multi-layer CIFs to build the required auxiliary inference models, which we show empirically yield low-variance estimators of the model evidence. We then demonstrate the advantages of CIFs over baseline flows in VI problems when the posterior distribution of interest possesses a complicated topology, obtaining improved results in both the Bayesian inference and maximum likelihood settings.
```
@inproceedings{CatCorSejDou2021,
  author = {Caterini, Anthony and Cornish, Rob and Sejdinovic, Dino and Doucet, Arnaud},
  title = {{{Variational Inference with Continuously-Indexed Normalizing Flows}}},
  booktitle = {Uncertainty in Artificial Intelligence (UAI)},
  year = {2021}
}
```
Z. Li , J. Ton , D. Oglic , D. Sejdinovic , Towards A Unified Analysis of Random Fourier Features, Journal of Machine Learning Research (JMLR), vol. 22, no. 108, 1–51, 2021.
Random Fourier features is a widely used, simple, and effective technique for scaling up kernel methods. The existing theoretical analysis of the approach, however, remains focused on specific learning tasks and typically gives pessimistic bounds which are at odds with the empirical results. We tackle these problems and provide the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions. In our bounds, the trade-off between the computational cost and the learning risk convergence rate is problem specific and expressed in terms of the regularization parameter and the number of effective degrees of freedom. We study both the standard random Fourier features method for which we improve the existing bounds on the number of features required to guarantee the corresponding minimax risk convergence rate of kernel ridge regression, as well as a data-dependent modification which samples features proportional to ridge leverage scores and further reduces the required number of features. As ridge leverage scores are expensive to compute, we devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency. Our empirical results illustrate the effectiveness of the proposed scheme relative to the standard random Fourier features method.
```
@article{LiTonOglSej2021,
  author = {Li, Zhu and Ton, Jean-Francois and Oglic, Dino and Sejdinovic, Dino},
  title = {{{Towards A Unified Analysis of Random Fourier Features}}},
  journal = {Journal of Machine Learning Research (JMLR)},
  volume = {22},
  number = {108},
  year = {2021},
  pages = {1--51}
}
```
X. Pu , S. L. Chau , X. Dong , D. Sejdinovic , Kernel-based Graph Learning from Smooth Signals: A Functional Viewpoint, IEEE Transactions on Signal and Information Processing over Networks, vol. 7, 192–207, 2021.
The problem of graph learning concerns the construction of an explicit topological structure revealing the relationship between nodes representing data entities, which plays an increasingly important role in the success of many graph-based representations and algorithms in the field of machine learning and graph signal processing. In this paper, we propose a novel graph learning framework that incorporates the node-side and observation-side information, and in particular the covariates that help to explain the dependency structures in graph signals. To this end, we consider graph signals as functions in the reproducing kernel Hilbert space associated with a Kronecker product kernel, and integrate functional learning with smoothness-promoting graph learning to learn a graph representing the relationship between nodes. The functional learning increases the robustness of graph learning against missing and incomplete information in the graph signals. In addition, we develop a novel graph-based regularisation method which, when combined with the Kronecker product kernel, enables our model to capture both the dependency explained by the graph and the dependency due to graph signals observed under different but related circumstances, e.g. different points in time. The latter means the graph signals are free from the i.i.d. assumptions required by the classical graph learning models. Experiments on both synthetic and real-world data show that our methods outperform the state-of-the-art models in learning a meaningful graph topology from graph signals, in particular under heavy noise, missing values, and multiple dependency.
```
@article{PuChaDonSej2021,
  author = {Pu, Xingyue and Chau, Siu Lun and Dong, Xiaowen and Sejdinovic, Dino},
  title = {{{Kernel-based Graph Learning from Smooth Signals: A Functional Viewpoint}}},
  journal = {IEEE Transactions on Signal and Information Processing over Networks},
  doi = {10.1109/TSIPN.2021.3059995},
  volume = {7},
  pages = {192--207},
  year = {2021}
}
```
D. Rindt , D. Sejdinovic , D. Steinsaltz , Consistency of permutation tests of independence using distance covariance, HSIC and dHSIC, Stat, vol. 10, no. 1, e364, 2021.
The Hilbert–Schmidt independence criterion (HSIC) and its d-variable extension dHSIC are measures of (joint) dependence between random variables. While combining these statistics with a permutation test has become a popular method of testing the null hypothesis of (joint) independence, it had thus far not been proved that this results in a consistent test. In this work, we provide a simple proof that the permutation test with the test statistic HSIC or dHSIC is indeed consistent when using characteristic kernels. That is, we prove that under each alternative hypothesis, the power of these permutation tests indeed converges to 1 as the sample size converges to infinity. Since the test is consistent for each number of permutations, we further give a brief discussion of how the number of permutations relates to the power of the test and how the number of permutations may be selected in practice.
```
@article{RinSejSte2021,
  author = {Rindt, David and Sejdinovic, Dino and Steinsaltz, David},
  title = {{{Consistency of permutation tests of independence using distance covariance, HSIC and dHSIC}}},
  journal = {Stat},
  doi = {10.1002/sta4.364},
  volume = {10},
  number = {1},
  pages = {e364},
  year = {2021}
}
```
J. Ton , L. Chan , Y. W. Teh , D. Sejdinovic , Noise Contrastive Meta Learning for Conditional Density Estimation using Kernel Mean Embeddings, in Artificial Intelligence and Statistics (AISTATS), 2021, PMLR 130:1099–1107.
Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. estimating conditional expectations in regression. In many applications, however, the conditional distributions cannot be meaningfully summarized solely by expectation (due to e.g. multimodality). We introduce a novel technique for meta-learning conditional densities, which combines neural representation and noise contrastive estimation together with well-established literature in conditional mean embeddings into reproducing kernel Hilbert spaces. The method shows significant improvements over standard density estimation methods on synthetic and real-world data, by leveraging shared representations across multiple conditional density estimation tasks.
```
@inproceedings{TonChaTehSej2021,
  author = {Ton, Jean-Francois and Chan, Leung and Teh, Yee Whye and Sejdinovic, Dino},
  title = {{{Noise Contrastive Meta Learning for Conditional Density Estimation using Kernel Mean Embeddings}}},
  pages = {PMLR 130:1099--1107},
  year = {2021},
  booktitle = {Artificial Intelligence and Statistics (AISTATS)}
}
```
J. Ton , D. Sejdinovic , K. Fukumizu , Meta Learning for Causal Direction, in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 11, 9897–9905.
The inaccessibility of controlled randomized trials due to inherent constraints in many fields of science has been a fundamental issue in causal inference. In this paper, we focus on distinguishing the cause from effect in the bivariate setting under limited observational data. Based on recent developments in meta learning as well as in causal inference, we introduce a novel generative model that allows distinguishing cause and effect in the small data setting. Using a learnt task variable that contains distributional information of each dataset, we propose an end-to-end algorithm that makes use of similar training datasets at test time. We demonstrate our method on various synthetic as well as real-world data and show that it is able to maintain high accuracy in detecting directions across varying dataset sizes.
```
@inproceedings{TonSejFuk2021,
  author = {Ton, Jean-Francois and Sejdinovic, Dino and Fukumizu, Kenji},
  title = {{{Meta Learning for Causal Direction}}},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume = {35},
  number = {11},
  pages = {9897--9905},
  year = {2021}
}
```
R. Hu , D. Sejdinovic , Robust Deep Interpretable Features for Binary Image Classification, in Proceedings of the Northern Lights Deep Learning Workshop, 2021, vol. 2.
The problem of interpretability for binary image classification is considered through the lens of kernel two-sample tests and generative modeling. A feature extraction framework coined Deep Interpretable Features is developed, which is used in combination with IntroVAE, a generative model capable of high-resolution image synthesis. Experimental results on a variety of datasets, including COVID-19 chest x-rays demonstrate the benefits of combining deep generative models with the ideas from kernel-based hypothesis testing in moving towards more robust interpretable deep generative models.
```
@inproceedings{HuSej2021,
  author = {Hu, Robert and Sejdinovic, Dino},
  title = {{{Robust Deep Interpretable Features for Binary Image Classification}}},
  booktitle = {Proceedings of the Northern Lights Deep Learning Workshop},
  year = {2021},
  volume = {2},
  doi = {10.7557/18.5708}
}
```
G. S. Blair , R. Bassett , L. Bastin , L. Beevers , M. I. Borrajo , M. Brown , S. L. Dance , A. Dionescu , L. Edwards , M. A. Ferrario , R. Fraser , H. Fraser , S. Gardner , P. Henrys , T. Hey , S. Homann , C. Huijbers , J. Hutchison , P. Jonathan , R. Lamb , S. Laurie , A. Leeson , D. Leslie , M. McMillan , V. Nundloll , O. Oyebamiji , J. Phillipson , V. Pope , R. Prudden , S. Reis , M. Salama , F. Samreen , D. Sejdinovic , W. Simm , R. Street , L. Thornton , R. Towe , J. V. Hey , M. Vieno , J. Waller , J. Watkins , The Role of Digital Technologies in Responding to the Grand Challenges of the Natural Environment: The Windermere Accord, Patterns, vol. 2, no. 1, 100156, 2021.
Summary Digital technology is having a major impact on many areas of society, and there is equal opportunity for impact on science. This is particularly true in the environmental sciences as we seek to understand the complexities of the natural environment under climate change. This perspective presents the outcomes of a summit in this area, a unique cross-disciplinary gathering bringing together environmental scientists, data scientists, computer scientists, social scientists, and representatives of the creative arts. The key output of this workshop is an agreed vision in the form of a framework and associated roadmap, captured in the Windermere Accord. This accord envisions a new kind of environmental science underpinned by unprecedented amounts of data, with technological advances leading to breakthroughs in taming uncertainty and complexity, and also supporting openness, transparency, and reproducibility in science. The perspective also includes a call to build an international community working in this important area.
```
@article{Blairetal2021,
  title = {The Role of Digital Technologies in Responding to the Grand Challenges of the Natural Environment: The {W}indermere Accord},
  journal = {Patterns},
  volume = {2},
  number = {1},
  pages = {100156},
  year = {2021},
  issn = {2666-3899},
  doi = {https://doi.org/10.1016/j.patter.2020.100156},
  author = {Blair, Gordon S. and Bassett, Richard and Bastin, Lucy and Beevers, Lindsay and Borrajo, Maribel Isabel and Brown, Mike and Dance, Sarah L. and Dionescu, Ada and Edwards, Liz and Ferrario, Maria Angela and Fraser, Rob and Fraser, Harriet and Gardner, Simon and Henrys, Peter and Hey, Tony and Homann, Stuart and Huijbers, Chantal and Hutchison, James and Jonathan, Phil and Lamb, Rob and Laurie, Sophie and Leeson, Amber and Leslie, David and McMillan, Malcolm and Nundloll, Vatsala and Oyebamiji, Oluwole and Phillipson, Jordan and Pope, Vicky and Prudden, Rachel and Reis, Stefan and Salama, Maria and Samreen, Faiza and Sejdinovic, Dino and Simm, Will and Street, Roger and Thornton, Lauren and Towe, Ross and Hey, Joshua Vande and Vieno, Massimo and Waller, Joanne and Watkins, John},
  keywords = {digital technologies, digital environment, data science, environmental science}
}
```

2020

D. Rindt , D. Sejdinovic , D. Steinsaltz , A kernel and optimal transport based test of independence between covariates and right-censored lifetimes, International Journal of Biostatistics, 2020.
We propose a nonparametric test of independence, termed optHSIC, between a covariate and a right-censored lifetime. Because the presence of censoring creates a challenge in applying the standard permutation-based testing approaches, we use optimal transport to transform the censored dataset into an uncensored one, while preserving the relevant dependencies. We then apply a permutation test using the kernel-based dependence measure as a statistic to the transformed dataset. The type 1 error is proven to be correct in the case where censoring is independent of the covariate. Experiments indicate that optHSIC has power against a much wider class of alternatives than Cox proportional hazards regression and that it has the correct type 1 control even in the challenging cases where censoring strongly depends on the covariate.
```
@article{RinSejSte2020,
  author = {Rindt, David and Sejdinovic, Dino and Steinsaltz, David},
  title = {A kernel and optimal transport based test of independence between covariates and right-censored lifetimes},
  journal = {International Journal of Biostatistics},
  year = {2020},
  doi = {10.1515/ijb-2020-0022}
}
```
N. M. Esbroeck , D. T. Lennon , H. Moon , V. Nguyen , F. Vigneau , L. C. Camenzind , L. Yu , D. Zumbuehl , G. A. D. Briggs , D. Sejdinovic , N. Ares , Quantum device fine-tuning using unsupervised embedding learning, New Journal of Physics, vol. 22, no. 9, 095003, 2020.
Quantum devices with a large number of gate electrodes allow for precise control of device parameters. This capability is hard to fully exploit due to the complex dependence of these parameters on applied gate voltages. We experimentally demonstrate an algorithm capable of fine-tuning several device parameters at once. The algorithm acquires a measurement and assigns it a score using a variational auto-encoder. Gate voltage settings are set to optimise this score in real-time in an unsupervised fashion. We report fine-tuning times of a double quantum dot device within approximately 40 min.
```
@article{Esbroecketal2020,
  author = {van Esbroeck, Nina M. and Lennon, Dominic T. and Moon, Hyungil and Nguyen, Vu and Vigneau, Florian and Camenzind, Leon C. and Yu, Liuqi and Zumbuehl, Dominik and Briggs, G. Andrew D. and Sejdinovic, Dino and Ares, Natalia},
  title = {Quantum device fine-tuning using unsupervised embedding learning},
  journal = {New Journal of Physics},
  year = {2020},
  doi = {10.1088/1367-2630/abb64c},
  volume = {22},
  number = {9},
  pages = {095003}
}
```
H. Moon , D. T. Lennon , J. Kirkpatrick , N. M. Esbroeck , L. C. Camenzind , L. Yu , F. Vigneau , D. M. Zumbühl , G. A. D. Briggs , M. A. Osborne , D. Sejdinovic , E. A. Laird , N. Ares , Machine learning enables completely automatic tuning of a quantum device faster than human experts, Nature Communications, vol. 11, no. 4161, 2020.
Variability is a problem for the scalability of semiconductor quantum devices. The parameter space is large, and the operating range is small. Our statistical tuning algorithm searches for specific electron transport features in gate-defined quantum dot devices with a gate voltage space of up to eight dimensions. Starting from the full range of each gate voltage, our machine learning algorithm can tune each device to optimal performance in a median time of under 70 minutes. This performance surpassed our best human benchmark (although both human and machine performance can be improved). The algorithm is approximately 180 times faster than an automated random search of the parameter space, and is suitable for different material systems and device architectures. Our results yield a quantitative measurement of device variability, from one device to another and after thermal cycling. Our machine learning algorithm can be extended to higher dimensions and other technologies.
```
@article{Moonetal2020,
  title = {Machine learning enables completely automatic tuning of a quantum device faster than human experts},
  author = {Moon, H. and Lennon, D. T. and Kirkpatrick, J. and van Esbroeck, N. M. and Camenzind, L. C. and Yu, Liuqi and Vigneau, F. and Zumb\"uhl, D. M. and Briggs, G. A. D. and Osborne, M. A and Sejdinovic, D. and Laird, E. A. and Ares, N.},
  journal = {Nature Communications},
  volume = {11},
  number = {4161},
  year = {2020},
  doi = {10.1038/s41467-020-17835-9}
}
```
T. Rudner , D. Sejdinovic , Y. Gal , Inter-domain Deep Gaussian Processes, in International Conference on Machine Learning (ICML), 2020, PMLR 119:8286–8294.
Inter-domain Gaussian processes (GPs) allow for high flexibility and low computational cost when performing approximate inference in GP models. They are particularly suitable for modeling data exhibiting global function behavior but are limited to stationary covariance functions and thus fail to model non-stationary data effectively. We propose Inter-domain Deep Gaussian Processes with RKHS Fourier Features, an extension of shallow inter-domain GPs that combines the advantages of inter-domain and deep Gaussian processes (DGPs) and demonstrate how to leverage existing approximate inference approaches to perform simple and scalable approximate inference on Inter-domain Deep Gaussian Processes. We assess the performance of our method on a wide range of prediction problems and demonstrate that it outperforms inter-domain GPs and DGPs on challenging large-scale and high-dimensional real-world datasets exhibiting both global behavior as well as a high-degree of non-stationarity.
```
@inproceedings{RudSejGal2020,
  author = {Rudner, T.G.J. and Sejdinovic, D. and Gal, Y.},
  title = {{{Inter-domain Deep Gaussian Processes}}},
  booktitle = {International Conference on Machine Learning (ICML)},
  pages = {PMLR 119:8286--8294},
  year = {2020}
}
```
D. Sejdinovic , Discussion of ‘Functional models for time-varying random objects’ by Dubey and Müller, Journal of the Royal Statistical Society: Series B, vol. 82, no. 2, 312–313, 2020.
The discussion focuses on metric covariance, a new association measure between paired random objects in a metric space, developed by Dubey and Müller, and on its relationship with other similar concepts which have previously appeared in the literature, including distance covariance by Székely et al, as well as its generalisations which rely on the formalism of reproducing kernel Hilbert spaces (RKHS).
```
@article{Sej2020-discussion,
  author = {Sejdinovic, Dino},
  title = {{{Discussion of `Functional models for time-varying random objects' by Dubey and M\"uller}}},
  journal = {Journal of the Royal Statistical Society: Series B},
  volume = {82},
  number = {2},
  pages = {312--313},
  year = {2020}
}
```

2019

H. Law , P. Zhao , L. Chan , J. Huang , D. Sejdinovic , Hyperparameter Learning via Distributional Transfer, Advances in Neural Information Processing Systems (NeurIPS), to appear, 2019.
Project: tencent-lsml

Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial ’exploration’ even in cases where potentially similar prior tasks have been solved. We propose to transfer information across tasks using kernel embeddings of distributions of training datasets used in those tasks. The resulting method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective.
```
@unpublished{LawZhaHuaSej2018,
  author = {Law, H.C.L. and Zhao, P. and Chan, L. and Huang, J. and Sejdinovic, D.},
  title = {{{Hyperparameter Learning via Distributional Transfer}}},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  pages = {to appear},
  year = {2019}
}
```
A. Raj , H. Law , D. Sejdinovic , M. Park , A Differentially Private Kernel Two-Sample Test, in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2019, to appear.
Project: bigbayes

Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to guarantee the privacy of subjects in the data. Unlike existing differentially private parametric tests that simply add noise to data, kernel-based testing imposes a challenge due to a complex dependence of test statistics on the raw data, as these statistics correspond to estimators of distances between representations of probability measures in Hilbert spaces. Our approach considers finite dimensional approximations to those representations. As a result, a simple chi-squared test is obtained, where a test statistic depends on a mean and covariance of empirical differences between the samples, which we perturb for a privacy guarantee. We investigate the utility of our framework in two realistic settings and conclude that our method requires only a relatively modest increase in sample size to achieve a similar level of power to the non-private tests in both settings.
```
@inproceedings{RajLawSejPar2019,
  author = {Raj, A. and Law, H.C.L. and Sejdinovic, D. and Park, M.},
  title = {{{A Differentially Private Kernel Two-Sample Test}}},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
  pages = {to appear},
  year = {2019}
}
```
D. Watson-Parris , S. Sutherland , M. Christensen , A. Caterini , D. Sejdinovic , P. Stier , Detecting Anthropogenic Cloud Perturbations with Deep Learning, in ICML 2019 Workshop on Climate Change: How Can AI Help?, 2019.
One of the most pressing questions in climate science is that of the effect of anthropogenic aerosol on the Earth’s energy balance. Aerosols provide the ‘seeds’ on which cloud droplets form, and changes in the amount of aerosol available to a cloud can change its brightness and other physical properties such as optical thickness and spatial extent. Clouds play a critical role in moderating global temperatures and small perturbations can lead to significant amounts of cooling or warming. Uncertainty in this effect is so large it is not currently known if it is negligible, or provides a large enough cooling to largely negate present-day warming by CO2. This work uses deep convolutional neural networks to look for two particular perturbations in clouds due to anthropogenic aerosol and assess their properties and prevalence, providing valuable insights into their climatic effects.
```
@inproceedings{Watetal2019,
  author = {Watson-Parris, Duncan and Sutherland, Sam and Christensen, Matthew and Caterini, Anthony and Sejdinovic, Dino and Stier, Philip},
  title = {{Detecting Anthropogenic Cloud Perturbations with Deep Learning}},
  booktitle = {ICML 2019 Workshop on Climate Change: How Can AI Help?},
  year = {2019}
}
```
J. Runge , P. Nowack , M. Kretschmer , S. Flaxman , D. Sejdinovic , Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets, Science Advances, vol. 5, no. 11, 2019.
Identifying causal relationships and quantifying their strength from observational time series data are key problems in disciplines dealing with complex dynamical systems such as the Earth system or the human body. Data-driven causal inference in such systems is challenging since datasets are often high dimensional and nonlinear with limited sample sizes. Here, we introduce a novel method that flexibly combines linear or nonlinear conditional independence tests with a causal discovery algorithm to estimate causal networks from large-scale time series datasets. We validate the method on time series of well-understood physical mechanisms in the climate system and the human heart and using large-scale synthetic datasets mimicking the typical properties of real-world data. The experiments demonstrate that our method outperforms state-of-the-art techniques in detection power, which opens up entirely new possibilities to discover and quantify causal networks from time series across a range of research fields.
```
@article{RunNowKreFlaSej2019,
  author = {Runge, Jakob and Nowack, Peer and Kretschmer, Marlene and Flaxman, Seth and Sejdinovic, Dino},
  title = {{{Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets}}},
  journal = {Science Advances},
  volume = {5},
  number = {11},
  doi = {10.1126/sciadv.aau4996},
  year = {2019}
}
```
Z. Li , A. Perez-Suay , G. Camps-Valls , D. Sejdinovic , Kernel Dependence Regularizers and Gaussian Processes with Applications to Algorithmic Fairness, ArXiv e-prints:1911.04322, 2019.
Current adoption of machine learning in industrial, societal and economical activities has raised concerns about the fairness, equity and ethics of automated decisions. Predictive models are often developed using biased datasets and thus retain or even exacerbate biases in their decisions and recommendations. Removing the sensitive covariates, such as gender or race, is insufficient to remedy this issue since the biases may be retained due to other related covariates. We present a regularization approach to this problem that trades off predictive accuracy of the learned models (with respect to biased labels) for the fairness in terms of statistical parity, i.e. independence of the decisions from the sensitive covariates. In particular, we consider a general framework of regularized empirical risk minimization over reproducing kernel Hilbert spaces and impose an additional regularizer of dependence between predictors and sensitive covariates using kernel-based measures of dependence, namely the Hilbert-Schmidt Independence Criterion (HSIC) and its normalized version. This approach leads to a closed-form solution in the case of squared loss, i.e. ridge regression. Moreover, we show that the dependence regularizer has an interpretation as modifying the corresponding Gaussian process (GP) prior. As a consequence, a GP model with a prior that encourages fairness to sensitive variables can be derived, allowing principled hyperparameter selection and studying of the relative relevance of covariates under fairness constraints. Experimental results in synthetic examples and in real problems of income and crime prediction illustrate the potential of the approach to improve fairness of automated decisions.
```
@unpublished{LiPerCamSej2019,
  author = {Li, Zhu and Perez-Suay, Adrian and Camps-Valls, Gustau and Sejdinovic, Dino},
  title = {{{Kernel Dependence Regularizers and Gaussian Processes with Applications to Algorithmic Fairness}}},
  journal = {ArXiv e-prints:1911.04322},
  year = {2019}
}
```
D. Rindt , D. Sejdinovic , D. Steinsaltz , Nonparametric Independence Testing for Right-Censored Data using Optimal Transport, ArXiv e-prints:1906.03866, 2019.
We propose a nonparametric test of independence, termed OPT-HSIC, between a covariate and a right-censored lifetime. Because the presence of censoring creates a challenge in applying the standard permutation-based testing approaches, we use optimal transport to transform the censored dataset into an uncensored one, while preserving the relevant dependencies. We then apply a permutation test using the kernel-based dependence measure as a statistic to the transformed dataset. The type 1 error is proven to be correct in the case where censoring is independent of the covariate. Experiments indicate that OPT-HSIC has power against a much wider class of alternatives than Cox proportional hazards regression and that it has the correct type 1 control even in the challenging cases where censoring strongly depends on the covariate.
```
@unpublished{RinSejSte2019,
  author = {Rindt, David and Sejdinovic, Dino and Steinsaltz, David},
  title = {{{Nonparametric Independence Testing for Right-Censored Data using Optimal Transport}}},
  journal = {ArXiv e-prints:1906.03866},
  year = {2019}
}
```
J. Ton , L. Chan , Y. W. Teh , D. Sejdinovic , Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings, ArXiv e-prints:1906.02236, 2019.
Project: bigbayes tencent-lsml

Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. on estimating conditional expectations in regression. In many applications, however, we are faced with conditional distributions which cannot be meaningfully summarized using expectation only (due to e.g. multimodality). Hence, we consider the problem of conditional density estimation in the meta-learning setting. We introduce a novel technique for meta-learning which combines neural representation and noise-contrastive estimation with the established literature of conditional mean embeddings into reproducing kernel Hilbert spaces. The method is validated on synthetic and real-world problems, demonstrating the utility of sharing learned representations across multiple conditional density estimation tasks.
```
@unpublished{TonChaTehSej2019,
  author = {Ton, Jean-Francois and Chan, Lucian and Teh, Yee Whye and Sejdinovic, Dino},
  title = {{{Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings}}},
  journal = {ArXiv e-prints:1906.02236},
  year = {2019}
}
```

G. Camps-Valls , D. Sejdinovic , J. Runge , M. Reichstein , A Perspective on Gaussian Processes for Earth Observation, National Science Review, 2019.

@article{CamSejRunRei2019,
  author = {Camps-Valls, G. and Sejdinovic, D. and Runge, J. and Reichstein, M.},
  title = {{{A Perspective on Gaussian Processes for Earth Observation}}},
  journal = {National Science Review},
  doi = {10.1093/nsr/nwz028},
  year = {2019}
}

Z. Li , J. Ton , D. Oglic , D. Sejdinovic , Towards A Unified Analysis of Random Fourier Features, in International Conference on Machine Learning (ICML), 2019, PMLR 97:3905–3914.
Random Fourier features is a widely used, simple, and effective technique for scaling up kernel methods. The existing theoretical analysis of the approach, however, remains focused on specific learning tasks and typically gives pessimistic bounds which are at odds with the empirical results. We tackle these problems and provide the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions. In our bounds, the trade-off between the computational cost and the expected risk convergence rate is problem specific and expressed in terms of the regularization parameter and the \emphnumber of effective degrees of freedom. We study both the standard random Fourier features method for which we improve the existing bounds on the number of features required to guarantee the corresponding minimax risk convergence rate of kernel ridge regression, as well as a data-dependent modification which samples features proportional to \emphridge leverage scores and further reduces the required number of features. As ridge leverage scores are expensive to compute, we devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.
```
@inproceedings{LiTonOglSej2019,
  author = {Li, Z. and Ton, J.-F. and Oglic, D. and Sejdinovic, D.},
  title = {{{Towards A Unified Analysis of Random Fourier Features}}},
  booktitle = {International Conference on Machine Learning (ICML)},
  pages = {PMLR 97:3905-3914},
  year = {2019}
}
```
F. Briol , C. Oates , M. Girolami , M. Osborne , D. Sejdinovic , Probabilistic Integration: A Role in Statistical Computation? (with Discussion and Rejoinder), Statistical Science, vol. 34, no. 1, 1–22; rejoinder: 38–42, 2019.
A research frontier has emerged in scientific computation, wherein discretisation error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow, in order to assess the impact of discretisation error on the computer output. This paper examines the case for probabilistic numerical methods in routine statistical computation. Our focus is on numerical integration, where a probabilistic integrator is equipped with a full distribution over its output that reflects the fact that the integrand has been discretised. Our main technical contribution is to establish, for the first time, rates of posterior contraction for one such method. Several substantial applications are provided for illustration and critical evaluation, including examples from statistical modelling, computer graphics and a computer model for an oil reservoir.
```
@article{BriOatGirOsbSej2019,
  author = {Briol, F.-X. and Oates, C.J. and Girolami, M. and Osborne, M.A. and Sejdinovic, D.},
  title = {{{Probabilistic Integration: A Role in Statistical Computation? (with Discussion and Rejoinder)}}},
  journal = {Statistical Science},
  volume = {34},
  number = {1},
  pages = {1--22; rejoinder: 38--42},
  year = {2019},
  doi = {10.1214/18-STS660}
}
```

2018

J. Mitrovic , D. Sejdinovic , Y. Teh , Causal Inference via Kernel Deviance Measures, in Advances in Neural Information Processing Systems (NeurIPS), 2018.
Project: bigbayes

Discovering the causal structure among a set of variables is a fundamental problem in many areas of science. In this paper, we propose Kernel Conditional Deviance for Causal Inference (KCDC) a fully nonparametric causal discovery method based on purely observational data. From a novel interpretation of the notion of asymmetry between cause and effect, we derive a corresponding asymmetry measure using the framework of reproducing kernel Hilbert spaces. Based on this, we propose three decision rules for causal discovery. We demonstrate the wide applicability of our method across a range of diverse synthetic datasets. Furthermore, we test our method on real-world time series data and the real-world benchmark dataset Tubingen Cause-Effect Pairs where we outperform existing state-of-the-art methods.
```
@inproceedings{MitSejTeh2018,
  author = {Mitrovic, J. and Sejdinovic, D. and Teh, Y.W.},
  title = {{{Causal Inference via Kernel Deviance Measures}}},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2018},
  month = dec
}
```
Q. Zhang , S. Filippi , A. Gretton , D. Sejdinovic , Large-Scale Kernel Methods for Independence Testing, Statistics and Computing, vol. 28, no. 1, 113–130, Jan. 2018.
Project: bigbayes

Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.
```
@article{ZhaFilGreSej2018,
  author = {Zhang, Q. and Filippi, S. and Gretton, A. and Sejdinovic, D.},
  journal = {Statistics and Computing},
  year = {2018},
  month = jan,
  volume = {28},
  number = {1},
  pages = {113--130},
  title = {{Large-Scale Kernel Methods for Independence Testing}}
}
```
A. Caterini , A. Doucet , D. Sejdinovic , Hamiltonian Variational Auto-Encoder, in Advances in Neural Information Processing Systems (NeurIPS), 2018, to appear.
Variational Auto-Encoders (VAEs) have become very popular techniques to perform inference and learning in latent variable models as they allow us to leverage the rich representational power of neural networks to obtain flexible approximations of the posterior of latent variables as well as tight evidence lower bounds (ELBOs). Combined with stochastic variational inference, this provides a methodology scaling to large datasets. However, for this methodology to be practically efficient, it is necessary to obtain low-variance unbiased estimators of the ELBO and its gradients with respect to the parameters of interest. While the use of Markov chain Monte Carlo (MCMC) techniques such as Hamiltonian Monte Carlo (HMC) has been previously suggested to achieve this, the proposed methods require specifying reverse kernels which have a large impact on performance. Additionally, the resulting unbiased estimator of the ELBO for most MCMC kernels is typically not amenable to the reparameterization trick. We show here how to optimally select reverse kernels in this setting and, by building upon Hamiltonian Importance Sampling (HIS), we obtain a scheme that provides low-variance unbiased estimators of the ELBO and its gradients using the reparameterization trick. This allows us to develop a Hamiltonian Variational Auto-Encoder (HVAE). This method can be reinterpreted as a target-informed normalizing flow which, within our context, only requires a few evaluations of the gradient of the sampled likelihood and trivial Jacobian calculations at each iteration.
```
@inproceedings{CatDouSej2018,
  author = {Caterini, A.L. and Doucet, A. and Sejdinovic, D.},
  title = {{{Hamiltonian Variational Auto-Encoder}}},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  pages = {to appear},
  year = {2018}
}
```
H. Law , D. Sejdinovic , E. Cameron , T. Lucas , S. Flaxman , K. Battle , K. Fukumizu , Variational Learning on Aggregate Outputs with Gaussian Processes, in Advances in Neural Information Processing Systems (NeurIPS), 2018, to appear.
Project: bigbayes

While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.
```
@inproceedings{LawSejCamLucFlaBatFuk2018,
  author = {Law, H.C.L. and Sejdinovic, D. and Cameron, E. and Lucas, T.C.D. and Flaxman, S. and Battle, K. and Fukumizu, K.},
  title = {{{Variational Learning on Aggregate Outputs with Gaussian Processes}}},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  pages = {to appear},
  year = {2018}
}
```
H. C. L. Law , P. Zhao , J. Huang , D. Sejdinovic , Hyperparameter Learning via Distributional Transfer, ArXiv e-prints:1810.06305, 2018.
Project: tencent-lsml

Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial ’exploration’ even in cases where potentially similar prior tasks have been solved. We propose to transfer information across tasks using kernel embeddings of distributions of training datasets used in those tasks. The resulting method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective.
```
@unpublished{LawZhaHuaSej2019,
  author = {Law, Ho Chung Leon and Zhao, Peilin and Huang, Junzhou and Sejdinovic, Dino},
  title = {{{Hyperparameter Learning via Distributional Transfer}}},
  journal = {ArXiv e-prints:1810.06305},
  year = {2018}
}
```
M. Kanagawa , P. Hennig , D. Sejdinovic , B. Sriperumbudur , Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences, ArXiv e-prints:1807.02582, 2018.
This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.
```
@unpublished{KanHenSejSri2018,
  author = {Kanagawa, M. and Hennig, P. and Sejdinovic, D. and Sriperumbudur, B.K.},
  title = {{{ Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences}}},
  journal = {ArXiv e-prints:1807.02582},
  year = {2018}
}
```
J. Ton , S. Flaxman , D. Sejdinovic , S. Bhatt , Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features, Spatial Statistics, vol. 28, 59–78, 2018.
Project: bigbayes

The use of covariance kernels is ubiquitous in the field of spatial statistics. Kernels allow data to be mapped into high-dimensional feature spaces and can thus extend simple linear additive methods to nonlinear methods with higher order interactions. However, until recently, there has been a strong reliance on a limited class of stationary kernels such as the Matern or squared exponential, limiting the expressiveness of these modelling approaches. Recent machine learning research has focused on spectral representations to model arbitrary stationary kernels and introduced more general representations that include classes of nonstationary kernels. In this paper, we exploit the connections between Fourier feature representations, Gaussian processes and neural networks to generalise previous approaches and develop a simple and efficient framework to learn arbitrarily complex nonstationary kernel functions directly from the data, while taking care to avoid overfitting using state-of-the-art methods from deep learning. We highlight the very broad array of kernel classes that could be created within this framework. We apply this to a time series dataset and a remote sensing problem involving land surface temperature in Eastern Africa. We show that without increasing the computational or storage complexity, nonstationary kernels can be used to improve generalisation performance and provide more interpretable results.
```
@article{TonFlaSejBha2018,
  author = {Ton, J.-F. and Flaxman, S. and Sejdinovic, D. and Bhatt, S.},
  title = {{Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features}},
  journal = {Spatial Statistics},
  year = {2018},
  volume = {28},
  pages = {59--78}
}
```
H. Law , D. Sutherland , D. Sejdinovic , S. Flaxman , Bayesian Approaches to Distribution Regression, in Artificial Intelligence and Statistics (AISTATS), 2018.
Project: bigbayes

Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images.
```
@inproceedings{LawSutSejFla2018,
  author = {Law, H.C.L. and Sutherland, D.J. and Sejdinovic, D. and Flaxman, S.},
  title = {{Bayesian Approaches to Distribution Regression}},
  booktitle = {Artificial Intelligence and Statistics (AISTATS)},
  year = {2018}
}
```

2017

T. G. J. Rudner , D. Sejdinovic , Inter-domain Deep Gaussian Processes, NeurIPS 2017 Workshop on Bayesian Deep Learning, 2017.

@article{Rudner:Sejdinovic:2017,
  author = {Rudner, Tim G. J. and Sejdinovic, Dino},
  title = {{I}nter-domain {D}eep {G}aussian {P}rocesses},
  journal = {NeurIPS 2017 Workshop on Bayesian Deep Learning},
  year = {2017}
}

S. Flaxman , Y. Teh , D. Sejdinovic , Poisson Intensity Estimation with Reproducing Kernels, Electronic Journal of Statistics, vol. 11, no. 2, 5081–5104, 2017.
Project: bigbayes

Despite the fundamental nature of the inhomogeneous Pois- son process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially when observed points lie in a high-dimensional space. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) for- mulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. Whereas RKHS models used in su- pervised learning rely on the so-called representer theorem, the form of the inhomogeneous Poisson process likelihood means that the representer theorem does not apply. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite- dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.
```
@article{FlaTehSej2017ejs,
  author = {Flaxman, S. and Teh, Y.W. and Sejdinovic, D.},
  title = {{{Poisson Intensity Estimation with Reproducing Kernels}}},
  journal = {Electronic Journal of Statistics},
  year = {2017},
  volume = {11},
  number = {2},
  pages = {5081--5104}
}
```
Q. Zhang , S. Filippi , S. Flaxman , D. Sejdinovic , Feature-to-Feature Regression for a Two-Step Conditional Independence Test, in Uncertainty in Artificial Intelligence (UAI), 2017.
Project: bigbayes

The algorithms for causal discovery and more broadly for learning the structure of graphical models require well calibrated and consistent conditional independence (CI) tests. We revisit the CI tests which are based on two-step procedures and involve regression with subsequent (unconditional) independence test (RESIT) on regression residuals and investigate the assumptions under which these tests operate. In particular, we demonstrate that when going beyond simple functional relationships with additive noise, such tests can lead to an inflated number of false discoveries. We study the relationship of these tests with those based on dependence measures using reproducing kernel Hilbert spaces (RKHS) and propose an extension of RESIT which uses RKHS-valued regression. The resulting test inherits the simple two-step testing procedure of RESIT, while giving correct Type I control and competitive power. When used as a component of the PC algorithm, the proposed test is more robust to the case where hidden variables induce a switching behaviour in the associations present in the data.
```
@inproceedings{ZhaFilFlaSej2017,
  author = {Zhang, Q. and Filippi, S. and Flaxman, S. and Sejdinovic, D.},
  title = {{Feature-to-Feature Regression for a Two-Step Conditional Independence Test}},
  booktitle = {Uncertainty in Artificial Intelligence (UAI)},
  year = {2017}
}
```
J. Mitrovic , D. Sejdinovic , Y. W. Teh , Deep Kernel Machines via the Kernel Reparametrization Trick, in International Conference on Learning Representations (ICLR) Workshop Track, 2017.
Project: bigbayes

While deep neural networks have achieved state-of-the-art performance on many tasks across varied domains, they still remain black boxes whose inner workings are hard to interpret and understand. In this paper, we develop a novel method for efficiently capturing the behaviour of deep neural networks using kernels. In particular, we construct a hierarchy of increasingly complex kernels that encode individual hidden layers of the network. Furthermore, we discuss how our framework motivates a novel supervised weight initialization method that discovers highly discriminative features already at initialization.
```
@inproceedings{MitSejTeh2017,
  author = {Mitrovic, J. and Sejdinovic, D. and Teh, Y. W.},
  booktitle = {International Conference on Learning Representations (ICLR) Workshop Track},
  title = {{Deep Kernel Machines via the Kernel Reparametrization Trick}},
  year = {2017},
  bdsk-url-1 = {https://openreview.net/forum?id=Bkiqt3Ntg&noteId=Bkiqt3Ntg}
}
```
H. Law , C. Yau , D. Sejdinovic , Testing and Learning on Distributions with Symmetric Noise Invariance, in Advances in Neural Information Processing Systems (NeurIPS), 2017, 1343–1353.
Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest – discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise. Such features lend themselves to a straightforward neural network implementation and can thus also be learned given a supervised signal.
```
@inproceedings{LawYauSej2017,
  author = {Law, H.C.L. and Yau, C. and Sejdinovic, D.},
  title = {{{Testing and Learning on Distributions with Symmetric Noise Invariance}}},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2017},
  pages = {1343--1353}
}
```
J. Runge , P. Nowack , M. Kretschmer , S. Flaxman , D. Sejdinovic , Detecting Causal Associations in Large Nonlinear Time Series Datasets, ArXiv e-prints:1702.07007, 2017.
Detecting causal associations in time series datasets is a key challenge for novel insights into complex dynamical systems such as the Earth system or the human brain. Interactions in high-dimensional dynamical systems often involve time-delays, nonlinearity, and strong autocorrelations. These present major challenges for causal discovery techniques such as Granger causality leading to low detection power, biases, and unreliable hypothesis tests. Here we introduce a reliable and fast method that outperforms current approaches in detection power and scales up to high-dimensional datasets. It overcomes detection biases, especially when strong autocorrelations are present, and allows ranking associations in large-scale analyses by their causal strength. We provide mathematical proofs, evaluate our method in extensive numerical experiments, and illustrate its capabilities in a large-scale analysis of the global surface-pressure system where we unravel spurious associations and find several potentially causal links that are difficult to detect with standard methods. The broadly applicable method promises to discover novel causal insights also in many other fields of science.
```
@unpublished{RunSejFla2017,
  author = {Runge, J. and Nowack, P. and Kretschmer, M. and Flaxman, S. and Sejdinovic, D.},
  title = {{{Detecting Causal Associations in Large Nonlinear Time Series Datasets}}},
  journal = {ArXiv e-prints:1702.07007},
  year = {2017}
}
```
I. Schuster , H. Strathmann , B. Paige , D. Sejdinovic , Kernel Sequential Monte Carlo, in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2017.
Bayesian posterior inference with Monte Carlo methods has a fundamental role in statistics and probabilistic machine learning. Target posterior distributions arising in increasingly complex models often exhibit high degrees of nonlinearity and multimodality and pose substantial challenges to traditional samplers. We propose the Kernel Sequential Monte Carlo (KSMC) framework for building emulator models of the current particle system in a Reproducing Kernel Hilbert Space and use the emulator’s geometry to inform local proposals. KSMC is applicable when gradients are unknown or prohibitively expensive and inherits the superior performance of SMC on multi-modal targets and its ability to estimate model evidence. Strengths of the proposed methodology are demonstrated on a series of challenging synthetic and real-world examples.
```
@inproceedings{SchStrPaiSej2017,
  author = {Schuster, I. and Strathmann, H. and Paige, B. and Sejdinovic, D.},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
  title = {{Kernel Sequential Monte Carlo}},
  year = {2017}
}
```
S. Flaxman , Y. W. Teh , D. Sejdinovic , Poisson Intensity Estimation with Reproducing Kernels, in Artificial Intelligence and Statistics (AISTATS), 2017.
Project: bigbayes

Despite the fundamental nature of the inhomogeneous Poisson process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially in high dimensional settings. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) formulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. The modeling challenge is that the usual representer theorem arguments no longer apply due to the form of the inhomogeneous Poisson process likelihood. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite-dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.
```
@inproceedings{FlaTehSej2017,
  author = {Flaxman, S. and Teh, Y. W. and Sejdinovic, D.},
  booktitle = {Artificial Intelligence and Statistics (AISTATS)},
  title = {{Poisson Intensity Estimation with Reproducing Kernels}},
  year = {2017}
}
```

2016

D. Vukobratovic , D. Jakovetic , V. Skachek , D. Bajovic , D. Sejdinovic , G. Karabulut Kurt , C. Hollanti , I. Fischer , CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT, IEEE Access, vol. 4, 3360–3378, 2016.
In forthcoming years, the Internet of Things (IoT) will connect billions of smart devices generating and uploading a deluge of data to the cloud. If successfully extracted, the knowledge buried in the data can significantly improve the quality of life and foster economic growth. However, a critical bottleneck for realising the efficient IoT is the pressure it puts on the existing communication infrastructures, requiring transfer of enormous data volumes. Aiming at addressing this problem, we propose a novel architecture dubbed Condense (reconfigurable knowledge acquisition systems), which integrates the IoT-communication infrastructure into data analysis. This is achieved via the generic concept of network function computation: Instead of merely transferring data from the IoT sources to the cloud, the communication infrastructure should actively participate in the data analysis by carefully designed en-route processing. We define the Condense architecture, its basic layers, and the interactions among its constituent modules. Further, from the implementation side, we describe how Condense can be integrated into the 3rd Generation Partnership Project (3GPP) Machine Type Communications (MTC) architecture, as well as the prospects of making it a practically viable technology in a short time frame, relying on Network Function Virtualization (NFV) and Software Defined Networking (SDN). Finally, from the theoretical side, we survey the relevant literature on computing “atomic” functions in both analog and digital domains, as well as on function decomposition over networks, highlighting challenges, insights, and future directions for exploiting these techniques within practical 3GPP MTC architecture.
```
@article{VukJaketal2016a,
  author = {Vukobratovic, D. and Jakovetic, D. and Skachek, V. and Bajovic, D. and Sejdinovic, D. and Karabulut Kurt, G. and Hollanti, C. and Fischer, I.},
  doi = {10.1109/ACCESS.2016.2585468},
  journal = {{IEEE Access}},
  pages = {3360--3378},
  title = {{CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT}},
  volume = {4},
  year = {2016},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ACCESS.2016.2585468}
}
```
D. Vukobratovic , D. Jakovetic , V. Skachek , D. Bajovic , D. Sejdinovic , Network Function Computation as a Service in Future 5G Machine Type Communications, in International Symposium on Turbo Codes & Iterative Information Processing (ISTC), 2016, 365–369.
The 3GPP machine type communications (MTC) service is expected to contribute a dominant share of the IoT traffic via the upcoming fifth generation (5G) mobile cellular systems. MTC has ambition to connect billions of devices to communicate their data to MTC applications for further processing and data analysis. However, for majority of the applications, collecting all the MTC generated data is inefficient as the data is typically fed into application-dependent functions whose outputs determine the application actions. In this paper, we present a novel MTC architecture that, instead of collecting raw large-volume MTC data, offers the network function computation (NFC) as a service. For a given application demand (function to be computed), different modules (atomic nodes) of the communication infrastructure are orchestrated into a (reconfigurable) directed network topology, and each module is assigned an appropriately defined (reconfigurable) atomic function over the input data, such that the desired global network function is evaluated over the MTC data and a requested MTC-NFC service is delivered. We detail practical viability of incorporating MTC-NFC within the existing 3GPP architecture relying on emerging concepts of Network Function Virtualization and Software Defined Networking. Finally, throughout the paper, we point to the theoretical foundations that inspired the presented architecture highlighting challenges and future directions for designing 3GPP MTC-NFC service.
```
@inproceedings{VukJaketal2016b,
  author = {Vukobratovic, D. and Jakovetic, D. and Skachek, V. and Bajovic, D. and Sejdinovic, D.},
  booktitle = {International Symposium on Turbo Codes \& Iterative Information Processing (ISTC)},
  pages = {365-369},
  title = {{Network Function Computation as a Service in Future 5G Machine Type Communications}},
  year = {2016},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ISTC.2016.7593138}
}
```
J. Mitrovic , D. Sejdinovic , Y. W. Teh , DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression, in International Conference on Machine Learning (ICML), 2016, 1482–1491.
Project: bigbayes

Performing exact posterior inference in complex generative models is often difficult or impossible due to an expensive to evaluate or intractable likelihood function. Approximate Bayesian computation (ABC) is an inference framework that constructs an approximation to the true likelihood based on the similarity between the observed and simulated data as measured by a predefined set of summary statistics. Although the choice of appropriate problem-specific summary statistics crucially influences the quality of the likelihood approximation and hence also the quality of the posterior sample in ABC, there are only few principled general-purpose approaches to the selection or construction of such summary statistics. In this paper, we develop a novel framework for this task using kernel-based distribution regression. We model the functional relationship between data distributions and the optimal choice (with respect to a loss function) of summary statistics using kernel-based distribution regression. We show that our approach can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning. In addition to that, our framework shows superior performance when compared to related methods on toy and real-world problems.
```
@inproceedings{MitSejTeh2016,
  author = {Mitrovic, J. and Sejdinovic, D. and Teh, Y. W.},
  booktitle = {International Conference on Machine Learning (ICML)},
  pages = {1482--1491},
  title = {{DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression}},
  year = {2016},
  bdsk-url-1 = {http://jmlr.org/proceedings/papers/v48/mitrovic16.html}
}
```
G. Franchi , J. Angulo , D. Sejdinovic , Hyperspectral Image Classification with Support Vector Machines on Kernel Distribution Embeddings, in IEEE International Conference on Image Processing (ICIP), 2016, 1898–1902.
We propose a novel approach for pixel classification in hyperspectral images, leveraging on both the spatial and spectral information in the data. The introduced method relies on a recently proposed framework for learning on distributions – by representing them with mean elements in reproducing kernel Hilbert spaces (RKHS) and formulating a classification algorithm therein. In particular, we associate each pixel to an empirical distribution of its neighbouring pixels, a judicious representation of which in an RKHS, in conjunction with the spectral information contained in the pixel itself, give a new explicit set of features that can be fed into a suite of standard classification techniques – we opt for a well established framework of support vector machines (SVM). Furthermore, the computational complexity is reduced via random Fourier features formalism. We study the consistency and the convergence rates of the proposed method and the experiments demonstrate strong performance on hyperspectral data with gains in comparison to the state-of-the-art results.
```
@inproceedings{FraAngSej2016,
  author = {Franchi, G. and Angulo, J. and Sejdinovic, D.},
  booktitle = {IEEE International Conference on Image Processing (ICIP)},
  doi = {10.1109/ICIP.2016.7532688},
  pages = {1898-1902},
  title = {{Hyperspectral Image Classification with Support Vector Machines on Kernel Distribution Embeddings}},
  year = {2016},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ICIP.2016.7532688}
}
```
B. Paige , D. Sejdinovic , F. Wood , Super-Sampling with a Reservoir, in Uncertainty in Artificial Intelligence (UAI), 2016, 567–576.
We introduce an alternative to reservoir sampling, a classic and popular algorithm for drawing a fixed-size subsample from streaming data in a single pass. Rather than draw a random sample, our approach performs an online optimization which aims to select the subset which provides the best overall approximation to the full data set, as judged using a kernel two-sample test. This produces subsets which minimize the worst-case relative error when computing expectations of functions in a specified function class, using just the samples from the subset. Kernel functions are approximated using random Fourier features, and the subset of samples itself is stored in a random projection tree, allowing for an algorithm which runs in a single pass through the whole data set, with only a logarithmic time complexity in the size of the subset at each iteration. These “super-samples” subsampled from the full data provide a concise summary, as demonstrated empirically on mixture models and the MNIST dataset.
```
@inproceedings{PaiSejWoo2016,
  author = {Paige, B. and Sejdinovic, D. and Wood, F.},
  booktitle = {Uncertainty in Artificial Intelligence (UAI)},
  pages = {567--576},
  title = {{Super-Sampling with a Reservoir}},
  year = {2016},
  bdsk-url-1 = {http://www.auai.org/uai2016/proceedings/papers/293.pdf}
}
```
S. Flaxman , D. Sejdinovic , J. Cunningham , S. Filippi , Bayesian Learning of Kernel Embeddings, in Uncertainty in Artificial Intelligence (UAI), 2016, 182–191.
Project: bigbayes

Kernel methods are one of the mainstays of machine learning, but the problem of kernel learning remains challenging, with only a few heuristics and very little theory. This is of particular importance in methods based on estimation of kernel mean embeddings of probability measures. For characteristic kernels, which include most commonly used ones, the kernel mean embedding uniquely determines its probability measure, so it can be used to design a powerful statistical testing framework, which includes nonparametric two-sample and independence tests. In practice, however, the performance of these tests can be very sensitive to the choice of kernel and its lengthscale parameters. To address this central issue, we propose a new probabilistic model for kernel mean embeddings, the Bayesian Kernel Embedding model, combining a Gaussian process prior over the Reproducing Kernel Hilbert Space containing the mean embedding with a conjugate likelihood function, thus yielding a closed form posterior over the mean embedding. The posterior mean of our model is closely related to recently proposed shrinkage estimators for kernel mean embeddings, while the posterior uncertainty is a new, interesting feature with various possible applications. Critically for the purposes of kernel learning, our model gives a simple, closed form marginal pseudolikelihood of the observed data given the kernel hyperparameters. This marginal pseudolikelihood can either be optimized to inform the hyperparameter choice or fully Bayesian inference can be used.
```
@inproceedings{FlaSejCunFil2016,
  author = {Flaxman, S. and Sejdinovic, D. and Cunningham, J.P. and Filippi, S.},
  booktitle = {Uncertainty in Artificial Intelligence (UAI)},
  pages = {182--191},
  title = {{Bayesian Learning of Kernel Embeddings}},
  year = {2016},
  bdsk-url-1 = {http://www.auai.org/uai2016/proceedings/papers/145.pdf},
  bdsk-url-2 = {http://www.auai.org/uai2016/proceedings/supp/145_supp.pdf}
}
```
M. Park , W. Jitkrittum , D. Sejdinovic , K2-ABC: Approximate Bayesian Computation with Kernel Embeddings, in Artificial Intelligence and Statistics (AISTATS), 2016, 398–407.
Complicated generative models often result in a situation where computing the likelihood of observed data is intractable, while simulating from the conditional density given a parameter value is relatively easy. Approximate Bayesian Computation (ABC) is a paradigm that enables simulation-based posterior inference in such cases by measuring the similarity between simulated and observed data in terms of a chosen set of summary statistics. However, there is no general rule to construct sufficient summary statistics for complex models. Insufficient summary statistics will “leak” information, which leads to ABC algorithms yielding samples from an incorrect (partial) posterior. In this paper, we propose a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics. Our approach, K2-ABC, uses maximum mean discrepancy (MMD) to construct a dissimilarity measure between the observed and simulated data. The embedding of an empirical distribution of the data into a reproducing kernel Hilbert space plays a role of the summary statistic and is sufficient whenever the corresponding kernels are characteristic. Experiments on a simulated scenario and a real-world biological problem illustrate the effectiveness of the proposed algorithm.
```
@inproceedings{ParJitSej2016,
  author = {Park, M. and Jitkrittum, W. and Sejdinovic, D.},
  booktitle = {Artificial Intelligence and Statistics (AISTATS)},
  pages = {398--407},
  title = {{K2-ABC: Approximate Bayesian Computation with Kernel Embeddings}},
  year = {2016},
  bdsk-url-1 = {http://jmlr.org/proceedings/papers/v51/park16.html},
  bdsk-url-2 = {https://github.com/wittawatj/k2abc}
}
```

2015

H. Strathmann , D. Sejdinovic , M. Girolami , Unbiased Bayes for Big Data: Paths of Partial Posteriors, ArXiv e-prints:1501.03326, 2015.
A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iteration. Realising that such simulation is an unnecessarily hard problem if the goal is estimation, we construct a computationally scalable methodology that allows unbiased estimation of the required expectations – without explicit simulation from the full posterior. The scheme’s variance is finite by construction and straightforward to control, leading to algorithms that are provably unbiased and naturally arrive at a desired error tolerance. This is achieved at an average computational complexity that is sub-linear in the size of the dataset and its free parameters are easy to tune. We demonstrate the utility and generality of the methodology on a range of common statistical models applied to large-scale benchmark and real-world datasets.
```
@unpublished{StrSejGir2015,
  author = {Strathmann, H. and Sejdinovic, D. and Girolami, M.},
  journal = {ArXiv e-prints:1501.03326},
  title = {{Unbiased Bayes for Big Data: Paths of Partial Posteriors}},
  year = {2015}
}
```
H. Strathmann , D. Sejdinovic , S. Livingstone , Z. Szabo , A. Gretton , Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families, in Advances in Neural Information Processing Systems (NeurIPS), vol. 28, 2015, 955–963.
We propose Kamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC). On target densities where HMC is unavailable due to intractable gradients, KMC adaptively learns the target’s gradient structure by fitting an exponential family model in a Reproducing Kernel Hilbert Space. Computational costs are reduced by two novel efficient approximations to this gradient. While being asymptotically exact, KMC mimics HMC in terms of sampling efficiency and offers substantial mixing improvements to state-of-the-art gradient free-samplers. We support our claims with experimental studies on both toy and real-world applications, including Approximate Bayesian Computation and exact-approximate MCMC.
```
@incollection{StrSejLivSzaGre2015,
  author = {Strathmann, H. and Sejdinovic, D. and Livingstone, S. and Szabo, Z. and Gretton, A.},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  pages = {955--963},
  title = {{Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families}},
  volume = {28},
  year = {2015},
  bdsk-url-1 = {http://papers.nips.cc/paper/5890-gradient-free-hamiltonian-monte-carlo-with-efficient-kernel-exponential-families}
}
```
K. Chwialkowski , A. Ramdas , D. Sejdinovic , A. Gretton , Fast Two-Sample Testing with Analytic Representations of Probability Measures, in Advances in Neural Information Processing Systems (NeurIPS), vol. 28, 2015, 1981–1989.
We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses smoothed empirical characteristic functions to represent the distributions, the second uses distribution embeddings in a reproducing kernel Hilbert space. Analyticity implies that differences in the distributions may be detected almost surely at a finite number of randomly chosen locations/frequencies. The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests. Experiments on artificial benchmarks and on challenging real-world testing problems demonstrate that our tests give a better power/time tradeoff than competing approaches, and in some cases, better outright power than even the most expensive quadratic-time tests. This performance advantage is retained even in high dimensions, and in cases where the difference in distributions is not observable with low order statistics.
```
@incollection{ChwRamSejGre2015,
  author = {Chwialkowski, K. and Ramdas, A. and Sejdinovic, D. and Gretton, A.},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  pages = {1981--1989},
  title = {{Fast Two-Sample Testing with Analytic Representations of Probability Measures}},
  volume = {28},
  year = {2015},
  bdsk-url-1 = {http://papers.nips.cc/paper/5685-fast-two-sample-testing-with-analytic-representations-of-probability-measures}
}
```
D. Vukobratovic , D. Sejdinovic , A. Pizurica , Compressed Sensing Using Sparse Binary Measurements: A Rateless Coding Perspective, in IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2015.
Compressed Sensing (CS) methods using sparse binary measurement matrices and iterative message-passing recovery procedures have been recently investigated due to their low computational complexity and excellent performance. Drawing much of inspiration from sparse-graph codes such as Low-Density Parity-Check (LDPC) codes, these studies use analytical tools from modern coding theory to analyze CS solutions. In this paper, we consider and systematically analyze the CS setup inspired by a class of efficient, popular and flexible sparse-graph codes called rateless codes. The proposed rateless CS setup is asymptotically analyzed using tools such as Density Evolution and EXIT charts and fine-tuned using degree distribution optimization techniques.
```
@inproceedings{VukSejPiz2015,
  author = {Vukobratovic, Dejan and Sejdinovic, Dino and Pizurica, Aleksandra},
  booktitle = {IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)},
  doi = {10.1109/SPAWC.2015.7227005},
  title = {Compressed Sensing Using Sparse Binary Measurements: A Rateless Coding Perspective},
  year = {2015},
  bdsk-url-1 = {http://dx.doi.org/10.1109/SPAWC.2015.7227005}
}
```
Z. Kurth-Nelson , G. Barnes , D. Sejdinovic , R. Dolan , P. Dayan , Temporal structure in associative retrieval, eLife, vol. 4, no. e04919, 2015.
Electrophysiological data disclose rich dynamics in patterns of neural activity evoked by sensory objects. Retrieving objects from memory reinstates components of this activity. In humans, the temporal structure of this retrieved activity remains largely unexplored, and here we address this gap using the spatiotemporal precision of magnetoencephalography (MEG). In a sensory preconditioning paradigm, ’indirect’ objects were paired with ’direct’ objects to form associative links, and the latter were then paired with rewards. Using multivariate analysis methods we examined the short-time evolution of neural representations of indirect objects retrieved during reward-learning about direct objects. We found two components of the evoked representation of the indirect stimulus, 200 ms apart. The strength of retrieval of one, but not the other, representational component correlated with generalization of reward learning from direct to indirect stimuli. We suggest the temporal structure within retrieved neural representations may be key to their function.
```
@article{KNeBarSejDolDay2015,
  author = {Kurth-Nelson, Zeb and Barnes, Gareth and Sejdinovic, Dino and Dolan, Ray and Dayan, Peter},
  doi = {10.7554/eLife.04919},
  journal = {eLife},
  number = {e04919},
  publisher = {eLife Sciences Publications Limited},
  title = {Temporal structure in associative retrieval},
  volume = {4},
  year = {2015},
  bdsk-url-1 = {http://dx.doi.org/10.7554/eLife.04919}
}
```
W. Jitkrittum , A. Gretton , N. Heess , S. M. A. Eslami , B. Lakshminarayanan , D. Sejdinovic , Z. Szabó , Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages, in Uncertainty in Artificial Intelligence (UAI), 2015.
We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernel-based regression, which is trained on a set of probability distributions representing the incoming messages, and the associated outgoing messages. The kernel approach has two main advantages: first, it is fast, as it is implemented using a novel two-layer random feature representation of the input message distributions; second, it has principled uncertainty estimates, and can be cheaply updated online, meaning it can request and incorporate new training data when it encounters inputs on which it is uncertain. In experiments, our approach is able to solve learning problems where a single message operator is required for multiple, substantially different data sets (logistic regression for a variety of classification problems), where the ability to accurately assess uncertainty and to efficiently and robustly update the message operator are essential.
```
@inproceedings{JitGreHeeEslLakSejSza2015,
  author = {Jitkrittum, Wittawat and Gretton, Arthur and Heess, Nicolas and Eslami, S. M. Ali and Lakshminarayanan, Balaji and Sejdinovic, Dino and Szab\'{o}, Zolt\'{a}n},
  booktitle = {Uncertainty in Artificial Intelligence (UAI)},
  title = {{Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages}},
  year = {2015},
  bdsk-url-1 = {http://auai.org/uai2015/proceedings/papers/235.pdf},
  bdsk-url-2 = {http://auai.org/uai2015/proceedings/supp/239_supp.pdf},
  bdsk-url-3 = {https://github.com/wittawatj/kernel-ep}
}
```

2014

K. Chwialkowski , D. Sejdinovic , A. Gretton , A Wild Bootstrap for Degenerate Kernel Tests, in Advances in Neural Information Processing Systems (NeurIPS), vol. 27, 2014, 3608–3616.
A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes, for which the naive permutation-based bootstrap fails. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. To illustrate this approach, we construct a two-sample test, an instantaneous independence test and a multiple lag independence test for time series. In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler.
```
@incollection{ChwSejGre2014,
  author = {Chwialkowski, Kacper and Sejdinovic, Dino and Gretton, Arthur},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  pages = {3608--3616},
  title = {A Wild Bootstrap for Degenerate Kernel Tests},
  volume = {27},
  year = {2014},
  bdsk-url-1 = {http://papers.nips.cc/paper/5452-a-wild-bootstrap-for-degenerate-kernel-tests.pdf},
  bdsk-url-2 = {https://github.com/kacperChwialkowski/wildBootstrap},
  bdsk-url-3 = {http://research.microsoft.com/apps/video/?id=240378}
}
```
D. Sejdinovic , H. Strathmann , M. Lomeli , C. Andrieu , A. Gretton , Kernel Adaptive Metropolis-Hastings, in International Conference on Machine Learning (ICML), 2014, 1665–1673.
A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples.
```
@inproceedings{SejStrGarAndGre14,
  author = {Sejdinovic, D. and Strathmann, H. and Lomeli, M.G. and Andrieu, C. and Gretton, A.},
  booktitle = {International Conference on Machine Learning (ICML)},
  code = {https://github.com/karlnapf/kameleon-mcmc},
  pages = {1665--1673},
  title = {{Kernel Adaptive Metropolis-Hastings}},
  year = {2014},
  bdsk-url-1 = {http://jmlr.org/proceedings/papers/v32/sejdinovic14.pdf},
  bdsk-url-2 = {http://jmlr.org/proceedings/papers/v32/sejdinovic14-supp.zip}
}
```
O. Johnson , D. Sejdinovic , J. Cruise , R. Piechocki , A. Ganesh , Non-Parametric Change-Point Estimation using String Matching Algorithms, Methodology and Computing in Applied Probability, vol. 16, no. 4, 987–1008, 2014.
Given the output of a data source taking values in a finite alphabet, we wish to estimate change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately estimate the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, establishing a fluid limit and using martingale arguments.
```
@article{JohSejCruPieGan2014,
  author = {Johnson, Oliver and Sejdinovic, Dino and Cruise, James and Piechocki, Robert and Ganesh, Ayalvadi},
  doi = {10.1007/s11009-013-9359-2},
  issn = {1387-5841},
  journal = {Methodology and Computing in Applied Probability},
  number = {4},
  pages = {987-1008},
  publisher = {Springer US},
  title = {{Non-Parametric Change-Point Estimation using String Matching Algorithms}},
  volume = {16},
  year = {2014},
  bdsk-url-1 = {http://dx.doi.org/10.1007/s11009-013-9359-2}
}
```

2013

D. Sejdinovic , B. Sriperumbudur , A. Gretton , K. Fukumizu , Equivalence of distance-based and RKHS-based statistics in hypothesis testing, Annals of Statistics, vol. 41, no. 5, 2263–2291, Oct. 2013.
We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.
```
@article{SejSriGreFuk2013,
  author = {Sejdinovic, Dino and Sriperumbudur, Bharath and Gretton, Arthur and Fukumizu, Kenji},
  doi = {10.1214/13-AOS1140},
  journal = {Annals of Statistics},
  month = oct,
  number = {5},
  pages = {2263--2291},
  title = {{Equivalence of distance-based and RKHS-based statistics in hypothesis testing}},
  volume = {41},
  year = {2013},
  bdsk-url-1 = {http://dx.doi.org/10.1214/13-AOS1140}
}
```
D. Sejdinovic , A. Gretton , W. Bergsma , A Kernel Test for Three-Variable Interactions, in Advances in Neural Information Processing Systems (NeurIPS), vol. 26, 2013, 1124–1132.
We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
```
@incollection{SejGreBer2013,
  author = {Sejdinovic, Dino and Gretton, Arthur and Bergsma, Wicher},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  pages = {1124--1132},
  title = {A Kernel Test for Three-Variable Interactions},
  volume = {26},
  year = {2013},
  bdsk-url-1 = {http://papers.nips.cc/paper/4893-a-kernel-test-for-three-variable-interactions.pdf},
  bdsk-url-2 = {http://papers.nips.cc/paper/4893-a-kernel-test-for-three-variable-interactions-supplemental.zip},
  bdsk-url-3 = {http://www.gatsby.ucl.ac.uk/%7Egretton/interact/threeWayInteract.htm},
  bdsk-url-4 = {http://research.microsoft.com/apps/video/default.aspx?id=206943}
}
```

2012

A. Gretton , B. K. Sriperumbudur , D. Sejdinovic , H. Strathmann , S. Balakrishnan , M. Pontil , K. Fukumizu , Optimal Kernel Choice for Large-Scale Two-Sample Tests, in Advances in Neural Information Processing Systems (NeurIPS), vol. 25, 2012, 1205–1213.
Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p=q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is thus critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.
```
@incollection{GreSriSejStrBalPonFuk2012,
  author = {Gretton, Arthur and Sriperumbudur, Bharath K. and Sejdinovic, Dino and Strathmann, Heiko and Balakrishnan, Sivaraman and Pontil, Massimiliano and Fukumizu, Kenji},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  code = {http://www.gatsby.ucl.ac.uk/%7Egretton/adaptMMD/adaptMMD.htm},
  pages = {1205--1213},
  title = {Optimal Kernel Choice for Large-Scale Two-Sample Tests},
  volume = {25},
  year = {2012},
  bdsk-url-1 = {http://papers.nips.cc/paper/4727-optimal-kernel-choice-for-large-scale-two-sample-tests.pdf}
}
```
D. Sejdinovic , A. Gretton , B. K. Sriperumbudur , K. Fukumizu , Hypothesis testing using pairwise distances and associated kernels, in International Conference on Machine Learning (ICML), 2012, 1111–1118.
We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. The equivalence holds when energy distances are computed with semimetrics of negative type, in which case a kernel may be defined such that the RKHS distance between distributions corresponds exactly to the energy distance. We determine the class of probability distributions for which kernels induced by semimetrics are characteristic (that is, for which embeddings of the distributions to an RKHS are injective). Finally, we investigate the performance of this family of kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.
```
@inproceedings{SejGreSriFuk12,
  author = {Sejdinovic, D. and Gretton, Arthur and Sriperumbudur, Bharath K. and Fukumizu, Kenji},
  booktitle = {International Conference on Machine Learning (ICML)},
  pages = {1111--1118},
  title = {{Hypothesis testing using pairwise distances and associated kernels}},
  year = {2012},
  bdsk-url-1 = {http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2012Sejdinovic_575.pdf},
  bdsk-url-2 = {http://techtalks.tv/talks/57320/}
}
```
R. Piechocki , D. Sejdinovic , Combinatorial Channel Signature Modulation for Wireless ad-hoc Networks, in IEEE International Conference on Communications (ICC), 2012.
In this paper we introduce a novel modulation and multiplexing method which facilitates highly efficient and simultaneous communication between multiple terminals in wireless ad-hoc networks. We term this method Combinatorial Channel Signature Modulation (CCSM). The CCSM method is particularly efficient in situations where communicating nodes operate in highly time dispersive environments. This is all achieved with a minimal MAC layer overhead, since all users are allowed to transmit and receive at the same time/frequency (full simultaneous duplex). The CCSM method has its roots in sparse modelling and the receiver is based on compressive sampling techniques. Towards this end, we develop a new low complexity algorithm termed Group Subspace Pursuit. Our analysis suggests that CCSM at least doubles the throughput when compared to the state-of-the art.
```
@inproceedings{PieSej2012,
  author = {Piechocki, R. and Sejdinovic, D.},
  booktitle = {IEEE International Conference on Communications (ICC)},
  doi = {10.1109/ICC.2012.6363956},
  file = {pdf/2012ICC.pdf},
  title = {{Combinatorial Channel Signature Modulation for Wireless ad-hoc Networks}},
  year = {2012},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ICC.2012.6363956}
}
```
A. Muller , D. Sejdinovic , R. Piechocki , Approximate Message Passing under Finite Alphabet Constraints, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012.
In this paper we consider Basis Pursuit De-Noising (BPDN) problems in which the sparse original signal is drawn from a finite alphabet. To solve this problem we propose an iterative message passing algorithm, which capitalises not only on the sparsity but by means of a prior distribution also on the discrete nature of the original signal. In our numerical experiments we test this algorithm in combination with a Rademacher measurement matrix and a measurement matrix derived from the random demodulator, which enables compressive sampling of analogue signals. Our results show in both cases significant performance gains over a linear programming based approach to the considered BPDN problem. We also compare the proposed algorithm to a similar message passing based algorithm without prior knowledge and observe an even larger performance improvement.
```
@inproceedings{MulSejPie2012,
  author = {Muller, A. and Sejdinovic, D. and Piechocki, R.},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  doi = {10.1109/ICASSP.2012.6288590},
  file = {pdf/2012ICASSP.pdf},
  title = {{Approximate Message Passing under Finite Alphabet Constraints}},
  year = {2012},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ICASSP.2012.6288590}
}
```

2011

W. Dai , D. Sejdinovic , O. Milenkovic , Gaussian Dynamic Compressive Sensing, in International Conference on Sampling Theory and Applications (SampTA), 2011.
We consider the problem of estimating a discrete-time sequence of sparse signals with Gaussian innovations. Instances of such problems arise in networking and imaging, in particular, dynamic and interventional MRI imaging. Our approach combines Kalman filtering and compressive sensing (CS) techniques by introducing a sparse MAP estimator for Gaussian signals, and then developing a CS-type algorithm for solving the sparse MAP problem. Despite the underlying assumption that the sequence of sparse signals is Gaussian, our approach also allows for efficient tracking of sparse non-Gaussian signals obtained via non-linear mappings, using only one sample/observation per time instance.
```
@inproceedings{DaiSejMil2011,
  author = {Dai, W. and Sejdinovic, D. and Milenkovic, O.},
  booktitle = {International Conference on Sampling Theory and Applications (SampTA)},
  title = {{Gaussian Dynamic Compressive Sensing}},
  year = {2011},
  bdsk-url-1 = {http://sampta2011.ntu.edu.sg/SampTA2011Proceedings/papers/Mo5S02.2-P0186.pdf}
}
```

2010

D. Sejdinovic , O. Johnson , Note on noisy group testing: asymptotic bounds and belief propagation reconstruction, in 48th Annual Allerton Conference on Communication, Control, and Computing, 2010, 998–1003.
An information theoretic perspective on group testing problems has recently been proposed by Atia and Saligrama, in order to characterise the optimal number of tests. Their results hold in the noiseless case, where only false positives occur, and where only false negatives occur. We extend their results to a model containing both false positives and false negatives, developing simple information theoretic bounds on the number of tests required. Based on these bounds, we obtain an improved order of convergence in the case of false negatives only. Since these results are based on (computationally infeasible) joint typicality decoding, we propose a belief propagation algorithm for the detection of defective items and compare its actual performance to the theoretical bounds.
```
@inproceedings{SejJoh2010,
  author = {Sejdinovic, D. and Johnson, O.},
  booktitle = {48th Annual Allerton Conference on Communication, Control, and Computing},
  doi = {10.1109/ALLERTON.2010.5707018},
  pages = {998--1003},
  title = {Note on noisy group testing: asymptotic bounds and belief propagation reconstruction},
  year = {2010},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ALLERTON.2010.5707018}
}
```
D. Sejdinovic , R. Piechocki , A. Doufexi , M. Ismail , Decentralised distributed fountain coding: asymptotic analysis and design, IEEE Communications Letters, vol. 14, no. 1, 42–44, 2010.
A class of generic decentralised distributed fountain coding schemes is introduced and the tools of analysis of the performance of such schemes are presented. It is demonstrated that the developed approach can be used to formulate a robust code design methodology in a number of instances. We show that two non-standard applications of fountain codes, fountain codes for distributed source coding and fountain codes for unequal error protection lie within this decentralised distributed fountain coding framework.
```
@article{SejPieDouIsm2010,
  author = {Sejdinovic, D. and Piechocki, R. and Doufexi, A. and Ismail, M.},
  doi = {10.1109/LCOMM.2010.01.091541},
  file = {pdf/2010CommLetter.pdf},
  journal = {IEEE Communications Letters},
  number = {1},
  pages = {42--44},
  title = {Decentralised distributed fountain coding: asymptotic analysis and design},
  volume = {14},
  year = {2010},
  bdsk-url-1 = {http://dx.doi.org/10.1109/LCOMM.2010.01.091541}
}
```
D. Sejdinovic , C. Andrieu , R. Piechocki , Bayesian sequential compressed sensing in sparse dynamical systems, in 48th Annual Allerton Conference on Communication, Control, and Computing, 2010, 1730–1736.
While the theory of compressed sensing provides means to reliably and efficiently acquire a sparse high-dimensional signal from a small number of its linear projections, sensing of dynamically changing sparse signals is still not well understood. We pursue a Bayesian approach to the problem of sequential compressed sensing and develop methods to recursively estimate the full posterior distribution of the signal.
```
@inproceedings{SejAndPie2010,
  author = {Sejdinovic, D. and Andrieu, C. and Piechocki, R.},
  booktitle = {48th Annual Allerton Conference on Communication, Control, and Computing},
  doi = {10.1109/ALLERTON.2010.5707125},
  pages = {1730--1736},
  title = {Bayesian sequential compressed sensing in sparse dynamical systems},
  year = {2010},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ALLERTON.2010.5707125}
}
```

2009

D. Sejdinovic , D. Vukobratovic , A. Doufexi , V. Senk , R. Piechocki , Expanding window fountain codes for unequal error protection, IEEE Transactions on Communications, vol. 57, no. 9, 2510–2516, 2009.
A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed. EWF codes use a windowing technique rather than a weighted (non-uniform) selection of input symbols to achieve UEP property. The windowing approach introduces additional parameters in the UEP rateless code design, making it more general and flexible than the weighted approach. Furthermore, the windowing approach provides better performance of UEP scheme, which is confirmed both theoretically and experimentally.
```
@article{SejVukDouSenPie2009,
  author = {Sejdinovic, D. and Vukobratovic, D. and Doufexi, A. and Senk, V. and Piechocki, R.},
  doi = {10.1109/TCOMM.2009.09.070616},
  file = {pdf/2009TransComm.pdf},
  journal = {IEEE Transactions on Communications},
  number = {9},
  pages = {2510--2516},
  title = {Expanding window fountain codes for unequal error protection},
  volume = {57},
  year = {2009},
  bdsk-url-1 = {http://dx.doi.org/10.1109/TCOMM.2009.09.070616}
}
```
D. Vukobratovic , V. Stankovic , D. Sejdinovic , L. Stankovic , Z. Xiong , Scalable video multicast using expanding window fountain codes, IEEE Transactions on Multimedia, vol. 11, no. 6, 1094–1104, 2009.
Fountain codes were introduced as an efficient and universal forward error correction (FEC) solution for data multicast over lossy packet networks. They have recently been proposed for large scale multimedia content delivery in practical multimedia distribution systems. However, standard fountain codes, such as LT or Raptor codes, are not designed to meet unequal error protection (UEP) requirements typical in real-time scalable video multicast applications. In this paper, we propose recently introduced UEP expanding window fountain (EWF) codes as a flexible and efficient solution for real-time scalable video multicast. We demonstrate that the design flexibility and UEP performance make EWF codes ideally suited for this scenario, i.e., EWF codes offer a number of design parameters to be tuned at the server side to meet the different reception criteria of heterogeneous receivers. The performance analysis using both analytical results and simulation experiments of H.264 scalable video coding (SVC) multicast to heterogeneous receiver classes confirms the flexibility and efficiency of the proposed EWF-based FEC solution.
```
@article{VukStaSejStaXio2009,
  author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.},
  doi = {10.1109/TMM.2009.2026087},
  file = {pdf/2009TransMultimedia.pdf},
  journal = {IEEE Transactions on Multimedia},
  number = {6},
  pages = {1094--1104},
  title = {Scalable video multicast using expanding window fountain codes},
  volume = {11},
  year = {2009},
  bdsk-url-1 = {http://dx.doi.org/10.1109/TMM.2009.2026087}
}
```
D. Sejdinovic , R. Piechocki , A. Doufexi , M. Ismail , Fountain code design for data multicast with side information, IEEE Transactions on Wireless Communications, vol. 8, no. 10, 5155–5165, 2009.
Fountain codes are a robust solution for data multicasting to a large number of receivers which experience variable channel conditions and different packet loss rates. However, the standard fountain code design becomes inefficient if all receivers have access to some side information correlated with the source information. We focus our attention on the cases where the correlation of the source and side information can be modelled by a binary erasure channel (BEC) or by a binary input additive white Gaussian noise channel (BIAWGNC). We analyse the performance of fountain codes in data multicasting with side information for these cases, derive bounds on their performance and provide a fast and robust linear programming optimization framework for code parameters. We demonstrate that systematic Raptor code design can be employed as a possible solution to the problem at the cost of higher encoding/decoding complexity, as it reduces the side information scenario to a channel coding problem. However, our results also indicate that a simpler solution, non-systematic LT and Raptor codes, can be designed to perform close to the information theoretic bounds.
```
@article{SejPieDouIsm2009,
  author = {Sejdinovic, D. and Piechocki, R. and Doufexi, A. and Ismail, M.},
  doi = {10.1109/TWC.2009.081076},
  file = {pdf/2009TransWirelessComm.pdf},
  journal = {IEEE Transactions on Wireless Communications},
  number = {10},
  pages = {5155--5165},
  publisher = {IEEE},
  title = {Fountain code design for data multicast with side information},
  volume = {8},
  year = {2009},
  bdsk-url-1 = {http://dx.doi.org/10.1109/TWC.2009.081076}
}
```
D. Sejdinovic , R. Piechocki , A. Doufexi , AND-OR tree analysis of distributed LT codes, in IEEE Information Theory Workshop (ITW), 2009, 261–265.
In this contribution, we consider design of distributed LT codes, i.e., independent rateless encodings of multiple sources which communicate to a common relay, where relay is able to combine incoming packets from the sources and forwards them to receivers. We provide density evolution formulae for distributed LT codes, which allow us to formulate distributed LT code design problem and prove the equivalence of performance of distributed LT codes and LT codes with related parameters in the asymptotic regime. Furthermore, we demonstrate that allowing LT coding apparatus at both the sources and the relay may prove advantageous to coding only at the sources and coding only at the relay.
```
@inproceedings{SejPieDou2009ITW,
  author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A.},
  booktitle = {IEEE Information Theory Workshop (ITW)},
  doi = {10.1109/ITWNIT.2009.5158583},
  file = {pdf/2009ITW.pdf},
  pages = {261--265},
  title = {{AND-OR tree analysis of distributed LT codes}},
  year = {2009},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ITWNIT.2009.5158583}
}
```
D. Vukobratovic , V. Stankovic , L. Stankovic , D. Sejdinovic , Precoded EWF codes for unequal error protection of scalable video, in International ICST Mobile Multimedia Communications Conference (MOBIMEDIA), 2009.
Rateless codes are forward error correcting (FEC) codes of linear encoding-decoding complexity and asymptotically capacity-approaching performance over erasure channels with any erasure statistics. They have been recently recognized as a simple and efficient solution for packetized video transmission over networks with packet erasures. However, to adapt the error correcting capabilities of rateless codes to the unequal importance of scalable video, unequal error protection (UEP) rateless codes are proposed as an alternative to standard rateless codes. In this paper, we extend our recent work on UEP rateless codes called Expanding Window Fountain (EWF) codes in order to improve their UEP performance. We investigate the design of precoded EWF codes, where precoding is done using high-rate Low-Density Parity-Check (LDPC) codes, following the similar reasoning applied in the design of Raptor codes. The obtained results are presented in the context of UEP error correcting performance of EWF codes and applied on scalable video coded (SVC) transmission over erasure networks.
```
@inproceedings{VukStaStaSej2009,
  author = {Vukobratovic, D. and Stankovic, V. and Stankovic, L. and Sejdinovic, D.},
  booktitle = {International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)},
  file = {pdf/2009MobimediaB.pdf},
  title = {{Precoded EWF codes for unequal error protection of scalable video}},
  year = {2009},
  bdsk-url-1 = {http://portal.acm.org/citation.cfm?id=1653559}
}
```
D. Sejdinovic , R. Piechocki , A. Doufexi , Rateless distributed source code design, in International ICST Mobile Multimedia Communications Conference (MOBIMEDIA), 2009.
Over the past decade, rateless codes, i.e., digital fountain codes, have emerged as an efficient and robust solution for reliable data transmission over packet erasure networks and a particularly suitable one for multicasting and broadcasting applications where users may experience variable channel conditions and packet loss rates, such as mobile environments. Luby Transform (LT) and Raptor codes are practical fountain codes with a capacity approaching performance and a low computational cost. In addition to their channel coding applications, the use of fountain codes for various kinds of distributed source compression and distributed joint-source channel coding has been extensively studied lately, and with promising results. However, a systematic treatise of the code design and optimization considerations for such non-standard applications of fountain codes is still absent. In this contribution, we overview the main results concerned with rateless codes for distributed source coding and outline several examples of data dissemination protocols where carefully designed fountain codes can provide strikingly simple, yet robust solutions yielding both distributed source coding and channel coding gains.
```
@inproceedings{SejPieDou2009,
  author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A.},
  booktitle = {International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)},
  file = {pdf/2009MobimediaA.pdf},
  title = {Rateless distributed source code design},
  year = {2009},
  bdsk-url-1 = {http://portal.acm.org/citation.cfm?id=1653578}
}
```
D. Sejdinovic , Topics in Fountain Coding, PhD thesis, University of Bristol, 2009.
The invention of the sparse graph codes, error correction codes with low complexity and rates close to capacity, has had an unrivaled impact on digital communication systems. A recent advance in the sparse graph codes, fountain coding, due to its natural rate adaptivity, is becoming an error correction coding scheme of choice for many multicasting and broadcasting systems. This thesis studies the use of fountain codes for several non-standard coding problems commonly occuring in communications. Generic decentralised distributed fountain coding schemes for networked communications are developed, discussed and analysed, where many non-cooperating source nodes communicate possibly correlated data to a large number of receivers. Several results concerning the generalised asymptotic analysis of the fountain decoder in this decentralised and distributed coding setting are presented. The problem of fountain codes with unequal error protection property is explored, where a novel class of fountain codes, Expanding Window Fountain (EWF) codes, is proposed, analysed and shown to offer competitive performance applicable to scalable video multicasting. Further, asymptotic analysis, code design and optimisation are derived for both symmetric and asymmetric Slepian-Wolf coding with fountain codes. It is shown how one can obtain both channel coding and distributed source coding gains with the same fountain coding scheme, by a judicious choice of the code parameters. The developed methods of asymptotic analysis are extended to the problem of independent fountain encodings at multiple source nodes which communicate to a common relay. It is shown that the re-encoding of the multiple fountain encoded bitstreams at the relay node with another fountain code may reduce the number of required transmissions, and the overall code optimisation methods of such schemes are derived. Finally, dual fountain codes are introduced and equipped with a low complexity quantisation algorithm for a lossy source coding problem dual to binary erasure channel coding.
```
@phdthesis{Sej2009,
  author = {Sejdinovic, D.},
  file = {pdf/PhD_TopicsInFountainCoding.pdf},
  school = {University of Bristol},
  title = {Topics in Fountain Coding},
  year = {2009}
}
```

2008

D. Vukobratovic , V. Stankovic , D. Sejdinovic , L. Stankovic , Z. Xiong , Expanding window fountain codes for scalable video multicast, in IEEE International Conference on Multimedia and Expo (ICME), 2008, 77–80.
Digital Fountain (DF) codes have recently been suggested as an efficient forward error correction (FEC) solution for video multicast to heterogeneous receiver classes over lossy packet networks. However, to adapt DF codes to low-delay constraints and varying importance of scalable multimedia content, unequal error protection (UEP) DF schemes are needed. Thus, in this paper, Expanding Window Fountain (EWF) codes are proposed as a FEC solution for scalable video multicast. We demonstrate that the design flexibility and UEP performance make EWF codes ideally suited for this scenario, i.e., EWF codes offer a number of design parameters to be ldquotunedrdquo at the server side to meet the different reception conditions of heterogeneous receivers. Performance analysis of H.264 Scalable Video Coding (SVC) multicast to heterogeneous receiver classes confirms the flexibility and efficiency of the proposed EWF-based FEC solution.
```
@inproceedings{VukStaSejStaXio2008,
  author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.},
  booktitle = {IEEE International Conference on Multimedia and Expo (ICME)},
  doi = {10.1109/ICME.2008.4607375},
  pages = {77--80},
  title = {Expanding window fountain codes for scalable video multicast},
  year = {2008},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ICME.2008.4607375}
}
```
D. Sejdinovic , R. Piechocki , A. Doufexi , M. Ismail , Fountain coding with decoder side information, in IEEE International Conference on Communications (ICC), 2008, 4477–4482.
In this contribution, we consider the application of digital fountain (DF) codes to the problem of data transmission when side information is available at the decoder. The side information is modelled as a "virtual" channel output when original information sequence is the input. For two cases of the system model, which model both the virtual and the actual transmission channel either as a binary erasure channel or as a binary input additive white Gaussian noise (BIAWGN) channel, we propose methods of enhancing the design of standard non-systematic DF codes by optimizing their output degree distribution based on the side information assumption. In addition, a systematic Raptor design has been employed as a possible solution to the problem.
```
@inproceedings{SejPieDouIsm2008ICC,
  author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A. and Ismail, M.},
  booktitle = {IEEE International Conference on Communications (ICC)},
  doi = {10.1109/ICC.2008.840},
  pages = {4477--4482},
  title = {Fountain coding with decoder side information},
  year = {2008},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ICC.2008.840}
}
```
D. Sejdinovic , V. Ponnampalam , R. Piechocki , A. Doufexi , The throughput analysis of different IR-HARQ schemes based on fountain codes, in IEEE Wireless Communications and Networking Conference (WCNC), 2008, 267–272.
In this contribution, we construct two novel IR-HARQ (automatic repeat request) schemes based on fountain codes, which combine the punctured and rateless IR-HARQ schemes, in order to attain the advantageous properties of both: nearly optimal performance of the former at the high signal-to-noise ratio (SNR) region and ratelessness of the latter. The preliminary simulation results indicate that these schemes are particularly suitable for scenarios where the transmission is originally assumed to occur at the very high SNR region, but resilience to severe deterioration of channel conditions is required.
```
@inproceedings{SejPonPieDou2008,
  author = {Sejdinovic, D. and Ponnampalam, V. and Piechocki, R.J. and Doufexi, A.},
  booktitle = {IEEE Wireless Communications and Networking Conference (WCNC)},
  doi = {10.1109/WCNC.2008.52},
  file = {pdf/2008WCNC.pdf},
  pages = {267--272},
  title = {The throughput analysis of different IR-HARQ schemes based on fountain codes},
  year = {2008},
  bdsk-url-1 = {http://dx.doi.org/10.1109/WCNC.2008.52}
}
```
D. Sejdinovic , R. Piechocki , A. Doufexi , M. Ismail , Rate adaptive binary erasure quantization with dual fountain codes, in IEEE Global Telecommunications Conference (GLOBECOM), 2008.
In this contribution, duals of fountain codes are introduced and their use for lossy source compression is investigated. It is shown both theoretically and experimentally that the source coding dual of the binary erasure channel coding problem, binary erasure quantization, is solved at a nearly optimal rate with application of duals of LT and raptor codes by a belief propagation-like algorithm which amounts to a graph pruning procedure. Furthermore, this quantizing scheme is rate adaptive, i.e., its rate can be modified on-the-fly in order to adapt to the source distribution, very much like LT and raptor codes are able to adapt their rate to the erasure probability of a channel.
```
@inproceedings{SejPieDouIsm2008,
  author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A. and Ismail, M.},
  booktitle = {IEEE Global Telecommunications Conference (GLOBECOM)},
  doi = {10.1109/GLOCOM.2008.ECP.238},
  file = {pdf/2008Globecom.pdf},
  title = {Rate adaptive binary erasure quantization with dual fountain codes},
  year = {2008},
  bdsk-url-1 = {http://dx.doi.org/10.1109/GLOCOM.2008.ECP.238}
}
```

2007

D. Vukobratovic , V. Stankovic , D. Sejdinovic , L. Stankovic , Z. Xiong , Scalable data multicast using expanding window fountain codes, in 45th Annual Allerton Conference on Communication, Control, and Computing, 2007.
Digital Fountain (DF) codes were introduced as an efficient and universal Forward Error Correction (FEC) solution for data multicast over lossy packet networks. However, in real-time applications, the DF encoder cannot make use of the “rateless” property as it was proposed in the DF framework, due to its delay constraints. In this scenario, many receivers might not be able to collect enough encoded symbols (packets) to perform succesful decoding of the source data block (e.g., they are connected as a low bit-rate receivers to a high bit-rate source stream, or they are affected by severe channel conditions). This paper proposes an application of recently introduced Expanding Window Fountain (EWF) codes as a scalable and efficient solution for real-time multicast data transmission. We show that, by carefully optimizing EWF code design parameters, it is possible to design a flexible DF solution that is capable of satisfying multicast data receivers over a wide range of data rates and/or erasure channel conditions.
```
@inproceedings{VukStaSejStaXio2007,
  author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.},
  booktitle = {45th Annual Allerton Conference on Communication, Control, and Computing},
  file = {pdf/2007Allerton.pdf},
  title = {Scalable data multicast using expanding window fountain codes},
  year = {2007}
}
```
D. Sejdinovic , D. Vukobratovic , A. Doufexi , V. Senk , R. Piechocki , Expanding window fountain codes for unequal error protection, in Asilomar Conference on Signals, Systems and Computers, 2007, 1020–1024.
A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed. EWF codes use a windowing technique rather than a weighted (non-uniform) selection of input symbols to achieve UEP property. The windowing approach introduces additional parameters in the UEP rateless code design, making it more general and flexible than the weighted approach. Furthermore, the windowing approach provides better performance of UEP scheme, which is confirmed both theoretically and experimentally.
```
@inproceedings{SejVukDouSenPie2007,
  author = {Sejdinovic, D. and Vukobratovic, D. and Doufexi, A. and Senk, V. and Piechocki, R.},
  booktitle = {Asilomar Conference on Signals, Systems and Computers},
  doi = {10.1109/ACSSC.2007.4487375},
  file = {pdf/2007Asilomar.pdf},
  pages = {1020--1024},
  title = {Expanding window fountain codes for unequal error protection},
  year = {2007},
  bdsk-url-1 = {http://dx.doi.org/10.1109/ACSSC.2007.4487375}
}
```

Software

2017

S. Flaxman , Y. W. Teh , D. Sejdinovic , Kernel Poisson. 2017.

Project: bigbayes

@software{FlaTehSej2017b,
  author = {Flaxman, Seth and Teh, Yee Whye and Sejdinovic, Dino},
  title = {Kernel {P}oisson},
  year = {2017}
}