I am a DPhil at the University of Oxford supervised by Yee Whye Teh and Patrick Rebeschini. I am interested in researching the statistical properties of models that are implicitly regularised by gradient descent. More recently, I have been studying the statistical performance of the least norm solution in the context of high-dimension least squares regression.
Publications
2021
D. Richards
,
J. Mourtada
,
L. Rosasco
,
Asymptotics of Ridge(less) Regression under General Source Condition , in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, 2021, vol. 130, 3889–3897.
@inproceedings{richards2021asymptotics,
title = { Asymptotics of Ridge(less) Regression under General Source Condition },
author = {Richards, Dominic and Mourtada, Jaouad and Rosasco, Lorenzo},
booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
pages = {3889--3897},
year = {2021},
volume = {130},
series = {Proceedings of Machine Learning Research},
month = {13--15 Apr},
publisher = {PMLR}
}
D. Richards
,
M. Rabbat
,
Learning with Gradient Descent and Weakly Convex Losses , in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, 2021, vol. 130, 1990–1998.
We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of the empirical risk’s Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample guarantees are then achieved by decomposing the test error into generalisation, optimisation and approximation errors, each of which can be bounded and traded off with respect to algorithmic parameters, sample size and magnitude of this eigenvalue. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity, specifically, the Hessian’s smallest eigenvalue during training can be controlled by the normalisation of the layers, i.e., network scaling. This allows test error guarantees to then be achieved when the population risk minimiser satisfies a complexity assumption. By trading off the network complexity and scaling, insights are gained into the implicit bias of neural network scaling, which are further supported by experimental findings.
@inproceedings{richards2021learning,
title = { Learning with Gradient Descent and Weakly Convex Losses },
author = {Richards, Dominic and Rabbat, Mike},
booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
pages = {1990--1998},
year = {2021},
volume = {130},
series = {Proceedings of Machine Learning Research},
month = {13--15 Apr},
publisher = {PMLR}
}
2020
D. Richards
,
P. Rebeschini
,
Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent, Journal of Machine Learning Research, vol. 21, no. 34, 1–44, 2020.
@article{JMLR:v21:18-638,
author = {Richards, Dominic and Rebeschini, Patrick},
title = {Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent},
journal = {Journal of Machine Learning Research},
year = {2020},
volume = {21},
number = {34},
pages = {1-44}
}
D. Richards
,
P. Rebeschini
,
L. Rosasco
,
Decentralised Learning with Random Features and Distributed Gradient Descent, in Proceedings of the 37th International Conference on Machine Learning, 2020, vol. 119, 8105–8115.
@inproceedings{richards2020decentralised,
title = {Decentralised Learning with Random Features and Distributed Gradient Descent},
author = {Richards, Dominic and Rebeschini, Patrick and Rosasco, Lorenzo},
booktitle = {Proceedings of the 37th International Conference on Machine Learning},
pages = {8105--8115},
year = {2020},
volume = {119},
series = {Proceedings of Machine Learning Research},
month = {13--18 Jul},
publisher = {PMLR}
}
2019
D. Richards
,
P. Rebeschini
,
Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up, in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, 1216–1227.
@incollection{NIPS2019_8405,
title = {Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up},
author = {Richards, Dominic and Rebeschini, Patrick},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {Wallach, H. and Larochelle, H. and Beygelzimer, A. and d\textquotesingle Alch\'{e}-Buc, F. and Fox, E. and Garnett, R.},
pages = {1216--1227},
year = {2019},
publisher = {Curran Associates, Inc.}
}