This webpage summarises ongoing work in the area of probabilistic integration.
Probabilistic integration is one of the main sub-branches of probabilistic numerics. In essence, the idea is that any numerical analysis problem (e.g. solving an integral, a differential equation, a linear system, etc...) can be interpreted as a statistical inference problem. For instance, in the case of numerical integration, a common approach is to observe values at the function to be integrated at several points of the domain and approximate the integral using a weighted combination of these values. This can be seen as a statistical problem by considering the function evaluations as data points which are used to infer the quantity of interest, which in this case is the solution of the integral.
The aim of probabilistic numerics approaches is to obtain a better understanding and handling of numerical error, by accounting for those using tools from probability and statistics. One particular example where this could be useful is when several numerical solvers are used one after the other (maybe up to thousands of times) within a statistical procedure. In those cases, even if numerical error may be small for each individual solver, the errors could accumulate and completely invalidate the statistical procedure being caried out. Here, probabilistic numerics could propagate a measure of numerical uncertainty through the numerical solvers and could help diagnose when such situations will occur.
The particular method of numerical integration I have been working on is called Bayesian Quadrature. The method consists of approximating the integrand of interest with a Gaussian process, and then integrating analytically this approximation of the integrand (this is possible since integrals of Gaussian variables are themselves Gaussian; see sketch on the right hand side). In this case the approximation provided by the method will be a univariate (Gaussian) posterior distribution. Its mean is then taken to be the approximation of the integral and its variance represents our uncertainty over the solution of the numerical procedure. As the number of observations in the underlying Gaussian process increases, the posterior mass will concentrate on the true value of the integral. This posterior variance is also an example of a measure of uncertainty which can be propagated through subsequent computation.
The work discussed on the rest of this webpage focuses on providing efficient algorithms for probabilistic integration together with strong theoretical guarantees in both a statistical and numerical analysis sense. See the following blog post I wrote on recent advances in this area and feel free to try out this method using the R code available here.
Paper #1: Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees.
About: This paper provides a method of efficiently choosing design points for Bayesian Quadrature, based on a convex optimisation algorithm called the Frank-Wolfe algorithm. This algorithm can efficiently spread the points around the domain of integration where most of the measure is located. It is also the first probabilistic integration algorithm which has theoretical guarantees in the form of rates of convergence and contraction of the posterior probability distribution of Bayesian Quadrature.
- Sep 2015: The paper has been accepted to NIPS 2015 with a spotlight presentation!
- Oct 2015: We would also like to thank Imgmar Schuster for a nice blog review here.
Reference: Briol, F-X., Oates, C. J., Girolami, M., & Osborne, M. A. (2015). Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees. Advances In Neural Information Processing Systems (NIPS), pages 1162-1170. [paper][arXiv][conference]
Paper #2: Probabilistic Integration: A Role for Statisticians in Numerical Analysis?
About: This paper discusses extensively the usefulness of having a probabilistic numerical approach to numerical integration. It focuses mostly on Bayesian Monte Carlo and Bayesian Quasi Monte Carlo, the Bayesian Quadrature methods based on Monte Carlo and Quasi-Monte Carlo states. The paper shows that, under several assumptions on the regularity of the function to be integrated (e.g. assumptions on the smoothness of the function), such approaches can provide significant improvements over standard Monte Carlo methods. More precisely, the convergence rate (asymptotic rate at which the error decreases with the number of function evaluations) of these method can significantly outperform that of the non-probabilistic methods. Finally, The paper also demonstrates some of the potential advantages of having a probability distribution to summarise our numerical uncertainty, and shows numerically on several test functions that good calibration of this distribution is possible. An honest discussion of the advantages and disadvantages of the method is also provided and is illustrated on several applications ranging from computer graphics to petroleum engineering.
- Dec 2015: Nice blog posts by Andrew Gelman available here and by Christian Robert here.
- Jan 2016: The paper was awarded a Best Student Paper 2016 award by the Section for Bayesian Statistical Science of the American Statistical Association!
- March 2016: An updated version of the paper was uploaded to arxiv which now includes a greater discussion of how to calibrate the posterior probability distribution and also has additional results on convergence rates. Some R code to reproduce results is also available here: .
Reference: Briol, F-X., Oates, C. J., Girolami, M., Osborne, M. A. & Sejdinovic, D. (2015). Probabilistic Integration: A Role for Statisticians in Numerical Analysis? [arXiv]
Paper #3: Probabilistic Integration and Intractable Distributions.
About: This paper extends some of our previous work on Bayesian quadrature to cases where the probability distribution with respect to which we are integrating is not available in closed-form. In particular, we assume we have access to distribution only through samples, which is often the case in areas such as Bayesian calibration of computer models. We propose a probabilistic numerics approach to this problem which includes a model for the integrand together with another model for the kernel mean (also called kernel embedding). The approach is illustrated on a Bayesian forecasting problem for the Goodwin oscillator, a well known model of complex chemical systems made up of a system of ODEs.
- June 2016: Preprint available on arXiv!
- May 2017: Update of the paper, with application to functional cardiac models.
Reference: Oates, C. J., Niederer, S., Lee, A., Briol, F-X. & Girolami, M. (2016). Probabilistic Models for Integration Error in the
Assessment of Functional Cardiac Models. [arXiv]
Collaborators on this project:
- Chris. J. Oates (University of Newcastle & The Alan Turing Institute for Data Science)
Mark Girolami (Imperial College London, Department of Mathematics & The Alan Turing Institute for Data Science)
- Michael A. Osborne (University of Oxford, Department of Engineering Science & The Oxford-Man Institute of Quantitative Finance)
- Dino Sejdinovic (University of Oxford, Department of Statistics)
- Jon Cockayne (University of Warwick, Department of Statistics)