Algorithms & Computationally Intensive Inference seminars
Terms 13, C1.06, 12:1514:00 (12:1512:45 is an informal sandwich lunch). On 3rd February 2016 (Week 4, Term 2), we will be in OC1.06.
Reminder emails are not sent to participants unless there is a change to the scheduled programme at short notice. If you would like to speak, or you want to be included in any emails, please contact one of the organisers.
Current Organisers: Paul Jenkins, Adam Johansen, Krzysztof Łatuszyński, Anthony Lee, Murray Pollock, Gareth Roberts. Term 1: Cyril Chimisov, Lewis Rendell. Term 2: Valerio Perrone, Nick Tawn.
Website URL: www.warwick.ac.uk/compstat
Mailing List SignUp: http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar
Mailing List: algorithmseminar@listserv.csv.warwick.ac.uk (NB  only approved members can post)
2016/17 Term 1:
 Week 2  14th October  Short Talks
 Talk 1  Cyril Chimisov (Warwick)  "Adaptive MCMC"
 Talk 2  Lassi Roininen (Warwick)  "Cauchy priors, MetropoliswithinGibbs and applications in tomography"
 Abstract: We present a method for making edgepreserving Bayesian statistical inversion with Cauchy priors. For drawing estimators, we use MetropoliswithinGibbs algorithm. We apply the developed methodology to computerised tomography.
 Talk 3  Hanne Kekkonen (Warwick)  "Convergence rates of MCMC sampler for linear models in Wasserstein metric."
 Week 3  21st October  Short Talks
 Talk 1  Ryan Christ (Oxford)
 Talk 2  Anthony Lee (Warwick)
 Talk 3  Divakar Kumar (Warwick)
 Week 4  28th October  Short Talks
 Talk 1  Matt Moores (Warwick)  "Accelerating pseudomarginal methods using Gaussian processes"
 Joint work with Chris Drovandi (QUT) and Richard Boys (Newcastle). http://eprints.qut.edu.au/90973/
 Talk 2  Murray Pollock (Warwick)  "Dividing and (Exactly) Conquering"
 Talk 1  Matt Moores (Warwick)  "Accelerating pseudomarginal methods using Gaussian processes"
 Week 5  4th November  Short Talks
 Talk 1  Krys Łatuszyński (Warwick)
 Talk 2  Nick Tawn (Warwick)
 Talk 3  Sebastian Vollmer (Warwick)
 Week 6  11th November  Matthew Pearce (Cambridge)  "MCMC for Bayesian nonparametric matrix factorisation in high dimensions"
 Abstract: The fMRI modality of neuroimaging can generally be described as high dimensional and low sample size. Within fMRI analysis there is a family of methods utilising Independent Components Analysis as a matrix factorisation tool. An appropriate rank for the factorisation is typically unknown and must be estimated. We extend a Bayesian nonparametric model capable of integrating dimensionality estimation with other aspects of model fitting. Due to scale of the data involved, MCMC based inference requires both exploitation of the graphical structure of the collapsed Indian Buffet Process and use of a GPU cluster computer. We will touch on the advances in numerical computing found in the Julia language which facilitate the development of statistical software for parallel architectures.
 Week 7  18th November  Axel Finke (Cambridge)  "Approximate Smoothing and Parameter Estimation in HighDimensional StateSpace Models"
 Abstract: We present approximate algorithms for performing smoothing in a class of highdimensional statespace models via sequential Monte Carlo methods ("particle filters"). In high dimensions, a prohibitively large number of Monte Carlo samples ("particles")  growing exponentially in the dimension of the state space  is usually required to obtain a useful smoother. Using blocking strategies as in Rebeschini and Van Handel (2015) (and earlier pioneering work on blocking), we exploit the spatial ergodicity properties of the model to circumvent this curse of dimensionality. We thus obtain approximate smoothers that can be computed recursively in time and in parallel in space. First, we show that the bias of our blocked smoother is bounded uniformly in the time horizon and in the model dimension. We then approximate the blocked smoother with particles and derive the asymptotic variance of idealised versions of our blocked particle smoother to show that variance is no longer adversely effected by the dimension of the model. Finally, we employ our method to successfully perform maximumlikelihood estimation via stochastic gradientascent and stochastic expectationmaximisation algorithms in a 100dimensional statespace model. This is joint work with Sumeetpal S. Singh and the paper is available at https://arxiv.org/abs/1606.08650
 Week 8  25th November  Adam Johansen (Warwick)  "Divide and Conquer Sequential Monte Carlo"
 Abstract: We propose a novel class of Sequential Monte Carlo (SMC) algorithms, appropriate for inference in probabilistic graphical models. This class of algorithms adopts a divideandconquer approach based upon an auxiliary treestructured decomposition of the model of interest, turning the overall inferential task into a collection of recursively solved subproblems. The proposed method is applicable to a broad class of probabilistic graphical models, including models with loops. Unlike a standard SMC sampler, the proposed DivideandConquer SMC employs multiple independent populations of weighted particles, which are resampled, merged, and propagated as the method progresses. We provide an initial theoretical characterisation of the class of algorithms and illustrate empirically that this approach can outperform standard methods in terms of the accuracy of the posterior expectation and marginal likelihood approximations.
 Week 9  2nd December  Philip Maybank (Reading)  "Scalable and accurate inference for differential equations
with a stable steady state using model reparametrisation and spectral analysis"  Week 10  9th December  Kasia Taylor (Warwick)  "Construction of approximate likelihood for inference in rough paths setting"
2016/17 Term 2:
 Week 1 13th January  Cancelled and talks start in week 2
 Week 2  20th January  Gareth Roberts (Warwick)
 Week 3  27th January  Valerio Perrone (Warwick)  "Relativistic Monte Carlo"
 Abstract: Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a MetropolisHastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In particular the mass matrix of HMC is hard to tune. In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduce a maximum velocity on particles. We also derive stochastic gradient versions of the algorithm and show that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning. Based on this, we develop relativistic stochastic gradient descent by taking the zerotemperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo. In experiments we show that the relativistic algorithms perform better than classical Newtonian variants and Adam.
 Week 4  3rd February  Cancelled
 NB  We will be in OC1.06, not C1.06, on 3rd February 2016
 Week 5  10th February  Tigran Nagapetyan (Oxford)
 Week 6  17th February  Mathieu Gerber (Bristol)  Convergence of Advanced Resampling Schemes
 Abstract: A resampling scheme is an algorithm for transforming $N$ weighted variables into $N$ unweighted variables. A good resampling scheme should be such that the resampling error is small; i.e. the empirical measures of the initial variables and of the resampled variables should be close in some sense. Resampling schemes play an import role in particular in particle filtering. We develop results for advanced resampling schemes. We first study resampling schemes based on Hilbert ordering. We establish their consistency (the resampling error goes to zero as $N\rightarrow +\infty$) and show that the variance of the resulting Hilbertordered version of stratified resampling is $\smallo(N^{(1+1/d)})$ under mild conditions. By contrast, the resampling error of basic resampling schemes is $\bigO(N^{1})$. We then provide a general consistency result for unordered resampling schemes satisfying a negative association condition. As a corollary, we establish the consistency of stratified resampling and propose a 'valid' version of systematic resampling that we refer to as SSP resampling. We finally derive a central limit theorem for particle filters based on Hilbertordered version of stratified resampling; the soobtained asymptotic variance are smaller than for a particle filter based on multinomial resampling.Joint work with Nicolas Chopin and Nick Whiteley.
 Week 7  24th February  Peter Neal (Lancaster)  Optimal scaling of the independence sampler
 Abstract: The independence sampler is one of the most commonly used MCMC algorithms usually as a component of a MetropoliswithinGibbs algorithm. The common focus for the independence sampler is on the choice of proposal distribution to obtain an as high as possible acceptance rate. In this talk we have a somewhat different focus concentrating on the use of the independence sampler for updating augmented data in a Bayesian framework where a natural proposal distribution for the independence sampler exists. Thus we concentrate on the proportion of the augmented data to update to optimise the independence sampler. Generic guidelines for optimising the independence sampler are obtained for independent and identically distributed product densities mirroring findings for the random walk Metropolis algorithm. The generic guidelines are shown to be informative beyond the narrow confines of idealised product densities using an SIR epidemic example.
 Week 8  3rd March  Philip Jonathan and David Randell (Shell) Extreme ocean environments
 Abstract: Characterisation of extreme ocean storm seas is complicated by temporal, spatial and covariate dependence, and presents interesting inferential challenges. We present a model for peaksoverthreshold, nonstationary with respect to multidimensional covariates, estimated using carefullydesigned computationallyefficient Bayesian inference incorporating Gibbs and Metropolis (mMALA) within Gibbs sampling procedures. Model parameters are functions of covariates with penalised Bspline representations. Slick GLAM linear algebra greatly reduces computational burden.
Return values corresponding to the 10,000year return period are key regulatory requirements for the design and reliability (re)assessment of marine and coastal structures. We quantify the joint directional and seasonal variation of extreme ocean storm severity in terms of posterior predictive return value distributions estimated efficiently by numerical integration.
The methodology (and  as time permits  current research ideas) are illustrated using real design studies from ocean basins worldwide.
 Week 9  10th March  Paul Fearnhead (Lancaster)  Asymptotics of Approximate Bayesian Computation
 Abstract: Many statistical applications involve models for which it is difficult to evaluate the likelihood, but relatively easy to sample from. Approximate Bayesian computation (ABC) is a likelihoodfree method for implementing Bayesian inference in such cases. It avoids evaluating the likelihood function and generates samples from an approximate posterior distribution by jointly simulating the parameter and the data, and accepting parameter values that give simulated data close to the observed data.
I will present some recent results on the asymptotics for ABC. These results will look at (i) what is the asymptotic distribution of the ABC posterior mean; (ii) what does the ABC posterior converge to; (iii) how does the Monte Carlo accuracy of ABC algorithms change as we have more data. In particular I will show that for a certain implementation of ABC, the ABC posterior will converge to the true posterior (given our choice of summaries for the data) and that there are Monte Carlo algorithms for simulating from the posterior that have an acceptance rate that tends to 1.
 Abstract: Many statistical applications involve models for which it is difficult to evaluate the likelihood, but relatively easy to sample from. Approximate Bayesian computation (ABC) is a likelihoodfree method for implementing Bayesian inference in such cases. It avoids evaluating the likelihood function and generates samples from an approximate posterior distribution by jointly simulating the parameter and the data, and accepting parameter values that give simulated data close to the observed data.
 Week 10  17th March  Geoff Nicholls (Oxford)  Simultaneous estimation of spatial frequency fields and measurement locations, with an application to spatial location of Late Middle English texts
 Abstract: Our data are indirect measurements of spatial frequency fields. In our application
there are several hundred distinct multivariate frequency fields over each point in space, and we
have approximately one thousand data vectors, each recording Bernoulli observations of each
frequency field at each point. If we knew the locations at which the observations had been taken
we could interpolate the frequency fields (using for example, logistic regression for a spatial field).In fact the observation locations are known only for a minority of the data vectors (we call these
data vectors "anchors"). We give models and joint estimation procedures for the locations of the
nonanchor data vectors and the unobserved spatial frequency fields. The parameter space is huge
and the data sparse, as we must reconstruct hundreds of spatial fields, data vector locations and
observation model parameters.We apply our methods to locate dialect samples on a map of England using other dialect samples
with known location. The data are feature vectors extracted from written dialect samples (old texts).
Just a fraction of the texts ("anchors") have an associated spatial location. The data set
is large, but sparse, since a given word has a large number of variant spellings which may appear
in just a few documents.Interestingly, straightforward (computationally intensive) Bayesian inference doesn’t seem to work.
Related analyses of Genetic data used cut models in which the feedback from nonanchors into the
frequency fields is removed, and this is effective in our case also. We discuss why straightforward
methods do not work, and what can be done to fix the problem and use all the information in the data.
 Abstract: Our data are indirect measurements of spatial frequency fields. In our application
2016/17 Term 3:
 Week 1 28th April  Simon Wood (Bristol)  Generalized additive models for gigadata: modelling black smoke in the UK
 Week 2  5th May  Simon Spencer (Warwick)  Talk Title / Abstract TBC
 Week 3  12th May  Richard Wilkinson (Sheffield)  Gaussian process accelerated ABC

Approximate Bayesian computation (ABC) is a class of algorithms used for doing Bayesian inference when you do not have access to the likelihood function. Instead, all simulation is done using realisations from the simulator. One of the major challenges for ABC methods is dealing with the computational cost that arises from needing to repeatedly run the simulator.
In this talk, I will discuss surrogate modelling approaches to speeding up ABC, in which we seek to approximate the behaviour of the simulator, before inverting it to learn the parameters. The challenges for these types of approaches include: what should our target of approximation be? How should we build the surrogate? What design is best used for the simulator evaluation? And how should we evaluate the resulting approximation.

 Week 4  19th May  Sebastian Vollmer (Warwick)  Does subsampling help in Stochastic Gradient Langevin Dynamics?
 Bayesian statistics offers a principled way to reason about uncertainty and incorporate prior information. It naturally helps to prevent overfitting. Unfortunately these advantages come at a cost  exact sampling from the posterior is typically impossible and approximate methods are needed. Markov Chain Monte Carlo (MCMC) methods are an appealing class of methods for posterior sampling since they produce asymptotically exact results. As datasets become ever larger and models become ever more complicated there is a demand for more scalable sampling techniques. One recently introduced class of methods is stochastic gradient MCMC (\cite{welling2011bayesian}). These methods generally use a discretisation of a stochastic differential equation (SDE) with the correct invariant distribution. For scalability, in every iteration stochastic gradients based on a subset of the data are used. \cite{Ma2015} provide a general framework for such samplers. It can be shown that these methods are asymptotically exact for decreasing stepsize schemes. However, in practice these methods are used with a constant stepsize, incurring a bias. Stochastic gradient MCMC methods have been applied across a large range of machine learning application such as matrix factorization models (\cite{Chen2014,Ding2014,Ahn2015}), topic models (\cite{Ding2014,Gan2015}) and neural networks (\cite{Li2016,Chen2014}). While SGMCMC produces stateoftheart results in many applications there has been little work attempting to quantify the bias introduced by the stochastic gradients. In this paper we consider the computational cost of reaching a certain accuracy as the size of the dataset increases. Note that the required accuracy depends both on the functional we are interested in and the width of the posterior. For example if we are interested in estimating the posterior mean of a parameter then the natural scale is the standard deviation which will scale with the size of the dataset (typically as $N^{\frac{1}{2}}$). We show that in a simple Gaussian toy model the computational cost of reaching a certain error relative to the width of the posterior with stochastic gradient methods scale in the same way as methods using full gradients. Therefore SGMCMC is at most better by a constant factor. However we argue that in a bigdata setting the dependence on the size of the data set dominates. This observation raises important question about the use of stochastic gradient methods in more complicated models  does the good performance of stochastic gradient methods come from averaging different models (similar to ensemble methods) rather than faithful posterior simulation.
 Week 5  26th May  Luke Kelly (Oxford)  Talk Title / Abstract TBC
 Week 6  2nd June  Chris Nemeth (Lancaster)  Talk Title / Abstract TBC
 Week 7  9th June  OxWaSP speakers  Talk Titles / Abstracts TBC
 Week 8  16th June  George Deligiannidis (Kings)  Talk Title / Abstract TBC
 Week 9  23rd June  Dootika Vats (Warwick)  Talk Title / Abstract TBC
 Week 10  30th June  Theo Damoulas (Warwick) and Matt Kusner (Warwick / ATI)  Titles TBC [Provisional]
Previous Years:
some key phrases:
 sampling and inference for diffusions
 exact algorithms
 intractable likelihood
 pseudomarginal algorithms
 particle filters
 importance sampling
 MCMC
 adaptive MCMC
 perfect simulation
 Markov chains...
 other random structures...
 and randomised algorithms...