# Algorithms & Computationally Intensive Inference seminars

**Terms 1-3, C1.06, 12:15-14:00 (12:15-12:45 is an informal sandwich lunch). On 3rd February 2016 (Week 4, Term 2), we will be in OC1.06.**

*Reminder emails are not sent to participants unless there is a change to the scheduled programme at short notice. If you would like to speak, or you want to be included in any emails, please contact one of the organisers.*

**Current Organisers**: Paul Jenkins, Adam Johansen, Krzysztof Łatuszyński, Anthony Lee, Murray Pollock, Gareth Roberts. **Term 1:** Cyril Chimisov, Lewis Rendell. **Term 2:** Valerio Perrone, Nick Tawn.

**Website URL:** www.warwick.ac.uk/compstat

**Mailing List Sign-Up:** http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar

**Mailing List:** algorithmseminar@listserv.csv.warwick.ac.uk **(NB - only approved members can post)**

**2016/17 Term 1:**

- Week 2 - 14th October - Short Talks
- Talk 1 - Cyril Chimisov (Warwick) -
*"Adaptive MCMC"* - Talk 2 - Lassi Roininen (Warwick) -
*"Cauchy priors, Metropolis-within-Gibbs and applications in tomography"***Abstract:**We present a method for making edge-preserving Bayesian statistical inversion with Cauchy priors. For drawing estimators, we use Metropolis-within-Gibbs algorithm. We apply the developed methodology to computerised tomography.

- Talk 3 - Hanne Kekkonen (Warwick) -
*"Convergence rates of MCMC sampler for linear models in Wasserstein metric."*

- Talk 1 - Cyril Chimisov (Warwick) -
- Week 3 - 21st October - Short Talks
- Talk 1 - Ryan Christ (Oxford)
- Talk 2 - Anthony Lee (Warwick)
- Talk 3 - Divakar Kumar (Warwick)

- Week 4 - 28th October - Short Talks
- Talk 1 - Matt Moores (Warwick) -
*"Accelerating pseudo-marginal methods using Gaussian processes"*- Joint work with Chris Drovandi (QUT) and Richard Boys (Newcastle). http://eprints.qut.edu.au/90973/

- Talk 2 - Murray Pollock (Warwick) -
*"Dividing and (Exactly) Conquering"*

- Talk 1 - Matt Moores (Warwick) -
- Week 5 - 4th November - Short Talks
- Talk 1 - Krys Łatuszyński (Warwick)
- Talk 2 - Nick Tawn (Warwick)
- Talk 3 - Sebastian Vollmer (Warwick)

- Week 6 - 11th November - Matthew Pearce (Cambridge) -
*"MCMC for Bayesian nonparametric matrix factorisation in high dimensions"***Abstract:**The fMRI modality of neuroimaging can generally be described as high dimensional and low sample size. Within fMRI analysis there is a family of methods utilising Independent Components Analysis as a matrix factorisation tool. An appropriate rank for the factorisation is typically unknown and must be estimated. We extend a Bayesian nonparametric model capable of integrating dimensionality estimation with other aspects of model fitting. Due to scale of the data involved, MCMC based inference requires both exploitation of the graphical structure of the collapsed Indian Buffet Process and use of a GPU cluster computer. We will touch on the advances in numerical computing found in the Julia language which facilitate the development of statistical software for parallel architectures.

- Week 7 - 18th November - Axel Finke (Cambridge) -
*"Approximate Smoothing and Parameter Estimation in High-Dimensional State-Space Models"***Abstract:**We present approximate algorithms for performing smoothing in a class of high-dimensional state-space models via sequential Monte Carlo methods ("particle filters"). In high dimensions, a prohibitively large number of Monte Carlo samples ("particles") -- growing exponentially in the dimension of the state space -- is usually required to obtain a useful smoother. Using blocking strategies as in Rebeschini and Van Handel (2015) (and earlier pioneering work on blocking), we exploit the spatial ergodicity properties of the model to circumvent this curse of dimensionality. We thus obtain approximate smoothers that can be computed recursively in time and in parallel in space. First, we show that the bias of our blocked smoother is bounded uniformly in the time horizon and in the model dimension. We then approximate the blocked smoother with particles and derive the asymptotic variance of idealised versions of our blocked particle smoother to show that variance is no longer adversely effected by the dimension of the model. Finally, we employ our method to successfully perform maximum-likelihood estimation via stochastic gradient-ascent and stochastic expectation--maximisation algorithms in a 100-dimensional state-space model. This is joint work with Sumeetpal S. Singh and the paper is available at https://arxiv.org/abs/1606.08650

- Week 8 - 25th November - Adam Johansen (Warwick) -
*"Divide and Conquer Sequential Monte Carlo"*

**Abstract:**We propose a novel class of Sequential Monte Carlo (SMC) algorithms, appropriate for inference in probabilistic graphical models. This class of algorithms adopts a divide-and-conquer approach based upon an auxiliary tree-structured decomposition of the model of interest, turning the overall inferential task into a collection of recursively solved sub-problems. The proposed method is applicable to a broad class of probabilistic graphical models, including models with loops. Unlike a standard SMC sampler, the proposed Divide-and-Conquer SMC employs multiple independent populations of weighted particles, which are resampled, merged, and propagated as the method progresses. We provide an initial theoretical characterisation of the class of algorithms and illustrate empirically that this approach can outperform standard methods in terms of the accuracy of the posterior expectation and marginal likelihood approximations.

- Week 9 - 2nd December - Philip Maybank (Reading) -
*"Scalable and accurate inference for differential equations*

*with a stable steady state using model reparametrisation and spectral analysis"* - Week 10 - 9th December - Kasia Taylor (Warwick) -
*"Construction of approximate likelihood for inference in rough paths setting"*

**2016/17 Term 2:**

- Week 1 -13th January - Cancelled and talks start in week 2
- Week 2 - 20th January - Gareth Roberts (Warwick)
- Week 3 - 27th January - Valerio Perrone (Warwick) - "
*Relativistic Monte Carlo*"**Abstract**: Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a Metropolis-Hastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In particular the mass matrix of HMC is hard to tune. In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduce a maximum velocity on particles. We also derive stochastic gradient versions of the algorithm and show that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning. Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo. In experiments we show that the relativistic algorithms perform better than classical Newtonian variants and Adam.

- Week 4 - 3rd February -
**Cancelled****NB - We will be in OC1.06, not C1.06, on 3rd February 2016**

- Week 5 - 10th February - Tigran Nagapetyan (Oxford)
- Week 6 - 17th February - Mathieu Gerber (Bristol) - Convergence of Advanced Resampling Schemes
**Abstract:**A resampling scheme is an algorithm for transforming $N$ weighted variables into $N$ unweighted variables. A good resampling scheme should be such that the resampling error is small; i.e. the empirical measures of the initial variables and of the resampled variables should be close in some sense. Resampling schemes play an import role in particular in particle filtering. We develop results for advanced resampling schemes. We first study resampling schemes based on Hilbert ordering. We establish their consistency (the resampling error goes to zero as $N\rightarrow +\infty$) and show that the variance of the resulting Hilbert-ordered version of stratified resampling is $\smallo(N^{-(1+1/d)})$ under mild conditions. By contrast, the resampling error of basic resampling schemes is $\bigO(N^{-1})$. We then provide a general consistency result for unordered resampling schemes satisfying a negative association condition. As a corollary, we establish the consistency of stratified resampling and propose a 'valid' version of systematic resampling that we refer to as SSP resampling. We finally derive a central limit theorem for particle filters based on Hilbert-ordered version of stratified resampling; the so-obtained asymptotic variance are smaller than for a particle filter based on multinomial resampling.Joint work with Nicolas Chopin and Nick Whiteley.

- Week 7 - 24th February - Peter Neal (Lancaster) - Optimal scaling of the independence sampler
**Abstract:**The independence sampler is one of the most commonly used MCMC algorithms usually as a component of a Metropolis-within-Gibbs algorithm. The common focus for the independence sampler is on the choice of proposal distribution to obtain an as high as possible acceptance rate. In this talk we have a somewhat different focus concentrating on the use of the independence sampler for updating augmented data in a Bayesian framework where a natural proposal distribution for the independence sampler exists. Thus we concentrate on the proportion of the augmented data to update to optimise the independence sampler. Generic guidelines for optimising the independence sampler are obtained for independent and identically distributed product densities mirroring findings for the random walk Metropolis algorithm. The generic guidelines are shown to be informative beyond the narrow confines of idealised product densities using an SIR epidemic example.

- Week 8 - 3rd March - Philip Jonathan and David Randell (Shell)- Extreme ocean environments
**Abstract:**Characterisation of extreme ocean storm seas is complicated by temporal, spatial and covariate dependence, and presents interesting inferential challenges. We present a model for peaks-over-threshold, non-stationary with respect to multidimensional covariates, estimated using carefully-designed computationally-efficient Bayesian inference incorporating Gibbs and Metropolis (mMALA) within Gibbs sampling procedures. Model parameters are functions of covariates with penalised B-spline representations. Slick GLAM linear algebra greatly reduces computational burden.

Return values corresponding to the 10,000-year return period are key regulatory requirements for the design and reliability (re-)assessment of marine and coastal structures. We quantify the joint directional and seasonal variation of extreme ocean storm severity in terms of posterior predictive return value distributions estimated efficiently by numerical integration.

The methodology (and - as time permits - current research ideas) are illustrated using real design studies from ocean basins worldwide.

- Week 9 - 10th March - Paul Fearnhead (Lancaster) - Asymptotics of Approximate Bayesian Computation
**Abstract:**Many statistical applications involve models for which it is difficult to evaluate the likelihood, but relatively easy to sample from. Approximate Bayesian computation (ABC) is a likelihood-free method for implementing Bayesian inference in such cases. It avoids evaluating the likelihood function and generates samples from an approximate posterior distribution by jointly simulating the parameter and the data, and accepting parameter values that give simulated data close to the observed data.

I will present some recent results on the asymptotics for ABC. These results will look at (i) what is the asymptotic distribution of the ABC posterior mean; (ii) what does the ABC posterior converge to; (iii) how does the Monte Carlo accuracy of ABC algorithms change as we have more data. In particular I will show that for a certain implementation of ABC, the ABC posterior will converge to the true posterior (given our choice of summaries for the data) and that there are Monte Carlo algorithms for simulating from the posterior that have an acceptance rate that tends to 1.

- Week 10 - 17th March - Geoff Nicholls (Oxford) - Simultaneous estimation of spatial frequency fields and measurement locations, with an application to spatial location of Late Middle English texts
**Abstract:**Our data are indirect measurements of spatial frequency fields. In our application

there are several hundred distinct multivariate frequency fields over each point in space, and we

have approximately one thousand data vectors, each recording Bernoulli observations of each

frequency field at each point. If we knew the locations at which the observations had been taken

we could interpolate the frequency fields (using for example, logistic regression for a spatial field).In fact the observation locations are known only for a minority of the data vectors (we call these

data vectors "anchors"). We give models and joint estimation procedures for the locations of the

non-anchor data vectors and the unobserved spatial frequency fields. The parameter space is huge

and the data sparse, as we must reconstruct hundreds of spatial fields, data vector locations and

observation model parameters.We apply our methods to locate dialect samples on a map of England using other dialect samples

with known location. The data are feature vectors extracted from written dialect samples (old texts).

Just a fraction of the texts ("anchors") have an associated spatial location. The data set

is large, but sparse, since a given word has a large number of variant spellings which may appear

in just a few documents.Interestingly, straightforward (computationally intensive) Bayesian inference doesn’t seem to work.

Related analyses of Genetic data used cut models in which the feedback from non-anchors into the

frequency fields is removed, and this is effective in our case also. We discuss why straightforward

methods do not work, and what can be done to fix the problem and use all the information in the data.

**2016/17 Term 3:**

- Week 1 -28th April - Simon Wood (Bristol) - Generalized additive models for gigadata: modelling black smoke in the UK
- Week 2 - 5th May - Simon Spencer (Warwick)

- Week 3 - 12th May - Richard Wilkinson (Sheffield) - Gaussian process accelerated ABC

Approximate Bayesian computation (ABC) is a class of algorithms used for doing Bayesian inference when you do not have access to the likelihood function. Instead, all simulation is done using realisations from the simulator. One of the major challenges for ABC methods is dealing with the computational cost that arises from needing to repeatedly run the simulator.

In this talk, I will discuss surrogate modelling approaches to speeding up ABC, in which we seek to approximate the behaviour of the simulator, before inverting it to learn the parameters. The challenges for these types of approaches include: what should our target of approximation be? How should we build the surrogate? What design is best used for the simulator evaluation? And how should we evaluate the resulting approximation.

- Week 4 - 19th May - Sebastian Vollmer (Warwick) - Does subsampling help in Stochastic Gradient Langevin Dynamics?

- Bayesian statistics offers a principled way to reason about uncertainty and incorporate prior information. It naturally helps to prevent overfitting. Unfortunately these advantages come at a cost - exact sampling from the posterior is typically impossible and approximate methods are needed. Markov Chain Monte Carlo (MCMC) methods are an appealing class of methods for posterior sampling since they produce asymptotically exact results. As datasets become ever larger and models become ever more complicated there is a demand for more scalable sampling techniques. One recently introduced class of methods is stochastic gradient MCMC (\cite{welling2011bayesian}). These methods generally use a discretisation of a stochastic differential equation (SDE) with the correct invariant distribution. For scalability, in every iteration stochastic gradients based on a subset of the data are used. \cite{Ma2015} provide a general framework for such samplers. It can be shown that these methods are asymptotically exact for decreasing stepsize schemes. However, in practice these methods are used with a constant stepsize, incurring a bias. Stochastic gradient MCMC methods have been applied across a large range of machine learning application such as matrix factorization models (\cite{Chen2014,Ding2014,Ahn2015}), topic models (\cite{Ding2014,Gan2015}) and neural networks (\cite{Li2016,Chen2014}). While SGMCMC produces state-of-the-art results in many applications there has been little work attempting to quantify the bias introduced by the stochastic gradients. In this paper we consider the computational cost of reaching a certain accuracy as the size of the dataset increases. Note that the required accuracy depends both on the functional we are interested in and the width of the posterior. For example if we are interested in estimating the posterior mean of a parameter then the natural scale is the standard deviation which will scale with the size of the dataset (typically as $N^{-\frac{1}{2}}$). We show that in a simple Gaussian toy model the computational cost of reaching a certain error relative to the width of the posterior with stochastic gradient methods scale in the same way as methods using full gradients. Therefore SGMCMC is at most better by a constant factor. However we argue that in a big-data setting the dependence on the size of the data set dominates. This observation raises important question about the use of stochastic gradient methods in more complicated models - does the good performance of stochastic gradient methods come from averaging different models (similar to ensemble methods) rather than faithful posterior simulation.
- Week 5 - 26th May - Luke Kelly (Oxford) - Phylogenetic trees and lateral trait transfer.

Abstract:Lateral transfer, a process whereby evolving species exchange evolutionary traits outside of ancestral relationships, is a frequent source of model misspecification in phylogenetic inference. We propose a novel model of species diversification which explicitly controls for the effect of lateral transfer. Our likelihood parameters are the solution of a sequence of large systems of differential equations representing the expected evolution of traits along the tree. To address this issue, we exploit symmetries in the differential systems to construct an accurate approximation scheme. Finally, we illustrate our method on a data set of lexical traits in Eastern Polynesian languages and obtain an improved fit over the corresponding model without lateral transfer.- Week 6 - 2nd June - Chris Nemeth (Lancaster) - Tempered pseudo-marginal MCMC.

- MCMC algorithms are a class of exact methods used for sampling from target distributions. If the target is multimodal, MCMC algorithms often struggle to explore all of the modes of the target within a reasonable number of iterations. This issue can become even more pronounced when using efficient gradient-based samplers, such as HMC, which tend to tend to become trapped local modes.
In this talk, I'll outline how a pseudo-marginal approach to this problem can improve the mixing of the HMC sampler by tempering the target distribution.

This is joint work with Fredrick Lindsten, Maurizio Filippone and James Hensman.

- Week 7 - 9th June - OxWaSP speakers - Talk Titles / Abstracts TBC
- Week 8 - 16th June - George Deligiannidis (Kings) - Talk Title / Abstract TBC
- Week 9 - 23rd June - Sergios Agapiou- Talk Title / Abstract TBC
- Week 10 - 30th June - Dootika Vats (Warwick) - Talk Title / Abstract TBC

**Previous Years:**

**some key phrases**:

- sampling and inference for diffusions

- exact algorithms

- intractable likelihood

- pseudo-marginal algorithms

- particle filters

- importance sampling

- MCMC

- adaptive MCMC

- perfect simulation

- Markov chains...

- other random structures...

- and randomised algorithms...