APTS module: Computer Intensive Statistics
Module leader: B D Ripley
Please see the full Module Specifications
document for background information relating to all of the APTS modules, including how to interpret the information below.
Aims: This module will introduce various computationally-intensive methods and their background theory, including material on simulation-based approaches such as Markov-chain Monte Carlo (MCMC) and the bootstrap, and on strategies for handling large datasets. The different methods will be illustrated by applications.
Learning outcomes: After taking this module, students will have a working appreciation of MCMC, the bootstrap and other simulation-based methods and of their limitations, and have some experience of implementing them for simple examples. Students will also have gained an appreciation of the difficulties of handling very large datasets and of some approaches to overcoming them.
Prerequisites: Preparation for this module should include a review of:
- relevant basic material on statistical modelling (for which the earlier APTS module 'Statistical Modelling' would be advantageous);
- basic Markov chains (as for the 'Applied Stochastic Processes' module). Basic knowledge of programming in a high-level language such as R will be assumed, and R will be used for case studies and exercises.
Topics:
- Overview of simulation-based inference; Monte Carlo testing.
- Basic theory of bootstrap methods; practical considerations; limitations.
- Basic theory of MCMC; types of MCMC samplers; assessment of convergence/mixing; other practical considerations; case studies.
- Strategies for dealing with large datasets: use of database management systems, multipass algorithms, subsampling, distributed computing. A case study, e.g. generalized linear models.
Assessment: Analysis of a data set using computationally-intensive methods. This could be a data set related to the student's research project, or an example specified by the module leader.
