Skip to main content Skip to navigation

Bill Browne

Classification of mass spectroscopy data using principal components analysis, Bayesian MCMC modelling and a deterministic peak finding algorithm

 

In this talk we consider three approaches to classifying SELDI and MALDI mass spectroscopy datasets. Individual mass spectroscopy scans consist of a trace of ~14,000 values (intensities) at differing mass to charge ratios. Each individual scan belongs to one of m groups where individual groups may represent differing breast cancer lines and we wish to classify new scans to these groups. Due to the large number of variables (mass/charge ratios) associated with each scan we require data reduction techniques to give a group of derived variables that can be used for classification.


We consider three techniques: Firstly principal components analysis (PCA) of the full scans to produce a smaller group of derived variables. Then two methods that take into account the fact that the scan is a sequential set of variables, and attempt to fit mixtures of scaled Gaussian distribution functions to the scan. We consider a deterministic algorithm and a model based MCMC method that reduce each scan to a series of (scaled) Gaussian peaks at locations that are common to all scans. The resulting heights of these peaks are then used in the classification.

All three methods will be compared via cross-validation on two example datasets, one with 6 groups and one with 2 groups. This is joint work with Ian Dryden and Kelly Handley.