Skip to main content Skip to navigation

Igor Pruenster

Expressed sequence tags (ESTs) analyses are an important tool for gene identification in organisms. Given a preliminary EST survey from a certain cDNA library, various features of a possible additional sample have to be predicted. For instance, interest may rely on estimating the number of new genes to be detected, the gene discovery rate at each additional read and the probability of not re-observing certain specific genes present in the initial sample. We propose a Bayesian nonparametric approach for prediction in EST analysis based on nonparametric priors inducing Gibbs-type exchangeable random partitions and derive estimators for the relevant quantities. Several EST datasets are analysed by resorting to the two parameter Poisson-Dirichlet process, which represents the most remarkable Gibbs-type prior. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples.