Skip to main content Skip to navigation

Peter Mueller

Modeling Dependent Gene Expression

 

We consider statistical inference for high throughput gene expression data. Most traditional statistical methods implicitely assume independent sampling (conditional on some hyperparameters). Recognizing the limitations of independent modeling we develop a model that includes a simple dependence structure across genes. The important features of the proposed model are the ease of representing typical prior information on the nature of dependencies, model-based parsimonious representation of the signal as an ordinal outcome, and the use of a coherent probability model over both, structure and strength of the conjectured dependencies. As part of the inference we reduce the recorded data to a trinary reponse representing underexpression, average expression and overexpression. We use a dependent extension the popular POE model (Parmigiani et al. 2002 JRSSB) to achieve this. Inference in the described model is implemented through a straightforward Markov chian Monte Carlo (MCMC) simulation, including posterior simulation over conditional dependence and independence. The latter involves a variable dimension parameter space. We use a reversible jump MCMC scheme.

The motivating example are data from ovarian cancer patients. We use the proposed dependent probability model to derive inference about differentially expressed genes. We compare results under the dependent model with a corresponding independent model. We show how explicit modeling for known dependencies reduces the required sample size to achieve desired inference. In the example, a well known molecular pathway serves as informative prior probability model for the dependence structure.

Joint work with: Giovanni Parmigiani (Johns Hopkins Univ, MD)