envisage {Envisage}R Documentation

Enables Numerous Variables In Significant Analysis of Gene Expression

Description

Linear model based analysis of significantly changing gene expression across multiple experiment variables.

Usage

  envisage(expData, widget=TRUE, MTC=NULL, pCutoff=NULL, useParams=NULL, 
    param4INT=NULL, paramType=NULL, fileResults=getwd(), MEorINT="INT", 
    startModel=NULL) 

Arguments

expData An object of class ExpressionSet
widget Logical value stating whether the interactive widget should be used. Set to TRUE by default.
MTC A character string that defines which multiple testing correction method to use (if any). Available methods are None, BH, BY, Bonferroni, Holm, Hochberg, SidakSS, SidakSD. More information on these can be found in the package vignette. BH is selected by default.
pCutoff A number defining the p value cutoff to be used in significance analysis. Must be between 0 and 1 (default 0.05.)
useParams A character vector that defines which of the experiment variables to use in the model calculation.
param4INT A character vector that defines the model variables for which interactions should be calculated.
paramType A character vector specifying the classes (numeric or categoric) of the variables in the model.
fileResults The folder in which to save the results of the modelling procedure. Defaults to the current working directory.
MEorINT Character string defining whether to look at first order interactions ("INT") or only main effect terms ("ME") in the model.
startModel Starting Model for analysis. Must be of the form "x ~ ...". Care must be taken to ensure that the model terms match the names of variables specified in useParams

Details

The Envisage package contains methods allowing the use of linear models (LMs) for analysing significantly changing genes in experiments with a variety of sources of variation, be they experimentally controlled variables such as drug treatment or time, or non-controlled sources of confounding variation such as phenotypic or environmental differences. This allows all sources of variation to be considered when analysing for significant differential expression, ensuring resulting genes are biologically relevent to the experimental question.

The function envisage is the main function in the package and is run for an object expData of class ExpressionSet, which may represent 1-colour microarray data (log-intensity values) or 2-colour microarray data (as log-ratio values). This object contains gene expression values and 'phenodata' (experiment variable information) for each sample, along with annotation for the features contained in the array of interest. For further information on loading 1-colour microarray data into the ExpressionSet format see package affy, particularly read.affybatch. For information on loading 2-colour microarray data, see package marray.

Arguments can be specified using a tcl/tk widget based GUI. The widget is used to avoid errors in variable entry; specifically variable class specification, errors in which can have a dramatic effect on the model outcome. For more information, see the Envisage vignette.

Any labelling variables (i.e. any variable whereby every sample has a unique value) are removed before the GUI is displayed, as these should not be included in the model since they are irrelevant and will always result in a perfect fit. Also, any highly correlated variables that essentially contain identical information are flagged for the user. Only one of these should be chosen to avoid complications with overfitting of the data.

Model selection is performed for each gene individually. A backwards stepwise procedure is used to select a model balanced for parsimony versus explanatory power by starting with a saturated model containing all specified variables and interactions and removing terms based on the Akaike information criterion (AIC). A primary model is fit for main effect terms initially, and a second model is fit for first order interaction terms for those variables found to be significant in the main effect model. This two-part test saves hugely on run-time due to the large number of variables and the huge number of genes for wich models are calculated. Further information on the modelling process can be found in the package vignette.

The significance of each term in the selected model for each gene is calculated by using a Type II ANOVA F-test statistic, comparing the full selected model with a model with the term removed. This process produces a series of unadjusted p-value defining the significance of each model term. It is recommended to use a multiple testing correction, such as the Bonferroni & Hochberg false discovery rate method, to correct for the large number of statistical tests considered (one per gene). A gene is considered to change significantly for a particular model term if the resulting p value is less than pCutoff.

After model fitting has been completed, a second GUI window is created which gives the user information on which variables and interactions show a significant effect on the genes. Variables and interactions can be selected for output gene lists by selecting the relevent check boxes.

Lists of significantly changing genes for each experiment variable or interaction of interest and their associated p values are written to the newly created directory ‘.../EnvisageResults’, created in the current specified directory in a tab-delimited text format. These files can be read into analysis packages such as Genespring GX for further analysis.

Two further report files may also be created; ‘errorGenes.txt’ and ‘AliasReport.txt’. These files give information on aliasing in the ANOVA calculations which may occur due to modelling of highly correlated variables. More information on these files can be found in the package vignette.

Value

A list results is returned with a slot for each of the variables or interactions specified in the results GUI. Each slot contains a vector of genes and their associated ANOVA p-values for genes that change significantly (i.e. have a p-value less than pCutoff) based on that variable or interaction. If no genes are shown to be significant for a particular variable or interaction, then that slot is removed to improve readability.
If any genes fail the modelling procedure, the final slots in the list contain information on these genes that is then transferred to the files ‘errorGenes.txt’ and ‘AliasReport.txt’.
These lists are saved in the directory ‘.../EnvisageResults’ contained in the current specified directory as tab-delimited text files for further analysis.

Author(s)

Sam Robson S.C.Robson@warwick.ac.uk

References

Robson, S. C., Hunter, E., Bird, H., Turner, H. (2008) Envisage: model-based significance analysis of microarray gene expression data, manuscript in preparation

See Also

For information on loading 1-colour microarray data, see read.affybatch. For information on loading 2-colour microarray data, see marray. For more information on object classes, see ExpressionSet.

Examples

  ## Load dataset
  library(Envisage)
  data(SkinvsPancreas)

  ## Select which variables to use in analysis and their classes
  useParams <- c("RIN", "Tissue", "Time point", "Myc ON/OFF")
  paramType <- c("numeric", rep("categoric", 3))

  ## We only want to look at the interactions for the 3 main variables
  param4INT <- c("Tissue", "Time point", "Myc ON/OFF")

  ## Run envisage
  results <- envisage(SkinvsPancreas, widget=FALSE, MTC="BH", pCutoff=0.05, 
    useParams, param4INT, paramType, fileResults=getwd(), 
    MEorINT="INT")

  ## Output results
  results

[Package Envisage version 1.0-2 Index]