envisage {Envisage} | R Documentation |
Linear model based analysis of significantly changing gene expression across multiple experiment variables.
envisage(expData, widget=TRUE, MTC=NULL, pCutoff=NULL, useParams=NULL, param4INT=NULL, paramType=NULL, fileResults=getwd(), MEorINT="INT", startModel=NULL)
expData |
An object of class
ExpressionSet |
widget |
Logical value stating whether the interactive widget
should be used. Set to TRUE by default. |
MTC |
A character string that defines which multiple testing
correction method to use (if any). Available methods are
None , BH , BY , Bonferroni , Holm ,
Hochberg , SidakSS , SidakSD . More information on
these can be found in the package vignette. BH is selected by
default. |
pCutoff |
A number defining the p value cutoff to be used in significance analysis. Must be between 0 and 1 (default 0.05.) |
useParams |
A character vector that defines which of the experiment variables to use in the model calculation. |
param4INT |
A character vector that defines the model variables for which interactions should be calculated. |
paramType |
A character vector specifying the classes
(numeric or categoric ) of the variables in the model. |
fileResults |
The folder in which to save the results of the modelling procedure. Defaults to the current working directory. |
MEorINT |
Character string defining whether to look at first order interactions ("INT") or only main effect terms ("ME") in the model. |
startModel |
Starting Model for analysis. Must be of the form
"x ~ ...". Care must be taken to ensure that the model terms match
the names of variables specified in useParams |
The Envisage package contains methods allowing the use of linear models (LMs) for analysing significantly changing genes in experiments with a variety of sources of variation, be they experimentally controlled variables such as drug treatment or time, or non-controlled sources of confounding variation such as phenotypic or environmental differences. This allows all sources of variation to be considered when analysing for significant differential expression, ensuring resulting genes are biologically relevent to the experimental question.
The function envisage
is the main function in
the package and is run for an object expData
of class
ExpressionSet
, which may
represent 1-colour microarray data (log-intensity values) or 2-colour
microarray data (as log-ratio values). This object contains gene
expression values and 'phenodata' (experiment variable information)
for each sample, along with annotation for the features contained in
the array of interest. For further information on loading 1-colour
microarray data into the
ExpressionSet
format see
package affy, particularly
read.affybatch
. For information on loading
2-colour microarray data, see package marray.
Arguments can be specified using a tcl/tk widget based GUI. The widget is used to avoid errors in variable entry; specifically variable class specification, errors in which can have a dramatic effect on the model outcome. For more information, see the Envisage vignette.
Any labelling variables (i.e. any variable whereby every sample has a unique value) are removed before the GUI is displayed, as these should not be included in the model since they are irrelevant and will always result in a perfect fit. Also, any highly correlated variables that essentially contain identical information are flagged for the user. Only one of these should be chosen to avoid complications with overfitting of the data.
Model selection is performed for each gene individually. A backwards stepwise procedure is used to select a model balanced for parsimony versus explanatory power by starting with a saturated model containing all specified variables and interactions and removing terms based on the Akaike information criterion (AIC). A primary model is fit for main effect terms initially, and a second model is fit for first order interaction terms for those variables found to be significant in the main effect model. This two-part test saves hugely on run-time due to the large number of variables and the huge number of genes for wich models are calculated. Further information on the modelling process can be found in the package vignette.
The significance of each term in the selected model for each gene is
calculated by using a Type II ANOVA F-test statistic, comparing the
full selected model with a model with the term removed. This process
produces a series of unadjusted p-value defining the significance of
each model term. It is recommended to use a multiple testing
correction, such as the Bonferroni & Hochberg false discovery rate
method, to correct for the large number of statistical tests
considered (one per gene). A gene is considered to change
significantly for a particular model term if the resulting p value is
less than pCutoff
.
After model fitting has been completed, a second GUI window is created which gives the user information on which variables and interactions show a significant effect on the genes. Variables and interactions can be selected for output gene lists by selecting the relevent check boxes.
Lists of significantly changing genes for each experiment variable or interaction of interest and their associated p values are written to the newly created directory ‘.../EnvisageResults’, created in the current specified directory in a tab-delimited text format. These files can be read into analysis packages such as Genespring GX for further analysis.
Two further report files may also be created; ‘errorGenes.txt’ and ‘AliasReport.txt’. These files give information on aliasing in the ANOVA calculations which may occur due to modelling of highly correlated variables. More information on these files can be found in the package vignette.
A list results
is returned with a slot for each of the
variables or interactions specified in the results GUI. Each slot
contains a vector of genes and their associated ANOVA p-values for
genes that change significantly (i.e. have a p-value less than
pCutoff
) based on that variable or interaction. If no genes are
shown to be significant for a particular variable or interaction, then
that slot is removed to improve readability.
If any genes fail the modelling procedure, the final slots in the list
contain information on these genes that is then transferred to the files
‘errorGenes.txt’ and ‘AliasReport.txt’.
These lists are saved in the directory ‘.../EnvisageResults’
contained in the current specified directory as tab-delimited text
files for further analysis.
Sam Robson S.C.Robson@warwick.ac.uk
Robson, S. C., Hunter, E., Bird, H., Turner, H. (2008) Envisage: model-based significance analysis of microarray gene expression data, manuscript in preparation
For information on loading 1-colour microarray data, see
read.affybatch
. For information on loading
2-colour microarray data, see marray. For more information on
object classes, see
ExpressionSet
.
## Load dataset library(Envisage) data(SkinvsPancreas) ## Select which variables to use in analysis and their classes useParams <- c("RIN", "Tissue", "Time point", "Myc ON/OFF") paramType <- c("numeric", rep("categoric", 3)) ## We only want to look at the interactions for the 3 main variables param4INT <- c("Tissue", "Time point", "Myc ON/OFF") ## Run envisage results <- envisage(SkinvsPancreas, widget=FALSE, MTC="BH", pCutoff=0.05, useParams, param4INT, paramType, fileResults=getwd(), MEorINT="INT") ## Output results results