Improved modelling of RNA-seq and ChIP-seq bias using multiple alternative nucleotide distributions
Software and Datasets to accompany paper
The following are a set of software and sample files has been packaged to allow the analysis described in the paper to be reproduced.
These will also allow other data to be analysed using the techniques descibed in the paper.
The software installation and usage is described here
The analysis described in the paper was performed using a slightly customised version of the cisGenome software, which provides additional features that are needed to display some of the results presented in the paper. The changes largely relate to GUI usability and data visualisation.
PSSM Motif display: Improved quality graphics, Additional titles, x-axis text embedded in PSSM file, RNA motif handling
Region display: It is now possible to display the contents of cod files that indicate regions of the genome
Tag display: Improved algorithm for producing plots of running averages
Usability: Considerable improvements in the usability of the GUI
This software runs on Windows (XP/Vista/7) and an installer for the customised version can be downloaded from here. This should be downloaded to a local directory and then run to install the software.
The homepage for the standard version of cisGenome is here
An analysis requires the sequence and gene locations to be available in cisGenome format. These are available from here. The analysis described in the paper is based on the hg18 release of the human genome.
Bias analysis and modelling software
The Windows exe files and excel spreadsheets used for the analysis of the data are available from here
These should be unzipped into a single directory (e.g. 'C:\ModellingTools').
The instructions for using these applications are available from here.
An analysis requires the sequence and gene locations to be available in cisGenome format. These are available from here. The analysis described in the paper is based on the hg18 release of the human genome. These should be unzipped into a single directory (e.g. C:\ModellingTools\hg18')
ChIP-seq related files
It is suggested that these are placed in a single directory (e.g. 'C:\ModellingTools\chipSeq data')
RNA-seq related files
It is suggested that these are placed in a single directory (e.g. 'C:\ModellingTools\rnaSeq data')
- compressed alignment data for the arabidopsis 24 hr Rep 2 data
The source code for the command line executables is available from here. This should be unzipped into a suitable directory.
Multiplatform support is acheived by building using the boost build system, which allows a common build definition file to be used for all platforms including Microsoft Windows. Instructions for installing and using boost build can be found here. Once installed, the code can be build using 'bjam -q release' from within the biasTools directory.
The build definition has been tested on Windows XP, Linux with gcc vesion 4.1.2 and Darwin OS X 10.4 with gcc version 4.3.3.
Building on Microsoft Windows
Building on windows requires the installation of the Microsoft C++ development environment . The code was built using Visual C++ 2005 Express, but it should also build with the later versions, such as the current Visual C++ 2012 Express although this has not been tested
The software can be built using the Microsoft development environment by opening the biasTools\cisGenomeTools.sln solution file and building each of the four core executables, analyseBreaks, makeDistribution, optimiseBreaks and optimiseStartSeqs, which can be built using the '_buildAll' project.
This does not require the use of the boost build system. If boost build is installed then a project file has been created that supports the building of the executables by boost build within the development environment.