Skip to main content

Improved modelling of RNA-seq and ChIP-seq bias using multiple alternative nucleotide distributions

Software and Datasets to accompany paper

The following are a set of software and sample files has been packaged to allow the analysis described in the paper to be reproduced.

These will also allow other data to be analysed using the techniques descibed in the paper.

The software installation and usage is described here

cisGenome software

The analysis described in the paper was performed using a slightly customised version of the cisGenome software, which provides additional features that are needed to display some of the results presented in the paper. The changes largely relate to GUI usability and data visualisation.

Changes include:

PSSM Motif display: Improved quality graphics, Additional titles, x-axis text embedded in PSSM file, RNA motif handling

Region display: It is now possible to display the contents of cod files that indicate regions of the genome

Tag display: Improved algorithm for producing plots of running averages

Usability: Considerable improvements in the usability of the GUI

This software runs on Windows (XP/Vista/7) and an installer for the customised version can be downloaded from here. This should be downloaded to a local directory and then run to install the software.

The homepage for the standard version of cisGenome is here

An analysis requires the sequence and gene locations to be available in cisGenome format. These are available from here. The analysis described in the paper is based on the hg18 release of the human genome.

Bias analysis and modelling software

The Windows exe files and excel spreadsheets used for the analysis of the data are available from here

These should be unzipped into a single directory (e.g. 'C:\ModellingTools').

The instructions for using these applications are available from here. 

Reference Genome

An analysis requires the sequence and gene locations to be available in cisGenome format. These are available from here. The analysis described in the paper is based on the hg18 release of the human genome. These should be unzipped into a single directory (e.g. C:\ModellingTools\hg18')

Sample files

ChIP-seq related files

It is suggested that these are placed in a single directory (e.g. 'C:\ModellingTools\chipSeq data')

RNA-seq related files

It is suggested that these are placed in a single directory (e.g. 'C:\ModellingTools\rnaSeq data')

Source Code

The source code for the command line executables is available from here. This should be unzipped into a suitable directory.

Multiplatform support is acheived by building using the boost build system, which allows a common build definition file to be used for all platforms including Microsoft Windows. Instructions for installing and using boost build can be found here. Once installed, the code can be build using 'bjam -q release' from within the biasTools directory.

The build definition has been tested on Windows XP, Linux with gcc vesion 4.1.2 and Darwin OS X 10.4 with gcc version 4.3.3.

Building on Microsoft Windows

Building on windows requires the installation of the Microsoft C++ development environment . The code was built using Visual C++ 2005 Express, but it should also build with the later versions, such as the current Visual C++ 2012 Express although this has not been tested

The software can be built using the Microsoft development environment by opening the biasTools\cisGenomeTools.sln solution file and building each of the four core executables, analyseBreaks, makeDistribution, optimiseBreaks and optimiseStartSeqs, which can be built using the '_buildAll' project.

This does not require the use of the boost build system. If boost build is installed then a project file has been created that supports the building of the executables by boost build within the development environment.