Skip to main content Skip to navigation

LiBiNorm

wold_bias.png

LiBiNorm has been designed to perform gene expression analysis using RNA-seq data as described in the paper "LiBiNorm: an htseq-count analogue with improved normalisation of Smart-seq2 data and library preparation diagnostics"Link opens in a new window

It is command line and output compatible with htseq-count and also includes RNA-seq bias compensation tailored to Smart-seq2 library preparation as described in the paper: Modeling enzyme processivity reveals that RNA-Seq libraries are biased in characteristic and correctable ways

The RNA-seq data should be aligned to a reference genome using an aligner that can process intron spanning reads, such as HISAT2. The reads should then be converted to bam format, e.g. with samtools in that, unlike htseq-count, LiBINorm only supports reads in bam format and not sam format. However, unlike htseq-count, LiBiNorm can process unsorted bam files of any size thus removing the need for large bam files to be sorted first.

LiBiNorm can also work with feature definition files in both gtf and gff3 format. It is sometimes the case that the chromosome identifiers in the gtf/gff file do not match those used in the alignment data. In this situation LiBiNorm will match the chromosomes based on the chromosome lengths which are recorded in both the bam files and the gtf/gff files.

LiBiNorm count produces a file that contains the normalised expression for each gene, calculated as TPM values, with the RNA length used having been adjusted based on the bias that was identified for RNA of that length. LiBiNorm count is also able to produce a count file that reproduces the output of htseq-count and does not include bias normalisation.