Introduction¶

piquant is a pipeline to help assess the accuracy of the quantification of transcripts from RNA-sequencing data.

RNA-sequencing has become an important technique in cellular biology for characterising and quantifying the transcriptome, and many bioinformatics methods have been developed to reconstruct transcripts from RNA-seq data and then estimate their abundances. Gene expression estimates calculated by these methods have been shown to be relatively robust. However, at the level of transcripts, problems arising from the ambiguous origin of short RNA-seq reads and from bias in their sequence composition are compounded, and thus estimates of isoform abundance may be less accurate. It is therefore useful to be able to assess the conditions under which different transcriptome quantification tools perform well or more poorly, and how the many optional parameter choices available for each tool may affect their performance.

piquant is a pipeline of python scripts to help assess the accuracy of transcriptome quantification tools. In its first stage, RNA-seq reads are simulated from a starting set of transcripts with known abundances, under specified combinations of sequencing parameters: for example, different read lengths and sequencing depths, single- and paired-end reads, reads with or without sequencing errors, and reads with or without sequence bias. In the second stage, a number of transcriptome quantification tools (or the same tool with different optional parameter choices) estimate isoform abundances for each set of simulated reads. Finally, the isoform expression estimates calculated by each tool for each data set are compared to the known transcript abundances used to generate the reads. The comparative accuracy of expression estimates calculated by each tool can then be assessed as sequencing parameters change, or for different groups of transcripts segregated by particular transcript classification measures, via a range of automatically generated statistics and graphs.