Quantification tools

By default, the piquant pipeline has the ability to run the following six transcript quantification tools. The pipeline can, however, be easily extended to run additional quantification tools by editing the quantifiers.py Python module, as described in Adding a new quantifier.

Attention

It is important to clarify that rather than testing the performance of quantification tools alone, piquant is actually testing the performance, as regards the accuracy of transcript quantification, of mapping plus quantification tool pipelines (at least in the case of quantification tools which require mapping of reads prior to quantification). It can easily be understood, for example, how difficulties encountered when mapping reads to the genome might adversely affect quantification performance, through factors beyond a quantification tool’s control.

RSEM

Note

piquant has been tested with RSEM [RSEM] version 1.2.19 and Bowtie version 1.0.0.

In preparation for quantifying transcripts with RSEM, the rsem-prepare-reference tool from the RSEM package is used to construct sequences in FASTA format for the input set of transcripts (see here for more details on the rsem-prepare-reference tool).

Then, when quantifying transcripts with RSEM for a set of simulated RNA-seq reads, the tool rsem-calculate-expression is executed with the --strand-specific command line option in the case that reads have been simulated for a stranded protocol. See here for more details on the rsem-calculate-expression tool.

After transcript abundance estimation has completed, of the files output by RSEM, only <sample_name>.isoforms.results is retained (unless the --nocleanup option was specified when the run_quantification.sh script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).

eXpress

Note

piquant has been tested with eXpress [eXpress] version 1.5.1 and Bowtie [Bowtie] version 1.0.0.

In preparation for quantifying transcripts with eXpress, the rsem-prepare-reference tool from the RSEM package is used to construct transcript sequences, as described above.

When quantifying transcripts with eXpress for a set of simulated RNA-seq reads, reads are first mapped to the transcript sequences using Bowtie, with the following command line options, which have, in general, been chosen to provide similar alignment behaviour as is implemented within the RSEM pipeline (see the Bowtie manual for further details on these options):

  • -e 99999999: The maximum permitted total of quality values at all mismatched read positions throughout the entire alignment.
  • -l 25: A seed length for alignments of 25 base pairs.
  • -I 1: A minimum insert size of 1 base pair for valid paired-end alignments.
  • -X 1000: A maximum insert size of 1000 base pairs for valid paired-end alignments.
  • -a: All valid alignments are reported per read or read pair.
  • -m 200: All alignments are suppressed for a particular read or read pair if more than 200 alignments exist for it.
  • -S: Alignments are printed in SAM [SAM] format.
  • --norc: Only specified if stranded reads are being quantified, this option causes only paired-end read configurations corresponding to fragments from the forward strand to be considered.

The alignments produced by Bowtie are piped to the view command of the SAMtools package to convert them to BAM format, for subsequent input to eXpress. eXpress is executed with the --f-stranded (for single-end reads) or --fr-stranded (for paired-end reads) command line options in the case that reads have been simulated for a stranded protocol. See the eXpress manual for further details on the options available.

After transcript abundance estimation has completed, of the files output by eXpress, only results.xprs is retained (unless the --nocleanup option was specified when the run_quantification.sh script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).

Sailfish

Note

piquant has been tested with Sailfish [Sailfish] version 0.8.0.

In preparation for quantifying transcripts with Sailfish, the Sailfish index command is executed to create a kmer index for the input transcript set (for more information on Sailfish commands, see the Sailfish manual).

Then, when quantifying transcripts with Sailfish for a set of simulated RNA-seq reads, the Sailfish quant command is executed with the following settings for the library type (-l) option, depending on whether single- or paired-end, and stranded or unstranded reads are being quantified:

  • U for single-end reads of unknown strandedness
  • SF for single-end stranded reads
  • IU for paired-end reads of unknown strandedness
  • ISF for paired-end stranded reads.

After transcript abundance estimation has completed, of the files output by Sailfish, only quant.sf is retained (unless the --nocleanup option was specified when the run_quantification.sh script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).

Salmon

Note

piquant has been tested with Salmon [Salmon] version 0.5.1.

In preparation for quantifying transcripts with Salmon, the Salmon index command is executed to create a Salmon index for the input transcript set. The --type argument is set to quasi to use Salmon’s quasi-mapping method of lightweight alignment (for more information on Salmon commands, see the Salmon manual).

Then, when quantifying transcripts with Salmon for a set of simulated RNA-seq reads, the Salmon quant command is executed with the same settings for the library type (-l) option as shown for Sailfish above.

After transcript abundance estimation has completed, of the files output by Salmon only quant.sf is retained (unless the --nocleanup option was specified when the run_quantification.sh script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).

Kallisto

Note

piquant has been tested with Kallisto [Kallisto] version 0.42.4.

In preparation for quantifying transcripts with Kallisto, the Kallisto index command is executed to create a Kallisto index for the input transcript set (for more information on Kallisto commands, see the Kallisto manual).

Then, when quantifying transcripts with Kallisto for a set of simulated RNA-seq reads, the Kallisto quant command is executed, with the --single option specified and a value of 200 for the --fragment-length option (estimated average fragment length) when single-end reads are being quantified. The --bias option is specified in all cases, indicating that Kallisto should performed sequence-based bias correction.

After transcriptome abundance estimation has completed, of the files output by Kallisto only abundance.tsv is retained (unless the --nocleanup option was specified when the run_quantification.sh script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).