Quantification tools¶
By default, the piquant pipeline has the ability to run the following six transcript quantification tools. The pipeline can, however, be easily extended to run additional quantification tools by editing the quantifiers.py
Python module, as described in Adding a new quantifier.
Attention
It is important to clarify that rather than testing the performance of quantification tools alone, piquant is actually testing the performance, as regards the accuracy of transcript quantification, of mapping plus quantification tool pipelines (at least in the case of quantification tools which require mapping of reads prior to quantification). It can easily be understood, for example, how difficulties encountered when mapping reads to the genome might adversely affect quantification performance, through factors beyond a quantification tool’s control.
Cufflinks¶
Note
piquant has been tested with Cufflinks [Cufflinks] version 2.2.1 and TopHat [TopHat] version 2.0.10.
In preparation for quantifying transcripts with Cufflinks, the following prequantification tasks are executed (these steps are a necessary preliminary for mapping simulated reads to the genome with TopHat):
- A Bowtie [Bowtie] index is built for the genome using the
bowtie-build
command. - A FASTA file for the genome, corresponding to the Bowtie index, is constructed using the
bowtie-inspect
command.
When quantifying transcripts with Cufflinks for a set of simulated RNA-seq reads, reads are first mapped to the genome using the splice-aware mapper TopHat, with the following command line options (see the TopHat manual for further details on these options):
--library-type <type>
: The library type is set tofr-secondstrand
if reads from a stranded protocol are being quantified, and tofr-unstranded
for unstranded reads.--no-coverage-search
: Coverage-based search for junctions is disabled.
Cufflinks is then run to estimate transcript abundances with the following command line options (see the Cufflinks manual for further details on these options):
--library-type <type>
: The library type is set tofr-unstranded
orfr-secondstrand
as for TopHat above.-u
: Reads mapping to multiple locations in the genome are more accurately weighted.-b <genome FASTA file>
: Cufflinks‘ bias detection and correction algorithm is run.
After transcript abundance estimation has completed, of the files output by Cufflinks, only isoforms.fpkm_tracking
is retained (unless the --nocleanup
option was specified when the run_quantification.sh
script was created). Relative transcript abundances are extracted from this file in units of FPKM (fragments per kilobase of exon per million reads mapped) and then converted to relative abundances measured in TPM (transcripts per million).
(Note: for an excellent discussion of RNA-seq expression units, see this blog post).
RSEM¶
Note
piquant has been tested with RSEM [RSEM] version 1.2.19 and Bowtie version 1.0.0.
In preparation for quantifying transcripts with RSEM, the rsem-prepare-reference
tool from the RSEM package is used to construct sequences in FASTA format for the input set of transcripts (see here for more details on the rsem-prepare-reference
tool).
Then, when quantifying transcripts with RSEM for a set of simulated RNA-seq reads, the tool rsem-calculate-expression
is executed with the --strand-specific
command line option in the case that reads have been simulated for a stranded protocol. See here for more details on the rsem-calculate-expression
tool.
After transcript abundance estimation has completed, of the files output by RSEM, only <sample_name>.isoforms.results
is retained (unless the --nocleanup
option was specified when the run_quantification.sh
script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).
eXpress¶
Note
piquant has been tested with eXpress [eXpress] version 1.5.1 and Bowtie [Bowtie] version 1.0.0.
In preparation for quantifying transcripts with eXpress, the rsem-prepare-reference
tool from the RSEM package is used to construct transcript sequences, as described above.
When quantifying transcripts with eXpress for a set of simulated RNA-seq reads, reads are first mapped to the transcript sequences using Bowtie, with the following command line options, which have, in general, been chosen to provide similar alignment behaviour as is implemented within the RSEM pipeline (see the Bowtie manual for further details on these options):
-e 99999999
: The maximum permitted total of quality values at all mismatched read positions throughout the entire alignment.-l 25
: A seed length for alignments of 25 base pairs.-I 1
: A minimum insert size of 1 base pair for valid paired-end alignments.-X 1000
: A maximum insert size of 1000 base pairs for valid paired-end alignments.-a
: All valid alignments are reported per read or read pair.-m 200
: All alignments are suppressed for a particular read or read pair if more than 200 alignments exist for it.-S
: Alignments are printed in SAM [SAM] format.--norc
: Only specified if stranded reads are being quantified, this option causes only paired-end read configurations corresponding to fragments from the forward strand to be considered.
The alignments produced by Bowtie are piped to the view
command of the SAMtools package to convert them to BAM format, for subsequent input to eXpress. eXpress is executed with the --f-stranded
(for single-end reads) or --fr-stranded
(for paired-end reads) command line options in the case that reads have been simulated for a stranded protocol. See the eXpress manual for further details on the options available.
After transcript abundance estimation has completed, of the files output by eXpress, only results.xprs
is retained (unless the --nocleanup
option was specified when the run_quantification.sh
script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).
Sailfish¶
Note
piquant has been tested with Sailfish [Sailfish] version 0.8.0.
In preparation for quantifying transcripts with Sailfish, the Sailfish index
command is executed to create a kmer index for the input transcript set (for more information on Sailfish commands, see the Sailfish manual).
Then, when quantifying transcripts with Sailfish for a set of simulated RNA-seq reads, the Sailfish quant
command is executed with the following settings for the library type (-l
) option, depending on whether single- or paired-end, and stranded or unstranded reads are being quantified:
U
for single-end reads of unknown strandednessSF
for single-end stranded readsIU
for paired-end reads of unknown strandednessISF
for paired-end stranded reads.
After transcript abundance estimation has completed, of the files output by Sailfish, only quant.sf
is retained (unless the --nocleanup
option was specified when the run_quantification.sh
script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).
Salmon¶
Note
piquant has been tested with Salmon [Salmon] version 0.5.1.
In preparation for quantifying transcripts with Salmon, the Salmon index
command is executed to create a Salmon index for the input transcript set. The --type
argument is set to quasi
to use Salmon’s quasi-mapping method of lightweight alignment (for more information on Salmon commands, see the Salmon manual).
Then, when quantifying transcripts with Salmon for a set of simulated RNA-seq reads, the Salmon quant
command is executed with the same settings for the library type (-l
) option as shown for Sailfish above.
After transcript abundance estimation has completed, of the files output by Salmon only quant.sf
is retained (unless the --nocleanup
option was specified when the run_quantification.sh
script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).
Kallisto¶
Note
piquant has been tested with Kallisto [Kallisto] version 0.42.4.
In preparation for quantifying transcripts with Kallisto, the Kallisto index
command is executed to create a Kallisto index for the input transcript set (for more information on Kallisto commands, see the Kallisto manual).
Then, when quantifying transcripts with Kallisto for a set of simulated RNA-seq reads, the Kallisto quant
command is executed, with the --single
option specified and a value of 200 for the --fragment-length
option (estimated average fragment length) when single-end reads are being quantified. The --bias
option is specified in all cases, indicating that Kallisto should performed sequence-based bias correction.
After transcriptome abundance estimation has completed, of the files output by Kallisto only abundance.tsv
is retained (unless the --nocleanup
option was specified when the run_quantification.sh
script was created). Relative transcript abundances are extracted from this file in units of TPM (transcripts per million).