For a poster overview of piquant, see
The piquant pipeline consists of three main stages:
- Simulate RNA-seq reads under specified combinations of sequencing parameters.
- Run a number of transcriptome quantification tools (or the same tool with different optional parameter choices) on each set of simulated reads to estimate isoform abundances.
- Generate statistics and graphs to assess and compare the performance of each quantification tool.
All three stages of the pipeline can be run via different commands of the
piquant script. They are described in more detail below.
Simulation of RNA-seq reads proceeds in two steps. In the first, run via the
prepare_read_dirs, directories are prepared in which reads will be simulated; each directory corresponds to a particular combination of sequencing parameters:
- depth of sequencing
- length of reads
- single- or paired-end reads
- reads with or without errors
- reads with or without sequence bias
- strand-specific or unstranded reads
- presence or absence of background read “noise”
In the second step, RNA-seq reads are simulated. Each directory created in the first step contains a script which, when run, will use the Flux Simulator RNA-seq experiment simulator [FluxSimulator] to generate an expression profile for transcripts, then simulate reads for those transcripts according to the specified combination of sequencing parameters. This script can be run directly; however, using the
create_reads, reads for several combinations of sequencing parameters can be simulated at once as a batch.
check_reads provides an easy way to check that the read simulation processes completed correctly for specified combinations of sequencing parameters.
Quantification of transcripts proceeds in three steps. In the first, run via the
prepare_quant_dirs, directories are prepared in which quantification results will be produced. For each combination of sequencing parameters for which reads were simulated, there will be such a directory for each quantification tool (or, alternatively, for each different combination of quantification tool parameters that are being assessed).
In the second step, the
prequantify runs, for each quantification tool, commands that only need to be executed once, regardless of how many different sets of simulated reads are being used for quantification. For example, such commands might include creating a Bowtie [Bowtie] index for the genome to which reads will be mapped, or deriving FASTA sequences for the transcripts whose abundance is being measured.
Finally, transcript abundances are estimated using specified transcriptome quantification tools. Each directory created by the command
prepare_quant_dirs contains a script which, when run, will use a particular tool to estimate isoform expression for a particular set of simulated reads. As for the case of creating reads, this script can be run directly if necessary; however, the
quantify allows a number of such scripts to be run simultaneously as a batch.
check_quant provides an easy way to check that the transcript quantification processes completed correctly for specified combinations of tools and read sequencing parameters.
Assess quantification accuracy¶
In the final stage of the pipeline, run via the
analyse_runs, data describing quantification accuracy for specified combinations of sequencing parameters and quantification tools are assembled, and statistics and graphs are generated by which comparative performance can be assessed. In addition, by default, graphs are produced comparing the time and memory resource usage of the different quantification tools during the prequantification and quantification steps.
The piquant pipeline is implemented as a set of Python scripts and modules; it has currently been tested against Python versions 2.7.6 and 3.4.0 running under Ubuntu 12.04.4 LTS and 14.04.1 LTS.
In order to simulate reads, the Flux Simulator RNA-seq experiment simulator is required to be installed, and the
flux-simulator executable be added to the executable path (e.g. via the Unix PATH variable). piquant has been tested with Flux Simulator version 1.2.2.
By default, piquant has the ability to run six different quantification tools:
- Cufflinks: [Cufflinks]
- RSEM: [RSEM]
- eXpress: [eXpress]
- Sailfish: [Sailfish]
- Salmon: [Salmon]
- Kallisto: [Kallisto]
and these tools are required to be installed, and their relevant executables added to the executable path, if they are to be used within piquant. The pipeline has been tested with the following versions of these quantification tools:
- Cufflinks: version 2.2.1
- RSEM: version 1.2.19
- eXpress: version 1.5.1
- Sailfish: version 0.8.0
- Salmon: version 0.5.1
- Kallisto: version 0.42.4
In addition, the use of each quantification tool within the piquant pipeline has additional dependencies, which are enumerated below:
- Cufflinks: Bowtie [Bowtie] and TopHat [TopHat] are required to map simulated reads to the genome.
- RSEM: Bowtie is required by RSEM to map simulated reads to the transcriptome.
- eXpress: Bowtie is required to map simulated reads to the transcriptome. In this case, piquant creates transcriptome sequences for mapping using a tool from the RSEM package (
- Sailfish, Salmon and Kallisto: piquant again uses
rsem-prepare-referencefrom the RSEM package to create reference transcriptome sequences.
piquant has been tested with Bowtie version 1.0.0 and TopHat version 2.0.10.
TopHat does not currently execute under Python 3. Hence, if piquant is being run in a virtual environment in which the command
python invokes Python 3, the main TopHat script must be altered so as to invoke Python 2. This can be done by altering the first line of the TopHat script to read
Finally, the recording of time and memory usage by quantification tools requires that the GNU
time command is available at
/usr/bin/time. Resource usage recording can be turned off by specifying the
--nousage option to the
analyse_runs piquant commands.