Introduction into RNA-Seq


This is an SOP on getting started with RNA-seq. Here you can find information on resources, aligners, DESeq2, and pathway analysis tools.


Resources


This is a list of links to resources that have essential information on RNA-seq and the recommended R packages for RNA-seq analysis.

About Aligners


Contributed by Dr. Lara Ianov, UAB

When RNA-seq alignment and mapping are discussed, there are generally two primary methods of alignment: standard splice-aware alignment to the genome (e.g.: STAR, HISAT2) and quasi-mapping/pseudoalignment approaches, which utilize k-mer based counting methods to map fragments to the transcriptome (e.g.: Salmon, kallisto). STAR (Dobin et al., 2013) is usually my choice for a splice-aware aligner and Salmon (Patro, Duggal, Love, Irizarry, & Kingsford, 2017) for quasi-mapping. The choice between the two generally depend on the specific long-term goal of a study. It is worth noting that there are significant differences in the computational efficiency among these methods, but this goes beyond the scope of this synopsis.

While some benchmarking studies or computational community-based benchmarking have pointed to increased performance from quasi-mapping/pseudoalignment approaches, others have indicated that all modern methods perform similarly for mRNAs and highly abundant genes (Smith, 2016; Wu, Yao, Ho, Lambowitz, & Wilke, 2018). Thus, one may conclude that if the goal of the study is to perform standard mRNA-seq, any differences found among the methods will not result in a large loss of information if any.

However, beyond standard gene-level mRNA-seq quantification, here is an overview of some of the key points where choosing one approach over another may have a large effect in the interpretation of the results:

  • For long non-coding RNAs quantification, quasi-mapping/pseudoalignment have consistently outperformed standard splice-aware alignment methods (Zheng, Brennan, Hernaez, & Gevaert, 2019).
  • On the other hand, it has been reported that standard splice-aware RNA-seq aligners are more sensitive towards small RNAs (including miRNAs) or low abundant genes (Wu et al., 2018).
  • Gene-level vs transcript-levels quantification: it is important to keep in mind that in most cases standard splice-aware aligners and down-stream quantification methods are meant to target gene-level analysis as there are challenges with multi-mapping reads for transcript variant quantification. Transcript-level analysis is a much-debated topic, but the use of quasi-mapping coupled with novel down-stream quantification tools (Sarkar, Srivastava, Bravo, Love, & Patro, 2020) for transcript-level analysis are more sensitive (Smith, 2016).
    • However, if the goal of the analysis is to investigate splicing events (e.g.: classifying what type of splicing event occurred) or fusion events rather than transcript abundance quantification, most of the tools developed currently depend on outputs from standard splice-aware RNA-seq aligners (although this may change in the near future as tools such as Terminus cited above finish development/are updated).
  • If variant analysis is to be performed on RNA-seq, then splice-aware alignment methods must be used. It could also be argued that if RNA-seq data will be closely compared to whole-genome sequencing dataset, then it may be best to use the slice-aware aligner methods as well (especially if genome browsers will be used heavily).
  • For diagnostic goals (e.g.: rare disease cases), most tools available have a dependency on standard alignment approaches as well.

DESeq2


DESeq2 is an R package used for differential analysis on count data from high-throughput sequencing assays. It allows for quantitative analysis focused on strength rather than the mere presence of differential expression.

The required input for DESeq2 is gene or transcript counts.

Here is the paper that will give an overall introduction into DESeq2. It explains how DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates.

Instructions on how to install DESeq2 into RStudio can be found here.

After installing DESeq2, you can begin the vignette. The link to the vignette can be found here.

Here is Mike Love’s guide for DESeq2. This is an additional resource that explains how to use the package and demonstrates useful workflows.

Please be aware there are many other approaches for estimating differential expression that may be appropriate, or more appropriate (e.g., time course analysis), for your project. We recommend discussing with a knowledgeable data scientist if you are unsure if DESeq2 is a good option for your experimental design.

Pathway Analysis Tools for R


Below are links to vignettes/tutorials for R packages that are helpful for pathway analysis. There are many other tools and analytical approaches and we intend to add to this section in the future.

gprofiler2

The gprofiler2 package provides a tool set for functional enrichment analysis and visualization. It is primarily used to visualize gene lists, convert gene/protein/SNP identifiers to numerous namespaces, and map orthologous genes across species. Here is the paper that will give an overall introduction to pathway analysis using the gprofiler2 package.

Here is the link to the vignette, which will give you clear instructions on how to use the package.

GAGE

Generally Applicable Gene-set Enrichment (GAGE) is a method for gene set or pathway analysis. The gage package can be used on microarray or RNA-seq data for routine and advanced gene set analyses. Here is the link to the Bioconductor webpage, which will background information on the package and instructions on how to install it into RStudio.

Here is the link to the vignette.















Version 1.1: Added RNA-Seq alignment information from Dr. Ianov