Bioinformatics Pipeline Roadmap

Transcriptomics analysis of RNA-seq data from embryonic development of Montipora capitata rice corals exposed to PVC leachate

code
coral
Author

Sarah Tanja

Published

October 7, 2024

Modified

September 1, 2025

1 Sequence files background

We’ve extracted totalRNA and shipped it to Azenta for sequencing… so what happens next? How did we go from working with microcentrifuge tubes in the lab to large files on the computer?

First they run the total RNA through quality assessment and quality control steps (QAQC) to make sure it is sufficient in quantity and quality to sequence. Here is the QAQC report from Azenta:

The total RNA is then subjected to library prep, where the RNA is turned into cDNA.

The cDNA is what is actually sequenced, with an Illumina sequencer, 20 million reads, Poly-A selection,

The raw FASTA files come back demultiplexed

2 Coding resources

This roadmap was built off the following resources and references from:

3 Pipeline birds-eye view

3.1 1. Receive raw FASTA files

  • files are already demultiplexed

  • files have a .fasta.gz zipped format

  • files must checked to make sure there were no errors in the transfer process (this is done with md5sum )

3.2 2. QAQC FASTA files

Some great examples of previous QAQC scripts generated for Montipora capitata RNA-seq data by E. Chille (QAQC Script), Sam White (Notebook post) and A. Huffmyer (QAQC Script)

  • Quality check raw sequences with FastQC , synthesize a report with MultiQC

  • Clean up sequences with Fastp

    • Trim sequence lengths

    • Filter out bad quality reads

    • Remove adapters & polyA tails

  • Check cleaned sequences with FastQC, synthesize a report with MultiQC

  • Repeat cleaning steps if needed

3.3 3. Align to reference genome & assemble

Great example of previous HISAT2 RNA-seq alignment in Sam White’s notebook here and Steven Robert’s examples here

3.4 4. Create gene expression count matrix

3.5 5. Data exploration using R & DESeq2

  • DESeq-2 vignette found here

3.6 6. Identify differentially expressed genes (DEG)s