본문 바로가기

Learning/Statistics & Data analysis

[ technique review / RNA-seq. data analysis / Bulk RNA-seq / advanced ] nf-core/rnaseq

I've been doing troubleshooting to install nf-core/rnaseq by nextflow in cluster. 

 


 

Primary summary of how nf-core/rnaseq work ( taken from the webiste https://nf-co.re/rnaseq/3.1 )

  1. Download FastQ files via SRA, ENA or GEO ids and auto-create input samplesheet (ENA FTP; if required)
  2. Merge re-sequenced FastQ files (cat)
  3. Read QC (FastQC)
  4. UMI extraction (UMI-tools)
  5. Adapter and quality trimming (Trim Galore!)
  6. Removal of ribosomal RNA (SortMeRNA)
  7. Choice of multiple alignment and quantification routes:
    1. STAR -> Salmon
    2. STAR -> RSEM
  8. Sort and index alignments (SAMtools)
  9. UMI-based deduplication (UMI-tools)
  10. Duplicate read marking (picard MarkDuplicates)
  11. Transcript assembly and quantification (StringTie)

 


To install by conda 

Type 1. 
conda create --name nf-core python=3.7 nf-core nextflow 

conda activate nf-core
nf-core download
# type: rnaseq
# choose: none or singularity
# execute 
conda activate nf-core

(nf-core)$nextflow 

Type 2. (my case)
# just in case, if the setting up is failed to run "nextflow"
curl -s https://get.nextflow.io | bash
chmod +x nextflow
# execute 
conda activate nf-core

(nf-core)$Path_to_save_nextflow/nextflow 
# install all required packages in the env "nf-core" 
# the list of tools is in below link
# https://github.com/nf-core/rnaseq/tree/master/modules/nf-core/software
(nf-core)$conda install ....
# install R packages in R,
"tximport", "SummarizedExperiment", "readr", "TxDb.Hsapiens.UCSC.hg19.knownGene" 
in the R in the env "nf-core" 


To prepare required files ( if paired-end files )

# samplesheet.csv
sample,fastq_1,fastq_2,strandedness
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,forward CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,reverse CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,unstranded

 

To Run ( if human data, if docker or singularity cannot be usable )

## in shell script 

NXF_OPTS='-Xms1g -Xmx4g'
cd $(pwd) ### output will be saved in current directory 
source /your_anaconda_path/anaconda3/bin/activate nf-core

## "your_path" is the directory which you downloaded
## nextflow, mydata.csv, genome index built by you, DNA fasta file, gtf file..
## the path doesn't need to be same. It depends on where you saved them.

/your_path/nextflow run nf-core/rnaseq --input /your_path/mydata.csv --gencode --star_index '/your_path/genecode_db_index' --fasta '/your_path/GRCh38.primary_assembly.genome.fa' --gtf '/your_path/gencode.v38.annotation.gtf' --aligner star_salmon

 

To resume the previous work ( after getting the error )

/your_path/nextflow log
/your_path/nextflow run naseq-nf -resume your_session_id  --input /your_path/mydata.csv --gencode --star_index '/your_path/genecode_db_index' --fasta '/your_path/GRCh38.primary_assembly.genome.fa' --gtf '/your_path/gencode.v38.annotation.gtf' --aligner star_salmon

 

References

 

rnaseq » nf-core

Introduction Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through screen

nf-co.re

 

 

nf-core/rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control. - nf-core/rnaseq

github.com

 

 

Tools » nf-core

Table of contents The nf-core tools package is written in Python and can be imported and used within other packages. For documentation of the internal Python functions, please refer to the Tools Python API docs. Installation Bioconda You can install nf-cor

nf-co.re

https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html