[ technique review / RNA-seq. data analysis / Bulk RNA-seq / advanced ] nf-core/rnaseq

I've been doing troubleshooting to install nf-core/rnaseq by nextflow in cluster.

Primary summary of how nf-core/rnaseq work ( taken from the webiste https://nf-co.re/rnaseq/3.1 )

Download FastQ files via SRA, ENA or GEO ids and auto-create input samplesheet (ENA FTP; if required)
Merge re-sequenced FastQ files (cat)
Read QC (FastQC)
UMI extraction (UMI-tools)
Adapter and quality trimming (Trim Galore!)
Removal of ribosomal RNA (SortMeRNA)
Choice of multiple alignment and quantification routes:
1. STAR -> Salmon
2. STAR -> RSEM
Sort and index alignments (SAMtools)
UMI-based deduplication (UMI-tools)
Duplicate read marking (picard MarkDuplicates)
Transcript assembly and quantification (StringTie)

To install by conda

Type 1.
conda create --name nf-core python=3.7 nf-core nextflow
conda activate nf-core
nf-core download
# type: rnaseq
# choose: none or singularity
# execute
conda activate nf-core
(nf-core)$nextflow

Type 2. (my case)
# just in case, if the setting up is failed to run "nextflow"
curl -s https://get.nextflow.io | bash
chmod +x nextflow
# execute
conda activate nf-core
(nf-core)$Path_to_save_nextflow/nextflow
# install all required packages in the env "nf-core"
# the list of tools is in below link
# https://github.com/nf-core/rnaseq/tree/master/modules/nf-core/software
(nf-core)$conda install ....
# install R packages in R,
"tximport", "SummarizedExperiment", "readr", "TxDb.Hsapiens.UCSC.hg19.knownGene"
in the R in the env "nf-core"

To prepare required files ( if paired-end files )

# samplesheet.csv
sample,fastq_1,fastq_2,strandedness
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,forward CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,reverse CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,unstranded

To Run ( if human data, if docker or singularity cannot be usable )

## in shell script

NXF_OPTS='-Xms1g -Xmx4g'
cd $(pwd) ### output will be saved in current directory
source /your_anaconda_path/anaconda3/bin/activate nf-core

## "your_path" is the directory which you downloaded
## nextflow, mydata.csv, genome index built by you, DNA fasta file, gtf file..
## the path doesn't need to be same. It depends on where you saved them.

/your_path/nextflow run nf-core/rnaseq --input /your_path/mydata.csv --gencode --star_index '/your_path/genecode_db_index' --fasta '/your_path/GRCh38.primary_assembly.genome.fa' --gtf '/your_path/gencode.v38.annotation.gtf' --aligner star_salmon

To resume the previous work ( after getting the error )

/your_path/nextflow log
/your_path/nextflow run naseq-nf -resume your_session_id --input /your_path/mydata.csv --gencode --star_index '/your_path/genecode_db_index' --fasta '/your_path/GRCh38.primary_assembly.genome.fa' --gtf '/your_path/gencode.v38.annotation.gtf' --aligner star_salmon

References

rnaseq » nf-core

Introduction Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through screen

nf-co.re

nf-core/rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control. - nf-core/rnaseq

github.com

Tools » nf-core

Table of contents The nf-core tools package is written in Python and can be imported and used within other packages. For documentation of the internal Python functions, please refer to the Tools Python API docs. Installation Bioconda You can install nf-cor

nf-co.re

https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html

'정보 > Statistics & Data analysis' 카테고리의 다른 글

[ ML / 머신러닝 ] Contents / 목차 (0)	2021.11.20
[ Summary part .1 ] Understanding Representation Learning With Autoencoder: Everything You Need to Know About Representation and Feature Learning (0)	2021.10.08
[ technique review / RNA-seq. data analysis / Bulk RNA-seq / Basic ] STAR & Salmon & paired-end reads (0)	2021.05.29

NoteHaus

[ technique review / RNA-seq. data analysis / Bulk RNA-seq / advanced ] nf-core/rnaseq

'정보 > Statistics & Data analysis' 카테고리의 다른 글

티스토리툴바

[ technique review / RNA-seq. data analysis / Bulk RNA-seq / advanced ] nf-core/rnaseq

'정보 > Statistics & Data analysis' 카테고리의 다른 글

'정보/Statistics & Data analysis' Related Articles

티스토리툴바