I've been doing troubleshooting to install nf-core/rnaseq by nextflow in cluster.
Primary summary of how nf-core/rnaseq work ( taken from the webiste https://nf-co.re/rnaseq/3.1 )
- Download FastQ files via SRA, ENA or GEO ids and auto-create input samplesheet (ENA FTP; if required)
- Merge re-sequenced FastQ files (cat)
- Read QC (FastQC)
- UMI extraction (UMI-tools)
- Adapter and quality trimming (Trim Galore!)
- Removal of ribosomal RNA (SortMeRNA)
- Choice of multiple alignment and quantification routes:
- Sort and index alignments (SAMtools)
- UMI-based deduplication (UMI-tools)
- Duplicate read marking (picard MarkDuplicates)
- Transcript assembly and quantification (StringTie)
To install by conda
Type 1.
conda create --name nf-core python=3.7 nf-core nextflow
conda activate nf-core
nf-core download
# type: rnaseq
# choose: none or singularity
# execute
conda activate nf-core
(nf-core)$nextflow
Type 2. (my case)
# just in case, if the setting up is failed to run "nextflow"
curl -s https://get.nextflow.io | bash
chmod +x nextflow
# execute
conda activate nf-core
(nf-core)$Path_to_save_nextflow/nextflow
# install all required packages in the env "nf-core"
# the list of tools is in below link
# https://github.com/nf-core/rnaseq/tree/master/modules/nf-core/software
(nf-core)$conda install ....
# install R packages in R,
"tximport", "SummarizedExperiment", "readr", "TxDb.Hsapiens.UCSC.hg19.knownGene"
in the R in the env "nf-core"
To prepare required files ( if paired-end files )
# samplesheet.csv
sample,fastq_1,fastq_2,strandedness
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,forward CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,reverse CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,unstranded
To Run ( if human data, if docker or singularity cannot be usable )
## in shell script
NXF_OPTS='-Xms1g -Xmx4g'
cd $(pwd) ### output will be saved in current directory
source /your_anaconda_path/anaconda3/bin/activate nf-core
## "your_path" is the directory which you downloaded
## nextflow, mydata.csv, genome index built by you, DNA fasta file, gtf file..
## the path doesn't need to be same. It depends on where you saved them./your_path/nextflow run nf-core/rnaseq --input /your_path/mydata.csv --gencode --star_index '/your_path/genecode_db_index' --fasta '/your_path/GRCh38.primary_assembly.genome.fa' --gtf '/your_path/gencode.v38.annotation.gtf' --aligner star_salmon
To resume the previous work ( after getting the error )
/your_path/nextflow log
/your_path/nextflow run naseq-nf -resume your_session_id --input /your_path/mydata.csv --gencode --star_index '/your_path/genecode_db_index' --fasta '/your_path/GRCh38.primary_assembly.genome.fa' --gtf '/your_path/gencode.v38.annotation.gtf' --aligner star_salmon
References
rnaseq » nf-core
Introduction Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through screen
nf-co.re
nf-core/rnaseq
RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control. - nf-core/rnaseq
github.com
Tools » nf-core
Table of contents The nf-core tools package is written in Python and can be imported and used within other packages. For documentation of the internal Python functions, please refer to the Tools Python API docs. Installation Bioconda You can install nf-cor
nf-co.re
https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html