nf-celescope

工欲善其事必先利其器

nf-celescope 可以说是celescope的升级版，采用Nextflow框架，优化计算资源配置，在参考基因定量部分采用更快的STARsolo。新版本定量软件环境部署学习成本低、运行速度快同时兼容不同组学生物信息捕获

GitHub: https://github.com/singleron-RD/nf-celescope

目前可用pipeline

scrna：处理单细胞（核）转录组测序（GEXSCOPE®）数据的pipeline
wf-single-cell：处理Nanopore 单细胞测序（GEXSCOPE® Nanopore）数据的pipeline
scsnp：处理单细胞靶向试剂盒（FocuSCOPE®）建库测序数据的pipeline
scatac：处理single-cell ATAC-seq 数据的pipeline
sccite：处理single-cell CITE-Seq 数据的pipeline

如何快速配置分析环境

nextflow环境

Nextflow 可以在任何兼容 POSIX 的系统（Linux、macOS 等）上使用，也可以通过 WSL 在 Windows 上使用。它需要安装 Bash 3.2（或更高版本）和 Java 11（或更高版本，最多 22）。首先查看我们的服务器当前环境的Java是否符合要求，如果不符合可以使用conda创建一个符合要求的Java环境。尽量不要去动默认环境的java版本

代码语言：javascript复制

##首先是创建一个新环境，安装nextflow
mamba create -n nf_celescope nextflow
mamba activate nf_celescope

pip install nf-core
pip install sccore

nextflow安装成功

下载所需pipeline

代码语言：javascript复制

wget -c https://github.com/singleron-RD/scrna/archive/refs/tags/1.2.1.tar.gz
tar -xf 1.2.1.tar.gz

如何使用

环境如果配置好的话，使用起来还是蛮简单的，基本就是

准备输入信息csv文件
修改运行代码，提交任务

每个pipeline都给出了使用说明，见：https://github.com/singleron-RD/nf-celescope 。这里我们还是以单细胞转录组数据定量为例。

基本用法

代码语言：javascript复制

nextflow run singleron-RD/scrna 
 --input ./samplesheet.csv 
 --outdir ./results 
 --star_genome path_to_star_genome_index 
 -profile docker

--input ##输入文件信息。
--outdir ##保存结果的输出目录
--star_genome ##STAR参考基因组目录路径
--max_cpus ##最大调用cpu。默认16
--max_memory ##最大使用内存。默认128.GB 
-profile ##选择配置文件。可选[docker、singularity、podman、shifter、charliecloud、charliecloud、conda]。若未指定，则使用本地软件

更多参数见：https://github.com/singleron-RD/scrna/blob/master/docs/parameters.md

输入文件 samplesheet

samplesheet.csv 以逗号分割的csv文件。包含三列信息，分别为

自定义的样本名
fastq_1文件绝对路径
fastq_2文件绝对路径

参考基因组索引文件

首次使用，可以提供 fasta gtf genome_name star_genome 。创建的索引文件，将会保存在{outdir}/star_genome/{genome_name}/ star_genome 。后续再次使用，可以直接调用。

代码语言：javascript复制

fasta: "https://raw.githubusercontent.com/singleron-RD/test_genome/master/human.GRCh38.99.MT/human.GRCh38.99.MT.fasta"
gtf: "https://raw.githubusercontent.com/singleron-RD/test_genome/master/human.GRCh38.99.MT/human.GRCh38.99.MT.gtf"
genome_name: "human.GRCh38.99.MT"

实例演示

数据还是来自于CRA008674 。见：CeleScope — 新格元单细胞多组学分析工具箱

制作输入文件

样本少的话，其实可以直接按照输入文件格式要求，手动创建。

样本多的话，手动创建难免出错，这时候可以使用其提供的python脚本自动创建。【其实也可以使用shell命令快速制作】

代码语言：javascript复制

#pip install sccore
#cat run_info.csv |awk -F "," 'BEGIN{print "sample"",""prefix"}{print $3","$2}'|head -n 4 > manifext.csv

manifest -m manifext.csv -f ~/scRNA/CRA008674/st1_data

运行pipeline

代码语言：javascript复制

nextflow run /home/data/t020559/biosoft/scrna-1.2.1 
 --input /home/data/t020559/scRNA/CRA008674/text/samplesheet.csv 
 --outdir /home/data/t020559/scRNA/CRA008674/text/out_results 
 --star_genome /home/data/t020559/ref/mouse/refdata_celescope 
 --max_cpus 8 --max_memory 100.GB 
 -profile singularity

由于我们使用的是共享服务器。出于安全的考虑，没有配置docker。所以这里我们调用Singularity 。

运行日志-部分

可能报错

如果是第一次使用nextflow流程的话，可能环境会麻烦一点。比如遇到报错：ERROR ~ Plugin with id nf-validation not found in any repository

很明显这个报错，是由于网络原因，无法从github下载所需的插件

一个常规的解决办法，就是使用本地电脑从github 下载后上传服务器使用

代码语言：javascript复制

wget -c https://github.com/nextflow-io/nf-validation/releases/download/1.1.3/nf-validation-1.1.3.zip

非常不巧的是，这个nf-validation-1.1.3.zip 压缩文件应该有问题，无法解压直接使用

另一个更曲线救国的办法就是，从别的终端拷贝到服务器。比如在自己本地电脑，配置一下上述nextflow环境。运行一个测试代码

代码语言：javascript复制

nextflow run ~/biosoft/scrna-1.2.1 -profile test,singularity --outdir ./results

那么就会成功下载这个nf-validation-1.1.3 插件（本机电脑需要科学上网）。然后上传服务器即可使用。

参考：

https://nf-co.re/docs/usage/installation
https://github.com/singleron-RD/scrna/blob/master/docs/usage.md

数据插件服务器配置软件

0 人点赞