代码语言:javascript复制首先在自己的服务器上面安装conda,安装方法代码如下:
# 首先下载文件,20M/S的话需要几秒钟即可
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# 接下来使用bash命令来运行我们下载的文件,记得是一路yes下去
bash Miniconda3-latest-Linux-x86_64.sh
# 安装成功后需要更新系统环境变量文件
source ~/.bashrc
安装好conda后需要设置镜像。
代码语言:javascript复制conda config --add channels r
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels https://mirrors.bfsu.edu.cn/anaconda/cloud/bioconda/
conda config --add channels https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge/
conda config --add channels https://mirrors.bfsu.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.bfsu.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
我们已经多次强调了,之前推荐的清华大学镜像可能是人满为患,大家需要自己机智一点哦。
使用conda新建rmats环境
记住,是新建rmats环境 ,然后在rmats环境 里面去安装rmats软件哦,代码如下:
代码语言:javascript复制conda create -n rmats
conda activate rmats
conda search rmats -c bioconda
conda install -c bioconda rmats=4.1.1
conda clean --al
conda install -c bioconda rmats=4.1.1
需要仔细查看安装rmats这一个软件,我们的conda需要做的工作 :
代码语言:javascript复制The following NEW packages will be INSTALLED:
_libgcc_mutex anaconda/cloud/conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
_openmp_mutex anaconda/cloud/conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
ca-certificates anaconda/cloud/conda-forge/linux-64::ca-certificates-2020.12.5-ha878542_0
certifi anaconda/cloud/conda-forge/linux-64::certifi-2020.12.5-py38h578d9bd_1
gsl anaconda/cloud/conda-forge/linux-64::gsl-2.6-he838d99_2
ld_impl_linux-64 anaconda/cloud/conda-forge/linux-64::ld_impl_linux-64-2.35.1-hea4e1c9_2
libblas anaconda/cloud/conda-forge/linux-64::libblas-3.9.0-8_openblas
libcblas anaconda/cloud/conda-forge/linux-64::libcblas-3.9.0-8_openblas
libffi anaconda/cloud/conda-forge/linux-64::libffi-3.3-h58526e2_2
libgcc-ng anaconda/cloud/conda-forge/linux-64::libgcc-ng-9.3.0-h2828fa1_18
libgfortran-ng anaconda/cloud/conda-forge/linux-64::libgfortran-ng-7.5.0-h14aa051_18
libgfortran4 anaconda/cloud/conda-forge/linux-64::libgfortran4-7.5.0-h14aa051_18
libgomp anaconda/cloud/conda-forge/linux-64::libgomp-9.3.0-h2828fa1_18
liblapack anaconda/cloud/conda-forge/linux-64::liblapack-3.9.0-8_openblas
libopenblas anaconda/cloud/conda-forge/linux-64::libopenblas-0.3.12-pthreads_hb3c22a3_1
libstdcxx-ng anaconda/cloud/conda-forge/linux-64::libstdcxx-ng-9.3.0-h6de172a_18
ncurses anaconda/cloud/conda-forge/linux-64::ncurses-6.2-h58526e2_4
numpy anaconda/cloud/conda-forge/linux-64::numpy-1.20.1-py38h18fd61f_0
openssl anaconda/cloud/conda-forge/linux-64::openssl-1.1.1j-h7f98852_0
pip anaconda/cloud/conda-forge/noarch::pip-21.0.1-pyhd8ed1ab_0
python anaconda/cloud/conda-forge/linux-64::python-3.8.8-hffdb5ce_0_cpython
python_abi anaconda/cloud/conda-forge/linux-64::python_abi-3.8-1_cp38
readline anaconda/cloud/conda-forge/linux-64::readline-8.0-he28a2e2_2
rmats bioconda/linux-64::rmats-4.1.1-py38h566bde1_0
setuptools anaconda/cloud/conda-forge/linux-64::setuptools-49.6.0-py38h578d9bd_3
sqlite anaconda/cloud/conda-forge/linux-64::sqlite-3.34.0-h74cdb3f_0
star bioconda/linux-64::star-2.7.8a-0
tk anaconda/cloud/conda-forge/linux-64::tk-8.6.10-h21135ba_1
wheel anaconda/cloud/conda-forge/noarch::wheel-0.36.2-pyhd3deb0d_0
xz anaconda/cloud/conda-forge/linux-64::xz-5.2.5-h516909a_1
zlib anaconda/cloud/conda-forge/linux-64::zlib-1.2.11-h516909a_1010
安装成功后,就查看自己的软件:
代码语言:javascript复制$ STAR --version
2.7.8a
$ rmats.py --version
v4.1.1
对star运行成功后的bam文件进行可变剪切操作
star运行成功后的bam文件大小示例如下所示:
代码语言:javascript复制$ cat *txt|xargs ls -lh |cut -d" " -f 5-
3.7G 3月 10 18:25 SRR8518122.bam
3.9G 3月 10 19:25 SRR8518123.bam
3.6G 3月 10 18:21 SRR8518124.bam
7.9G 3月 12 12:12 SRR8518436.bam
3.2G 3月 12 12:59 SRR8518442.bam
7.2G 3月 12 15:05 SRR8518448.bam
bam文件全路径需要制作成为两个文本文件,如下所示:
代码语言:javascript复制jmzeng 21:30:42 ~/tnbc/test_rmats
$ cat g1.txt
SRR8518122.bam,SRR8518123.bam,SRR8518124.bam
$ cat g2.txt
SRR8518436.bam,SRR8518442.bam,SRR8518448.bam
运行rmats的时候,选择--b1
和 --b2
。
gtf=$HOME/rna/SUPPA2/gtf/gencode.v37.annotation.gtf
rmats.py --b1 g1.txt --b2 g2.txt
--gtf $gtf
-t paired --readLength 147 --nthread 4
--od results --tmp tmp_output
运行成功的日志如下所示:
代码语言:javascript复制gtf: 26.418766975402832
There are 60651 distinct gene ID in the gtf file
There are 234485 distinct transcript ID in the gtf file
There are 36780 one-transcript genes in the gtf file
There are 1460986 exons in the gtf file
There are 25134 one-exon transcripts in the gtf file
There are 22496 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.866136
Average number of exons per transcript is 6.230616
Average number of exons per transcript excluding one-exon tx is 6.858587
Average number of gene per geneGroup is 8.495835
statistic: 0.04167461395263672
通常呢,运行速度很快:
代码语言:javascript复制==========
Done processing each gene from dictionary to compile AS events
Found 55759 exon skipping events
Found 4089 exon MX events
Found 18752 alt SS events
There are 11349 alt 3 SS events and 7403 alt 5 SS events.
Found 8037 RI events
==========
ase: 3.8115618228912354
count: 5.383385896682739
Processing count files.
Done processing count files.
得到的结果不是一般的多:
代码语言:javascript复制382K 3月 13 21:44 A3SS.MATS.JCEC.txt
368K 3月 13 21:44 A3SS.MATS.JC.txt
257K 3月 13 21:44 A5SS.MATS.JCEC.txt
241K 3月 13 21:44 A5SS.MATS.JC.txt
1.1M 3月 13 21:44 fromGTF.A3SS.txt
703K 3月 13 21:44 fromGTF.A5SS.txt
464K 3月 13 21:44 fromGTF.MXE.txt
16K 3月 13 21:44 fromGTF.novelJunction.A3SS.txt
11K 3月 13 21:44 fromGTF.novelJunction.A5SS.txt
30K 3月 13 21:44 fromGTF.novelJunction.MXE.txt
2.1K 3月 13 21:44 fromGTF.novelJunction.RI.txt
356K 3月 13 21:44 fromGTF.novelJunction.SE.txt
102 3月 13 21:44 fromGTF.novelSpliceSite.A3SS.txt
102 3月 13 21:44 fromGTF.novelSpliceSite.A5SS.txt
140 3月 13 21:44 fromGTF.novelSpliceSite.MXE.txt
108 3月 13 21:44 fromGTF.novelSpliceSite.RI.txt
104 3月 13 21:44 fromGTF.novelSpliceSite.SE.txt
758K 3月 13 21:44 fromGTF.RI.txt
5.3M 3月 13 21:44 fromGTF.SE.txt
83K 3月 13 21:44 JCEC.raw.input.A3SS.txt
56K 3月 13 21:44 JCEC.raw.input.A5SS.txt
38K 3月 13 21:44 JCEC.raw.input.MXE.txt
123K 3月 13 21:44 JCEC.raw.input.RI.txt
356K 3月 13 21:44 JCEC.raw.input.SE.txt
80K 3月 13 21:44 JC.raw.input.A3SS.txt
52K 3月 13 21:44 JC.raw.input.A5SS.txt
34K 3月 13 21:44 JC.raw.input.MXE.txt
112K 3月 13 21:44 JC.raw.input.RI.txt
329K 3月 13 21:44 JC.raw.input.SE.txt
199K 3月 13 21:44 MXE.MATS.JCEC.txt
177K 3月 13 21:44 MXE.MATS.JC.txt
579K 3月 13 21:44 RI.MATS.JCEC.txt
523K 3月 13 21:44 RI.MATS.JC.txt
1.6M 3月 13 21:44 SE.MATS.JCEC.txt
1.5M 3月 13 21:44 SE.MATS.JC.txt
377 3月 13 21:44 summary.txt
具体的解读,就很耗费时间了,需要一点点的看文档。
如果你确实觉得我的教程对你的科研课题有帮助,让你茅塞顿开,或者说你的课题大量使用我的技能,烦请日后在发表自己的成果的时候,加上一个简短的致谢,如下所示:
代码语言:javascript复制We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes.
十年后我环游世界各地的高校以及科研院所(当然包括中国大陆)的时候,如果有这样的情谊,我会优先见你。