5 一步法找基因变异流程

2019-06-03 08:53:03 浏览数 (4)

1 先查看sam文件

随机选择3个

代码语言:javascript复制
$ samtools mpileup SRR8517854.bam |head -95|tail -3
代码语言:javascript复制
[mpileup] 1 samples in 1 input files
chr1    10105   N       8       AAAAcAAA        kuuu>6uK
chr1    10106   N       10      CCCCcCCCC^$c    uuuukAuuAK
chr1    10107   N       10      CCCCcCCCCc      upukaFfuKF

再看另外一个:

代码语言:javascript复制
$ samtools mpileup SRR8517854.bam |head -168944|tail -3
代码语言:javascript复制
[mpileup] 1 samples in 1 input files
chr1    909515  N       1       c       K
chr1    909516  N       1       c       K
chr1    909517  N       1       g       K

关于samtools mpileup的具体用法和结果解释请看

2 最简单的找变异流程

align文件夹

代码语言:javascript复制
ref=/mnt/f/kelly/bioTree/server/wesproject/hg38/Homo_sapiens_assembly38.fasta
time samtools mpileup -ugf $ref *.bam|bcftools call -vmO z -o wxs_out.vcf.gz 
代码语言:javascript复制
less -S wxs_out.vcf.gz

3载入IGV查看

3.1先构建索引

批量构建

代码语言:javascript复制
ls *.bam|xargs -i samtools index {}

如果是服务器需要下载到本地查看

4 去除PCR重复

需要samtools的4个命令

align文件夹

代码语言:javascript复制
samtools markdup -r SRR7696207.bam SRR7696207.rm.bam
[markdup] error: no ms score tag. Please run samtools fixmate on file first.
[markdup] error: no ms score tag. Please run samtools fixmate on file first.

提示先fixmate

代码语言:javascript复制
$ samtools fixmate SRR7696207.bam SRR7696207.fixmate.bam
[bam_mating_core] ERROR: Coordinate sorted, require grouped/sorted by queryname.

提示先要sort,并且要以queryname进行sort

代码语言:javascript复制
$ samtools sort -n -o SRR7696207.sort.bam SRR7696207.bam
代码语言:javascript复制
$ samtools sort -n -o SRR7696207.namesort.bam SRR7696207.bam
[bam_sort_core] merging from 25 files and 1 in-memory blocks...

接下来就可以逆回去了,也就是是正确的顺序(本篇最下方有说明)

代码语言:javascript复制
$ samtools sort -n -o SRR7696207.namesort.bam SRR7696207.bam
$ samtools fixmate -m SRR7696207.namesort.bam SRR7696207.fixmate.bam
$ samtools sort -o SRR7696207.positionsort.fixmat.bam SRR7696207.fixmate.bam
$ samtools markdup -r SRR7696207.positionsort.fixmat.bam SRR7696207.rm.bam

看下以上的文件结构和大小

代码语言:javascript复制
├── [ 136]  1
├── [ 136]  2
├── [   0]  config
├── [ 42K]  markdup.bam
├── [ 22G]  SRR7696207.bam
├── [5.4G]  SRR7696207.fixmate.bam
├── [5.2G]  SRR7696207.namesort.bam
├── [4.0G]  SRR7696207.positionsort.fixmat.bam
├── [3.8G]  SRR7696207.rm.bam
├── [2.6G]  SRR8517853.bam
├── [3.4G]  SRR8517854.bam
└── [6.9G]  SRR8517856.bam

最后查看下rm.bam

代码语言:javascript复制
samtools view SRR7696207.rm.bam |less -S
代码语言:javascript复制
SRR7696207.27107225     2193    chr1    10024   0       86M64H  chr5    10324   0       CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
SRR7696207.27728977     99      chr1    10025   17      91M18S  =       10025   91      TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
SRR7696207.27728977     147     chr1    10025   17      91M18S  =       10025   -91     TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
SRR7696207.4339191      163     chr1    10027   0       69M     =       10039   123     ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
SRR7696207.12487651     163     chr1    10028   0       86M     =       10028   81      CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
SRR7696207.9370306      83      chr1    10028   0       116M    =       10016   -128    CCCTAACCCTAACCCTAAACCTAACCCTAACCCTAACC
SRR7696207.12487651     83      chr1    10028   0       69S81M  =       10028   -81     TTTCTAGCAGTGGACTGCATACGTGTTCGCATACTGTG
SRR7696207.18904916     99      chr1    10030   0       81M69S  =       10030   79      CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
SRR7696207.18904916     147     chr1    10030   0       79M     =       10030   -79     CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
SRR7696207.4339191      83      chr1    10039   0       111M7S  =       10027   -123    ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
SRR7696207.20389847     99      chr1    10040   0       74M     =       10040   74      CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
SRR7696207.20389847     147     chr1    10040   0       76S74M  =       10040   -74     CATATTTTTGTTTTTTTTAATGTTACGGCGACCACCGA

看rm后的是否有差异

代码语言:javascript复制
samtools flagstat SRR7696207.bam
samtools flagstat SRR7696207.rm.bam

结果如下

代码语言:javascript复制
$ samtools flagstat SRR7696207.bam
55398860   0 in total (QC-passed reads   QC-failed reads)
0   0 secondary
372636   0 supplementary
0   0 duplicates
55374129   0 mapped (99.96% : N/A)
55026224   0 paired in sequencing
27513112   0 read1
27513112   0 read2
54512924   0 properly paired (99.07% : N/A)
54978908   0 with itself and mate mapped
22585   0 singletons (0.04% : N/A)
330146   0 with mate mapped to a different chr
252082   0 with mate mapped to a different chr (mapQ>=5)
代码语言:javascript复制
$ samtools flagstat SRR7696207.rm.bam
52114377   0 in total (QC-passed reads   QC-failed reads)
0   0 secondary
372636   0 supplementary
0   0 duplicates
52089646   0 mapped (99.95% : N/A)
51741741   0 paired in sequencing
25868532   0 read1
25873209   0 read2
51246880   0 properly paired (99.04% : N/A)
51699203   0 with itself and mate mapped
17807   0 singletons (0.03% : N/A)
319884   0 with mate mapped to a different chr
242709   0 with mate mapped to a different chr (mapQ>=5)

可以再试试-S参数

代码语言:javascript复制
samtools markdup -S SRR7696207.sam SRR7696207.mk.bam

补充:samtoolsmarkdup操作的正确顺序

  1. The first sort can be omitted if the file is already name ordered
代码语言:javascript复制
samtools sort -n -o namesort.bam example.bam
  1. Add ms and MC tags for markdup to use later
代码语言:javascript复制
samtools fixmate -m namesort.bam fixmate.bam
  1. Markdup needs position order
代码语言:javascript复制
samtools sort -o positionsort.bam fixmate.bam
  1. Finally mark duplicates
代码语言:javascript复制
samtools markdup positionsort.bam markdup.bam

以上是简单的找变异流程,并不完善。

1 人点赞