variant calling还在用GATK？deepvariant又快又准

deepvariant（A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology 36, 983–987 (2018). ）为谷哥开源的基于机器学习的变异分析工具，今年年初有篇scientific report上的文献（ https://www.nature.com/articles/s41598-022-05833-4 ），对GATK与deepvariant做了详细的比较，有兴趣的可自行阅读下这篇文献。

最终得出的结论是：Compared to GATK, DeepVariant had a shorter execution time and higher accuracy for clinical samples.

deepvariant production model使用6个core chanels（read base，base quality，mapping quality，strand of alignment，read supports variant，base differs from ref）作为基本训练模型（https://google.github.io/deepvariant/posts/2022-06-09-adding-custom-channels/），1.4版本引入了insert_size chanel后准确性进一步提升。For Illumina WGS and WES, we add an additional feature of read insert size (insert_size) . This reduces errors by 4-10% for Illumina WGS and WES model.（https://github.com/google/deepvariant/releases）

大家有兴趣的也可用rtgtools、hap.py等工具对NA12878 金标准数据做一个评测。个人的直观感受就是deepvariant假阳性明显要比GATK少很多、假阴性比GATK也要少。下面举两个例子：

下面是处在non-uniqueness mappability边缘的一个变异，GATK haplotypecaller没有call出来proband的变异（GATK出了假阴性），只call出了母亲的变异，而deepvariant都准确call出来了。

另一个是位于参考基因组为n-polymer（polyA）附近的序列，GATK报了一个低VAF的indel，但deepvariant认为此处是refCall，不是变异

deepvariant最好采用docker安装运行，demo命令行如下：

docker run --privileged --rm --user `id -u`:`id -g` -v "/sg2/8.xuxiong/WES_Clinical/workstation_V6.2.0_WES_20220916A_T7/b.cram":"/input" -v "/bi/8.xuxiong":"/output" -v "/sg2/8.xuxiong/TargetSeqV6/genome":"/reference" -v "/bi/8.xuxiong/database":"/database" google/deepvariant:"latest" /opt/deepvariant/bin/run_deepvariant --model_type=WES --ref=/reference/ucsc.hg19.fasta --reads=/input/PES22090081-HE.deduped.cram --regions chr1:215913883-215915883 --output_vcf=/output/PES22090081.dv.vcf.gz --output_gvcf=/output/PES22090081.raw.g.vcf.gz --intermediate_results_dir /output/PES22090081_tmp_dir --num_shards=8

https 网络安全容器镜像服务 sql

0 人点赞