bcftools 高级用法

2023-03-06 14:56:47 浏览数 (1)

查看bcftools安装路径

which bcftools

导出插件所需的环境编辑

export BCFTOOLS_PLUGINS=/bi/software/bcftools-1.16/plugins;

查看插件环境变量

echo ${BCFTOOLS_PLUGINS}

反向过滤 ( https://samtools.github.io/bcftools/bcftools.html#expressions )

bcftools view -e ‘CHROM~”M” || CHROM~”_” || QUAL<30 || MAX(FMT/GQ)<20 || MAX(FMT/DP)<8 || AVG(FMT/DP)<5 || (COUNT(FMT/GT="het")>10 && MAX(FMT/AD[*:1]/FMT/DP[*])<0.2) || MAX(FMT/AD[*:1]/FMT/DP[*]) < 0.1’ /bi/8.xuxiong/EQA2022/20221019/All.raw.vcf.gz | less -S `

正向过滤

bcftools view -i ‘N_ALT=1 & AVG(FMT/DP)>8 & MIN(FMT/DP)>5 & MIN(FMT/GQ)>15 & QUAL > 30 & MAX(FORMAT/AD[*:1]/FORMAT/DP[*]) > 0.1 ‘ /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf.gz | less -S

取交集

bcftools isec /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf.gz /bi/8.xuxiong/work/WES/CYP21A2_CYP21A1P_SMN1_SMN2_HBA1_HBA2_diff.vcf.gz -n =2 -w 1 -Oz -o /bi/8.xuxiong/EQA2022/20221019/All.sp.homologous.vcf.gz ; tabix -fp vcf /bi/8.xuxiong/EQA2022/20221019/All.sp.homologous.vcf.gz ;

取差集

bcftools isec /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf.gz /bi/8.xuxiong/work/WES/CYP21A2_CYP21A1P_SMN1_SMN2_HBA1_HBA2_diff.vcf.gz -n ~10 -w 1 -Oz -o /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.vcf.gz; tabix -fp vcf /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.vcf.gz;

bcftools isec -n~10 -c none -w 1 /bi/8.xuxiong/EQA2022/20221019/All.vt.vcf.gz /bi/8.xuxiong/database/Variants.with.local.blacklist.V20211010.vcf.gz -Oz -o All.blacklist.QCflt.vcf.gz; tabix -fp vcf All.blacklist.QCflt.vcf.gz;

正向过滤 选当批次人群频率高频位点

bcftools view -Ov -i ‘(AF>=0.5 && AN>=12) || (AF>=0.2 && AN>=30) || (AF>=0.1 && AN>=100)’ /bi/8.xuxiong/EQA2022/20221019/All.blacklist.QCflt.vcf.gz -Oz -o All.blacklist.QCflt.Batch_common.vcf.gz; tabix -fp vcf All.blacklist.QCflt.Batch_common.vcf.gz;

挑选IMPACT synonymous_variant及以上的位点,需先经VEP注释,IMPACT 顺序为/bi/database/VEPseverity.txt ,运行插件前需导出环境变量( export BCFTOOLS_PLUGINS=/bi/software/bcftools-1.6/plugins; )也可加入~/.bashrc

bcftools split-vep -S /bi/database/VEPseverity.txt -s all:synonymous_variant -x /bi/8.xuxiong/EQA2022/20221019/All.vep.vcf -Oz -o All.MoreSeverity.vcf.gz; tabix -fp vcf All.MoreSeverity.vcf.gz;

挑选IMPACT coding_sequence_variant及以下的位点,需先经VEP注释

bcftools split-vep -S /bi/database/VEPseverity.txt -s all:coding_sequence_variant- -x /bi/8.xuxiong/EQA2022/20221019/All.vep.vcf -Oz -o All.modifier.vcf.gz; tabix -fp vcf All.modifier.vcf.gz;

取交集(需要取交集的文件需要预先排序并建索引)

bcftools isec /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.vcf.gz /bi/8.xuxiong/database/ClinVar.HGMD.Local.SpliceAI.whitelist.V20211010.vcf.gz -n =2 -w 1 -Ov | less -S

取并集,需两个vcf均有索引,且排序(需要合并的文件需要预先排序并建索引)

bcftools concat -a /bi/8.xuxiong/EQA2022/20221019/All.sp.homologous.vcf.gz /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.vcf.gz -Ov|less -S

按家系拆分joint calling的vcf文件

bcftools split /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf -Ov -o . -G pedGroupsVEP.tsv;

按样本拆分出joint calling的vcf文件所有样本vcf文件

bcftools query -l /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf | bcftools split -S /dev/stdin -Oz -o . /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf

根据/bi/database/gene_region.bed.gz(需先 tabix -fp bed /bi/database/gene_region.bed.gz 建立索引)注释vcf文件,添加到INFO/GENE

bcftools annotate -a /bi/database/gene_region.bed.gz -c CHROM,FROM,TO,GENE -H ‘##INFO=<id=gene,number=1,type=string,description="gene less=""></id=gene,number=1,type=string,description="gene>

将注释上基因后的vcf按-f格式转成tsv格式

bcftools annotate -a /bi/database/gene_region.bed.gz -c CHROM,FROM,TO,GENE -H ‘##INFO=<id=gene,number=1,type=string,description="gene bcftools="" query="" -f="" less=""></id=gene,number=1,type=string,description="gene>

计算每个样本每个位点的VAF,并转成tsv格式

bcftools fill-tags All.raw.QCflt.vcf — -t FORMAT/VAF |bcftools query -H -f ‘%CHROMt%POSt%REFt%ALT[t%VAF]n’|less -S

(bcftools fill-tags /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.flt.vcf.gz — -t FORMAT/VAF |bcftools query -H -f ‘[t%VAF]n’|head -n 1|perl -ne ‘s/^#s*|[d ]|:VAF//g;print “chrtstarttendtallelet$_”;’; bcftools fill-tags /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.flt.vcf.gz — -t FORMAT/VAF |bcftools query -f ‘%CHROMt%POSt%ENDt%REF/%ALT[t%VAF]n’) | less -S

仅挑选指定WES22070248.bam的vcf

bcftools view -s WES22070248.bam /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf|less -S

仅挑选指定WES22070248.bam的vcf,且 过滤掉纯合野生型和no call的位点,仅挑选INFO中的CSQ字段,left-align indel、去重、uniq

bcftools view -s WES22070248.bam /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf|bcftools view -e ‘FMT/GT[0]==”RR” || FMT/GT[0]==”mis”‘ | bcftools annotate -x ^INFO/CSQ |bcftools norm -d none|less -S

列出所有样本

bcftools query -l /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf

按指定格式设置位点的ID,在前面增加’ ’,将保留原有的ID

bcftools annotate —set-id ‘%CHROM-%POS-%REF-%FIRST_ALT’ /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf|less -S bcftools annotate —set-id ’%CHROM-%POS-%REF-%FIRST_ALT’ /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf|less -S

仅挑选 INFO/AF、INFO/AC、INFO/AN字段输出vcf

bcftools annotate -x ^INFO/AF,^INFO/AC,^INFO/AN /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf|less -S

解压 vcf.gz(zcat) ,向vcf文件header增加contig信息(awk),将染色体按指定的映射关系重命名(bcftools annotate —rename-chrs )且仅挑选INFO/CLASS字段,按指定格式输出(bcftools query -f ),统计最后一列排序后输出频数

zcat ~/data/database/2021Q4/HGMD_Pro_2021.4_hg19.vcf.gz | awk ‘/^##contig/ {next} /^#CHROM/ {printf(“##contig= bcftools annotate —rename-chrs /bi/database/chr-names.txt -x ^INFO/CLASS | bcftools query -f “%CHROMt%POSt%IDt%REFt%ALTt%QUALt%FILTERt%INFO/CLASSn”| awk -F”t” ‘{print $NF}’|sort|uniq -c

0 人点赞