查看bcftools安装路径
which bcftools
导出插件所需的环境编辑
export BCFTOOLS_PLUGINS=/bi/software/bcftools-1.16/plugins;
查看插件环境变量
echo ${BCFTOOLS_PLUGINS}
反向过滤 ( https://samtools.github.io/bcftools/bcftools.html#expressions )
bcftools view -e ‘CHROM~”M” || CHROM~”_” || QUAL<30 || MAX(FMT/GQ)<20 || MAX(FMT/DP)<8 || AVG(FMT/DP)<5 || (COUNT(FMT/GT="het")>10 && MAX(FMT/AD[*:1]/FMT/DP[*])<0.2) || MAX(FMT/AD[*:1]/FMT/DP[*]) < 0.1’ /bi/8.xuxiong/EQA2022/20221019/All.raw.vcf.gz | less -S `
正向过滤
bcftools view -i ‘N_ALT=1 & AVG(FMT/DP)>8 & MIN(FMT/DP)>5 & MIN(FMT/GQ)>15 & QUAL > 30 & MAX(FORMAT/AD[*:1]/FORMAT/DP[*]) > 0.1 ‘ /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf.gz | less -S
取交集
bcftools isec /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf.gz /bi/8.xuxiong/work/WES/CYP21A2_CYP21A1P_SMN1_SMN2_HBA1_HBA2_diff.vcf.gz -n =2 -w 1 -Oz -o /bi/8.xuxiong/EQA2022/20221019/All.sp.homologous.vcf.gz ; tabix -fp vcf /bi/8.xuxiong/EQA2022/20221019/All.sp.homologous.vcf.gz ;
取差集
bcftools isec /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf.gz /bi/8.xuxiong/work/WES/CYP21A2_CYP21A1P_SMN1_SMN2_HBA1_HBA2_diff.vcf.gz -n ~10 -w 1 -Oz -o /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.vcf.gz; tabix -fp vcf /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.vcf.gz;
bcftools isec -n~10 -c none -w 1 /bi/8.xuxiong/EQA2022/20221019/All.vt.vcf.gz /bi/8.xuxiong/database/Variants.with.local.blacklist.V20211010.vcf.gz -Oz -o All.blacklist.QCflt.vcf.gz; tabix -fp vcf All.blacklist.QCflt.vcf.gz;
正向过滤 选当批次人群频率高频位点
bcftools view -Ov -i ‘(AF>=0.5 && AN>=12) || (AF>=0.2 && AN>=30) || (AF>=0.1 && AN>=100)’ /bi/8.xuxiong/EQA2022/20221019/All.blacklist.QCflt.vcf.gz -Oz -o All.blacklist.QCflt.Batch_common.vcf.gz; tabix -fp vcf All.blacklist.QCflt.Batch_common.vcf.gz;
挑选IMPACT synonymous_variant及以上的位点,需先经VEP注释,IMPACT 顺序为/bi/database/VEPseverity.txt ,运行插件前需导出环境变量( export BCFTOOLS_PLUGINS=/bi/software/bcftools-1.6/plugins; )也可加入~/.bashrc
bcftools split-vep -S /bi/database/VEPseverity.txt -s all:synonymous_variant -x /bi/8.xuxiong/EQA2022/20221019/All.vep.vcf -Oz -o All.MoreSeverity.vcf.gz; tabix -fp vcf All.MoreSeverity.vcf.gz;
挑选IMPACT coding_sequence_variant及以下的位点,需先经VEP注释
bcftools split-vep -S /bi/database/VEPseverity.txt -s all:coding_sequence_variant- -x /bi/8.xuxiong/EQA2022/20221019/All.vep.vcf -Oz -o All.modifier.vcf.gz; tabix -fp vcf All.modifier.vcf.gz;
取交集(需要取交集的文件需要预先排序并建索引)
bcftools isec /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.vcf.gz /bi/8.xuxiong/database/ClinVar.HGMD.Local.SpliceAI.whitelist.V20211010.vcf.gz -n =2 -w 1 -Ov | less -S
取并集,需两个vcf均有索引,且排序(需要合并的文件需要预先排序并建索引)
bcftools concat -a /bi/8.xuxiong/EQA2022/20221019/All.sp.homologous.vcf.gz /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.vcf.gz -Ov|less -S
按家系拆分joint calling的vcf文件
bcftools split /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf -Ov -o . -G pedGroupsVEP.tsv;
按样本拆分出joint calling的vcf文件所有样本vcf文件
bcftools query -l /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf | bcftools split -S /dev/stdin -Oz -o . /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf
根据/bi/database/gene_region.bed.gz(需先 tabix -fp bed /bi/database/gene_region.bed.gz 建立索引)注释vcf文件,添加到INFO/GENE
bcftools annotate -a /bi/database/gene_region.bed.gz -c CHROM,FROM,TO,GENE -H ‘##INFO=<id=gene,number=1,type=string,description="gene less=""></id=gene,number=1,type=string,description="gene>
将注释上基因后的vcf按-f格式转成tsv格式
bcftools annotate -a /bi/database/gene_region.bed.gz -c CHROM,FROM,TO,GENE -H ‘##INFO=<id=gene,number=1,type=string,description="gene bcftools="" query="" -f="" less=""></id=gene,number=1,type=string,description="gene>
计算每个样本每个位点的VAF,并转成tsv格式
bcftools fill-tags All.raw.QCflt.vcf — -t FORMAT/VAF |bcftools query -H -f ‘%CHROMt%POSt%REFt%ALT[t%VAF]n’|less -S
(bcftools fill-tags /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.flt.vcf.gz — -t FORMAT/VAF |bcftools query -H -f ‘[t%VAF]n’|head -n 1|perl -ne ‘s/^#s*|[d ]|:VAF//g;print “chrtstarttendtallelet$_”;’; bcftools fill-tags /bi/8.xuxiong/EQA2022/20221019/All.sp.nonHomo.flt.vcf.gz — -t FORMAT/VAF |bcftools query -f ‘%CHROMt%POSt%ENDt%REF/%ALT[t%VAF]n’) | less -S
仅挑选指定WES22070248.bam的vcf
bcftools view -s WES22070248.bam /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf|less -S
仅挑选指定WES22070248.bam的vcf,且 过滤掉纯合野生型和no call的位点,仅挑选INFO中的CSQ字段,left-align indel、去重、uniq
bcftools view -s WES22070248.bam /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf|bcftools view -e ‘FMT/GT[0]==”RR” || FMT/GT[0]==”mis”‘ | bcftools annotate -x ^INFO/CSQ |bcftools norm -d none|less -S
列出所有样本
bcftools query -l /bi/8.xuxiong/EQA2022/20221019/All.vep.flt.vcf
按指定格式设置位点的ID,在前面增加’ ’,将保留原有的ID
bcftools annotate —set-id ‘%CHROM-%POS-%REF-%FIRST_ALT’ /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf|less -S bcftools annotate —set-id ’%CHROM-%POS-%REF-%FIRST_ALT’ /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf|less -S
仅挑选 INFO/AF、INFO/AC、INFO/AN字段输出vcf
bcftools annotate -x ^INFO/AF,^INFO/AC,^INFO/AN /bi/8.xuxiong/EQA2022/20221019/All.raw.QCflt.vcf|less -S
解压 vcf.gz(zcat) ,向vcf文件header增加contig信息(awk),将染色体按指定的映射关系重命名(bcftools annotate —rename-chrs )且仅挑选INFO/CLASS字段,按指定格式输出(bcftools query -f ),统计最后一列排序后输出频数
zcat ~/data/database/2021Q4/HGMD_Pro_2021.4_hg19.vcf.gz | awk ‘/^##contig/ {next} /^#CHROM/ {printf(“##contig= bcftools annotate —rename-chrs /bi/database/chr-names.txt -x ^INFO/CLASS | bcftools query -f “%CHROMt%POSt%IDt%REFt%ALTt%QUALt%FILTERt%INFO/CLASSn”| awk -F”t” ‘{print $NF}’|sort|uniq -c