下面是100个lncRNA组装流程的软件的笔记教程
BEDTools是可用于genomic features的比较,相关操作及进行注释的工具。而genomic features通常使用Browser Extensible Data (BED) 或者 General Feature Format (GFF)文件表示,用UCSC Genome Browser进行可视化比较。bedtools总共有二三十个工具/命令来处理基因组数据。
代码语言:javascript复制 intersect Find overlapping intervals in various ways.
window Find overlapping intervals within a window around an interval.
closest Find the closest, potentially non-overlapping interval.
coverage Compute the coverage over defined intervals.
map Apply a function to a column for each overlapping interval.
genomecov Compute the coverage over an entire genome.
merge Combine overlapping/nearby intervals into a single interval.
cluster Cluster (but don't merge) overlapping/nearby intervals.
complement Extract intervals _not_ represented by an interval file.
shift Adjust the position of intervals.
subtract Remove intervals based on overlaps b/w two files.
slop Adjust the size of intervals.
flank Create new intervals from the flanks of existing intervals.
sort Order the intervals in a file.
random Generate random intervals in a genome.
shuffle Randomly redistribute intervals in a genome.
sample Sample random records from file using reservoir sampling.
spacing Report the gap lengths between intervals in a file.
annotate Annotate coverage of features from multiple files.
比较典型而且常用的功能举例如下:
代码语言:javascript复制格式转换,bam转bed(bamToBed),bed转其他格式(bedToBam,bedToIgv);
对基因组坐标的逻辑运算,包括:交集(intersectBed,windowBed),”邻集“(closestBed),补集(complementBed),并集(mergeBed),差集(subtractBed);
计算覆盖度(coverage)(coverageBed,genomeCoverageBed);
一、软件安装
使用conda安装
代码语言:javascript复制conda install bedtools
二、bedtools window 的用法
安装完成以后,可以使用bedtools window -h来查看软件的帮助文档。
1. 软件用法:
2. 常用参数:
image-20210505132248543
三、输入文件
代码语言:javascript复制bed/gff/vcf文件
四、软件运行命令
与bedtools intersect类似,window 在A和B中搜索重叠的特征。
However, window adds a specified number (1000, by default) of base pairs upstream and downstream of each feature in A. In effect, this allows features in B that are “near” features in A to be detected.
代码语言:javascript复制bedtools window -a DEL.gtf
-b protein_coding_gene.gtf
-l 10000 -r 10000 > test.txt
参数说明:
代码语言:javascript复制-a DEL.gtf -b protein_coding_gene.gtf # 把DEL.gtf比对到protein_coding_gene.gtf寻找overlap
-l 10000 # 寻找范围,上游10000bp
-r 10000 #寻找范围,下游10000bp
五、输出结果
代码语言:javascript复制chr1 8416627 8422722 transcript_id "MSTRG.299.44" chr1 8352397 8848921 -
gene_name "RERE"
chr1 16142499 16142858 transcript_id "MSTRG.518.1" chr1 16124337 16156069 -
gene_name "EPHA2"
chr1 20981406 20984251 transcript_id "MSTRG.624.1" chr1 20806292 21176888 - gene_name "EIF4G3"
chr1 39634613 39639494 transcript_id "MSTRG.1249.4" chr1 39623435 39639643 gene_name "HEYL"
chr1 44423896 44512709 transcript_id "MSTRG.1392.8" chr1 44405194 44651724 gene_name "RNF220"
chr1 53720323 53734052 transcript_id "MSTRG.1665.7" chr1 53506237 53738106 - gene_name "GLIS1"