lncRNA组装流程的软件介绍之bedtools

2021-07-06 15:52:41 浏览数 (1)

咱们《生信技能树》的B站有一个lncRNA数据分析实战,缺乏配套笔记,所以我们安排了100个lncRNA组装案例文献分享,以及这个流程会用到的100个软件的实战笔记教程

下面是100个lncRNA组装流程的软件的笔记教程

BEDTools是可用于genomic features的比较,相关操作及进行注释的工具。而genomic features通常使用Browser Extensible Data (BED) 或者 General Feature Format (GFF)文件表示,用UCSC Genome Browser进行可视化比较。bedtools总共有二三十个工具/命令来处理基因组数据。

代码语言:javascript复制
    intersect     Find overlapping intervals in various ways.
    window        Find overlapping intervals within a window around an interval.
    closest       Find the closest, potentially non-overlapping interval.
    coverage      Compute the coverage over defined intervals.
    map           Apply a function to a column for each overlapping interval.
    genomecov     Compute the coverage over an entire genome.
    merge         Combine overlapping/nearby intervals into a single interval.
    cluster       Cluster (but don't merge) overlapping/nearby intervals.
    complement    Extract intervals _not_ represented by an interval file.
    shift         Adjust the position of intervals.
    subtract      Remove intervals based on overlaps b/w two files.
    slop          Adjust the size of intervals.
    flank         Create new intervals from the flanks of existing intervals.
    sort          Order the intervals in a file.
    random        Generate random intervals in a genome.
    shuffle       Randomly redistribute intervals in a genome.
    sample        Sample random records from file using reservoir sampling.
    spacing       Report the gap lengths between intervals in a file.
    annotate      Annotate coverage of features from multiple files.

比较典型而且常用的功能举例如下:

代码语言:javascript复制
格式转换,bam转bed(bamToBed),bed转其他格式(bedToBam,bedToIgv);

对基因组坐标的逻辑运算,包括:交集(intersectBed,windowBed),”邻集“(closestBed),补集(complementBed),并集(mergeBed),差集(subtractBed);

计算覆盖度(coverage)(coverageBed,genomeCoverageBed);

一、软件安装

使用conda安装

代码语言:javascript复制
conda install bedtools

二、bedtools window 的用法

安装完成以后,可以使用bedtools window -h来查看软件的帮助文档。

1. 软件用法:

2. 常用参数:

image-20210505132248543

三、输入文件

代码语言:javascript复制
bed/gff/vcf文件

四、软件运行命令

与bedtools intersect类似,window 在A和B中搜索重叠的特征。

However, window adds a specified number (1000, by default) of base pairs upstream and downstream of each feature in A. In effect, this allows features in B that are “near” features in A to be detected.

代码语言:javascript复制
bedtools window -a DEL.gtf 
-b protein_coding_gene.gtf 
-l 10000 -r 10000 > test.txt

参数说明:

代码语言:javascript复制
-a DEL.gtf -b protein_coding_gene.gtf # 把DEL.gtf比对到protein_coding_gene.gtf寻找overlap
-l 10000 # 寻找范围,上游10000bp
-r 10000 #寻找范围,下游10000bp

五、输出结果

代码语言:javascript复制
chr1 8416627 8422722   transcript_id "MSTRG.299.44" chr1 8352397 8848921 -  
gene_name "RERE"
chr1 16142499 16142858   transcript_id "MSTRG.518.1" chr1 16124337 16156069 - 
gene_name "EPHA2"
chr1 20981406 20984251   transcript_id "MSTRG.624.1" chr1 20806292 21176888 - gene_name "EIF4G3"
chr1 39634613 39639494   transcript_id "MSTRG.1249.4" chr1 39623435 39639643 gene_name "HEYL"
chr1 44423896 44512709   transcript_id "MSTRG.1392.8" chr1 44405194 44651724   gene_name "RNF220"
chr1 53720323 53734052   transcript_id "MSTRG.1665.7" chr1 53506237 53738106 -  gene_name "GLIS1"

0 人点赞