对于DNA数据和蛋白质数据的分析和可视化一般大家都不会考虑R语言,但是还是有学者开发了在R语言的DNA和蛋白质数据的分析和可视化。那就是R包seqinr。这个包包含的函数数量也是我见过的最多的了,当然啦,人外有人,天外有天,更多的我还没见过。今天我们就来介绍下这个庞大的R包。
首先我们看下其函数构成:
经过深思熟虑,我觉得还是不贴出来,有点多,我们还是直接讲讲怎么用吧。
其实它里面分为两块:蛋白质分析和DNA分析。我们就不去挨个讲解每个函数的功能了,我们今天主要看下其中的可视化部分。
- 蛋白质中氨基酸的一个物理化学分类可视化图的绘制
函数AAstat()主要是对氨基酸的统计,统计主要是通过其理化性质的分类进行分类。并且可以将其分类信息进行可视化展示:
seqAA <- read.fasta(file =system.file("sequences/seqAA.fasta", package = "seqinr"),seqtype = "AA")#seqtype也可以是DNA序列。
AAstat(seqAA[[1]])
运行结果如下:
2. 在我们蛋白质电泳分析过程中,RFU需要一个基线。那么,我们下面这个函数就是用来评估基准值的函数:
baselineabif(rfu, maxrfu = 1000)
通过baseline()我们可以确定基准值,接下来就是实现对数据的一个可视化,我们就以R包自带数据进行实例化:
data(JLO)
rfu <- JLO$Data$DATA.1
bl <- baselineabif(rfu)
plot(1:length(rfu), rfu, type ="l", xlab = "Time [datapoint units]",ylab = "Signal[RFU]", main = "Example of baseline estimates")
abline(h = bl, col="red", lty =2)
legend("topright", inset = 0.02,"Baseline estimate", lty = 2, col = "red")
运行结果如下:
3. circle()绘制图形,其中主要参数是theta指的是起始结束角度。样例程序如下
par(mfrow = c(2, 2), mar = c(0,0,2,0))
setup <- function(){
plot.new()
plot.window(xlim = c(-1,1),ylim = c(-1,1), asp = 1)
}
setup()
circle(col = "lightblue")
title(main = "theta = c(0, 360)")
setup()
circle(col = "lightblue", theta = c(0, 270))
title(main = "theta = c(0, 270)")
setup()
circle(col = "lightblue", theta = c(-90, 180))
title(main = "theta = c(-90, 180)")
setup()
n <- 20
for(i in seq(0, 360, length = n)){
circle(col ="lightblue", theta = c(i, i 360/(2*n)))
}
title(main = "many thetas")
运行结果:
4. dotPlot()主要是以两个序列中对应位置为坐标点进行绘制点图
dotPlot(letters, rev(letters), main = "Inversion")
5. draw.recstat()绘制CA计算的值还有起始密码子位置进行标定.
样例程序如下:
ff <- system.file("sequences/ECOUNC.fsa", package ="seqinr")
seq <- read.fasta(ff)
rec <- recstat(seq[[1]], seqname = getName(seq))
draw.recstat(rec)
6. plotladder()等位基因分型标记物可视化展示,主要目的是利用分型标记物去对未知样本进行基因型的确定.
在这里我们利用样例进行展示:
data(ECH)
#
# Extract from internal sizestandard chanel number 5 the location
# of 14 peaks:
#
ECH.maxis <- peakabif(ECH, 5, npeak = 14, tmin = 2.7, thres =0.1, fig = FALSE)$maxis
#
# Load data about theexpected size of peaks in bp for calibration:
#
data(gs500liz)
lizbp <- gs500liz$liz # All peaks size in bp
lizbp[!gs500liz$mask1 | !gs500liz$mask2] <- NA # Mark uselesspeaks
lizbp <- lizbp[-c(1,2)] # The first two peaks are not extractedfrom ECH
ECH.calibr <- splinefun(ECH.maxis[!is.na(lizbp)],lizbp[!is.na(lizbp)])#构建拟合函数
#
# Show the allelic ladderfor the 4 dyes:
#
plotladder(ECH, 1, ECH.calibr, tmin = 3.1, thres = 0.3, fig = FALSE)
7. plotabif()对ABIF数据的电泳图谱展示,样例如下:
plotabif(ECH,chanel = 1, tmin = 3.2, tmax = 6.1)
8. plotPanels()展示STR试剂盒中各位点的扩增情况.样例程序如下:path1 <- system.file("abif/AmpFLSTR_Panels_v1.txt", package = "seqinr")res1 <- readPanels(path1)
par(mfrow = c(2,1))
plotPanels("Identifiler_v1", res1)plotPanels("SEfiler_v1", res1)
9. consensus()利用很多方法进行序列对比的函数,其中可以是DNA序列也可以是蛋白质序列.
phylip <- read.alignment(file = system.file("sequences/test.phylip",
package ="seqinr"), format = "phylip")
res <- consensus(phylip, method = "profile")
bxc <- barplot(res, col =c("green", "blue", "orange", "white","red"), border = NA,
space = 0, las = 2, ylab ="Base count",
main = "Profile of aDNA sequence alignment",
xlab = "sequenceposition", xaxs = "i")
text(x = bxc, y =par("usr")[4],lab = res.thr, pos = 3, xpd = NA)
text(x = bxc, y =par("usr")[1],lab = res.iup, pos = 1, xpd = NA)