在运行RSeQC软件对转录组的比对好的bam文件进行质控的时候,我发现了一个很无语的现象,就是它模仿fastqc的4个质控图里面,GC含量分布,ATCG碱基比例没有问题,但是画碱基质量图的时候,所有样本都是空的,如下:
我打开了软件出图的R代码才发现,真的好原始啊!!!
代码语言:javascript复制pdf('1_read_quality.qual.boxplot.pdf')
p0<-rep(c(2,12,22,27,32,37,41),times=c(502,1062718,8073,1306425,35457919,542565,272330)/1)
p1<-rep(c(2,12,22,27,32,37,41),times=c(2857,1133839,10183,1348200,34597142,1162560,395751)/1)
p2<-rep(c(2,12,22,27,32,37,41),times=c(1531,1055310,16489,1310029,8109840,27627266,530067)/1)
p3<-rep(c(2,12,22,27,32,37,41),times=c(13859,960145,121042,1091350,3551363,32128720,784053)/1)
p4<-rep(c(2,12,22,27,32,37,41),times=c(1,831342,222717,811763,2444363,32816629,1523717)/1)
p5<-rep(c(12,22,27,32,37,41),times=c(756725,269357,693807,2111482,5112150,29707011)/1)
p6<-rep(c(2,12,22,27,32,37,41),times=c(383,714875,267381,623622,1782376,4452520,30809375)/1)
p7<-rep(c(2,12,22,27,32,37,41),times=c(3,699425,352083,576246,1676829,4164567,31181379)/1)
p8<-rep(c(2,12,22,27,32,37,41),times=c(33,719293,393174,559110,1597976,4062411,31318535)/1)
p9<-rep(c(2,12,22,27,32,37,41),times=c(33,671366,400498,557291,1556259,3951524,31513561)/1)
# 此处删除60行
boxplot(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,xlab="Position of Read(5'->3')",ylab="Phred Quality Score",outline=F)
dev.off()
pdf('1_read_quality.qual.heatmap.pdf')
上面的67个循环,代码就构建了67个长度为2千万的向量,对这两千万的向量画boxplot,一个向量内存约200多M,R语言本身如此低效,怪不得我都没有出图,肯定是内存溢出,挂掉了。
后记
粉丝们,对于这个绘图代码,考考你们,可以提出哪些优化建议!
可以使用我们一直讲解的airway转录组数据集来作为例子!https://space.bilibili.com/338686099/#/