单细胞转录组相关文章大家看到很多了,最近三五年基本上是CNS井喷状态,但是这个时间段单细胞基因组测序却一直少有人问津。
我猜测主要是因为单细胞基因组测序的分析手段有点少,看拷贝数变异就足够了,比如发表于2022年6月份的nature的文章:《cGAS–STING drives the IL-6-dependent survival of chromosomally instable cancers》,里面就有 shallow single-cell whole-genome sequencing. 数据在 https://www.ebi.ac.uk/ena/browser/view/PRJEB49800 ,很容易下载里面的测序fastq数据:
代码语言:javascript复制BB160120_IV_001_ATAGAT_L002_R1_001.fastq.gz
BB160120_IV_002_TATAGA_L002_R1_001.fastq.gz
BB160120_IV_003_ATATGA_L002_R1_001.fastq.gz
BB160120_IV_004_TAGATA_L002_R1_001.fastq.gz
BB160120_IV_005_AGTATA_L002_R1_001.fastq.gz
BB160120_IV_006_ATAGTA_L002_R1_001.fastq.gz
BB160120_IV_007_TCACAT_L002_R1_001.fastq.gz
BB160120_IV_008_CTACAT_L002_R1_001.fastq.gz
BB160120_IV_009_CATCAT_L002_R1_001.fastq.gz
因为是单细胞基因组测序,所以目前主流的分析是看拷贝数变异就足够了,而且分辨率非常之低,基本上以染色体为单位的拷贝数增加和减少,如下所示:
主流的分析是看拷贝数变异就足够了
让我们看看单细胞基因组测序数据分析方法:
数据分析方法
作者提供了github代码:https://github.com/mschubert/cgas_ko ,蛮有意思的,看文章里面的描述他们的测序数据不需要走Linux里面的比对软件了,文章直接就走了这个 AneuFinder
包:
Copy-number detection in single-cell whole genome sequencing (scWGS) and Strand-seq data using a Hidden Markov Model or binary bisection method.
它的bioconductor官网是:https://bioconductor.org/packages/release/bioc/html/AneuFinder.html
我看了看用法,其实是从bam文件开始:
代码语言:javascript复制library(AneuFinder)
## Load human genome
library(BSgenome.Hsapiens.UCSC.hg19)
## Get a BAM file for the estimation of quality scores (adjust this to your experiment)
bamfile <- system.file("extdata", "BB150803_IV_074.bam", package="AneuFinderData")
也就是说,它所基于的bam文件仍然是需要走上游的Linux环境里面的比对工具,使用起来非常简单:
代码语言:javascript复制library(AneuFinder)
## First, get some data and reference files (adjust this to your experiment)
var.width.ref <- system.file("extdata", "hg19_diploid.bam.bed.gz", package="AneuFinderData")
blacklist <- system.file("extdata", "blacklist-hg19.bed.gz", package="AneuFinderData")
datafolder <- system.file("extdata", "B-ALL-B", package = "AneuFinderData")
list.files(datafolder) # only 3 cells for demonstration purposes
## Library for GC correction
library(BSgenome.Hsapiens.UCSC.hg19)
## Produce output files
outputfolder <- tempdir()
Aneufinder(inputfolder = datafolder, outputfolder = outputfolder, assembly = 'hg19',
numCPU = 3, binsizes = c(5e5, 1e6), variable.width.reference = var.width.ref,
chromosomes = c(1:22,'X','Y'), blacklist = blacklist, correction.method = 'GC',
GC.BSgenome = BSgenome.Hsapiens.UCSC.hg19, refine.breakpoints=FALSE,
method = 'edivisive')
学徒作业
从前面的PRJEB49800 挑选10个样品,比对到参考基因组拿到bam文件后,走这个Aneufinder流程,试试看绘制文章里面的拷贝数变异全景图(热图)