回顾
Monocle2 踩坑教程(1)
差异分析
差异基因表达分析是RNA-Seq实验中的一项常见任务。Monocle可以帮助你找到不同细胞群间差异表达的基因,并评估这些变化的统计显著性。这些比较要求您有一种方法将细胞收集到两个或更多组中。这些组由每个CellDataSet的表现型数据表中的列定义。Monocle将评估不同细胞群中每个基因表达水平的显著性。
代码语言:javascript复制marker_genes <- row.names(subset(fData(HSMM),
gene_short_name %in% c("MEF2C", "MEF2D", "MYF5",
"ANPEP", "PDGFRA","MYOG",
"TPM1", "TPM2", "MYH2",
"MYH3", "NCAM1", "TNNT1",
"TNNT2", "TNNC1", "CDK1",
"CDK2", "CCNB1", "CCNB2",
"CCND1", "CCNA1", "ID1")))
diff_test_res <- differentialGeneTest(HSMM[marker_genes,],
fullModelFormulaStr = "~percent.mt")
> head(diff_test_res)
status family pval qval gene_short_name num_cells_expressed use_for_ordering
MEF2D OK negbinomial.size 0.10544931 0.4590520 MEF2D 1 FALSE
CCNB1 OK negbinomial.size 0.45277970 0.7043240 CCNB1 1 FALSE
MEF2C OK negbinomial.size 0.28733226 0.5028315 MEF2C 2 FALSE
TPM2 OK negbinomial.size 0.94348541 0.9696949 TPM2 0 FALSE
CDK1 OK negbinomial.size 0.12768261 0.4590520 CDK1 0 FALSE
CCND1 OK negbinomial.size 0.04021331 0.4590520 CCND1 0 FALSE
MYOG_ID1 <- HSMM[row.names(subset(fData(HSMM),
gene_short_name %in% c("CCND1", "CCNB2"))),]
plot_genes_jitter(MYOG_ID1, grouping = "seurat_clusters", ncol= 2)
选择对区分细胞类型有意义的基因(Finding Genes that Distinguish Cell Type or State )
代码语言:javascript复制#Finding Genes that Distinguish Cell Type or State
to_be_tested <- row.names(subset(fData(HSMM),
gene_short_name %in% c("CDC20", "NCAM1", "ANPEP")))
cds_subset <- HSMM[to_be_tested,]
diff_test_res <- differentialGeneTest(cds_subset,
fullModelFormulaStr = "~CellType")
diff_test_res[,c("gene_short_name", "pval", "qval")]
plot_genes_jitter(cds_subset,
grouping = "CellType",
color_by = "CellType",
nrow= 1,
ncol = NULL,
plot_trend = TRUE)
代码语言:javascript复制full_model_fits <-
fitModel(cds_subset, modelFormulaStr = "~CellType")
reduced_model_fits <- fitModel(cds_subset, modelFormulaStr = "~1")
diff_test_res <- compareModels(full_model_fits, reduced_model_fits)
diff_test_res
status family pval qval
CDC20 OK negbinomial.size 0.007822883 0.02346865
NCAM1 OK negbinomial.size 0.791131484 0.88906005
ANPEP OK negbinomial.size 0.889060052 0.88906005
Monocle中的差异分析过程非常灵活:您在测试中使用的模型公式可以包含pData表中作为列存在的任何项,包括Monocle在其他分析步骤中添加的列。例如,如果您使用集群细胞,您可以使用集群作为模型公式来测试cluster之间不同的基因。
Finding Genes that Change as a Function of Pseudotime
Monocle的主要工作是将细胞按照生物过程(如细胞分化)的顺序排列,而不事先知道要查看哪些基因。一旦这样做了,你就可以分析细胞,找到随着细胞进步而变化的基因。例如,你可以发现当细胞“成熟”时,基因显著上调。
代码语言:javascript复制to_be_tested <- row.names(subset(fData(HSMM),
gene_short_name %in% c("MYH3", "MEF2C", "CCNB2", "TNNT1")))
cds_subset <- HSMM[to_be_tested,]
diff_test_res <- differentialGeneTest(cds_subset,
fullModelFormulaStr = "~sm.ns(Pseudotime)")
diff_test_res[,c("gene_short_name", "pval", "qval")]
gene_short_name pval qval
MEF2C MEF2C 2.996841e-02 3.995789e-02
CCNB2 CCNB2 5.328721e-03 1.065744e-02
MYH3 MYH3 6.371156e-01 6.371156e-01
TNNT1 TNNT1 1.634704e-12 6.538818e-12
plot_genes_in_pseudotime(cds_subset, color_by = "seurat_clusters")
Clustering Genes by Pseudotemporal Expression Pattern
在研究时间序列基因表达研究时,一个常见的问题是:“哪些基因遵循相似的动力学趋势?”Monocle可以通过将具有相似趋势的基因分组来帮助你回答这个问题,这样你就可以分析这些基因组,看看它们有什么共同之处。Monocle提供了一种方便的方法来可视化所有伪时间相关的基因。函数plot_pseudotime_heatmap接受一个CellDataSet对象(通常只包含重要基因的子集),并生成与plot_genes_in_pseudotime类似的平滑表达曲线.然后,它将这些基因聚类并使用pheatmap包绘制它们。这允许您可视化基因模块,这些模块在伪时间内共同变化。
注意下面的num_clusters 指的是基因可以聚成几个类,而不是细胞。
代码语言:javascript复制diff_test_res <- differentialGeneTest(HSMM[marker_genes,],
fullModelFormulaStr = "~sm.ns(Pseudotime)")
sig_gene_names <- row.names(subset(diff_test_res, qval < 0.1))
plot_pseudotime_heatmap(HSMM[sig_gene_names,],
num_clusters = 6,
cores = 1,
show_rownames = T)
Multi-Factorial Differential Expression Analysis
Monocle可以在多个因素存在的情况下进行差异分析,这可以帮助你减去一些因素来看到其他因素的影响。在下面的简单例子中,Monocle测试了三个基因在成肌细胞和成纤维细胞之间的差异表达,同时减去percent.mt的影响。为此,我们必须同时指定完整模型和简化模型。完整的模型同时捕捉细胞类型和percent.mt的影响。
代码语言:javascript复制to_be_tested <-
row.names(subset(fData(HSMM),
gene_short_name %in% c("TPM1", "MYH3", "CCNB2", "GAPDH")))
cds_subset <- HSMM[to_be_tested,]
diff_test_res <- differentialGeneTest(cds_subset,
fullModelFormulaStr = "~CellType percent.mt",
reducedModelFormulaStr = "~percent.mt")
diff_test_res[,c("gene_short_name", "pval", "qval")]
gene_short_name pval qval
GAPDH GAPDH 0.07990737 0.1598147
CCNB2 CCNB2 0.04148172 0.1598147
TPM1 TPM1 0.90861287 0.9086129
MYH3 MYH3 0.77085745 0.9086129
plot_genes_jitter(cds_subset,
grouping = "seurat_clusters", color_by = "CellType", plot_trend = TRUE)
facet_wrap( ~ feature_label, scales= "free_y")
Analyzing Branches in Single-Cell Trajectories
通常,单细胞的轨迹包括分支。分支的产生是因为细胞执行替代基因表达程序。在发育过程中,当细胞做出命运的选择时,分支就会出现在轨迹中:一个发育谱系沿着一条路径前进,而另一个谱系产生第二条路径。Monocle包含用于分析这些分支事件的广泛功能。
芭芭拉·特雷特琳和她的同事们在史蒂夫·奎克的实验室里进行了一项实验,他们从正在发育的老鼠肺中获取细胞。他们在细胞发育的早期捕获细胞,之后当肺包含两种主要类型的上皮细胞(AT1和AT2),以及即将决定成为AT1或AT2的细胞时。Monocle可以将这个过程重构为一个分支轨迹,让你可以非常详细地分析决策点。下图显示了使用Monocle的一些数据重建的一般方法。有一个单独的分支,标记为“1”。当细胞从树的左上方通过树枝的早期发育阶段通过时,哪些基因发生了变化?哪些基因在分支间有差异表达?为了回答这个问题,Monocle为您提供了一个特殊的统计测试:分支表达式分析建模,或BEAM。
代码语言:javascript复制lung <- load_lung()
plot_cell_trajectory(lung, color_by = "Time")
# 图如下
BEAM_res <- BEAM(lung, branch_point = 1, cores = 1)
BEAM_res <- BEAM_res[order(BEAM_res$qval),]
BEAM_res <- BEAM_res[,c("gene_short_name", "pval", "qval")]
使用一种特殊类型的热图,您可以可视化所有明显依赖于分支的基因的变化。这张热图同时显示了两种血统的变化。它还要求您选择要检查的分支点。列是伪时间中的点,行是基因,伪时间的开始在热图的中间。当你从热图的中间读到右边的时候,你正在跟随一个伪时间谱系。当你读到左边时,另一个。这些基因是分层聚类的,因此您可以可视化具有类似的依赖于序列的基因模块.
代码语言:javascript复制plot_genes_branched_heatmap(lung[row.names(subset(BEAM_res,
qval < 1e-4)),],
branch_point = 1,
num_clusters = 4,
cores = 1,
use_gene_short_name = T,
show_rownames = T)
代码语言:javascript复制lung_genes <- row.names(subset(fData(lung),
gene_short_name %in% c("Ccnd2", "Sftpb", "Pdpn")))
plot_genes_branched_pseudotime(lung[lung_genes,],
branch_point = 1,
color_by = "Time",
ncol = 1)
代码语言:javascript复制> str(HSMM)
Formal class 'CellDataSet' [package "monocle"] with 19 slots
..@ reducedDimS : num [1:2, 1:2638] -2.613 -0.652 -1.784 0.745 -2.376 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2638] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG" ...
..@ reducedDimW : num [1:281, 1:2] -0.0384 0.0812 -0.0465 0.0681 0.0655 ...
..@ reducedDimA : num [1:2, 1:2638] 13.57 29.4 -2.74 -25.16 -9.3 ...
..@ reducedDimK : num [1:2, 1:126] -2.305 -0.692 -1.711 0.538 -1.935 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
..@ minSpanningTree :List of 10
.. ..$ :List of 1
.. .. ..$ Y_1: 'igraph.vs' Named int [1:2] 47 101
.. .. .. ..- attr(*, "names")= chr [1:2] "Y_47" "Y_101"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..$ :List of 1
.. .. ..$ Y_2: 'igraph.vs' Named int [1:2] 62 90
.. .. .. ..- attr(*, "names")= chr [1:2] "Y_62" "Y_90"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..$ :List of 1
.. .. ..$ Y_3: 'igraph.vs' Named int [1:2] 90 126
.. .. .. ..- attr(*, "names")= chr [1:2] "Y_90" "Y_126"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..$ :List of 1
.. .. ..$ Y_4: 'igraph.vs' Named int 106
.. .. .. ..- attr(*, "names")= chr "Y_106"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..$ :List of 1
.. .. ..$ Y_5: 'igraph.vs' Named int [1:2] 36 65
.. .. .. ..- attr(*, "names")= chr [1:2] "Y_36" "Y_65"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..$ :List of 1
.. .. ..$ Y_6: 'igraph.vs' Named int [1:2] 39 53
.. .. .. ..- attr(*, "names")= chr [1:2] "Y_39" "Y_53"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..$ :List of 1
.. .. ..$ Y_7: 'igraph.vs' Named int [1:2] 34 98
.. .. .. ..- attr(*, "names")= chr [1:2] "Y_34" "Y_98"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..$ :List of 1
.. .. ..$ Y_8: 'igraph.vs' Named int [1:2] 49 126
.. .. .. ..- attr(*, "names")= chr [1:2] "Y_49" "Y_126"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..$ :List of 1
.. .. ..$ Y_9: 'igraph.vs' Named int [1:2] 42 93
.. .. .. ..- attr(*, "names")= chr [1:2] "Y_42" "Y_93"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..$ :List of 1
.. .. ..$ Y_10: 'igraph.vs' Named int [1:2] 11 25
.. .. .. ..- attr(*, "names")= chr [1:2] "Y_11" "Y_25"
.. .. .. ..- attr(*, "env")=<weakref>
.. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
.. ..- attr(*, "class")= chr "igraph"
..@ cellPairwiseDistances: num [1:126, 1:126] 0 1.37 1.41 10.62 5.52 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
.. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
..@ expressionFamily :Formal class 'vglmff' [package "VGAM"] with 22 slots
.. .. ..@ blurb : chr [1:6] "Negative-binomial distribution with size knownnn" "Links: " "loglink(mu)" "n" ...
.. .. ..@ constraints : expression({ constraints <- cm.zero.VGAM(constraints, x = x, NULL, M = M, predictors.names = predictors.names, M1 = 2) })
.. .. ..@ deviance :function ()
.. .. ..@ fini : expression({ })
.. .. ..@ first : expression({ })
.. .. ..@ infos :function (...)
.. .. ..@ initialize : expression({ M1 <- 1 temp5 <- w.y.check(w = w, y = y, Is.nonnegative.y = TRUE, Is.integer.y = TRUE, ncol.w.max | __truncated__
.. .. ..@ last : expression({ misc$link <- rep_len("loglink", NOS) names(misc$link) <- mynames1 misc$earg <- vector("list", M) | __truncated__
.. .. ..@ linkfun :function ()
.. .. ..@ linkinv :function (eta, extra = NULL)
.. .. ..@ loglikelihood :function (mu, y, w, residuals = FALSE, eta, extra = NULL, summation = TRUE)
.. .. ..@ middle : expression({ })
.. .. ..@ middle2 : expression({ })
.. .. ..@ summary.dispersion: logi(0)
.. .. ..@ vfamily : chr "negbinomial.size"
.. .. ..@ validparams :function (eta, y, extra = NULL)
.. .. ..@ validfitted :function ()
.. .. ..@ simslot :function (object, nsim)
.. .. ..@ hadof :function ()
.. .. ..@ charfun :function ()
.. .. ..@ deriv : expression({ eta <- cbind(eta) M <- ncol(eta) kmat <- matrix(Inf, n, M, byrow = TRUE) newemu <- list(theta = | __truncated__
.. .. ..@ weight : expression({ ned2l.dmunb2 <- 1/mu - 1/(mu kmat) wz <- ned2l.dmunb2 * dmu.deta^2 c(w) * wz })
..@ lowerDetectionLimit : num 1
..@ dispFitInfo :<environment: 0x000000023e5e8110>
..@ dim_reduce_type : chr "DDRTree"
..@ auxOrderingData :<environment: 0x000000002c5323c8>
..@ auxClusteringData :<environment: 0x00000000240fc8a0>
..@ experimentData :Formal class 'MIAME' [package "Biobase"] with 13 slots
.. .. ..@ name : chr ""
.. .. ..@ lab : chr ""
.. .. ..@ contact : chr ""
.. .. ..@ title : chr ""
.. .. ..@ abstract : chr ""
.. .. ..@ url : chr ""
.. .. ..@ pubMedIds : chr ""
.. .. ..@ samples : list()
.. .. ..@ hybridizations : list()
.. .. ..@ normControls : list()
.. .. ..@ preprocessing : list()
.. .. ..@ other : list()
.. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. .. .. ..@ .Data:List of 2
.. .. .. .. .. ..$ : int [1:3] 1 0 0
.. .. .. .. .. ..$ : int [1:3] 1 1 0
..@ assayData :<environment: 0x000000005575c7a0>
..@ phenoData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
.. .. ..@ varMetadata :'data.frame': 17 obs. of 1 variable:
.. .. .. ..$ labelDescription: chr [1:17] NA NA NA NA ...
.. .. ..@ data :'data.frame': 2638 obs. of 17 variables:
.. .. .. ..$ orig.ident : Factor w/ 1 level "pbmc3k": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ nCount_RNA : num [1:2638] 2419 4903 3147 2639 980 ...
.. .. .. ..$ nFeature_RNA : int [1:2638] 779 1352 1129 960 521 781 782 790 532 550 ...
.. .. .. ..$ percent.mt : num [1:2638] 3.02 3.79 0.89 1.74 1.22 ...
.. .. .. ..$ RNA_snn_res.0.5 : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
.. .. .. ..$ seurat_clusters : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
.. .. .. ..$ Size_Factor : num [1:2638] 1.108 2.245 1.441 1.208 0.449 ...
.. .. .. ..$ num_genes_expressed : int [1:2638] 126 181 149 153 34 103 104 112 64 51 ...
.. .. .. ..$ Cluster : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
.. .. .. ..$ peaks : logi [1:2638] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. .. ..$ halo : logi [1:2638] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. .. ..$ delta : num [1:2638] 0.401 0.442 1.031 0.205 0.812 ...
.. .. .. ..$ rho : num [1:2638] 36 33.3 44.8 47.7 31.7 ...
.. .. .. ..$ nearest_higher_density_neighbor: logi [1:2638] NA NA NA NA NA NA ...
.. .. .. ..$ Pseudotime : num [1:2638] 12.277 11.115 13.023 0.723 16.159 ...
.. .. .. ..$ State : Factor w/ 5 levels "1","2","3","4",..: 5 2 5 1 5 4 5 5 5 1 ...
.. .. .. ..$ CellType : Factor w/ 3 levels "CDC20","GABPB2",..: 3 3 3 3 3 3 3 3 3 3 ...
.. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns"
.. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. .. .. ..@ .Data:List of 1
.. .. .. .. .. ..$ : int [1:3] 1 1 0
..@ featureData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
.. .. ..@ varMetadata :'data.frame': 3 obs. of 1 variable:
.. .. .. ..$ labelDescription: chr [1:3] NA NA NA
.. .. ..@ data :'data.frame': 13714 obs. of 3 variables:
.. .. .. ..$ gene_short_name : Factor w/ 13714 levels "7SK.2","A1BG",..: 527 715 9394 9395 5971 7345 5750 8158 9747 4917 ...
.. .. .. ..$ num_cells_expressed: int [1:13714] 0 0 0 0 0 1 0 0 0 6 ...
.. .. .. ..$ use_for_ordering : logi [1:13714] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..@ dimLabels : chr [1:2] "featureNames" "featureColumns"
.. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. .. .. ..@ .Data:List of 1
.. .. .. .. .. ..$ : int [1:3] 1 1 0
..@ annotation : chr(0)
..@ protocolData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
.. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable:
.. .. .. ..$ labelDescription: chr(0)
.. .. ..@ data :'data.frame': 2638 obs. of 0 variables
.. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns"
.. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. .. .. ..@ .Data:List of 1
.. .. .. .. .. ..$ : int [1:3] 1 1 0
..@ .__classVersion__ :Formal class 'Versions' [package "Biobase"] with 1 slot
.. .. ..@ .Data:List of 5
.. .. .. ..$ : int [1:3] 3 5 2
.. .. .. ..$ : int [1:3] 2 42 0
.. .. .. ..$ : int [1:3] 1 3 0
.. .. .. ..$ : int [1:3] 1 0 0
.. .. .. ..$ : int [1:3] 1 2 0