Monocle2 踩坑教程(2)

2020-03-30 10:37:20 浏览数 (1)

回顾

Monocle2 踩坑教程(1)

差异分析

差异基因表达分析是RNA-Seq实验中的一项常见任务。Monocle可以帮助你找到不同细胞群间差异表达的基因,并评估这些变化的统计显著性。这些比较要求您有一种方法将细胞收集到两个或更多组中。这些组由每个CellDataSet的表现型数据表中的列定义。Monocle将评估不同细胞群中每个基因表达水平的显著性。

代码语言:javascript复制
marker_genes <- row.names(subset(fData(HSMM),
                                 gene_short_name %in% c("MEF2C", "MEF2D", "MYF5",
                                                        "ANPEP", "PDGFRA","MYOG",
                                                        "TPM1",  "TPM2",  "MYH2",
                                                        "MYH3",  "NCAM1", "TNNT1",
                                                        "TNNT2", "TNNC1", "CDK1",
                                                        "CDK2",  "CCNB1", "CCNB2",
                                                        "CCND1", "CCNA1", "ID1")))

diff_test_res <- differentialGeneTest(HSMM[marker_genes,],
                                      fullModelFormulaStr = "~percent.mt")
> head(diff_test_res)
      status           family       pval      qval gene_short_name num_cells_expressed use_for_ordering
MEF2D     OK negbinomial.size 0.10544931 0.4590520           MEF2D                   1            FALSE
CCNB1     OK negbinomial.size 0.45277970 0.7043240           CCNB1                   1            FALSE
MEF2C     OK negbinomial.size 0.28733226 0.5028315           MEF2C                   2            FALSE
TPM2      OK negbinomial.size 0.94348541 0.9696949            TPM2                   0            FALSE
CDK1      OK negbinomial.size 0.12768261 0.4590520            CDK1                   0            FALSE
CCND1     OK negbinomial.size 0.04021331 0.4590520           CCND1                   0            FALSE

MYOG_ID1 <- HSMM[row.names(subset(fData(HSMM),
                                      gene_short_name %in% c("CCND1", "CCNB2"))),]
plot_genes_jitter(MYOG_ID1, grouping = "seurat_clusters", ncol= 2)

选择对区分细胞类型有意义的基因(Finding Genes that Distinguish Cell Type or State )

代码语言:javascript复制
#Finding Genes that Distinguish Cell Type or State 
to_be_tested <- row.names(subset(fData(HSMM),
                                 gene_short_name %in% c("CDC20", "NCAM1", "ANPEP")))
cds_subset <- HSMM[to_be_tested,]

diff_test_res <- differentialGeneTest(cds_subset,
                                      fullModelFormulaStr = "~CellType")
diff_test_res[,c("gene_short_name", "pval", "qval")]

plot_genes_jitter(cds_subset,
                  grouping = "CellType",
                  color_by = "CellType",
                  nrow= 1,
                  ncol = NULL,
                  plot_trend = TRUE)
代码语言:javascript复制
full_model_fits <-
  fitModel(cds_subset,  modelFormulaStr = "~CellType")
reduced_model_fits <- fitModel(cds_subset, modelFormulaStr = "~1")
diff_test_res <- compareModels(full_model_fits, reduced_model_fits)
diff_test_res

      status           family        pval       qval
CDC20     OK negbinomial.size 0.007822883 0.02346865
NCAM1     OK negbinomial.size 0.791131484 0.88906005
ANPEP     OK negbinomial.size 0.889060052 0.88906005

Monocle中的差异分析过程非常灵活:您在测试中使用的模型公式可以包含pData表中作为列存在的任何项,包括Monocle在其他分析步骤中添加的列。例如,如果您使用集群细胞,您可以使用集群作为模型公式来测试cluster之间不同的基因。

Finding Genes that Change as a Function of Pseudotime

Monocle的主要工作是将细胞按照生物过程(如细胞分化)的顺序排列,而不事先知道要查看哪些基因。一旦这样做了,你就可以分析细胞,找到随着细胞进步而变化的基因。例如,你可以发现当细胞“成熟”时,基因显著上调。

代码语言:javascript复制
to_be_tested <- row.names(subset(fData(HSMM),
                                 gene_short_name %in% c("MYH3", "MEF2C", "CCNB2", "TNNT1")))
cds_subset <- HSMM[to_be_tested,]

diff_test_res <- differentialGeneTest(cds_subset,
                                      fullModelFormulaStr = "~sm.ns(Pseudotime)")

diff_test_res[,c("gene_short_name", "pval", "qval")]

      gene_short_name         pval         qval
MEF2C           MEF2C 2.996841e-02 3.995789e-02
CCNB2           CCNB2 5.328721e-03 1.065744e-02
MYH3             MYH3 6.371156e-01 6.371156e-01
TNNT1           TNNT1 1.634704e-12 6.538818e-12

plot_genes_in_pseudotime(cds_subset, color_by = "seurat_clusters")

Clustering Genes by Pseudotemporal Expression Pattern

在研究时间序列基因表达研究时,一个常见的问题是:“哪些基因遵循相似的动力学趋势?”Monocle可以通过将具有相似趋势的基因分组来帮助你回答这个问题,这样你就可以分析这些基因组,看看它们有什么共同之处。Monocle提供了一种方便的方法来可视化所有伪时间相关的基因。函数plot_pseudotime_heatmap接受一个CellDataSet对象(通常只包含重要基因的子集),并生成与plot_genes_in_pseudotime类似的平滑表达曲线.然后,它将这些基因聚类并使用pheatmap包绘制它们。这允许您可视化基因模块,这些模块在伪时间内共同变化。

注意下面的num_clusters 指的是基因可以聚成几个类,而不是细胞。

代码语言:javascript复制
diff_test_res <- differentialGeneTest(HSMM[marker_genes,],
                                      fullModelFormulaStr = "~sm.ns(Pseudotime)")
sig_gene_names <- row.names(subset(diff_test_res, qval < 0.1))
plot_pseudotime_heatmap(HSMM[sig_gene_names,],
                        num_clusters = 6,
                        cores = 1,
                        show_rownames = T)

Multi-Factorial Differential Expression Analysis

Monocle可以在多个因素存在的情况下进行差异分析,这可以帮助你减去一些因素来看到其他因素的影响。在下面的简单例子中,Monocle测试了三个基因在成肌细胞和成纤维细胞之间的差异表达,同时减去percent.mt的影响。为此,我们必须同时指定完整模型和简化模型。完整的模型同时捕捉细胞类型和percent.mt的影响。

代码语言:javascript复制
to_be_tested <-
  row.names(subset(fData(HSMM),
                   gene_short_name %in% c("TPM1", "MYH3", "CCNB2", "GAPDH")))

cds_subset <- HSMM[to_be_tested,]

diff_test_res <- differentialGeneTest(cds_subset,
                                      fullModelFormulaStr = "~CellType   percent.mt",
                                      reducedModelFormulaStr = "~percent.mt")
diff_test_res[,c("gene_short_name", "pval", "qval")]

      gene_short_name       pval      qval
GAPDH           GAPDH 0.07990737 0.1598147
CCNB2           CCNB2 0.04148172 0.1598147
TPM1             TPM1 0.90861287 0.9086129
MYH3             MYH3 0.77085745 0.9086129

plot_genes_jitter(cds_subset,
                  grouping = "seurat_clusters", color_by = "CellType", plot_trend = TRUE)  
  facet_wrap( ~ feature_label, scales= "free_y")

Analyzing Branches in Single-Cell Trajectories

通常,单细胞的轨迹包括分支。分支的产生是因为细胞执行替代基因表达程序。在发育过程中,当细胞做出命运的选择时,分支就会出现在轨迹中:一个发育谱系沿着一条路径前进,而另一个谱系产生第二条路径。Monocle包含用于分析这些分支事件的广泛功能。

芭芭拉·特雷特琳和她的同事们在史蒂夫·奎克的实验室里进行了一项实验,他们从正在发育的老鼠肺中获取细胞。他们在细胞发育的早期捕获细胞,之后当肺包含两种主要类型的上皮细胞(AT1和AT2),以及即将决定成为AT1或AT2的细胞时。Monocle可以将这个过程重构为一个分支轨迹,让你可以非常详细地分析决策点。下图显示了使用Monocle的一些数据重建的一般方法。有一个单独的分支,标记为“1”。当细胞从树的左上方通过树枝的早期发育阶段通过时,哪些基因发生了变化?哪些基因在分支间有差异表达?为了回答这个问题,Monocle为您提供了一个特殊的统计测试:分支表达式分析建模,或BEAM。

代码语言:javascript复制
lung <- load_lung()
plot_cell_trajectory(lung, color_by = "Time")

# 图如下

BEAM_res <- BEAM(lung, branch_point = 1, cores = 1)
BEAM_res <- BEAM_res[order(BEAM_res$qval),]
BEAM_res <- BEAM_res[,c("gene_short_name", "pval", "qval")]

使用一种特殊类型的热图,您可以可视化所有明显依赖于分支的基因的变化。这张热图同时显示了两种血统的变化。它还要求您选择要检查的分支点。列是伪时间中的点,行是基因,伪时间的开始在热图的中间。当你从热图的中间读到右边的时候,你正在跟随一个伪时间谱系。当你读到左边时,另一个。这些基因是分层聚类的,因此您可以可视化具有类似的依赖于序列的基因模块.

代码语言:javascript复制
plot_genes_branched_heatmap(lung[row.names(subset(BEAM_res,
                                                  qval < 1e-4)),],
                            branch_point = 1,
                            num_clusters = 4,
                            cores = 1,
                            use_gene_short_name = T,
                            show_rownames = T)

代码语言:javascript复制
lung_genes <- row.names(subset(fData(lung),
                               gene_short_name %in% c("Ccnd2", "Sftpb", "Pdpn")))
plot_genes_branched_pseudotime(lung[lung_genes,],
                               branch_point = 1,
                               color_by = "Time",
                               ncol = 1)

代码语言:javascript复制
> str(HSMM)
Formal class 'CellDataSet' [package "monocle"] with 19 slots
  ..@ reducedDimS          : num [1:2, 1:2638] -2.613 -0.652 -1.784 0.745 -2.376 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2638] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG" ...
  ..@ reducedDimW          : num [1:281, 1:2] -0.0384 0.0812 -0.0465 0.0681 0.0655 ...
  ..@ reducedDimA          : num [1:2, 1:2638] 13.57 29.4 -2.74 -25.16 -9.3 ...
  ..@ reducedDimK          : num [1:2, 1:126] -2.305 -0.692 -1.711 0.538 -1.935 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
  ..@ minSpanningTree      :List of 10
  .. ..$ :List of 1
  .. .. ..$ Y_1: 'igraph.vs' Named int [1:2] 47 101
  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_47" "Y_101"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..$ :List of 1
  .. .. ..$ Y_2: 'igraph.vs' Named int [1:2] 62 90
  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_62" "Y_90"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..$ :List of 1
  .. .. ..$ Y_3: 'igraph.vs' Named int [1:2] 90 126
  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_90" "Y_126"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..$ :List of 1
  .. .. ..$ Y_4: 'igraph.vs' Named int 106
  .. .. .. ..- attr(*, "names")= chr "Y_106"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..$ :List of 1
  .. .. ..$ Y_5: 'igraph.vs' Named int [1:2] 36 65
  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_36" "Y_65"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..$ :List of 1
  .. .. ..$ Y_6: 'igraph.vs' Named int [1:2] 39 53
  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_39" "Y_53"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..$ :List of 1
  .. .. ..$ Y_7: 'igraph.vs' Named int [1:2] 34 98
  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_34" "Y_98"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..$ :List of 1
  .. .. ..$ Y_8: 'igraph.vs' Named int [1:2] 49 126
  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_49" "Y_126"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..$ :List of 1
  .. .. ..$ Y_9: 'igraph.vs' Named int [1:2] 42 93
  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_42" "Y_93"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..$ :List of 1
  .. .. ..$ Y_10: 'igraph.vs' Named int [1:2] 11 25
  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_11" "Y_25"
  .. .. .. ..- attr(*, "env")=<weakref> 
  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
  .. ..- attr(*, "class")= chr "igraph"
  ..@ cellPairwiseDistances: num [1:126, 1:126] 0 1.37 1.41 10.62 5.52 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
  .. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
  ..@ expressionFamily     :Formal class 'vglmff' [package "VGAM"] with 22 slots
  .. .. ..@ blurb             : chr [1:6] "Negative-binomial distribution with size knownnn" "Links:    " "loglink(mu)" "n" ...
  .. .. ..@ constraints       :  expression({  constraints <- cm.zero.VGAM(constraints, x = x, NULL, M = M, predictors.names = predictors.names, M1 = 2) })
  .. .. ..@ deviance          :function ()  
  .. .. ..@ fini              :  expression({ })
  .. .. ..@ first             :  expression({ })
  .. .. ..@ infos             :function (...)  
  .. .. ..@ initialize        :  expression({  M1 <- 1  temp5 <- w.y.check(w = w, y = y, Is.nonnegative.y = TRUE, Is.integer.y = TRUE, ncol.w.max | __truncated__
  .. .. ..@ last              :  expression({  misc$link <- rep_len("loglink", NOS)  names(misc$link) <- mynames1  misc$earg <- vector("list", M) | __truncated__
  .. .. ..@ linkfun           :function ()  
  .. .. ..@ linkinv           :function (eta, extra = NULL)  
  .. .. ..@ loglikelihood     :function (mu, y, w, residuals = FALSE, eta, extra = NULL, summation = TRUE)  
  .. .. ..@ middle            :  expression({ })
  .. .. ..@ middle2           :  expression({ })
  .. .. ..@ summary.dispersion: logi(0) 
  .. .. ..@ vfamily           : chr "negbinomial.size"
  .. .. ..@ validparams       :function (eta, y, extra = NULL)  
  .. .. ..@ validfitted       :function ()  
  .. .. ..@ simslot           :function (object, nsim)  
  .. .. ..@ hadof             :function ()  
  .. .. ..@ charfun           :function ()  
  .. .. ..@ deriv             :  expression({  eta <- cbind(eta)  M <- ncol(eta)  kmat <- matrix(Inf, n, M, byrow = TRUE)  newemu <- list(theta = | __truncated__
  .. .. ..@ weight            :  expression({  ned2l.dmunb2 <- 1/mu - 1/(mu   kmat)  wz <- ned2l.dmunb2 * dmu.deta^2  c(w) * wz })
  ..@ lowerDetectionLimit  : num 1
  ..@ dispFitInfo          :<environment: 0x000000023e5e8110> 
  ..@ dim_reduce_type      : chr "DDRTree"
  ..@ auxOrderingData      :<environment: 0x000000002c5323c8> 
  ..@ auxClusteringData    :<environment: 0x00000000240fc8a0> 
  ..@ experimentData       :Formal class 'MIAME' [package "Biobase"] with 13 slots
  .. .. ..@ name             : chr ""
  .. .. ..@ lab              : chr ""
  .. .. ..@ contact          : chr ""
  .. .. ..@ title            : chr ""
  .. .. ..@ abstract         : chr ""
  .. .. ..@ url              : chr ""
  .. .. ..@ pubMedIds        : chr ""
  .. .. ..@ samples          : list()
  .. .. ..@ hybridizations   : list()
  .. .. ..@ normControls     : list()
  .. .. ..@ preprocessing    : list()
  .. .. ..@ other            : list()
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 2
  .. .. .. .. .. ..$ : int [1:3] 1 0 0
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ assayData            :<environment: 0x000000005575c7a0> 
  ..@ phenoData            :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 17 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr [1:17] NA NA NA NA ...
  .. .. ..@ data             :'data.frame': 2638 obs. of  17 variables:
  .. .. .. ..$ orig.ident                     : Factor w/ 1 level "pbmc3k": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ nCount_RNA                     : num [1:2638] 2419 4903 3147 2639 980 ...
  .. .. .. ..$ nFeature_RNA                   : int [1:2638] 779 1352 1129 960 521 781 782 790 532 550 ...
  .. .. .. ..$ percent.mt                     : num [1:2638] 3.02 3.79 0.89 1.74 1.22 ...
  .. .. .. ..$ RNA_snn_res.0.5                : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
  .. .. .. ..$ seurat_clusters                : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
  .. .. .. ..$ Size_Factor                    : num [1:2638] 1.108 2.245 1.441 1.208 0.449 ...
  .. .. .. ..$ num_genes_expressed            : int [1:2638] 126 181 149 153 34 103 104 112 64 51 ...
  .. .. .. ..$ Cluster                        : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ peaks                          : logi [1:2638] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. .. .. ..$ halo                           : logi [1:2638] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. .. .. ..$ delta                          : num [1:2638] 0.401 0.442 1.031 0.205 0.812 ...
  .. .. .. ..$ rho                            : num [1:2638] 36 33.3 44.8 47.7 31.7 ...
  .. .. .. ..$ nearest_higher_density_neighbor: logi [1:2638] NA NA NA NA NA NA ...
  .. .. .. ..$ Pseudotime                     : num [1:2638] 12.277 11.115 13.023 0.723 16.159 ...
  .. .. .. ..$ State                          : Factor w/ 5 levels "1","2","3","4",..: 5 2 5 1 5 4 5 5 5 1 ...
  .. .. .. ..$ CellType                       : Factor w/ 3 levels "CDC20","GABPB2",..: 3 3 3 3 3 3 3 3 3 3 ...
  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ featureData          :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 3 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr [1:3] NA NA NA
  .. .. ..@ data             :'data.frame': 13714 obs. of  3 variables:
  .. .. .. ..$ gene_short_name    : Factor w/ 13714 levels "7SK.2","A1BG",..: 527 715 9394 9395 5971 7345 5750 8158 9747 4917 ...
  .. .. .. ..$ num_cells_expressed: int [1:13714] 0 0 0 0 0 1 0 0 0 6 ...
  .. .. .. ..$ use_for_ordering   : logi [1:13714] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ annotation           : chr(0) 
  ..@ protocolData         :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr(0) 
  .. .. ..@ data             :'data.frame': 2638 obs. of  0 variables
  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ .__classVersion__    :Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. ..@ .Data:List of 5
  .. .. .. ..$ : int [1:3] 3 5 2
  .. .. .. ..$ : int [1:3] 2 42 0
  .. .. .. ..$ : int [1:3] 1 3 0
  .. .. .. ..$ : int [1:3] 1 0 0
  .. .. .. ..$ : int [1:3] 1 2 0

0 人点赞