单细胞分析1—monocle3分析概览

2022-03-14 16:34:36 浏览数 (1)

资料

  • 官网:https://cole-trapnell-lab.github.io/monocle3/docs/starting/

Monocle 3被重新设计用于分析大型、复杂的单细胞数据集,核心算法具有高度可扩展性,可以处理百万级别单细胞数据。Monocle 3增加了一些强大的新功能:

  • 更好的结构化工作流来学习发展轨迹
  • 支持UMAP算法初始化轨迹推断
  • 轨迹支持多个根节点(root)
  • 学习有环路或收敛点的轨迹的方法
  • 利用“近似图抽象”("approximate graph abstraction")的思想,自动划分cell以学习不相交或平行轨迹的算法。
  • 新的轨迹依赖表达基因分析方法:替换monocle2中的differalgenetest()函数和BEAM()
  • 3D界面可视化轨迹和基因表达

安装

代码语言:javascript复制
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.10")

# 首先安装依赖
BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
                       'limma', 'S4Vectors', 'SingleCellExperiment',
                       'SummarizedExperiment', 'batchelor', 'Matrix.utils'))

# monocle3安装
install.packages("devtools")
devtools::install_github('cole-trapnell-lab/leidenbase')
devtools::install_github('cole-trapnell-lab/monocle3')

# 加载测试是否成功
library(monocle3)

monocle3的工作原理流程:

代码版步骤

代码语言:javascript复制
### ======== Store data in a cell_data_set object
cds <- new_cell_data_set(expression_matrix,
                         cell_metadata = cell_metadata,
                         gene_metadata = gene_annotation)

## Step 1: Normalize and pre-process the data
cds <- preprocess_cds(cds, num_dim = 100)


### ======== Remove batch effects(可选)
## Step 2: Remove batch effects with cell alignment 
cds <- align_cds(cds, alignment_group = "batch")


### ======== Cluster your cells
## Step 3: Reduce the dimensions using UMAP
cds <- reduce_dimension(cds)
## Step 4: Cluster the cells
cds <- cluster_cells(cds)


### ======== Order cells in pseudotime along a trajectory(可选)
## Step 5: Learn a graph
cds <- learn_graph(cds)
## Step 6: Order cells
cds <- order_cells(cds)
plot_cells(cds)


### ========  Perform differential expression analysis(可选)
# With regression:
gene_fits <- fit_models(cds, model_formula_str = "~embryo.time")
fit_coefs <- coefficient_table(gene_fits)
emb_time_terms <- fit_coefs %>% filter(term == "embryo.time")
emb_time_terms <- emb_time_terms %>% mutate(q_value = p.adjust(p_value))
sig_genes <- emb_time_terms %>% filter (q_value < 0.05) %>% pull(gene_short_name)

# With graph autocorrelation:
pr_test_res <- graph_test(cds,  neighbor_graph="principal_graph", cores=4)
pr_deg_ids <- row.names(subset(pr_test_res, q_value < 0.05))

支持数据类型

monocle3使用基因表达矩阵作为输入:

  • Monocle 3是专门为绝对转录本计数(例如UMI)设计的
  • Monocle 3可与Cell Ranger生成的转录本计数矩阵衔接,实现“开箱即用”
  • Monocle 3也可以很好地与来自其他RNA-Seq工作流程的数据,如sci-RNA-Seq

monocle3对象类型

Monocle使用cell_data_set类对象保存单细胞表达数据。该类派生自Bioconductor singlecellexperexperiment类,提供了一个公共接口,这个类需要三个输入文件:

  • expression_matrix:表达矩阵,其中行是基因,列是cell
  • cell_metadata:数据框,行为细胞,列是细胞表型(例如细胞类型、培养条件、捕获的天数等)
  • gene_metadata:数据框,行是features(例如基因),列是基因属性,例如生物类型、gc含量等

创建方式:

代码语言:javascript复制
# Load the data
expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_expression.rds"))
cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_colData.rds"))
gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_rowData.rds"))

# Make the CDS object
cds <- new_cell_data_set(expression_matrix,
                         cell_metadata = cell_metadata,
                         gene_metadata = gene_annotation)

针对10X Genomics Cell Ranger输出数据:

output结构:10x_data/outs/filtered_feature_bc_matrix/

  • features.tsv.gz
  • barcodes.tsv.gz
  • matrix.mtx.gz
代码语言:javascript复制
# Provide the path to the Cell Ranger output.
cds <- load_cellranger_data("~/Downloads/10x_data")

# or
cds <- load_mm_data(mat_path = "~/Downloads/matrix.mtx", 
                    feature_anno_path = "~/Downloads/features.tsv", 
                    cell_anno_path = "~/Downloads/barcodes.tsv")

大数据分析

note:可以不需要转换稀疏矩阵为matrix对象

代码语言:javascript复制
cds <- new_cell_data_set(as(umi_matrix, "sparseMatrix"),
cell_metadata = cell_metadata,
gene_metadata = gene_metadata)

多个CDS对象合并

代码语言:javascript复制
# make a fake second cds object for demonstration
cds2 <- cds[1:100,]

big_cds <- combine_cds(list(cds, cds2))

References

[1] "approximate graph abstraction": https://www.biorxiv.org/content/early/2017/10/25/208819

0 人点赞