资料:
- 官网:https://cole-trapnell-lab.github.io/monocle3/docs/starting/
Monocle 3被重新设计用于分析大型、复杂的单细胞数据集,核心算法具有高度可扩展性,可以处理百万级别单细胞数据。Monocle 3增加了一些强大的新功能:
- 更好的结构化工作流来学习发展轨迹
- 支持UMAP算法初始化轨迹推断
- 轨迹支持多个根节点(root)
- 学习有环路或收敛点的轨迹的方法
- 利用“近似图抽象”("approximate graph abstraction")的思想,自动划分cell以学习不相交或平行轨迹的算法。
- 新的轨迹依赖表达基因分析方法:替换monocle2中的differalgenetest()函数和BEAM()
- 3D界面可视化轨迹和基因表达
安装
代码语言:javascript复制if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.10")
# 首先安装依赖
BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
'limma', 'S4Vectors', 'SingleCellExperiment',
'SummarizedExperiment', 'batchelor', 'Matrix.utils'))
# monocle3安装
install.packages("devtools")
devtools::install_github('cole-trapnell-lab/leidenbase')
devtools::install_github('cole-trapnell-lab/monocle3')
# 加载测试是否成功
library(monocle3)
monocle3的工作原理流程:
代码版步骤
代码语言:javascript复制### ======== Store data in a cell_data_set object
cds <- new_cell_data_set(expression_matrix,
cell_metadata = cell_metadata,
gene_metadata = gene_annotation)
## Step 1: Normalize and pre-process the data
cds <- preprocess_cds(cds, num_dim = 100)
### ======== Remove batch effects(可选)
## Step 2: Remove batch effects with cell alignment
cds <- align_cds(cds, alignment_group = "batch")
### ======== Cluster your cells
## Step 3: Reduce the dimensions using UMAP
cds <- reduce_dimension(cds)
## Step 4: Cluster the cells
cds <- cluster_cells(cds)
### ======== Order cells in pseudotime along a trajectory(可选)
## Step 5: Learn a graph
cds <- learn_graph(cds)
## Step 6: Order cells
cds <- order_cells(cds)
plot_cells(cds)
### ======== Perform differential expression analysis(可选)
# With regression:
gene_fits <- fit_models(cds, model_formula_str = "~embryo.time")
fit_coefs <- coefficient_table(gene_fits)
emb_time_terms <- fit_coefs %>% filter(term == "embryo.time")
emb_time_terms <- emb_time_terms %>% mutate(q_value = p.adjust(p_value))
sig_genes <- emb_time_terms %>% filter (q_value < 0.05) %>% pull(gene_short_name)
# With graph autocorrelation:
pr_test_res <- graph_test(cds, neighbor_graph="principal_graph", cores=4)
pr_deg_ids <- row.names(subset(pr_test_res, q_value < 0.05))
支持数据类型
monocle3使用基因表达矩阵作为输入:
- Monocle 3是专门为绝对转录本计数(例如UMI)设计的
- Monocle 3可与Cell Ranger生成的转录本计数矩阵衔接,实现“开箱即用”
- Monocle 3也可以很好地与来自其他RNA-Seq工作流程的数据,如sci-RNA-Seq
monocle3对象类型
Monocle使用cell_data_set类对象保存单细胞表达数据。该类派生自Bioconductor singlecellexperexperiment类,提供了一个公共接口,这个类需要三个输入文件:
- expression_matrix:表达矩阵,其中行是基因,列是cell
- cell_metadata:数据框,行为细胞,列是细胞表型(例如细胞类型、培养条件、捕获的天数等)
- gene_metadata:数据框,行是features(例如基因),列是基因属性,例如生物类型、gc含量等
创建方式:
代码语言:javascript复制# Load the data
expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_expression.rds"))
cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_colData.rds"))
gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/cao_l2_rowData.rds"))
# Make the CDS object
cds <- new_cell_data_set(expression_matrix,
cell_metadata = cell_metadata,
gene_metadata = gene_annotation)
针对10X Genomics Cell Ranger输出数据:
output结构:10x_data/outs/filtered_feature_bc_matrix/
- features.tsv.gz
- barcodes.tsv.gz
- matrix.mtx.gz
# Provide the path to the Cell Ranger output.
cds <- load_cellranger_data("~/Downloads/10x_data")
# or
cds <- load_mm_data(mat_path = "~/Downloads/matrix.mtx",
feature_anno_path = "~/Downloads/features.tsv",
cell_anno_path = "~/Downloads/barcodes.tsv")
大数据分析
note:可以不需要转换稀疏矩阵为matrix对象
代码语言:javascript复制cds <- new_cell_data_set(as(umi_matrix, "sparseMatrix"),
cell_metadata = cell_metadata,
gene_metadata = gene_metadata)
多个CDS对象合并
代码语言:javascript复制# make a fake second cds object for demonstration
cds2 <- cds[1:100,]
big_cds <- combine_cds(list(cds, cds2))
References
[1]
"approximate graph abstraction": https://www.biorxiv.org/content/early/2017/10/25/208819