把Seurat对象里面表达量矩阵和细胞表型信息输出给CellPhoneDB做细胞通讯

CellPhoneDB如此难用，但是因为它是最经典的的单细胞高级分析里面的细胞通讯工具被广为人知，仍然是很多小伙伴在后台咨询它的用法。可以看到其最新版引用还挺好的，而且发在Nat Protoc. 2020 Apr;

最新版引用还挺好的

如果你查看它示例数据里面的表达矩阵文件test_counts.txt 可以看到 ensemble数据库的ID，所以推测CellPhoneDB works only with human ensembl ids，假如你的表达矩阵去其它物种，或者其它ID形式先需要转换一下哈，可以使用同源基因列表对应关系进行转换。

参考：https://pypi.org/project/CellPhoneDB/

下面我们以经典的pbmc3k单细胞数据集为例，说明如何制作CellPhoneDB做细胞通讯所要求的两个文件：

代码语言：javascript复制

library(SeuratData) #加载seurat数据集  
getOption('timeout')
options(timeout=10000)
#InstallData("sce")  
data("pbmc3k")  
sce <- pbmc3k.final   
library(Seurat)
DimPlot(sce,label = T,repel = T) 
 
exp  <- as.data.frame( sce@assays$RNA@data)
dim(exp)
exp[1:4,1:4]
library(AnnoProbe) 
ids=annoGene( rownames(exp),'SYMBOL','human')
head(ids) 
# 为了代码简单，这里直接粗暴的删除ID之间的混乱关系
length(unique(ids$ENSEMBL)) 
ids=ids[!duplicated(ids$ENSEMBL),]
length(unique(ids$SYMBOL)) 
ids=ids[!duplicated(ids$SYMBOL),]

test_counts=exp[ids$SYMBOL,]
test_counts[1:4,1:4]
table(sce$seurat_annotations ) 
test_meta <-  data.frame(Cell = rownames(sce@meta.data), 
                         cell_type = sce$seurat_annotations )  
head(test_meta)
test_meta$cell_type=gsub(' ','_',test_meta$cell_type)
test_meta$cell_type=gsub('\ ','',test_meta$cell_type) 
table(test_meta$cell_type)
length(unique(test_meta$Cell))

identical(colnames(test_counts),test_meta$Cell)
rownames(test_counts)=ids$ENSEMBL

test_counts=cbind(rownames(test_counts),test_counts)
colnames(test_counts)[1]='Gene'
test_counts[1:4,1:4]

write.table(test_counts, "test_counts.txt",  row.names=F, sep='t',quote = F)
write.table(test_meta, "test_meta.txt", row.names=F, sep='t',quote = F)

我们制作的表达量矩阵文件如下所示：

代码语言：javascript复制


> test_counts[1:4,1:4]
                           Gene AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC
ENSG00000238009 ENSG00000238009              0              0              0
ENSG00000286448 ENSG00000286448              0              0              0
ENSG00000225880 ENSG00000225880              0              0              0
ENSG00000188976 ENSG00000188976              0              0              0

我们的单细胞表型信息文件如下所示：

代码语言：javascript复制

head test_meta.txt 
Cell    cell_type
AAACATACAACCAC  Memory_CD4_T
AAACATTGAGCTAC  B
AAACATTGATCAGC  Memory_CD4_T
AAACCGTGCTTCCG  CD14_Mono
AAACCGTGTATGCG  NK
AAACGCACTGGTAC  Memory_CD4_T
AAACGCTGACCAGT  CD8_T
AAACGCTGGTTCTT  CD8_T
AAACGCTGTAGCCA  Naive_CD4_T
# 接近3千个细胞，归类如下；
cut -f 2 test_meta.txt |sort |uniq -c 
 344 B
 480 CD14_Mono
 271 CD8_T
  32 DC
 162 FCGR3A_Mono
 483 Memory_CD4_T
 155 NK
 697 Naive_CD4_T
  14 Platelet

有了这两个文件，运行CellPhoneDB做细胞通讯就是一句话代码的事情。

代码语言：javascript复制

source $HOME/cpdb-venv/bin/activate
cellphonedb --help
cellphonedb method statistical_analysis test_meta.txt test_counts.txt

难点可能是在如何解读它的输出结果，甚至对很多小伙伴来说，可能是如何安装这个CellPhoneDB

nat txt

0 人点赞