單細胞分析實錄(15): 基於monocle2的擬時序分析

本文轉載自查看原文 2021-06-07 00:12 1330

關於什么是“擬時序分析”，可以參考本期推送的另一篇推文。這一篇直接演示代碼

monocle2這個軟件用得太多了，很多文章都是monocle2的圖。因為只使用表達矩陣作為輸入，相比於其他軟件，已經很方便了。不過話說回來，我總感覺這種軌跡推斷像玄學，大半的結果是調整出來的/事先已知的，比如擬時序里面的起始狀態經常需要自己指定。

在做擬時序之前，最好先跑完seurat標准流程，並注釋好細胞類型。我這里使用的數據都已經注釋好細胞類型，並且事先已經知道哪一個細胞類型可能是初始狀態。

1. 導入數據，創建對象，預處理

library(monocle)
library(tidyverse)

expr_matrix=read.table("count_mat.txt",header = T,row.names = 1,sep = "\t",stringsAsFactors = F)
#10X的數據使用UMI count矩陣
cell_anno=read.table("cell_anno.txt",header = T,row.names = 1,sep = "\t",stringsAsFactors = F)
gene_anno=read.table("gene_anno.txt",header = T,row.names = 1,sep = "\t",stringsAsFactors = F)
#“gene_short_name”為列名
pd=new("AnnotatedDataFrame",data = cell_anno)
fd=new("AnnotatedDataFrame",data = gene_anno)
test=newCellDataSet(as(as.matrix(expr_matrix),'sparseMatrix'),phenoData = pd,featureData = fd)
#大數據集使用稀疏矩陣，節省內存，加快運算
test <- estimateSizeFactors(test) 
test <- estimateDispersions(test)
test=detectGenes(test,min_expr = 0.1) #計算每個基因在多少細胞中表達

2. 選擇基因

選擇研究的生物學過程涉及到的基因集，這一步對於軌跡形狀的影響很大。
可以選擇數據集中的高變基因，或者是在seurat中分析得到的marker基因列表。如果是時間序列數據，可以直接比較時間點選差異基因。總之選擇的基因要含有盡可能多的信息。
我這里直接用的各種亞群差異基因的集合

marker_gene=read.table("seurat_marker_gene.txt",header=T,sep="\t",stringsAsFactors=F)
test_ordering_genes=unique(marker_gene$gene)
test=setOrderingFilter(test,ordering_genes = test_ordering_genes) 
#指明哪些基因用於后續的聚類/排序

3. 推斷軌跡，並按照擬時序給細胞排序

test=reduceDimension(test,reduction_method = "DDRTree",max_components = 2, norm_method = 'log',residualModelFormulaStr = "~num_genes_expressed") 
#residualModelFormulaStr減少其他因素的影響，比如不同樣本、不同批次
test=orderCells(test)

4. 繪圖

plot_cell_trajectory(test,color_by = "celltype")
ggsave("celltype.pdf",device = "pdf",width = 8,height = 9,units = c("cm"))
plot_cell_trajectory(test,color_by = "State")
ggsave("State.pdf",device = "pdf",width = 8,height = 9,units = c("cm"))
plot_cell_trajectory(test,color_by = "Pseudotime")
ggsave("Pseudotime.pdf",device = "pdf",width = 8,height = 9,units = c("cm"))
plot_cell_trajectory(test,color_by = "celltype")+facet_wrap(~celltype,nrow=1)
ggsave("celltypeb.pdf",device = "pdf",width = 21,height = 9,units = c("cm"))

有時候（大多數時候），擬時序的方向或是根節點弄錯了，還需要手動更改

test=orderCells(test,root_state = 3) 
plot_cell_trajectory(test,color_by = "Pseudotime")
ggsave("Pseudotime.pdf",device = "pdf",width = 8,height = 9,units = c("cm"))

5. 找差異基因

這里是指找隨擬時序變化的差異基因，以及不同state之間的差異基因。這兩個都是monocle里面的概念，與seurat里面找的差異基因不同。
如果在做monocle2的時候，想展示這種差異基因，就需要做這一步。

expressed_genes=row.names(subset(fData(test),num_cells_expressed>=10)) #在部分基因里面找
pseudotime_de <- differentialGeneTest(test[expressed_genes,],
                                      fullModelFormulaStr = "~sm.ns(Pseudotime)")
pseudotime_de <- pseudotime_de[order(pseudotime_de$qval), ]
states_de <- differentialGeneTest(test[expressed_genes,],
                                  fullModelFormulaStr = "~State")
states_de <- states_de[order(states_de$qval), ]

saveRDS(test, file = "test_monocle.rds")
write.table(pseudotime_de, file = "pseudotime_de.rds", quote = FALSE, sep = '\t', row.names = FALSE, col.names = TRUE)
write.table(states_de, file = "states_de.rds", quote = FALSE, sep = '\t', row.names = FALSE, col.names = TRUE)

6. 分支點的分析

解決的問題：沿着擬時序的方向，經過不同的branch，發生了哪些基因的變化？
經常在文獻里面看到的monocle2熱圖就是這種分析。

BEAM_res=BEAM(test,branch_point = 1,cores = 1)
#會返回每個基因的顯著性，顯著的基因就是那些隨不同branch變化的基因
#這一步很慢
BEAM_res=BEAM_res[,c("gene_short_name","pval","qval")]
saveRDS(BEAM_res, file = "BEAM_res.rds")

畫圖

tmp1=plot_genes_branched_heatmap(test[row.names(subset(BEAM_res,qval<1e-4)),],
                                 branch_point = 1,
                                 num_clusters = 4, #這些基因被分成幾個group
                                 cores = 1,
                                 branch_labels = c("Cell fate 1", "Cell fate 2"),
                                 #hmcols = NULL, #默認值
                                 hmcols = colorRampPalette(rev(brewer.pal(9, "PRGn")))(62),
                                 branch_colors = c("#979797", "#F05662", "#7990C8"), #pre-branch, Cell fate 1, Cell fate 2分別用什么顏色
                                 use_gene_short_name = T,
                                 show_rownames = F,
                                 return_heatmap = T #是否返回一些重要信息
)

pdf("branched_heatmap.pdf",width = 5,height = 6)
tmp1$ph_res
dev.off()

monocle2的熱圖怎么看

從你想研究的節點（第4步圖中的1節點）到根（一般是人為指定，圖中是Ucell對應的狀態）都是pre-branch。
然后沿着擬時序的方向，State 1對應的枝是fate 1（對應圖中的Gcell），State 2對應的枝是fate 2（對應Ccell）。軟件作者是這么說的：Cell fate 1 corresponds to the state with small id while cell fate 2 corresponds to state with bigger id
熱圖中的基因是BEAM函數找到的branch特異的基因。從熱圖中間向兩邊看，能看到不同的fate/branch基因的動態變化。熱圖中列是擬時序，行是基因，熱圖中間是擬時序的開始。

返回的列表tmp1包含熱圖對象，熱圖的數值矩陣，熱圖主題顏色，每一行的注釋（基因屬於哪一個group）等信息。
根據行注釋提取出基因group，可以去做富集分析，后期加到熱圖的旁邊。像這樣：

gene_group=tmp1$annotation_row
gene_group$gene=rownames(gene_group)

library(clusterProfiler)
library(org.Hs.eg.db)
allcluster_go=data.frame()
for (i in unique(gene_group$Cluster)) {
  small_gene_group=filter(gene_group,gene_group$Cluster==i)
  df_name=bitr(small_gene_group$gene, fromType="SYMBOL", toType=c("ENTREZID"), OrgDb="org.Hs.eg.db")
  go <- enrichGO(gene         = unique(df_name$ENTREZID),
                 OrgDb         = org.Hs.eg.db,
                 keyType       = 'ENTREZID',
                 ont           = "BP",
                 pAdjustMethod = "BH",
                 pvalueCutoff  = 0.05,
                 qvalueCutoff  = 0.2,
                 readable      = TRUE)
  go_res=go@result
  if (dim(go_res)[1] != 0) {
    go_res$cluster=i
    allcluster_go=rbind(allcluster_go,go_res)
  }
}
head(allcluster_go[,c("ID","Description","qvalue","cluster")])

也可以換一種方式來展示具體的基因，這些基因可以是：

隨擬時序、state而改變的基因（第5步）
細胞亞群的marker基因（seurat得到的差異基因）
分支點分析得到的分支特異的基因（第6步BEAM函數得到的基因）

test_genes=c("TFF3","GUCA2B")
pdf("genes_branched_pseudotime.pdf",width = 9,height = 4)
plot_genes_branched_pseudotime(test[test_genes,],
                               branch_point = 1,
                               color_by = "celltype",
                               cell_size=2,
                               ncol = 2)
dev.off()

因水平有限，有錯誤的地方，歡迎批評指正！
在公眾號可以獲取本文的測試數據和全部代碼。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。