今天的內容講講單細胞文章中經常出現的展示細胞marker的圖:tsne/umap圖、熱圖、堆疊小提琴圖、氣泡圖,每個圖我都會用兩種方法繪制。
使用的數據來自文獻:Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma. 去年7月發表在Cell Research上的關於鼻咽癌的文章,數據下載:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE150430
髓系細胞數量相對少一些,為了方便演示,選它作為例子。
library(Seurat)
library(tidyverse)
library(harmony)
Myeloid=read.table("Myeloid_mat.txt",header = T,row.names = 1,sep = "\t",stringsAsFactors = F)
Myeloid_anno=read.table("Myeloid_anno.txt",header = T,sep = "\t",stringsAsFactors = F)
導入數據的時候需要注意一個地方:從cell ranger得到的矩陣,每一列的列名會在CB后面加上"-1"這個字符串,在R里面導入數據時,會自動轉化為".1",在做匹配的時候需要注意一下。我已經提前轉換為"_1"
> summary(as.numeric(Myeloid["CD14",]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.689 1.080 2.111 4.500
> summary(as.numeric(Myeloid["PTPRC",]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.200 1.681 1.607 2.124 3.520
考慮到下載的表達矩陣,表達值都不是整數,且大於0,推測該矩陣已經經過了標准化,因此下面的流程會跳過這一步
0. Seurat流程
我們直接把注釋結果賦值到mye.seu@meta.data矩陣中,后面省去聚類這一步
mye.seu=CreateSeuratObject(Myeloid)
mye.seu@meta.data$CB=rownames(mye.seu@meta.data)
mye.seu@meta.data=merge(mye.seu@meta.data,Myeloid_anno,by="CB")
rownames(mye.seu@meta.data)=mye.seu@meta.data$CB
#替代LogNormalize這一步
mye.seu[["RNA"]]@data=mye.seu[["RNA"]]@counts
mye.seu <- FindVariableFeatures(mye.seu, selection.method = "vst", nfeatures = 2000)
mye.seu <- ScaleData(mye.seu, features = rownames(mye.seu))
mye.seu <- RunPCA(mye.seu, npcs = 50, verbose = FALSE)
mye.seu=mye.seu %>% RunHarmony("sample", plot_convergence = TRUE)
mye.seu <- RunUMAP(mye.seu, reduction = "harmony", dims = 1:20)
mye.seu <- RunTSNE(mye.seu, reduction = "harmony", dims = 1:20)
#少了聚類
DimPlot(mye.seu, reduction = "tsne", group.by = "celltype", pt.size=1)+theme(
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
ggsave("tsne1.pdf",device = "pdf",width = 17,height = 14,units = "cm")
基本符合原圖,三個亞群分得開,三個亞群分不開。
接下來,我用4種方式展示marker基因,這些基因可以在文獻的補充材料里面找到。
1. tsne展示marker基因
FeaturePlot(mye.seu,features = "CCR7",reduction = "tsne",pt.size = 1)+
scale_x_continuous("")+scale_y_continuous("")+
theme_bw()+ #改變ggplot2的主題
theme( #進一步修改主題
panel.grid.major = element_blank(),panel.grid.minor = element_blank(), #去掉背景線
axis.ticks = element_blank(),axis.text = element_blank(), #去掉坐標軸刻度和數字
legend.position = "none", #去掉圖例
plot.title = element_text(hjust = 0.5,size=14) #改變標題位置和字體大小
)
ggsave("CCR7.pdf",device = "pdf",width = 10,height = 10.5,units = "cm")
另一種方法就是把tsne的坐標和基因的表達值提取出來,用ggplot2畫,其實不是很必要,因為FeaturePlot也是基於ggplot2的,我還是演示一下
mat1=as.data.frame(mye.seu[["RNA"]]@data["CCR7",])
colnames(mat1)="exp"
mat2=Embeddings(mye.seu,"tsne")
mat3=merge(mat2,mat1,by="row.names")
#數據格式如下:
> head(mat3)
Row.names tSNE_1 tSNE_2 exp
1 N01_AAACGGGCATTTCAGG_1 5.098727 32.748145 0.000
2 N01_AAAGATGCAATGTAAG_1 -24.394040 26.176422 0.000
3 N01_AACTCAGGTAATAGCA_1 11.856730 8.086553 0.000
4 N01_AACTCAGGTCTTCGTC_1 10.421878 12.660407 0.000
5 N01_AACTTTCAGGCCATAG_1 33.555756 -10.437406 1.606
6 N01_AAGACCTTCGAATGGG_1 -23.976967 11.897753 0.738
mat3%>%ggplot(aes(tSNE_1,tSNE_2))+geom_point(aes(color=exp))+
scale_color_gradient(low = "grey",high = "purple")+theme_bw()
ggsave("CCR7.2.pdf",device = "pdf",width = 13.5,height = 12,units = "cm")
用ggplot2的好處就是圖形修改很方便,畢竟ggplot2大家都很熟悉
2. 熱圖展示marker基因
畫圖前,需要給每個細胞一個身份,因為我們跳過了聚類這一步,此處需要手動賦值
Idents(mye.seu)="celltype"
library(xlsx)
markerdf1=read.xlsx("ref_marker.xlsx",sheetIndex = 1)
markerdf1$gene=as.character(markerdf1$gene)
# 這個表格整理自原文的附表,選了53個基因
#數據格式
# > head(markerdf1)
# gene celltype
# 1 S100B DC2(CD1C+)
# 2 HLA-DQB2 DC2(CD1C+)
# 3 FCER1A DC2(CD1C+)
# 4 CD1A DC2(CD1C+)
# 5 PKIB DC2(CD1C+)
# 6 NDRG2 DC2(CD1C+)
DoHeatmap(mye.seu,features = markerdf1$gene,label = F,slot = "scale.data")
ggsave("heatmap.pdf",device = "pdf",width = 23,height = 16,units = "cm")
label = F不在熱圖的上方標注細胞類型,
slot = "scale.data"使用scale之后的矩陣畫圖,默認就是這個
接下來用pheatmap畫,在布局上可以自由發揮
library(pheatmap)
colanno=mye.seu@meta.data[,c("CB","celltype")]
colanno=colanno%>%arrange(celltype)
rownames(colanno)=colanno$CB
colanno$CB=NULL
colanno$celltype=factor(colanno$celltype,levels = unique(colanno$celltype))
先對細胞進行排序,按照celltype的順序,然后對基因排序
rowanno=markerdf1
rowanno=rowanno%>%arrange(celltype)
提取scale矩陣的行列時,按照上面的順序
mat4=mye.seu[["RNA"]]@scale.data[rowanno$gene,rownames(colanno)]
mat4[mat4>=2.5]=2.5
mat4[mat4 < (-1.5)]= -1.5 #小於負數時,加括號!
下面就是繪圖代碼了,我加了分界線,使其看上去更有區分度
pheatmap(mat4,cluster_rows = F,cluster_cols = F,
show_colnames = F,
annotation_col = colanno,
gaps_row=as.numeric(cumsum(table(rowanno$celltype))[-6]),
gaps_col=as.numeric(cumsum(table(colanno$celltype))[-6]),
filename="heatmap.2.pdf",width=11,height = 7
)
先寫到這兒吧(原本以為能寫完的),剩下的氣泡圖、堆疊小提琴圖改天再補上。
因水平有限,有錯誤的地方,歡迎批評指正!