單細胞分析實錄(8): 展示marker基因的4種圖形(一)


今天的內容講講單細胞文章中經常出現的展示細胞marker的圖:tsne/umap圖、熱圖、堆疊小提琴圖、氣泡圖,每個圖我都會用兩種方法繪制。

使用的數據來自文獻:Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma. 去年7月發表在Cell Research上的關於鼻咽癌的文章,數據下載:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE150430

髓系細胞數量相對少一些,為了方便演示,選它作為例子。

library(Seurat)
library(tidyverse)
library(harmony)
Myeloid=read.table("Myeloid_mat.txt",header = T,row.names = 1,sep = "\t",stringsAsFactors = F)
Myeloid_anno=read.table("Myeloid_anno.txt",header = T,sep = "\t",stringsAsFactors = F)

導入數據的時候需要注意一個地方:從cell ranger得到的矩陣,每一列的列名會在CB后面加上"-1"這個字符串,在R里面導入數據時,會自動轉化為".1",在做匹配的時候需要注意一下。我已經提前轉換為"_1"

> summary(as.numeric(Myeloid["CD14",]))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.000   0.000   0.689   1.080   2.111   4.500
> summary(as.numeric(Myeloid["PTPRC",]))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.000   1.200   1.681   1.607   2.124   3.520

考慮到下載的表達矩陣,表達值都不是整數,且大於0,推測該矩陣已經經過了標准化,因此下面的流程會跳過這一步

0. Seurat流程

我們直接把注釋結果賦值到mye.seu@meta.data矩陣中,后面省去聚類這一步

mye.seu=CreateSeuratObject(Myeloid)
mye.seu@meta.data$CB=rownames(mye.seu@meta.data) 
mye.seu@meta.data=merge(mye.seu@meta.data,Myeloid_anno,by="CB")
rownames(mye.seu@meta.data)=mye.seu@meta.data$CB

#替代LogNormalize這一步
mye.seu[["RNA"]]@data=mye.seu[["RNA"]]@counts
mye.seu <- FindVariableFeatures(mye.seu, selection.method = "vst", nfeatures = 2000)
mye.seu <- ScaleData(mye.seu, features = rownames(mye.seu))

mye.seu <- RunPCA(mye.seu, npcs = 50, verbose = FALSE)
mye.seu=mye.seu %>% RunHarmony("sample", plot_convergence = TRUE)
mye.seu <- RunUMAP(mye.seu, reduction = "harmony", dims = 1:20)
mye.seu <- RunTSNE(mye.seu, reduction = "harmony", dims = 1:20)
#少了聚類

DimPlot(mye.seu, reduction = "tsne", group.by = "celltype", pt.size=1)+theme(
  axis.line = element_blank(),
  axis.ticks = element_blank(),axis.text = element_blank()
)
ggsave("tsne1.pdf",device = "pdf",width = 17,height = 14,units = "cm")

基本符合原圖,三個亞群分得開,三個亞群分不開。


接下來,我用4種方式展示marker基因,這些基因可以在文獻的補充材料里面找到。

1. tsne展示marker基因

FeaturePlot(mye.seu,features = "CCR7",reduction = "tsne",pt.size = 1)+
  scale_x_continuous("")+scale_y_continuous("")+
  theme_bw()+ #改變ggplot2的主題
  theme( #進一步修改主題
    panel.grid.major = element_blank(),panel.grid.minor = element_blank(), #去掉背景線
    axis.ticks = element_blank(),axis.text = element_blank(), #去掉坐標軸刻度和數字
    legend.position = "none", #去掉圖例
    plot.title = element_text(hjust = 0.5,size=14) #改變標題位置和字體大小
  )
ggsave("CCR7.pdf",device = "pdf",width = 10,height = 10.5,units = "cm")  

另一種方法就是把tsne的坐標和基因的表達值提取出來,用ggplot2畫,其實不是很必要,因為FeaturePlot也是基於ggplot2的,我還是演示一下

mat1=as.data.frame(mye.seu[["RNA"]]@data["CCR7",])
colnames(mat1)="exp"
mat2=Embeddings(mye.seu,"tsne")
mat3=merge(mat2,mat1,by="row.names")

#數據格式如下:
> head(mat3)
Row.names     tSNE_1     tSNE_2   exp
1 N01_AAACGGGCATTTCAGG_1   5.098727  32.748145 0.000
2 N01_AAAGATGCAATGTAAG_1 -24.394040  26.176422 0.000
3 N01_AACTCAGGTAATAGCA_1  11.856730   8.086553 0.000
4 N01_AACTCAGGTCTTCGTC_1  10.421878  12.660407 0.000
5 N01_AACTTTCAGGCCATAG_1  33.555756 -10.437406 1.606
6 N01_AAGACCTTCGAATGGG_1 -23.976967  11.897753 0.738

mat3%>%ggplot(aes(tSNE_1,tSNE_2))+geom_point(aes(color=exp))+
  scale_color_gradient(low = "grey",high = "purple")+theme_bw()
ggsave("CCR7.2.pdf",device = "pdf",width = 13.5,height = 12,units = "cm")

用ggplot2的好處就是圖形修改很方便,畢竟ggplot2大家都很熟悉

2. 熱圖展示marker基因

畫圖前,需要給每個細胞一個身份,因為我們跳過了聚類這一步,此處需要手動賦值

Idents(mye.seu)="celltype"

library(xlsx)
markerdf1=read.xlsx("ref_marker.xlsx",sheetIndex = 1)
markerdf1$gene=as.character(markerdf1$gene)
# 這個表格整理自原文的附表,選了53個基因

#數據格式
# > head(markerdf1)
# gene   celltype
# 1    S100B DC2(CD1C+)
# 2 HLA-DQB2 DC2(CD1C+)
# 3   FCER1A DC2(CD1C+)
# 4     CD1A DC2(CD1C+)
# 5     PKIB DC2(CD1C+)
# 6    NDRG2 DC2(CD1C+)

DoHeatmap(mye.seu,features = markerdf1$gene,label = F,slot = "scale.data")
ggsave("heatmap.pdf",device = "pdf",width = 23,height = 16,units = "cm")

label = F不在熱圖的上方標注細胞類型,
slot = "scale.data"使用scale之后的矩陣畫圖,默認就是這個

接下來用pheatmap畫,在布局上可以自由發揮

library(pheatmap)
colanno=mye.seu@meta.data[,c("CB","celltype")]
colanno=colanno%>%arrange(celltype)
rownames(colanno)=colanno$CB
colanno$CB=NULL
colanno$celltype=factor(colanno$celltype,levels = unique(colanno$celltype))

先對細胞進行排序,按照celltype的順序,然后對基因排序

rowanno=markerdf1
rowanno=rowanno%>%arrange(celltype)

提取scale矩陣的行列時,按照上面的順序

mat4=mye.seu[["RNA"]]@scale.data[rowanno$gene,rownames(colanno)]
mat4[mat4>=2.5]=2.5
mat4[mat4 < (-1.5)]= -1.5 #小於負數時,加括號!

下面就是繪圖代碼了,我加了分界線,使其看上去更有區分度

pheatmap(mat4,cluster_rows = F,cluster_cols = F,
         show_colnames = F,
         annotation_col = colanno,
         gaps_row=as.numeric(cumsum(table(rowanno$celltype))[-6]),
         gaps_col=as.numeric(cumsum(table(colanno$celltype))[-6]),
         filename="heatmap.2.pdf",width=11,height = 7
         )


先寫到這兒吧(原本以為能寫完的),剩下的氣泡圖、堆疊小提琴圖改天再補上。

因水平有限,有錯誤的地方,歡迎批評指正!


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM