ggplot 繪制熱圖

本文轉載自查看原文 2018-01-23 21:14 1205 R語言

生成測試數據

繪圖首先需要數據。通過生成一堆的向量，轉換為矩陣，得到想要的數據。

 
                  data <- c(1:6, 6:1, 6:1, 1:6, (6:1) 
                  /10 
                  , (1:6) 
                  /10 
                  , (1:6) 
                  /10 
                  , (6:1) 
                  /10 
                  , 1:6, 6:1, 6:1, 1:6, 6:1, 1:6, 1:6, 6:1)

[1] 1.0 2.0 3.0 4.0 5.0 6.0 6.0 5.0 4.0 3.0 2.0 1.0 6.0 5.0
[15] 4.0 3.0 2.0 1.0 1.0 2.0 3.0 4.0 5.0 6.0 0.6 0.5 0.4 0.3
[29] 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6
[43] 0.6 0.5 0.4 0.3 0.2 0.1 1.0 2.0 3.0 4.0 5.0 6.0 6.0 5.0
[57] 4.0 3.0 2.0 1.0 6.0 5.0 4.0 3.0 2.0 1.0 1.0 2.0 3.0 4.0
[71] 5.0 6.0 6.0 5.0 4.0 3.0 2.0 1.0 1.0 2.0 3.0 4.0 5.0 6.0
[85] 1.0 2.0 3.0 4.0 5.0 6.0 6.0 5.0 4.0 3.0 2.0 1.0

注意：運算符的優先級

 
                  > 1:3+4 
                 
                  [1] 5 6 7 
                 
                  > 1:(3+4) 
                 
                  [1] 1 2 3 4 5 6 7

Vector轉為矩陣 (matrix)，再轉為數據框 (data.frame)

 
                  # ncol 指定列數 
                 
                  # byrow 先按行填充數據 
                 
                  # ?matrix 可查看函數的使用方法 
                 
                  # as.data.frame的as系列是轉換用的 
                 
                  data <-  
                  as.data.frame 
                  ( 
                  matrix 
                  (data, ncol=12, byrow=T))

 
             
              
                
                
                  # 增加列的名字 
                 
 
                  colnames 
                  (data) <-  
                  c 
                  ( 
                  "Zygote" 
                  , 
                  "2_cell" 
                  , 
                  "4_cell" 
                  , 
                  "8_cell" 
                  , 
                  "Morula" 
                  , 
                  "ICM" 
                  , 
                  "ESC" 
                  , 
                  "4 week PGC" 
                  , 
                  "7 week PGC" 
                  , 
                  "10 week PGC" 
                  , 
                  "17 week PGC" 
                  , 
                  "OOcyte" 
                  ) 
                 
 
                  # 增加行的名字 
                 
 
                  rownames 
                  (data) <-  
                  paste 
                  ( 
                  "Gene" 
                  , 1:8, sep= 
                  "_" 
                  ) 
                 
 
                  # 只顯示前6行和前4列 
                 
 
                  head 
                  (data)[,1:4] 
                 
 
              
 
             
           

雖然方法比較繁瑣，但一個數值矩陣已經獲得了。

還有另外2種獲取數值矩陣的方式。

讀入字符串

 
                  # 使用字符串的好處是不需要額外提供文件 
                 
                  # 簡單測試時可使用，寫起來不繁瑣，又方便重復 
                 
                  # 尤其適用於在線提問時作為測試案例 
                 
                  > txt <- "ID;Zygote;2_cell;4_cell;8_cell 
                 
                  + Gene_1;1;2;3;4 
                 
                  + Gene_2;6;5;4;5 
                 
                  + Gene_3;0.6;0.5;0.4;0.4" 
                 
                  # 習慣設置quote為空，避免部分基因名字或注釋中存在引號，導致讀入文件錯誤。 
                 
                  > data2 <-  
                  read.table 
                  (text=txt, sep= 
                  ";" 
                  , header=T, row.names=1, quote= 
                  "" 
                  ) 
                 
                  >  
                  head 
                  (data2) 
                 
                  Zygote X2_cell X4_cell X8_cell 
                 
                  Gene_1    1.0     2.0     3.0     4.0 
                 
                  Gene_2    6.0     5.0     4.0     5.0 
                 
                  Gene_3    0.6     0.5     0.4     0.4

可以看到列名字中以數字開頭的列都加了X。一般要盡量避免行或列名字以數字開頭，會給后續分析帶去一些困難；另外名字中出現的非字母、數字、下划線、點的字符都會被轉為點，也需要注意，盡量只用字母、下划線和數字。

 
                  # 讀入時，增加一個參數`check.names=F`也可以解決問題。 
                 
                  # 這次數字前沒有再加 X 了 
                 
                  > data2 <-  
                  read.table 
                  (text=txt, sep= 
                  ";" 
                  , header=T, row.names=1, quote= 
                  "" 
                  , check.names = F) 
                 
                  >  
                  head 
                  (data2) 
                 
                  Zygote 2_cell 4_cell 8_cell 
                 
                  Gene_1    1.0    2.0    3.0    4.0 
                 
                  Gene_2    6.0    5.0    4.0    5.0 
                 
                  Gene_3    0.6    0.5    0.4    0.4

讀入文件

與上一步類似，只是改為文件名，不再贅述。

 
                  > data2 <-  
                  read.table 
                  ( 
                  "filename" 
                  , sep= 
                  ";" 
                  , header=T, row.names=1, quote= 
                  "" 
                  )

轉換數據格式

數據讀入后，還需要一步格式轉換。在使用ggplot2作圖時，有一種長表格模式是最為常用的，尤其是數據不規則時，更應該使用。

 
                  # 如果包沒有安裝，運行下面一句，安裝包 
                 
                  #install.packages(c("reshape2","ggplot2","magrittr")) 
                 
                  library 
                  (reshape2) 
                 
                  library 
                  (ggplot2) 
                 
                  # 轉換前，先增加一列ID列，保存行名字 
                 
                  data$ID <-  
                  rownames 
                  (data) 
                 
                  # melt：把正常矩陣轉換為長表格模式的函數。工作原理是把全部的非id列的數值列轉為1列，命名為value；所有字符列轉為variable列。 
                 
                  # id.vars 列用於指定哪些列為id列；這些列不會被merge，會保留為完整一列。 
                 
                  data_m <-  
                  melt 
                  (data, id.vars= 
                  c 
                  ( 
                  "ID" 
                  )) 
                 
                  head 
                  (data_m)

ID variable value

1 Gene_1 Zygote 1.0

2 Gene_2 Zygote 6.0

3 Gene_3 Zygote 0.6

4 Gene_4 Zygote 0.1

5 Gene_5 Zygote 1.0

6 Gene_6 Zygote 6.0

7 Gene_7 Zygote 6.0

8 Gene_8 Zygote 1.0

9 Gene_1 2_cell 2.0

10 Gene_2 2_cell 5.0

11 Gene_3 2_cell 0.5

12 Gene_4 2_cell 0.2

13 Gene_5 2_cell 2.0

14 Gene_6 2_cell 5.0

15 Gene_7 2_cell 5.0

16 Gene_8 2_cell 2.0

分解繪圖

數據轉換后就可以畫圖了，分解命令如下：

 
                  # data_m: 是前面費了九牛二虎之力得到的數據表 
                 
                  # aes: aesthetic的縮寫，一般指定整體的X軸、Y軸、顏色、形狀、大小等 
                 
                  # 在最開始讀入數據時，一般只指定x和y，其它后續指定 
                 
                  p <- ggplot(data_m, aes(x=variable,y=ID))  
                 
                  # 熱圖就是一堆方塊根據其值賦予不同的顏色，所以這里使用fill=value, 用數值做填充色。 
                 
                  p <- p + geom_tile(aes(fill=value)) 
                 
                  # ggplot2為圖層繪制，一層層添加，存儲在p中，在輸出p的內容時才會出圖。 
                 
                  p 
                 
                  ## 如果你沒有使用Rstudio或其它R圖形版工具，而是在遠程登錄的服務器上運行的交互式R，需要輸入下面的語句，獲得輸出圖形（圖形存儲於R的工作目錄下的Rplots.pdf文件中）

熱圖出來了，但有點不對勁，橫軸重疊一起了。一個辦法是調整圖像的寬度，另一個是旋轉橫軸標記

 
                  # theme: 是處理圖美觀的一個函數，可以調整橫縱軸label的選擇、圖例的位置等 
                 
                  # 這里選擇X軸標簽45度。 
                 
                  # hjust和vjust調整標簽的相對位置，具體見  
                 
                  # 簡單說，hjust是水平的對齊方式，0為左，1為右，0.5居中，0-1之間可以取任意值。vjust是垂直對齊方式，0底對齊，1為頂對齊，0.5居中，0-1之間可以取任意值 
                 
                  p <- p + theme(axis.text.x=element_text(angle=45, hjust=1, vjust=1)) 
                 
                  p

設置想要的顏色

 
                   # 連續的數字，指定最小數值代表的顏色和最大數值賦予的顏色 
                  
                   # 注意fill和color的區別，fill是填充，color只針對邊緣 
                  
                   p <- p + scale_fill_gradient(low =  
                   "white" 
                   , high =  
                   "red" 
                   ) 
                  
                   p

調整legend的位置

 
                   # postion可以接受的值有 top, bottom, left, right, 和一個坐標 c(0.05,0.8) (左上角，坐標是相對於圖的左下角計算的） 
                  
                   p <- p + theme(legend.position= 
                   "top" 
                   )

調整背景和背景格線以及X軸、Y軸的標題(注意灰色的背景沒了)

 
                    p <- p + xlab( 
                    "samples" 
                    ) + theme_bw() + theme(panel.grid.major = element_blank()) + theme(legend.key=element_blank()) 
                   
                    p

合並以上命令，就得到了下面這個看似復雜的繪圖命令

 
                   p <- ggplot(data_m, aes(x=variable,y=ID)) + xlab( 
                   "samples" 
                   ) + theme_bw() + theme(panel.grid.major = element_blank()) + theme(legend.key=element_blank())  + theme(axis.text.x=element_text(angle=45,hjust=1, vjust=1)) + theme(legend.position= 
                   "top" 
                   ) +  geom_tile(aes(fill=value)) + scale_fill_gradient(low =  
                   "white" 
                   , high =  
                   "red" 
                   )

圖形存儲

圖形出來了，就得考慮存儲了

 
                   # 可以跟輸出文件不同的后綴，以獲得不同的輸出格式 
                  
                   # colormode支持srgb （屏幕）和cmyk （打印，部分雜志需要，看上去有點褪色的感覺）格式 
                  
                   ggsave(p, filename= 
                   "heatmap.pdf" 
                   , width=10, height=15,  
                   units 
                   =c( 
                   "cm" 
                   ),colormodel= 
                   "srgb" 
                   )

至此，完成了簡單的heatmap的繪圖。但實際繪制時，經常會碰到由於數值變化很大，導致顏色過於集中，使得圖的可讀性下降很多。因此需要對數據進行一些處理，具體的下次再說。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python 繪制熱圖用ggplot繪制熱力圖#R Python 繪制熱度圖（seaborn） ggplot2繪制概率密度圖 D3.js數據可視化（一）——繪制熱圖（heat map） ggplot2繪制GO富集分析柱狀圖一張圖繪制多組散點圖和折線圖(R:ggplot2) ggplot2 PCA散點圖繪制 ggplot2做qqplot圖 ggplot2繪制KEGG富集散點圖