基於ggpubr包為ggplot添加p值和顯著性標記

本文轉載自查看原文 2021-09-21 14:54 261 ggplot2

基於ggpubr包為ggplot添加p值和顯著性標記

這篇文章我們將講述

如何簡單比較兩組或多組的平均值
如何自動化為ggplot添加p值和顯著性標記，包括箱線圖、點圖、條形圖、線圖等等

准備

安裝和導入所需要的R包

需要R包ggpubr，版本>0.1.3，該包提供了基於ggplot2包的論文發表級繪圖。

從CRAN安裝：

install.packages("ggpubr")

或者從GitHub上下載最新的開發版本：

if(!require(devtools)) install.packages("devtools") devtools::install_github("kassambara/ggpubr")

載入ggpubr：

library(ggpubr) #> Loading required package: ggplot2

ggpubr的官方文檔在http://www.sthda.com/english/rpkgs/ggpubr ## 樣例數據集

數據：ToothGrowth數據集

data("ToothGrowth") head(ToothGrowth) #> len supp dose #> 1 4.2 VC 0.5 #> 2 11.5 VC 0.5 #> 3 7.3 VC 0.5 #> 4 5.8 VC 0.5 #> 5 6.4 VC 0.5 #> 6 10.0 VC 0.5

比較均值的方法

http://www.sthda.com/english/wiki/comparing-means-in-r包含了均值方法比較的詳細描述，這里我們匯總常見的均值比較方法：

方法	R 函數	描述
T檢驗	t.test()	比較兩組 (參數)
Wilcoxon檢驗	wilcox.test()	比較兩組 (非參數)
ANOVA	aov() or anova()	比較多組 (參數)
Kruskal-Wallis	kruskal.test()	比較多組 (非參數)

添加p值的函數

這里我們展示ggpubr包中可以使用的用於添加p值的R函數：

compare_means()
stat_compare_means()

compare_mean()

下一部分我們將實際學習使用，這里先具體介紹一下這個函數：

compare_means(formula, data, method="wilcox.test", paired=FALSE, group.by=NULL, ref.group = NULL, ...)

formula: x~group形式的公式，x是一個數值向量，group是有1個或者多個組別的因子。比如formula = TP53 ~ cancer_group。也可以使用多個響應變量，比如formula = c(TP53, PTEN) ~ cancer_group。

data: 包含變量的數據框

method: 檢驗的類型。默認是wilcox.test。允許的值包括：

t.test和wilcox.test。

anova和kruskal.test。

paired: 邏輯值，是否執行配對檢驗。僅能用於t.test和wilcox.test。

group.by: 在執行檢驗前對數據集進行分組的變量。指定后，會根據該變量分不同子集進行檢驗。

ref.group: 一個字符串，指定參考組。指定后，對於一個給定分組變量，每個分組水平都會和參考組進行比較。ref.group可以使用.all，這會對所有組別基於一個全局的均值進行兩兩比較。 ## stat_compare_means()

這個函數擴展了ggplot2，可以對指定ggplot圖形添加均值比較的p值。

簡單形式如下：

stat_compare_means(mapping = NULL, comparisons = NULL, hide.ns = FALSE, label = NULL, label.x = NULL, label.y = NULL)

mapping: 由aes()創建的映射集合

comparisons: 一個長度為2的向量列表。向量中元素都是x軸的兩個名字或者2個對於感興趣，要進行比較的整數索引

hide.ns: 邏輯值，如果TRUE，隱藏不顯著標記ns

label: 指定標簽類型的字符串。允許值包括p.signif（顯示顯著性水平），p.format（顯示p值）

label.x, label.y: 數值。用於擺放標簽位置的坐標，如果太短，會循環重復。

…: 其他傳入compare_means()的參數，例如method,paired,ref.group

比較兩個獨立組別

執行檢驗：

compare_means(len ~ supp, data = ToothGrowth)
#> # A tibble: 1 x 8 #> .y. group1 group2 p p.adj p.format p.signif method #> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> #> 1 len OJ VC 0.0645 0.064 0.064 ns Wilcoxon

默認執行method="wilcox.test"，你可以指定method = "t.test"進行t檢驗。返回一個包含下面列的數據框：

.y: 用於檢驗的y變量
p: p值
p.adj: 矯正p值，默認矯正方法p.adjust.method="holm"
p.format: 格式化p值
p.signif: 顯著性水平
方法：用於比較組別的統計檢驗

創建一個帶p值的箱線圖：

p <- ggboxplot(ToothGrowth, x="supp", y = "len", color = "supp", palette = "jco", add = "jitter") # 添加p值 p + stat_compare_means()

# 改變方法 p + stat_compare_means(method = "t.test")

注意，p值標簽位置可以使用label.x, label.y, hjust, vjust參數進行矯正。

默認p值標簽顯示是將compare_means()返回數據框的method和p列粘連到一起。你可以使用aes()函數進行組合。

比如，

aes(label = ..p.format..) 或者 aes(label = paste0("p = ", ..p.format..)): 顯示p值但不顯示方法
aes(label = ..p.signif..): 僅顯示顯著性水平
aes(label = paste0(..method.., "\n", "p =", ..p.format..)) 使用在方法和p值中插入換行符

實際操作看看：

p + stat_compare_means(aes(label = ..p.signif..),
                       label.x = 1.5, label.y = 40)

也可以使用字符串指定標簽：

p + stat_compare_means(label = "p.signif", label.x = 1.5, label.y = 40)

比較兩組配對樣本

執行檢驗：

compare_means(len ~ supp, data = ToothGrowth,
              paired = TRUE) #> # A tibble: 1 x 8 #> .y. group1 group2 p p.adj p.format p.signif method #> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> #> 1 len OJ VC 0.00431 0.0043 0.0043 ** Wilcoxon

使用ggpaired()函數可視化：

ggpaired(ToothGrowth, x="supp", y="len", color="supp", line.color="gray", line.size=0.4, palette = "jco") + stat_compare_means(paired = TRUE)

比較多組樣本

全局檢驗

compare_means(len ~ dose, data = ToothGrowth,
              method = "anova") #> # A tibble: 1 x 6 #> .y. p p.adj p.format p.signif method #> <chr> <dbl> <dbl> <chr> <chr> <chr> #> 1 len 9.53e-16 9.50e-16 9.5e-16 **** Anova

用全局p值畫圖：

# 默認方法為 method = "kruskal.test" ggboxplot(ToothGrowth, x = "dose", y = "len", color = "dose", palette = "jco") + stat_compare_means()

# 將方法改為anova ggboxplot(ToothGrowth, x = "dose", y = "len", color = "dose", palette = "jco") + stat_compare_means(method = "anova")

成對比較，如果分組變量包含超過兩個水平，配對檢驗會自動執行。

# 執行成對比較 compare_means(len ~ dose, data = ToothGrowth) #> # A tibble: 3 x 8 #> .y. group1 group2 p p.adj p.format p.signif method #> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> #> 1 len 0.5 1 0.00000702 0.000014 7.0e-06 **** Wilcoxon #> 2 len 0.5 2 0.0000000841 0.00000025 8.4e-08 **** Wilcoxon #> 3 len 1 2 0.000177 0.00018 0.00018 *** Wilcoxon

# 可視化： 指定你想要比較的組別 my_comparisons <- list(c("0.5","1"), c("1", "2"), c("0.5", "2")) ggboxplot(ToothGrowth, x="dose", y="len", color="dose", palette = "jco") + stat_compare_means(comparisons = my_comparisons) + #添加成對p值 stat_compare_means(label.y = 50) # 添加全局p值 #> Warning in wilcox.test.default(c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, : #> cannot compute exact p-value with ties #> Warning in wilcox.test.default(c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, : #> cannot compute exact p-value with ties #> Warning in wilcox.test.default(c(16.5, 16.5, 15.2, 17.3, 22.5, 17.3, 13.6, : #> cannot compute exact p-value with ties

如果你想要指定精確的y軸位置，使用label.y參數：

ggboxplot(ToothGrowth, x="dose", y="len", color="dose", palette = "jco") + stat_compare_means(comparisons = my_comparisons, label.y=c(29, 35, 40)) + stat_compare_means(label.y = 45) #> Warning in wilcox.test.default(c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, : #> cannot compute exact p-value with ties #> Warning in wilcox.test.default(c(4.2, 11.5, 7.3, 5.8, 6.4, 10, 11.2, 11.2, : #> cannot compute exact p-value with ties #> Warning in wilcox.test.default(c(16.5, 16.5, 15.2, 17.3, 22.5, 17.3, 13.6, : #> cannot compute exact p-value with ties

基於參考組的多重成對比較

# 基於參考組 compare_means(len ~ dose, data = ToothGrowth, ref.group = "0.5", method = "t.test") #> # A tibble: 2 x 8 #> .y. group1 group2 p p.adj p.format p.signif method #> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> #> 1 len 0.5 1 1.27e- 7 1.30e- 7 1.3e-07 **** T-test #> 2 len 0.5 2 4.40e-14 8.80e-14 4.4e-14 **** T-test

# 可視化 ggboxplot(ToothGrowth, x="dose", y="len", color="dose", palette = "jco") + stat_compare_means(method="anova", label.y=40) + stat_compare_means(label="p.signif", method="t.test", ref.group = "0.5")

以所有值為基准(base-mean)進行多個成對比較

如果出現很多組別，兩兩比較過於復雜，通過將所有數據匯總創建一個虛擬的樣本，以它為基准進行比較。

# Comparison of each group against base-mean compare_means(len ~ dose, data = ToothGrowth, ref.group = ".all.", method = "t.test") #> # A tibble: 3 x 8 #> .y. group1 group2 p p.adj p.format p.signif method #> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> #> 1 len .all. 0.5 0.000000290 0.00000087 2.9e-07 **** T-test #> 2 len .all. 1 0.512 0.51 0.51 ns T-test #> 3 len .all. 2 0.000000425 0.00000087 4.3e-07 **** T-test

可視化

ggboxplot(ToothGrowth, x = "dose", y = "len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova", label.y = 40)+ # Add global p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.") # Pairwise comparison against all

這個方法有時候會非常有用，比如下面這個例子中，我們可以通過每個樣本均值與所有樣本的均值進行比較，判斷基因水平是過表達還是下調了。

# Load myeloma data from GitHub myeloma <- read.delim("https://raw.githubusercontent.com/kassambara/data/master/myeloma.txt") # 執行檢驗 compare_means(DEPDC1 ~ molecular_group, data = myeloma, ref.group = ".all.", method = "t.test") #> # A tibble: 7 x 8 #> .y. group1 group2 p p.adj p.format p.signif method #> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> #> 1 DEPDC1 .all. Cyclin D-1 0.288 1.00e+0 0.29 ns T-test #> 2 DEPDC1 .all. Cyclin D-2 0.424 1.00e+0 0.42 ns T-test #> 3 DEPDC1 .all. MMSET 0.578 1.00e+0 0.58 ns T-test #> 4 DEPDC1 .all. MAF 0.254 1.00e+0 0.25 ns T-test #> 5 DEPDC1 .all. Hyperdiploid 0.0000000273 1.90e-7 2.7e-08 **** T-test #> 6 DEPDC1 .all. Proliferation 0.0000239 1.20e-4 2.4e-05 **** T-test #> 7 DEPDC1 .all. Low bone disease 0.00000526 3.20e-5 5.3e-06 **** T-test

可視化表達譜

ggboxplot(myeloma, x="molecular_group", y="DEPDC1", color="molecular_group", add="jitter", legend="none") + rotate_x_text(angle = 45) + geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2) + # 添加base mean的水平線 stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.") # Pairwise comparison against all

多個分組變量

根據某個變量分組后兩個獨立樣本的比較

執行檢驗：

compare_means(len ~ supp, data = ToothGrowth,
              group.by = "dose") #> # A tibble: 3 x 9 #> dose .y. group1 group2 p p.adj p.format p.signif method #> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> #> 1 0.5 len OJ VC 0.0232 0.046 0.023 * Wilcoxon #> 2 1 len OJ VC 0.00403 0.012 0.004 ** Wilcoxon #> 3 2 len OJ VC 1 1 1.000 ns Wilcoxon

因為生成了不同的子圖，根據變量分面

# 根據 "dose" 變量分面繪制箱線圖 p <- ggboxplot(ToothGrowth, x = "supp", y = "len", color = "supp", palette = "jco", add = "jitter", facet.by = "dose", short.panel.labs = FALSE) # Use only p.format as label. Remove method name. p + stat_compare_means(label = "p.format")

# Or use significance symbol as label p + stat_compare_means(label = "p.signif", label.x = 1.5)

將這幾個圖繪制在單個面板內：

p <- ggboxplot(ToothGrowth, x = "dose", y = "len", color = "supp", palette = "jco", add = "jitter") p + stat_compare_means(aes(group = supp))

# 僅顯示p值 p + stat_compare_means(aes(group = supp), label = "p.format")

# 使用顯著性標記 p + stat_compare_means(aes(group = supp), label = "p.signif")

分組后配對樣本比較

compare_means(len ~ supp, data = ToothGrowth,
              group.by = "dose", paired = TRUE) #> # A tibble: 3 x 9 #> dose .y. group1 group2 p p.adj p.format p.signif method #> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> #> 1 0.5 len OJ VC 0.0330 0.066 0.033 * Wilcoxon #> 2 1 len OJ VC 0.0137 0.041 0.014 * Wilcoxon #> 3 2 len OJ VC 1 1 1.000 ns Wilcoxon

可視化，按分組變量dose分面創建一個多面板箱線圖：

p <- ggpaired(ToothGrowth, x="supp", y="len", color="supp", palette = "jco", line.color = "grey", line.size =0.4, facet.by = "dose", short.panel.labs = FALSE) # Use only p.format as label. Remove method name. p + stat_compare_means(label = "p.format", paired = TRUE)

其他圖形

條形圖和線圖（一組變量）

# 條形圖加均值標准誤 ggbarplot(ToothGrowth, x = "dose", y = "len", add = "mean_se")+ stat_compare_means() + # Global p-value stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29)) # compare to ref.group

# 線圖加均值標准誤 ggline(ToothGrowth, x = "dose", y = "len", add = "mean_se")+ stat_compare_means() + # Global p-value stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))

條形圖和線圖（兩組變量）

ggbarplot(ToothGrowth, x = "dose", y = "len", add = "mean_se", color = "supp", palette = "jco", position = position_dodge(0.8))+ stat_compare_means(aes(group = supp), label = "p.signif", label.y = 29)

ggline(ToothGrowth, x = "dose", y = "len", add = "mean_se", color = "supp", palette = "jco")+ stat_compare_means(aes(group = supp), label = "p.signif", label.y = c(16, 25, 29))

本文轉載自：https://shixiangwang.github.io/home/cn/post/ggpubr-add-pvalue-and-siglevels/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 R語言添加p-value和顯著性標記 ggplot的boxplot添加顯著性 | Add P-values and Significance Levels to ggplots | 方差分析 P值與significant（顯著性）的理解 R語言可視化學習筆記之添加p-value和顯著性標記 ggplot2+ggsignif繪制箱圖及添加顯著性統計學中的P值與顯著性的意義 P值(P-value)，“差異具有顯著性”和“具有顯著差異” 顯著性檢驗統計4：顯著性檢驗關於圖像的顯著性檢測