畫曼哈頓圖和QQ plot 首推R包“qqman”,簡約方便。下面具體介紹以下。
一、畫曼哈頓圖
install.packages("qqman") library(qqman)
1、准備包含SNP, CHR, BP, P的文件gwasResults(如果沒有zscore可以不用管),如下所示:
2、上代碼,如下所示:
manhattan(gwasResults)
如果覺得不夠美觀,考慮添加一下參數:
manhattan(gwasResults, main = "Manhattan Plot", ylim = c(0, 10), cex = 0.6, cex.axis = 0.9, col = c("blue4", "orange3"), suggestiveline = F, genomewideline = 6, chrlabs = c(1:20, "P", "Q"))
二、畫 QQ plot 圖
直接上代碼:
qq(gwasResults$P)
同樣的,還可以修改參數,美觀一下:
qq(gwasResults$P, main = "Q-Q plot of GWAS p-values", xlim = c(0, 7), ylim = c(0, 12), pch = 18, col = "blue4", cex = 1.5, las = 1)
三、計算膨脹系數(Calculating Genomic Inflation Factor)
如果數據類型為Z值,則膨脹系數為:
z = gwasResults$zscore lambda = round(median(z^2) / 0.454, 3)
如果數據類型是P值,則膨脹系數為:
p_value=gwasResults$P z = qnorm(p_value/ 2) lambda = round(median(z^2, na.rm = TRUE) / 0.454, 3)
如果數據類型是CHISQ值,則膨脹系數為:
z = gwasResults$CHISQ lambda = round(median(z^2, na.rm = TRUE) / 0.454, 3)
#關於0.454的由來:
#qchisq(0.5, 1)
#[1] 0.4549364
膨脹系數lambda的解讀:
基因組膨脹因子λ定義為經驗觀察到的檢驗統計分布與預期中位數的中值之比,從而量化了因大量膨脹而造成結果的假陽性率。換句話說,λ定義為得到的卡方檢驗統計量的中值除以卡方分布的預期中值。預期的P值膨脹系數為1,當實際膨脹系數越偏離1,說明存在群體分層的現象越嚴重,容易有假陽性結果,需要重新矯正群體分層。
參考鏈接:
https://www.biostars.org/p/43328/
https://cran.r-project.org/web/packages/qqman/vignettes/qqman.html
https://en.wikipedia.org/wiki/Population_stratification
chrome-extension://cdonnmffkdaoajfknoeeecmchibpmkmg/static/pdf/web/viewer.html?file=https%3A%2F%2Fpersonal.broadinstitute.org%2Fchrisnc%2Fothers%2FGWAS_Smith_MethMolBiol09.pdf
chrome-extension://cdonnmffkdaoajfknoeeecmchibpmkmg/static/pdf/web/viewer.html?file=http%3A%2F%2Fwww3.stat.sinica.edu.tw%2Fsisg2015%2Fdownload%2Fhandout%2FS1-notes.pdf