利用plink軟件基於LD信息過濾SNP

本文轉載自查看原文 2019-12-11 21:21 1172 plink/ 生信

最近有需求，對WGS測序獲得SNP信息進行篩減，可問題是測序個體少，call rate，maf，hwe，等條件過濾后，snp數量還是千萬級別，所以后面利用plink工具根據LD信息來濾除大量SNP標記。

工具版本：PLINK v1.90b4.6 64-bit (15 Aug 2017)

一、格式轉換

首先將准備好的vcf文件轉換下格式，map和ped格式：

  1 plink --allow-extra-chr --recode --chr-set 18 --vcf test.gz --out s_vcf
  2 awk '{print $1"\t"$1"_"$4"\t"$3"\t"$4}' s_vcf.map >s1_vcf.map
  3 mv s_vcf.ped s1_vcf.ped

map文件第二列必須要有唯一標識，否則后面區分不了那些snp被剔除；此處awk命令將第二列替換為chr_pos形式，作snp位點名稱，如下圖所示：

二、LD過濾

這里我們主要使用 --indep-pairwise 參數，直接運行查看具體用法：

  1 plink --indep-pairwise --help
  2 PLINK v1.90b4.6 64-bit (15 Aug 2017)           www.cog-genomics.org/plink/1.9/
  3 (C) 2005-2017 Shaun Purcell, Christopher Chang   GNU General Public License v3
  4 --help present, ignoring other flags.
  5 
  6 --indep [window size]<kb> [step size (variant ct)] [VIF threshold]
  7 --indep-pairwise [window size]<kb> [step size (variant ct)] [r^2 threshold]
  8 --indep-pairphase [window size]<kb> [step size (variant ct)] [r^2 threshold]
  9   Generate a list of markers in approximate linkage equilibrium.  With the
 10   'kb' modifier, the window size is in kilobase instead of variant count
 11   units.  (Pre-'kb' space is optional, i.e. '--indep-pairwise 500 kb 5 0.5'
 12   and '--indep-pairwise 500kb 5 0.5' have the same effect.)
 13   Note that you need to rerun PLINK using --extract or --exclude on the
 14   .prune.in/.prune.out file to apply the list to another computation.
 15 
 16 --ld-xchr [code]   : Set Xchr model for --indep{-pairwise}, --r/--r2,
 17                      --flip-scan, and --show-tags.
 18                      1 (default) = males coded 0/1, females 0/1/2 (A1 dosage)
 19                      2 = males coded 0/2
 20                      3 = males coded 0/2, but females given double weighting

主要參數就三個，滑動窗口大小，步長，r方，r方越小濾除的位點就愈多；命令如下：

  1 plink --file s1_vcf --indep-pairwise 1000kb 1 0.5 --out ld

運行結束后產生prune.in，prune.out兩個文件，prune.in文件中包含的就是通過篩選條件我們需要的SNP位點。文件內容為map文件第二列snp名稱（唯一標識符）。

根據snp位置信息提取數據請參考另一篇博文：https://www.cnblogs.com/mmtinfo/p/11945592.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 plink: 計算多個SNP集的連鎖不平衡（ld） plink計算兩個SNP位點的連鎖不平衡值（LD） plink軟件計算snp位點的觀測雜合度和期待雜合度 SNP 過濾（二） SNP 過濾（一）利用phylip軟件對SNP數據構建進化樹 haploview畫出所有SNP的LD關系圖 plink合並文件並更新SNP位置（merge file, update SNP position） Plink 估計標間LD情況及其結果文件處理 plink提取指定樣本和指定SNP的數據（keep，extract函數）