這是一個進化學上的概念,基因組的序列是不斷進化而來的,根據45個脊椎動物的基因組序列,通過多重比對,我們就可以知道人類基因組上每個位置的保守性,一些高度保守的區域可以做非常有意思的下游分析。
This directory contains compressed phastCons scores for multiple alignments of 45 vertebrate genomes to the human genome, plus an alternate set of scores for the primates subset of species in the alignments, and an alternate set of scores for the placental mammal subset of species in the alignments.
下載路徑:http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons46way/primates/
chr1.phastCons46way.primates.wigFix的文件內容:
這個文件非常不好處理,需要轉化為bed格式,參考鏈接:Sequence conservation in vertebrates
fixedStep chrom=chr1 start=10918 step=1 0.254 0.253 0.251 0.249 0.247 0.244 0.242 0.239 0.236 0.233 0.230 0.226 0.223 0.219 0.215 0.210
批量下載文件:
for i in `seq 1 22` do echo $i wget http://hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons46way/primates/chr${i}.phastCons46way.primates.wigFix.gz done
配套文件
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes
格式轉化
#convert to bigWig for file in `ls *.gz`; do base=`basename $file .wigFix.gz`; echo $file; ./wigToBigWig $file hg19.chrom.sizes ${base}.bw; done #convert to bedGraph for file in `ls *.bw`; do base=`basename $file .bw`; echo $file; ./bigWigToBedGraph $file $base.bedGraph; done # rm *.bw *.wigFix.gz
然后就用bedtools來操作
求指定區域的score
附錄:
工具下載地址:
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
待續