一、MCSCAN
參考 :http://chibba.pgml.uga.edu/mcscan2/MCScanX.zip http://chibba.pgml.uga.edu/mcscan2/#tm
安裝 :unzip MCscanX.zip && cd MCScanX && make
安裝報錯:
報錯解決:
這個錯誤的原因是,MCScanX 不支持64位系統。如果要在 64位上運行,需要修改下源代碼。只需要給 msa.h, dissect_multiple_alignment.h, and detect_collinear_tandem_arrays.h 這三個文件 前面添加 #include <unistd.h>
1.1 准備 *.blast
/export/software/python-2.7.13/bin/python /home/fanjp/bin/gffStat.py -g A.hypogaea.Chrom.gene.gff3 ##提取最長轉錄本
getGene.pl gffStat.out/A.hypogaea.Chrom.gene.gff3.longest.gff3 A.hypogaea.genome.fasta >A.hypogaea.genome.fasta.cds
/export/personal/zoum/bin/cds2aa.pl A.hypogaea.genome.fasta.cds >A.hypogaea.genome.fasta.pep
makeblastdb -dbtype prot -parse_seqids -in A.hypogaea.genome.fasta.pep -out A.hypogaea.genome.fasta.pep ##建庫,物種間共線性
blastp -query A.hypogaea.genome.fasta.pep -db A.hypogaea.genome.fasta.pep -out Dr_An.blast -evalue 1e-5 -num_threads 16 -outfmt 6 -num_alignments 5 ##Dr_An.blast
1.2 准備 *.gff
perl -lane 'if($F[2]=~/mRNA/){/ID=(.*?)\;/;print join("\t",$F[0],$1,$F[3],$F[4])}' ../A.nigrocauda/A.nigrocauda.final.gff.longest.new.gff3 >Dr_An.gff ##基於最長轉錄本提取gff, 獲得Dr_An.gff
perl -lane 'if($F[2]=~/mRNA/){/ID=(.*?)\;/;print join("\t",$F[0],$1,$F[3],$F[4])}' ../D.rerio/D.rerio.gff >>Dr_An.gff
1.3 共線性分析
sort -nk1 A.hypogaea.genome.fasta.fai|perl -lane 'BEGIN{$a=0}{print join("\t","chr","-",$F[0],$F[0],"0",$F[1],"chr".$a);$a++}' >chr.txt
/export/personal1/mengmh/1.software/MCScanX/MCScanX/MCScanX Dr_An ## Dr_An.collinearity Dr_An.html
perl /share/erapool/personal/renpp/biosoft/circos/script/convert_McScanX_to_links.pl -i1 Dr_An.gff -i2 Dr_An.collinearity >links.txt ##結果如下:
perl -lane 'print join("\t",@F,"color=".lc($F[3]))' links.txt ##增加第七列顏色
二、circos
參考 :http://circos.ca/software/download/ http://circos.ca/documentation/tutorials/ https://www.jianshu.com/p/17117766573a http://blog.sina.com.cn/s/blog_485b444b0102whp4.html https://www.jianshu.com/p/e7ebb8f0100c
安裝 :tar xf circos-0.69-9.tgz -C ./ && ./circos-0.69-9/bin/circos -h && circos -modules
/share/erapool/personal/renpp/biosoft/circos/circos-0.69-9/bin/circos -h
用法:
source /share/erapool/personal/renpp/.bashrc
circos -conf circos.conf
2.1 circos流程 和 主要的配置 (1.配置文件 2、輸入文件)
2.2 基本的circos配置文件 變量
karyotype = data/karyotype/karyotype.human.txt
<ideogram>
<spacing>
default = 0.005r
</spacing>
radius = 0.90r
thickness = 20p
fill = yes
stroke_color = dgrey
stroke_thickness = 2p
</ideogram>
<image>
<<include etc/image.conf>>
</image>
<<include etc/colors_fonts_patterns.conf>> ###colors.ucsc.conf 1500p
<<include etc/housekeeping.conf>>
2.3 karyotype:共有7列:chr - ID LABEL START END COLOR
perl -lane '{$a+=1;print join("\t","chr","-",$F[0],$F[0],"0",$F[1],"chr$a")}' ../0HWJHB.final_Chr.fasta.fai|head -24 >chr.txt
perl -lane '{$a+=1;print join("\t","chr","-",$F[0],$a,"0",$F[1],"chr$a")}' ../3D.rerio.fna.fai |less|head -25 >>chr.txt
karyotype = ./chr.txt
chromosomes_units = 2000000
chromosomes_display_default = yes
chromosomes = -ContigUN;
#chromosomes = hs1;hs2;hs3;-hs4;hs5:1-100;-hs6:50-);/hs[7-9]$/
#chromosomes_reverse = hs2;hs3;/hs[234]/
#chromosomes_order = hs1;hs3;hs2;^,hs5;hs4,$
#chromosomes_colors = hs1=red,hs2=orange,hs3=green,hs4=blue
#chromosomes_radius = hs1:0.5r;hs2:0.55r;hs3:0.6r
#chromosomes_scale = /hs[234]/=0.5rn
2.4 ideogram
<ideogram>
<spacing>
default = 0.005r
<pairwise Chr01;Chr02>
spacing = 4r
</pairwise>
</spacing>
#position configuration
radius = 0.80r
thickness = 20p
fill = yes
#fill_color = black
stroke_thickness = 3
stroke_color = dgrey
#label configuration
show_label = yes
label_font = default
label_radius = dims(ideogram,radius) + 0.065r
label_size = 30
label_parallel = yes
#band configuration
# show_bands = yes
# fill_bands = yes
# band_stroke_thickness = 2
# band_stroke_color = white
# band_transparency = 0
</ideogram>
2.3 ticks
show_ticks = yes
show_tick_labels = yes
<ticks>
radius = dims(ideogram,radius_outer)
color = black
thickness = 2p
multiplier = 1e-6
format = %d
<tick>
spacing = 1u
size = 10p
color = lgrey
show_label = no
</tick>
<tick>
spacing = 5u
size = 15p
show_label = yes
label_size = 20p
label_offset = 10p
format = %d
</tick>
</ticks>
2.4 plots
<plots>
<plot>
type = histogram
file = ./chr1.txt.11
r0 = 0.70r
r1 = 0.75r
max = 1
min = 0
orientation = out
fill_color = blue ##line heatmap histogram
</plot>
<plot>
type = heatmap
file = ./chr1.txt.12
r0 = 0.60r
r1 = 0.65r
max = 1
min = 0
color = yellow
</plot>
<plot>
type = scatter ##scatter line heatmap
file = ./chr1.txt.13
r0 = 0.50r
r1 = 0.55r
max = 1
min = 0
fill_color = black
stroke_color = black
</plot>
</plots>
2.5 links.conf
示例1:hs1 400 550 hs3 500 750 color=red ##1、共有7列定義不同染色體links的染色
<links>
<link>
file = ref/MCSCAN/Dr_An_links.txt
radius = 0.90r
bezier_radius = 0r
color = black_a4
thickness = 2
</link>
</links>
示例2:hs1 400 550 hs3 500 750 ##共有六列
<rules>
<rule>
condition = var(intrachr) ##染色體內的 links 對不顯示
show = no
</rule>
<rule>
condition = 1
color = var(chr2) #2、所有links的顏色都是第二條染色體的顏色,也就是結束一端的顏色。如果寫成color = var(chr2)那所有的線就是都是2號染色體的顏色了。前提的是:顏色的配置文件中要有自定義好的別名(如chr1 chr2 等),組型文件中至少有一個是染色體提用到別名 etc/colors.ucsc.conf 3、通過列表獲得:https://www.jianshu.com/p/3fd9175abad0
flow = continue
</rule>
<rule>
condition = between(hs1,hs2) ###兩個染色體間的 links 顯示的染色
color = green
z = 10
flow = continue
</rule>
<rule>
condition = between(hs2,hs3)
color = blue
thickness = 4
z = 15
</rule>
</rule>
condition支持的函數
:
1、value 獲取指定字段的值,var返回值可以分成以下3類 字符串,數字,邏輯值
CHRn var(chr1)
, var(chr2) #字符串
STARTn var(start1)
, var(start2)
SIZEn var(size1)
, var(size2) #數字
INTERCHR var(interchr) 如果一個link連接的兩個區域位於兩條染色體上,返回值為1 #邏輯值
INTRACHR var(intrachr) 如果一個link連接的兩個區域位於同一條染色體上,返回值為1
2、between
condition = between(hs1, hs2)
http://circos.ca/documentation/tutorials/links/rules2/ https://www.jianshu.com/p/3fd9175abad0 ##links