PAML 選擇壓力的計算


  •  簡介

       PAML(Phylogenetic Analysis by Maximum Likelihood)是倫敦大學的楊子恆(Yang Ziheng)教 授開發的一套基於最大似然估計來對蛋白質和核酸序列進行系統發育分析的軟件,對學術使用是 免費的。楊子恆教授維護和發布 PAML 在 UNIX/Linux/MAC OS X 平台下的 ANSI C 的源程序和MS Windows 下的可執行文件。

PAML 可實現系統發育樹的構建,祖先序列估計,進化模擬和 KaKs 計算等功能。其中分支及 位點 KaKs 的計算是本軟件包的特色功能。

由此可以下載並且安裝

1 cd paml4.5/
2 rm bin/*.exe
3 cd src
4 make -f Makefile
5 ls -l
6 rm *.o
7 mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin

 

  •  實例

1、codeml

所需文件有2

1⃣️:CDS的fasta比對文件(一定是3的倍數,且去掉終止密碼子);

*一定是比對過后的文件,我是用clusw進行比對,進而輸出fasta格式,有說PAML可以識別phlylip格式,但是我的這個不行,不知道咋么個情況。

得到的fasta文件后,可以使用EasyCodeML 進行格式轉換,轉為paml格式(把U替換為T)。把fasta文件放入inPath運行如下命令即可。

 1 java -cp EasyCodeML.jar SeqFormatConvert.seqFactory.SeqConverter -i inPath/ -iF fasta -o outPath/ -oF PAML 

2⃣️:tree文件

使用phylip得到樹文件

將兩個文件放入PAML bin文件下即可

 

第一步:

配置文件的配置

打開codeml.ctl,其內容如下:

 1 seqfile = dm_cds.pml * sequence data filename  ##比對文件
 2      treefile = outtree      * tree structure file name##樹的文件
 3       outfile = dm2.mlc           * main result file name##輸出文件
 4 
 5         noisy = 9  * 0,1,2,3,9: how much rubbish on the screen
 6       verbose = 1  * 0: concise; 1: detailed, 2: too much
 7       runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic
 8                    * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise##利用自己的樹,選擇0
 9 
10       seqtype = 2  * 1:codons; 2:AAs; 3:codons-->AAs
11     CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
12 
13 *        ndata = 10
14         clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
15        aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
16    aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)
17                    * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own
18 
19         model = 0
20                    * models for codons:
21                        * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
22                    * models for AAs or codon-translated AAs:
23                        * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F
24                        * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)##選擇0表示假設所有樹具有相同seqfile = dm_cds.pml * sequence data filename
25      treefile = outtree      * tree structure file name
26       outfile = dm2.mlc           * main result file name
27 
28         noisy = 9  * 0,1,2,3,9: how much rubbish on the screen
29       verbose = 1  * 0: concise; 1: detailed, 2: too much
30       runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic
31                    * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise
32 
33       seqtype = 2  * 1:codons; 2:AAs; 3:codons-->AAs
34     CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
35 
36 *        ndata = 10
37         clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
38        aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
39    aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)
40                    * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own
41 
42         model = 0
43                    * models for codons:
44                        * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
45                    * models for AAs or codon-translated AAs:
46                        * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical
47                        * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)##選擇0表示假定樹所有的相同的dN/dS,選擇2表示一個樹具有2個或者2個以上的不同dN/dS

其中,比對文件格式如下:

5    4609
RPP4
-----ATGGCTTCTTCTTCTTCTTCTCCTAGTAGCCGGAGATACGACGTTTTCCCAAGCTTCAGTGGGGTAGATGTTCGCAAAACGTTCCTCAGCCATCTAATCGAGGCGCTCGACCGCAGATCAATCAATACA-TTCATGGATCACGGCATCG---TGAGAAGCTGCATAATCGCCGATGCGCTTATAACGGCCATTAGAGAAGCGAGGATCTCAATAGTCATCTTCTCTGAGA-ACTATGCTTCTTCAACGTGG------TGCTTGAATGAATTGGTGGAGATCCACAAG-TGTTACAAGAAAGGGGAACAAATGGTGATTCCGGTTTTCTACGGCGTTGATCCTTCTC-------ATGTTAGAAAACAGATCGGTGGCTTTGGCGATGTCTTTAAAAAGACATGCGAGGACA------------AACCAGAGGATCAGAAACAAAGA--TGGGTTAAAGCTCTCACAGATATATCAAATTTAGCCGGGGAGGATCTTCGGAACGGGCCTACTGAAGCG---TTTATGGTTAAAAAGATAGCCAATGATGTTTCGAATAAACTTT--TTCCTCTGCCAAAGGGTTTTGGTGACTTCGTCGGAATTGAAGATCATATAAAGGCAATAAAATC-AATACTTTGCTTGGAATC--CAAGGAAGCTAGAATAATGGTCGGGATTTGGGGACAGTCAGGGATTGGTAAGAGTACCATAGGAAG----AGCTCTTTTCAGTCAACTCTCTAGCCAGTTCCACCATCGCGCTTTCATAACTTATAAAAGCACCAGTGGTAGTGACGTCTCTGGCATGAAGTTGAGTTGGGAAAAAGAACTT----CTCTCGGAAATCTTAGGTCAAAAGGACATAAAGATAGATCATTTTGGTGTGG-TGGAGCAAAGGTTAAAGCACAA--GAAAGTTCTTATCCTTCTTGATGATGTGGATAATCTAGAGTTT---CTTAAGACCTTGGTGGGAAAAGCTGAATGGT---TTGGTTCTGGAA-GCAGAATAATTGTGATCACTCA----AGATAAGCAACTTCTCAAGGCTCATGAGATTGACCTTGTATATGAGGTG-GAGCTGCCATCTCAAGGTCTTGCTCTTAAGATGATATCCCAATATGCTTTTGGGAAAGACTCTCCACCTGATGATTTTAAGGAACTAGCATTTGAAGTTGCCGAGCTTGTCGGTAGTCTTCCTTTGGGTCTCAGTGT-----CT---TGGGTTCATCTTTA-AAAGGAAGG----GACAAAGATGAGTGGGTGAAGATGATGCCTAGGCTTCGAAATGATTCAGATGATAAAATTGAGGAAACACTAAGAGTCGGCTACG-ATAGGTTAAATAAAAAAAA--TAGAG--AGTTATTTAAGTGCATTGCATGTTTTTTCAATGGTTTTAAAGTC--------AGTAACGTCAAAGAATTACTTGAAGATGATGTTGGGCTTACAATGTTGGCTGATAAGTCCCTCATACGTATTACACCGGATGGAGATATAGAGATGCACAATTTGC---TAGAGAAATTGGGTAGAGAAATTGATCGTGCAAAGTCCAAGGGTAATCCTGCAAAACGTCAATTTCTGACGAATTTTGAGGATATTCAAGAAGTAGTGACCGAGAAAACTGGGACAGAAACTGTTCTTGGAATACGTGTGCCACCCACGGTATTATTTTCGACAAGG-CCGTTATTAGTAATAAACGAAGAATCGTTCAAAGGCATGCG---TAATCTCCAATATCTAGAAATTGGTCATTGGTCAGAAATTGGTCTTTGGTCAGAAATTGGTCTTTGGTCAAAAATAGATCTACCT--CAGGGCCTCGTTTATTTGCCCCTTAAACTCAAATTGCTA-AAATGGAATTATTGTCCATTGAAGTC--TTTGCCATCT--ACTTTTAAGGCGGAATATCTAG-TTAACCTCATAATGA-A-GTATAGTAAGCTTGAGAA--ACTGT--GGGAAGGAACTCTGCCCCTTGGAAG------TCTCAAGAAGATGGA--TTTGGGGTGTTCCAACAATTTGAAA-------GAAATTCCAGA---TCTTTCTTTAGCCAT-AAACCTCGAGGAATTAAATCTTTCTAAATGCGA-ATCTT---TGGTGACACT-TCCTTCCTCGAT-TCAGAATGCCATTAAACTGAGGACGTTATATTGTTCGGGGGTGCTATTAAT-AGATTTAAAATCATTAGAAGGCATGTGTAATCTCGAAT---ATCTATCAGTTGATTGGT---CAAGTATGGAAGGCACTCAAGGCCTCA--TTTACTTGCCACGTAAACTCAAAAGGCTATGGT---GGGATTATTGTC-----------CAGTGAAGCGTTTGCCTTCTAATTT------------TAAGGCTGAGTATCTAGTTGAACTCAGAATGGAGAATAGTGACCTTGAGAAGCTGTGGGATGGAACTCAGCCACTTGGAAGCCTCAAGGAGATGTA-------------TCTGCATGGTTCCAAATAT--TTGAAAGAAAT------TCCAGATCTTTCTTTAGCCATAAACCTGGAGAGACTATAT-CTTTTTGGATGCGAATCTTTGGTGACACTTCCTTCCTCGATTCAGAATGCCACTAAATTGATCAATTTAGA-TATGAGAGATTGCAAAAAGCTAGAGAGTTTTCCAACCGATCTCAACTTGGAATCTCTCGAGTACCTCAATCTCACTGGATGCCCGAATTTGAGAAATTTCCCAGCAATCAAAATG--GGATGTTCATACTTTGAAATTCTGCAAGATAGAAATGAGATCGAGGTAGAAGATTGT-TTCTGGAACAAG-AATCTCCCTGCTGGACTAGATTATCTCGACT-----GCCTTATGAGGTGTATGCCTTGTGAATTTCGCCCAGA----ATATCTCACTTTTCTCGATGTGAGCGGCTGCAA--GCATG--AGAAGCTATGGGAAGGCATCCAGTCGCTTGGAAG----------TCTCAAGAGGATGGATCTGTCAGAATCTGA--AAACCTGACAGAAATTCCAGATCTTTCGAAGGCCACCAATCTGAAGCGTTTATATCTCAACGGGTGCAAAAGTTTGGTGAC-ACTTCCTTCTACAATTGGGAATCTTCATAGATTGGTGAGGTTGGAAATGAAAGAATGCACAGGGCTGGAGCTTCTTCCAACCGATGTCAACTTGTCATCTCT------------------------------------------------------------TATCA------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TCCTCGATCTCAGTGGTTGCTCAAGTCTGAGAACTTTTCCTCTGATTTCAACTAGAATCGAATGTCTCTATCTAGAAAACACCGCCATTGAAGAAGTTCCCTGCTGCATTGAGGATTTAACGAGGCTCAGTGTACTACTGATGTATTGTTGCCAGAGGTTGAAAAACATCTCCCCAAACATTTTCAGACTGACAAGTCTAATGGTCGCCGACTTTACAGACTGTAGAGGTGTCATCAAGGCGTTGAGTGATGCAACTGTGGTAGCGACAATGGAAGATCATGTTTCTTGTGTACCATTATCTGAAAACATTGAATATACATGTGAACGTTTCTGGGATGAGTTGTATGAAAGAAAT--TCCAGATCTA--TCTTTAGCTATAAAGATGAGGATGGCGACGTATATTGGGTAAA------TTGGGACTTA----ATGATGATGCTG-ATGTTGATA------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
RPP5
-----ATGGCGGCTTCTTCTTCTT---CTGGCAGACGGAGATACGACGTTTTTCCAAGCTTCAGTGGGGTTGATGTTCGCAAGACGTTCCTCAGCCATCTAATCGAGGCGCTCGACGGCAAATCAATCAATACA-TTCATCGATCATGGAATCG---AGAGAAGCCGCACAATCGCCCCTGAGCTTATATCGGCGATTAGAGAAGCTAGGATCTCAAT

 

tree的格式如下:

(RPP5:0.05837,(RPP27:0.64437,(RPP8:0.51359,RPP13:0.22395):0.18814):0.42283,RPP4:0.04920);

 

第二部:

進行假設檢驗

0假設:一個樹所有的分支具有相同的dN/dS

備選假設:自定義一支具有不同的dN/dS (在tree文件中進行標注,在特定位置標出$1 或者#1,具體可看b站視頻

 

分別以model為0,以及model為2進行運行PAML,根據兩個文件的結果中Lnl 計算p值

譬如:model=0:     lnr1

  model=2: lnr2

abs(lnr1-lnr2)*2 = q 值,自由度df為np2 -np1, np 為參數的數量

利用PAML中的chi函數,即可得到p值

 1 ./chi df q 

 p值<0.05 則拒絕原假設,接受備選假設,即你認為的支確實dN/dS 有差異

 

 

 

關注下方公眾號可獲得更多精彩

 

參考

1、利用comdeml 計算dN/dS

2、paml計算dn/ds的小實例

3、下載安裝PAML

4、PAML中文說明文檔


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM