TransDecoder尋找和預測ORF

本文轉載自查看原文 2018-02-01 14:58 2489 轉錄組

1. 軟件下載和安裝

下載網址：https://github.com/TransDecoder/TransDecoder/releases

在當前安裝目錄下輸入make進行編譯。

$ make

2. 使用

從fasta格式文件預測編碼區：

Step 1: 提取最長的開放閱讀框

$ TransDecoder.LongOrfs -t target_transcripts.fasta

Step 2: (可選),BlastP搜索和Pfam搜索

BlastP搜索：蛋白庫搜索， Swissprot (快) or Uniref90 (慢 but more comprehensive)

$ blastp -query transdecoder_dir/longest_orfs.pep -db uniprot_sprot.fasta -max_target_seqs 1 -outfmt 6 -evalue 1e-5 -num_threads 10 > blastp.outfmt6

Pfam搜索：肽或蛋白域預測，需要安裝hmmer3和Pfam數據庫

$ hmmscan --cpu 8 --domtblout pfam.domtblout /path/to/Pfam-A.hmm transdecoder_dir/longest_orfs.pep

Step 3: 將Blast和Pfam搜索結果整合到編碼區域選擇

$ TransDecoder.Predict -t target_transcripts.fasta --retain_pfam_hits pfam.domtblout --retain_blastp_hits blastp.outfmt6

3. 結果文件

longest_orfs.pep : 最長標准的ORF, 不管是否編碼

longest_orfs.gff3 : 在轉錄本中發現的所有ORF的位置

longest_orfs.cds : 所有檢測到ORF的核酸編碼序列

longest_orfs.cds.top_500_longest : 前500個最長的ORF，用於訓練一個編碼序列的馬爾科夫模型

hexamer.scores : 每個k-mer的對數似然得分 (coding/random)

longest_orfs.cds.scores : 每個ORF同6個閱讀框間對數似然得分的總和

longest_orfs.cds.scores.selected : 根據得分標准所選出的ORF

longest_orfs.cds.best_candidates.gff3 : 轉錄本中選出的ORF的位置

transcripts.fasta.transdecoder.pep : 最終候選ORF的蛋白質序列；所有較長ORF中的較短的候選序列已被移除。

transcripts.fasta.transdecoder.cds : 最終候選ORF的編碼區的核酸序列。

transcripts.fasta.transdecoder.gff3 : 最終被選中的ORF在目的轉錄本中的位置

transcripts.fasta.transdecoder.bed : 用來描述ORF位置的bed格式文件，最好用GenomeView或IGV來查看。

4.IGV查看

在目的轉錄組中查看ORF預測結果

$ java -jar $GENOMEVIEW/genomeview.jar transcripts.fasta transcripts.fasta.transdecoder.bed

在基因組中查看ORF

$ java -jar $GENOMEVIEW/genomeview.jar test.genome.fasta transcripts.bed transcripts.fasta.transdecoder.genome.bed

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 DNA sequence open reading frames (ORFs) | DNA序列的開放閱讀框ORF預測開放閱讀框架(ORF) ORF(開放閱讀框) 尋找素數對尋找SqlHelper 動態預測與靜態預測預測方法——馬爾可夫預測 C++尋找鞍點尋找“最好”（8）——牛頓法如何高效尋找素數