miRDeep2 學習及安裝篇


一、mirDeep2安裝
 
下載和解壓
wget http://mdc.helmholtz.de/38350089/en/research/research_teams/systems_biology_of_gene_regulatory_elements/projects/miRDeep/mirdeep2_0_0_5.zip
unzip mirdeep2_0_0_5.zip
 
如果用mirDeep2自帶的install.pl安裝會遇到下載的文件不存在的情況,比如bowtie
那么你需要自己安裝幾個軟件。解壓后的路徑下面有個README里面詳細介紹了如何自行安裝mirdeep2。不過有些細節需要修改。
首先,下載幾個必須的package,下載到/home/disk6/src路徑下,解壓也都在這個路徑下完成
 
(ps:所有附帶安裝軟件的網址,參照下載好的mirdeep2目錄下的README) 
 
bowtie                  #version 0.12.7
ViennaRNA-1.8.5.tar.gz
squid-1.9g.tar.gz
randfold-2.0.tar.gz
PDF-API2-0.73.tar.gz
perl                   #我的版本是 5.10.1
 
~~~~~~~~~~安裝bowtie
unzip bowtie-0.12.7-linux-x86_64.zip
解壓后就是可執行的二進制文件,不需要編譯,省心啊
把bowtie加入環境變量
 
~~~~~~~~~安裝ViennaRNA
tar -zxf ViennaRNA-1.8.5.tar.gz
cd ViennaRNA-1.8.5
./configure --prefix=/home/disk6/tools/ViennaRNA  #/home/disk6/tools/是我安裝軟件的路徑,我把常用的軟件都安裝到這里,或者建立ln -s到tools下面相應的目錄,然后一個個放到path中
make
make install
 
~~~~~~~~~安裝squid-1.9g.tar.gz和randfold-2.0.tar.gz
 
tar -zxf squid-1.9g.tar.gz
cd squid-1.9g
./configure --prefix=/home/disk6/tools/squid    #只有configure之后才有squid.h文件,這是下面的randfold2.0需要的文件
make
make install
 
tar -zxf randfold-2.0.tar.gz
cd randfold2.0
編輯Makefile文件,將INCLUDE=-I這一行替換為INCLUDE=-I. -I/home/disk6/src/squid-1.9g/ -L/home/disk6/src/squid-1.9g/
make
將randfold加入path
 
 
~~~~~~~~~~~~安裝PDF-API2-0.73.tar.gz
 
tar -zxf PDF-API2-0.73.tar.gz
cd PDF-API2-0.73
mkdir ../mirdeep2/lib/  #這個不能忘了,一開始就解壓了mirdeep2,在mirdeep2下面創建一個lib路徑
perl Makefile.PL PREFIX=/home/disk6/src/mirdeep2 LIB=/home/disk6/src/mirdeep2/lib
make
make test
make install   #至此,/home/disk6/src/mirdeep2/lib下面已經有了兩個目錄PDF和x86_64-linux-thread-multi
 
~~~~~~~~~~~~配置mirdeep2的perl5lib 就是那個PDF了
在~/.bash_profile里面加入
export PERL5LIB=PERL5LIB:/home/disk6/src/mirdeep2/lib
 
~~~~~~~~~測試所有安裝過的軟件是否正常
to test if everything is installed properly type in 
1) bowtie
2) RNAfold -h
3) randfold
4) make_html.pl
 
~~~~~~~~~~最后,在path中加入miRDeep2的路徑
 
 
 
二、mirDeep2介紹
 

miRDeep2的文件夾下面有自帶的tutorial,參考通過參考這個例子學習miRDeep2.

tutorial_dir文件夾里有下面幾個文件,.fa為fasta格式。

cel_cluster.fa:            #   研究物種的基因組文件  

mature_ref_this_species.fa:         #   研究物種的成熟miRNA文件,miRBase有下載

mature_ref_other_species.fa:        # 其他物種相關的成熟miRNA文件,miRBase有下載

precursors_ref_this_species.fa:     # 研究物種miRNA前體的文件,miRBase有下載

reads.fa:                           #   deep sequencing reads

~~~~~~~~~~第一步~~~~~~~~~

#  利用bowtie-build建立基因組文件的index

bowtie-build cel_cluster.fa cel_cluster      #   cel_cluster.fa是基因組文件,cel_cluster是index文件的

前綴,這個前綴可以是任意的

                                                                #   字符,不一定要和基因組文件相同。

~~~~~~~~~~第二步~~~~~~~~~

#  處理reads文件並且把它map到基因上


perl mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT  -l 18 -m -p cel_cluster -s

reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v

參數講解
-c 指出輸入文件是fasta格式,同類的參數還有-a(seq.txt format),-b(qseq.txt format),-e(fastq format),-d

(contig file)
-j 刪除不規范的字母(不規范的字母是指除a,c,g,t,u,n,A,C,G,T,U,N之外的字母)
-k 剪切接頭,后跟接頭序列,例子中的TCGTATGCCGTCTTCTGCTTGT就是接頭
-l 忽視小於某長度的序列,例子中忽視18nt長度的reads
-m collapses the reads
-p 將處理過的reads map到之前建立過索引的基因組上,例子中的cel_cluster
-s 指出將處理過的reads輸出到某個文件,例子中將處理過的reads輸出到reads_collapsed.fa
-t 指出將mapping的結果輸出到某個文件,例子中將mapping后的結果輸出到reads_collapsed_vs_genome.arf文件中
-v 在屏幕上顯示處理的動作,加v和不加v的區別見附注1,明顯看出來加v后屏幕不僅顯示了一個處理后的summary,而

且顯示了mapper的動作,如discarding,clipping,collapsing,trimming。不加v屏幕上只顯示一個summary

例子中未使用的參數
 處理/mapping參數
-g 給reads一個前綴,默認是seq。-s和-t兩個輸出文件中reads前面會多出seq三個字母。
-h parse to fasta format
-i 轉換rna成dna(再map到基因組)convert rna to dna alphabet (to map against genome)
-q 種子序列中一個錯配(mapping的時間會變長??)map with one mismatch in the seed (mapping takes

longer)
-r 允許在基因組上map到的最多的位置數,默認是5。也就是說最多map 5個位置
-u 不移除臨時文件的路徑
-n 覆蓋已有文件

 

~~~~~~~~~~第三步~~~~~~~~~
# fast quantitation of reads mapping to known miRBase precursors.

(This step is not required for

identification of known and novel miRNAs in the deep sequencing data when using miRDeep2.pl.)
快速定量reads mapping到已知的miRNA前體。利用miRDeep.pl在deep sequencing數據中鑒定已知和未知的miRNA,這

一步不是必須的。

quantifier.pl -p precursors_ref_this_species.fa -m mature_ref_this_species.fa -r reads_collapsed.fa

-t cel -y 16_19

參數講解
-p miRNA前體文件,miRBase可以下載

-m 成熟miRNA序列文件,miRBase可以下載

-r reads文件

-t 物種,可以指定某個物種,這樣分析的時候只考慮某個物種的數據。也可以不指定,分析所有的

-y [time]    optional otherwise its generating a new one

 

 


屏幕上顯示的結果
getting samples and corresponding read numbers

seq     374333 reads


Converting input files
building bowtie index
mapping mature sequences against index
# reads processed: 174
# reads with at least one reported alignment: 6 (3.45%)
# reads that failed to align: 168 (96.55%)
Reported 6 alignments to 1 output stream(s)
mapping read sequences against index
# reads processed: 1505
# reads with at least one reported alignment: 1088 (72.29%)
# reads that failed to align: 417 (27.71%)
Reported 1099 alignments to 1 output stream(s)
analyzing data

6 mature mappings to precursors

Expressed miRNAs are written to expression_analyses/expression_analyses_16_19/miRNA_expressed.csv
not expressed miRNAs are written to

expression_analyses/expression_analyses_16_19/miRNA_not_expressed.csv

Creating miRBase.mrd file

after READS READ IN thing

make_html2.pl -q expression_analyses/expression_analyses_16_19/miRBase.mrd -k

mature_ref_this_species.fa -z -t C.elegans -y 16_19  -o -i

expression_analyses/expression_analyses_16_19/mature_ref_this_species_mapped.arf  -l -m cel

miRNAs_expressed_all_samples_16_19.csv
miRNAs_expressed_all_samples_16_19.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files
creating pdf for cel-mir-39 finished
creating pdf for cel-mir-40 finished
creating pdf for cel-mir-37 finished
creating pdf for cel-mir-36 finished
creating pdf for cel-mir-38 finished
creating pdf for cel-mir-41 finished


#
得到幾個文件,expression_16_19.html,expression_analyses文件夾(里面有很多文件),

iRNAs_expressed_all_samples_16_19.csv
,pdfs_16_19文件夾

 

~~~~~~~~~~第四步~~~~~~~~~

#在deep sequencing data中鑒定已知和未知的miRNA

miRDeep2.pl reads_collapsed.fa cel_cluster.fa reads_collapsed_vs_genome.arf

mature_ref_this_species.fa mature_ref_other_species.fa precursors_ref_this_species.fa -t C.elegans

2> report.log


# reads_collapsed.fa是經過mapper.pl處理的reads。
# cel_cluster.fa是基因組文件
# reads_collapsed_vs_genome.arf mapping的結果
# mature_ref_this_species.fa研究物種的成熟miRNA文件,miRBase有下載
# mature_ref_other_species.fa其他物種相關的成熟miRNA文件,miRBase有下載
# precursors_ref_this_species.fa研究物種miRNA前體的文件,miRBase有下載
# 如果你只有reads,arf文件,genome文件,其他文件沒有,需要這樣表示miRNAs_ref/none miRNAs_other/none

precursors/none,本物種的成熟miRNA無,其他相關物種也無,更沒有前體。

參數說明
-t 物種
2> repot.log表示將所有的步驟輸出到report.log文件中

# 屏幕顯示

#####################################
#                                   #
# miRDeep2                          #
#                                   #
# last change: 07/07/2011           #
#                                   #
#####################################

miRDeep2 started at 19:44:43


#Starting miRDeep2
#testing input files
#Quantitation of known miRNAs in data
#parsing genome mappings
#excising precursors
#preparing signature
#folding precursors
#computing randfold p-values
#running miRDeep core algorithm
#running permuted controls
#doing survey of accuracy
#producing graphic results


miRDeep runtime:

started: 19:44:43
ended: 19:46:15
total:0h:1m:32s

 


~~~~~~~~~~第五步~~~~~~~~~

# 瀏覽結果

用瀏覽器打開.html文件
注意,cel-miR-37預測了兩次。因為這個位點的兩個潛在的前體可以折疊成發卡結構。然而,注釋的發卡結構得分遠遠

高於未注釋的發卡結構(miRDeep2 score 6.1e+4 vs. -0.2)


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~附注1~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

######加v###屏幕上輸出的結果如下####

discarding sequences with non-canonical letters
clipping 3' adapters
discarding short reads
collapsing reads
mapping reads to genome index
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends


######不加v###屏幕上輸出的結果如下####

# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)

~~~~~~~~~~~~~~附注1~~~~~~~~~~~~~~~~~~

 

 

原文地址:http://blog.sina.com.cn/s/blog_7cffd1400101m3i3.html    http://blog.sina.com.cn/s/blog_7cffd1400100twvb.html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM