PASA安裝


PASA的安裝,配置與主程序使用參數 
1. PASA簡介

PASA, acronym for Program to Assemble Spliced Alignments, is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments. Note: Combine genome and Trinity de novo RNA-Seq assemblies to generate a comprehensive transcript database. 2. PASA使用前的准備  2.1 Mysql數據庫的准備  創建只讀權限用戶和所有權限用戶各一個。  mysql> GRANT SELECT ON *.* TO 'pasa'@'%' IDENTIFIED BY '123456'; mysql> GRANT ALL ON *.* TO 'chenlianfu'@'%' IDENTIFIED BY '123456'; mysql> FLUSH PRIVILEGES;

2.1 安裝perl模塊

# cpan
cpan[1]> install DBD::mysql
cpan[1]> install GD

2.3 安裝GMAP

$ wget http://research-pub.gene.com/gmap/src/gmap-gsnap-2013-03-31.v5.tar.gz
$ tar zxvf gmap-gsnap-2013-03-31.v5.tar.gz
$ cd gmap-2013-03-31
$ ./configure --prefix=$PWD
$ make -j 8
$ make install

2.4 安裝BLAT

$ wget http://hgwdev.cse.ucsc.edu/~kent/src/blatSrc35.zip
$ unzip blatSrc35.zip
$ cd blatSrc
$ MACHTYP=x86_64
$ export MACHTYPE
$ mkdir -p ~/bin/x86_64
$ make -j 8

2.5 安裝FASTA

$ wget http://faculty.virginia.edu/wrpearson/fasta/fasta3/CURRENT.tar.gz
$ tar zxvf CURRENT.tar.gz
$ cd fasta-35.4.12
$ cd src
$ make -f ../make/Makefile.linux_sse2 all
$ cd ..
$ ln -s $PWD/bin/fasta35 ~/bin/fasta

2.6 安裝PASA

$ wget http://kaz.dl.sourceforge.net/project/pasa/PASA2-r20130425beta.tgz
$ tar zxvf PASA2-r20130425beta.tgz
$ cd PASA2-r20130425beta/
$ make -j 8

2.7 安裝GD

安裝GD需要先行安裝libgd

$ wget https://bitbucket.org/libgd/gd-libgd/get/93368566388c.zip
$ unzip 93368566388c.zip
$ cd libgd-gd-libgd-93368566388c
$ ./bootstrap.sh
$ ./configure
$ make -j 8
$ sudo make install
$ gdlib-config

再安裝GD

$ wget http://search.cpan.org/CPAN/authors/id/L/LD/LDS/GD-2.49.tar.gz
$ tar zxvf GD-2.49.tar.gz
$ cd GD-2.49
$ perl Makefile.PL
$ make -j 8
$ sudo make install

安裝GD的目的是能通過網頁來查看PASA的運行結果。

2.8 配置PASA

2.8.1. 修改PASA的配置文件$PASAHOME/pasa_conf/conf.txt

$ cp $PASAHOME/pasa_conf/pasa.CONFIG.template $PASAHOME/pasa_conf/conf.txt
$ vim $PASAHOME/pasa_conf/conf.txt

2.8.2. 該文件需要修改的地方:

PASA_ADMIN_EMAIL=(your email address)
MYSQLSERVER=(your mysql server name)   此處不能填寫IP。
MYSQL_RO_USER=(mysql read-only username)
MYSQL_RO_PASSWORD=(mysql read-only password)
MYSQL_RW_USER=(mysql all privileges username)
MYSQL_RW_PASSWORD=(mysql all privileges password)
BASE_PASA_URL=http://server_name/pasa/cgi-bin/

2.8.3. 修改httpd配置文件,

# vim /etc/httpd/conf/httpd.conf
# /etc/init.d/httpd restart

在/etc/httpd/conf/httpd.conf添加如下幾行:

ScriptAlias /pasa "$PASAHOME"
<Directory "$PASAHOME">
        Options MultiViews ExecCGI
        AllowOverride None
        Order allow,deny
        Allow from all
</Directory>

2.9 cleaning the transcript sequences[Optional, requires seqclean to be installed

下載兩個污染數據庫,為fasta文件。

$ cd $PASAHOME/seqclean
$ tar zxf seqclean.tar.gz
$ cd seqclean
$ wget ftp://ftp.ncbi.nih.gov/pub/UniVec/UniVec -O UniVec.fasta
$ wget ftp://ftp.ncbi.nih.gov/pub/UniVec/UniVec_Core -O UniVec_Core.fasta

UniVec_Core includes only oligonucleotides and vectors consisting of bacterial, phage, viral, yeast or synthetic sequences. Vectors that include sequences of mammalian origin are excluded.

3. PASA主程序的使用

PASA的主程序是: $PASAHOME/scripts/Launch_PASA_pipeline.pl, 其使用參數如下:

*代表該參數是必須的
 
-c <filename> *
比對配置文件。可以將$PASAHOME/pasa_conf/pasa.alignAssembly.Template.
txt復制過來,只是將其中的MYSQLDB修改成需要的mysql數據庫名。
 
####################
 
spliced alignment settings:
--ALIGNERS <string>
比對的軟件,可用的軟件有gmap和blat。也可以同時選擇使用'gmap,blat'
 
-N <int> default: 1
max number of top scoring alignments
 
--MAX_INTRON_LENGTH | -I <int>  default: 100000
max intron length parameter passed to GMAP or BLAT
 
--IMPORT_CUSTOM_ALIGNMENTS_GFF3 <filename>
only using the alignments supplied in the corresponding GFF3 file.
 
--cufflinks_gtf <filename>
incorporate cufflinks-generated transcripts
 
####################
 
actions
-C
    flag, create MYSQL database
-R
    flag, run alignment/assembly pipeline.
-A
    compare to annotated genes.
--ALT_SPLICE
    flag, run alternative splicing analysis
 
-R 用於比對transcripts , -A 用於和已有gff3注釋文件的比較和更新;這兩個參數不
能同時共用,使用不同的參數,則 -C 參數設置不同的參數文件。
 
####################
 
input files
 
-g <filename> *
    genome sequence FASTA file
 
-t <filename> *
    transcript db
 
-f <filename>
    file containing a list of fl-cdna accessions.
 
--TDN <filename>
    file containing a list of accessions corresponding to Trinity
 (full) de novo assemblies (not genome-guided)
 
####################
 
polyAdenylation site identification  ** highly recommended **
-T
    flag,transcript db were trimmed using the TGI seqclean tool.
-u <filename>
    value, transcript db containing untrimmed sequences (input to 
seqclean).a filename with a .cln extension should also exist, gen
erated by seqclean.
 
####################
 
Jump-starting or prematurely terminating
-x
    flag, print cmds only, don't process anything. (useful to get 
indices for -x or -e opts below)
-s <int>
    pipeline index to start running at (avoid rerunning searches).
-e <int>
    pipeline index where to stop running, and do not execute this 
entry. 
 
####################
 
Misc:
--TRANSDECODER
    flag, run transdecoder to identify candidate full-length coding
 transcripts
--CPU <int> default: 2
    multithreading
-d  flag, Debug 
-h  flag, print this option menu and quit

安裝與使用


轉載自:http://sihua.us/pasa.htm


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM