Augustus的安裝和使用參數

AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences.

1. Augustus的安裝

Augustus下載：http://bioinf.uni-greifswald.de/augustus/binaries/

$ wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus.2.7.tar.gz  
$ tar zxf augustus.2.7.tar.gz  
$ cd augustus.2.7  
$ cd src  
$ make -j 8  
$ export AUGUSTUS_CONFIG_PATH=$PWD/../config/ (可以加入到.bashrc中）

2. Augustus使用方法

2.1 基因預測例子

$ augustus --strand=both --genemode=partial --singlestrand=false --hintsfile=hints.gff --extrinsicCfgFile=extrinsic.cfg --protein=on --introns=on --start=on --stop=on --cds=on --codingseq=on --alternatives-from-evidence=true --gff3=on --UTR=on ----outfile=out.gff --species=human genome.fa  
$ augustus --noprediction=true --species=SPECIES sequences.gb

2.2 Augustus使用參數

Usage:

augustus [parameters] --sepcies=SPECIES queryfilename

重要參數：

--strand=both, --strand=forward or --strand=backward report predicted genes on both strands, just the forward or just the backward strand.default is 'both'
--genemodel=partial, --genemodel=intronless, --genemodel=complete,
--genemodel=atleastone or --genemodel=exactlyone partial : allow prediction of incomplete genes at the sequence boundaries (default) intronless : only predict single-exon genes like in prokaryotes and some eukaryotes
complete : only predict complete genes atleastone : predict at least one complete gene exactlyone : predict exactly one complete gene
--singlestrand=true predict genes independently on each strand, allow overlapping genes on opposite strands. This option is turned off by default.
--hintsfile=hintsfilename When this option is used the prediction considering hints (ex trinsic information) is turned on. hintsfilename contains the hints in gff format.
--extrinsicCfgFile=cfgfilename Optional. This file contains the list of used sources for the hints and their boni and mali. If not specified the file "extrin sic.cfg" in the config directory $AUGUSTUS_CONFIG_PATH is used.
--maxDNAPieceSize=n This value specifies the maximal length of the pieces that the sequence is cut into for the core algorithm (Viterbi) to be run. Default is --maxDNAPieceSize=200000.
AUGUSTUS tries to place the boundaries of these pieces in the intergenic region, which is inferred by a preliminary prediction. GC-content dependent parameters are chosen for each piece of DNA
if /Constant/decomp_num_steps > 1 for that species. This is why this value should not be set very large, even if you have plenty of memory.
--protein=on/off
--introns=on/off
--start=on/off
--stop=on/off
--cds=on/off
--codingseq=on/off Output options. Output predicted protein sequence, introns, start codons, stop codons. Or use 'cds' in addition to 'initial', 'internal', 'terminal' and 'single' exon.
The CDS excludes the stop codon (unless stopCodonExcludedFromCDS=false) whereas the terminal and single exon include the stop codon.
--AUGUSTUS_CONFIG_PATH=path path to config directory (if not specified as environment var iable)
--alternatives-from-evidence=true/false report alternative transcripts when they are suggested by hints
--alternatives-from-sampling=true/false report alternative transcripts generated through probabilistic sampling
--sample=n --minexonintronprob=p --minmeanexonintronprob=p --maxtracks=n --proteinprofile=filename Read a protein profile from file filename. See section 7 below.
--predictionStart=A, --predictionEnd=B A and B define the range of the sequence for which predictions should be found. Quicker if you need predictions only for a small part.
--gff3=on/off output in gff3 format.
--UTR=on/off predict the untranslated regions in addition to the coding sequence. This currently works only for human, galdieria, toxopl asma and caenorhabditis.
--outfile=filename print output to filename instead to standard output. This is useful for computing environments, e.g. parasol jobs, which do not allow shell redirection.
--noInFrameStop=true/false Don't report transcripts with in-frame stop codons. Otherwise, intron-spanning stop codons could occur. Default: false
--noprediction=true/false
If true and input is in genbank format, no prediction is made.
Useful for getting the annotated protein sequences. Augustus也可以以 genebank格式文件為輸入文件，進行基因預測，並將預測結果和genebank的結果進行比較后 得出一個精確性的統計結果。
當然，由於genebank格式文件中有些sequences沒有cds的注釋結果，因此可以使用該 參數進行檢測，從而得到沒有cds的序列號，在人為去去除這些沒有cds注釋的序列，再去進行 預測准確性的評估。
--contentmodels=on/off If 'off' the content models are disabled (all emissions unif ormly 1/4). The content models are; coding region Markov chain (emiprobs),
initial k-mers in coding region (Pls), intron and int ergenic regin Markov chain. This option is intended for special applications that require judging gene structures from the signal models only,
e.g. for predicting the effect of SNPs or mutations on splicing. For all typical gene predictions, this should be true. Default: on
--paramlist For a complete list of parameters, type "augustus --paramlist"

轉載自：http://sihua.us/Augustus.htm

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Augustus 進行基因注釋 5、預測和鑒定miRNA的靶基因基因組序列注釋（基因結構預測） tRNAscan-SE 預測tRNA基因 HOMER | MEME | 轉錄因子的靶基因預測 | motif富集分析【基因組預測】braker2基因結構注釋要點記錄 mxnet 神經網絡訓練和預測 BN和dropout在預測和訓練時的區別。混合精度訓練 | fp16 用於神經網絡訓練和預測 miRNA結合位點預測軟件miRanda的使用教程

AUGUSTUS安裝 基因訓練、預測軟件