從fasta中提取或者過濾掉多個序列


Google了一下,現成的工具不多。

自己寫代碼也可以,就是速度肯定不快,而且每次寫也很麻煩。

偶然看到QIIME的filter_fasta.py有這個功能,從name list中提取多個序列。

filter_fasta.py -f extract_no_N_200.fasta -o remain.fasta -s out.list

  

[REQUIRED]

-f, --input_fasta_fp
Path to the input fasta file
-o, --output_fasta_fp
The output fasta filepath
[OPTIONAL]

-m, --otu_map
An OTU map where sequences ids are those which should be retained.
-s, --seq_id_fp
A list of sequence identifiers (or tab-delimited lines with a seq identifier in the first field) which should be retained.
-b, --biom_fp
A biom file where otu identifiers should be retained.
-a, --subject_fasta_fp
A fasta file where the seq ids should be retained.
-p, --seq_id_prefix
Keep seqs where seq_id starts with this prefix.
--sample_id_fp
Keep seqs where seq_id starts with a sample id listed in this file. Must be newline delimited and may not contain a header.
-n, --negate
Discard passed seq ids rather than keep passed seq ids. [default: False]
--mapping_fp
Mapping file path (for use with –valid_states). [default: None]
--valid_states
Description of sample ids to retain (for use with –mapping_fp). [default: None]

 

60w條序列瞬間就處理完了。  


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM