需求

客戶反映，完整的基因組太大打不開，要我將之按各條染色體和scaffold拆分。如何快速實現？

方法一

借助工具：

$ pip install pyfaidx
$ faidx -x sequences.fa

方法二

自己寫腳本：split.pl

#!/usr/bin/perl

$f = $ARGV[0]; #get the file name

open (INFILE, "<$f")
or die "Can't open: $f $!";

while (<INFILE>) {
$line = $_; 
chomp $line;
if ($line =~ /\>/) { #if has fasta >
close OUTFILE;
$new_file = substr($line,1);
$new_file .= ".fa";
open (OUTFILE, ">$new_file")
or die "Can't open: $new_file $!";
}
print OUTFILE "$line\n";
}
close OUTFILE;

運行：perl split.pl sequences.fa

放到一個目錄中，gzip -r dir一並發給客戶。

https://www.biostars.org/p/173723/
http://seqanswers.com/forums/archive/index.php/t-32162.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 如何根據fasta快速統計基因組大小及其各染色體長度？基因染色體定位細胞，染色體，DNA與基因的關系參考基因組 plink格式數據依據染色體拆分數據、依據染色體合並數據轉錄組（四）：了解參考基因組及基因注釋轉錄組入門(4)：了解參考基因組及基因注釋 python學習——通過命令行參數根據fasta文件中染色體id提取染色體序列 mVISTA 多序列比對葉綠體基因組 manta，基因的somaticSV體細胞變異之染色體結構變異