需求

客户反映，完整的基因组太大打不开，要我将之按各条染色体和scaffold拆分。如何快速实现？

方法一

借助工具：

$ pip install pyfaidx
$ faidx -x sequences.fa

方法二

自己写脚本：split.pl

#!/usr/bin/perl

$f = $ARGV[0]; #get the file name

open (INFILE, "<$f")
or die "Can't open: $f $!";

while (<INFILE>) {
$line = $_; 
chomp $line;
if ($line =~ /\>/) { #if has fasta >
close OUTFILE;
$new_file = substr($line,1);
$new_file .= ".fa";
open (OUTFILE, ">$new_file")
or die "Can't open: $new_file $!";
}
print OUTFILE "$line\n";
}
close OUTFILE;

运行：perl split.pl sequences.fa

放到一个目录中，gzip -r dir一并发给客户。

https://www.biostars.org/p/173723/
http://seqanswers.com/forums/archive/index.php/t-32162.html

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 基因染色体定位参考基因组转录组（四）：了解参考基因组及基因注释转录组入门(4)：了解参考基因组及基因注释 mVISTA 多序列比对叶绿体基因组基因组序列注释（基因结构预测）参考基因组fasta文件添加index genome repeat sequence | 基因组重复序列染色体共线性可视化如何到NCBI提交基因组