fai示例:
Sc0000003 2774837 10024730 60 61 Sc0000004 2768176 12845826 60 61 Sc0000005 2756750 15660150 60 61 Sc0000006 2627294 18462857 60 61 Sc0000007 2472379 21133951 60 61 Sc0000008 2452568 23647548 60 61
NAME Name of this reference sequence LENGTH Total length of this reference sequence, in bases OFFSET Offset within the FASTA file of this sequence's first base LINEBASES The number of bases on each line LINEWIDTH The number of bytes in each line, including the newline
http://www.htslib.org/doc/faidx.html
offset比較讓人費解,其實就是 bytes starting from zero,文件層次的屬性,一般不需要關注。
有時需要將fasta轉為bed,就是統計長度就好了,但是利用samtools faidx這個功能,速度奇快,再配合一行Linux命令就搞定。
awk '{print $1, 1, $2}' file | sed -e 's/ /\t/g' > out
一個問題:bam,bed,gtf的位置都是從1開始的嗎?
