linux 系統中shell實現將fasta文件的鹼基轉換為一行及還原


1、測試數據

root@PC1:/home/test# cat a.fna  ##  實現將鹼基轉換為1行, 其他信息不變
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
gataaaaaataaatagaaacaaaatcactgaagaaCCAGTGTGCCTGCTCAGGTCAGATGAAGCCAGAGGGCTGCCAGAG GGCAAGCGAGCTGCGTTGCCTGGAAAAAGTTAAACACACAGAGAGCATGGTGGCTCTGATACTTTCTAGAAGGATTAAAG TCACTTTCCCAGTCTTTATGAGAATTGGGCCGAAGCTTAGCTGGTGCAACGAATTTAGAAATGAATGCACTTGCATTTGA >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
AGATGATGTGTCTTTGCCTTGAgctaaaaattttagaataatctgaACGTCATCTGAGGAACCTGCTTCTGGCGTGGTTT
TGGTGTCAGCATCTTCTCACCCTCTCTAGTAATTTTCAGTATGCATTTCTATTTTCGTGTAGTTATTTACAGGAGCATTT
TATGGAAAACCGGCTCAAATCTTTTTGGGTGCAGGGGTAGTTCAAATGCACTGAGACCCTCAGTTTCACTTGCTAATCTC
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
CTCCAGAAACCCTGTTCTCCTCGAGTGACAAGGTCAGCAGGGCAGCACGTGTGTTCCTGTCACTGCCAACTCAAGAATAT
GAAGTTTAAAGAGTTTCACCATCAAATGCAGTGTCGTGGACTGCCCCTGAACAGGTGTTTATAATCACGTGTGCAAGTGA
AGCAAGCACAAATCCTCAGTGGAAAACGGGCAGAGGACACGAGCagacaattctttttaaaaactgcacaaATTAGCACA
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
CTAGGCACGGATGAGCGTGCCTACCGTGTTGCATGGAGGTAACAGATGCCAGAGCCCGGAGGAGGCGCAAAGCTCACAAA
CAGATGCGGACCGCAGGAAGCCGGGACGGCCTTCCTCCCCTGAAGCAGGAGGACGCGCCCTACAGAAAGCCGCTCGATCC
TCCAGGCATTTGTTGTGAGCACTTAATCATCATTCGATCATTTGACGTGTACTCACTAGTAAAAGGCAGGACTGTGTCCC

 

2、 awk + sed實現

root@PC1:/home/test# awk '{if($0 ~ /^[a-zA-Z]/) {printf("%s", $0)} else {print $0}}' a.fna | sed '$ s/$/\n/' | sed '2,$ s/>/\n>/' 
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
gataaaaaataaatagaaacaaaatcactgaagaaCCAGTGTGCCTGCTCAGGTCAGATGAAGCCAGAGGGCTGCCAGAGGGCAAGCGAGCTGCGTTGCCTGGAAAAAGTTAAACACACAGAGAGCATGGTGGCTCTGATACTTTCTAGAAGGATTAAAGTCACTTTCCCAGTCTTTATGAGAATTGGGCCGAAGCTTAGCTGGTGCAACGAATTTAGAAATGAATGCACTTGCATTTGA
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
AGATGATGTGTCTTTGCCTTGAgctaaaaattttagaataatctgaACGTCATCTGAGGAACCTGCTTCTGGCGTGGTTTTGGTGTCAGCATCTTCTCACCCTCTCTAGTAATTTTCAGTATGCATTTCTATTTTCGTGTAGTTATTTACAGGAGCATTTTATGGAAAACCGGCTCAAATCTTTTTGGGTGCAGGGGTAGTTCAAATGCACTGAGACCCTCAGTTTCACTTGCTAATCTC
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
CTCCAGAAACCCTGTTCTCCTCGAGTGACAAGGTCAGCAGGGCAGCACGTGTGTTCCTGTCACTGCCAACTCAAGAATATGAAGTTTAAAGAGTTTCACCATCAAATGCAGTGTCGTGGACTGCCCCTGAACAGGTGTTTATAATCACGTGTGCAAGTGAAGCAAGCACAAATCCTCAGTGGAAAACGGGCAGAGGACACGAGCagacaattctttttaaaaactgcacaaATTAGCACA
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
CTAGGCACGGATGAGCGTGCCTACCGTGTTGCATGGAGGTAACAGATGCCAGAGCCCGGAGGAGGCGCAAAGCTCACAAACAGATGCGGACCGCAGGAAGCCGGGACGGCCTTCCTCCCCTGAAGCAGGAGGACGCGCCCTACAGAAAGCCGCTCGATCCTCCAGGCATTTGTTGTGAGCACTTAATCATCATTCGATCATTTGACGTGTACTCACTAGTAAAAGGCAGGACTGTGTCCC

 

3、利用正則表達式及sed預存儲還原

root@PC1:/home/test# ls
a.fna  b.fna
root@PC1:/home/test# cp b.fna b.fna_bak   ## 要在源文件進行修改,先進行備份
root@PC1:/home/test# ls
a.fna  b.fna  b.fna_bak
root@PC1:/home/test# cat b.fna
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
gataaaaaataaatagaaacaaaatcactgaagaaCCAGTGTGCCTGCTCAGGTCAGATGAAGCCAGAGGGCTGCCAGAGGGCAAGCGAGCTGCGTTGCCTGGAAAAAGTTAAACACACAGAGAGCATGGTGGCTCTGATACTTTCTAGAAGGATTAAAGTCACTTTCCCAGTCTTTATGAGAATTGGGCCGAAGCTTAGCTGGTGCAACGAATTTAGAAATGAATGCACTTGCATTTGA
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
AGATGATGTGTCTTTGCCTTGAgctaaaaattttagaataatctgaACGTCATCTGAGGAACCTGCTTCTGGCGTGGTTTTGGTGTCAGCATCTTCTCACCCTCTCTAGTAATTTTCAGTATGCATTTCTATTTTCGTGTAGTTATTTACAGGAGCATTTTATGGAAAACCGGCTCAAATCTTTTTGGGTGCAGGGGTAGTTCAAATGCACTGAGACCCTCAGTTTCACTTGCTAATCTC
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
CTCCAGAAACCCTGTTCTCCTCGAGTGACAAGGTCAGCAGGGCAGCACGTGTGTTCCTGTCACTGCCAACTCAAGAATATGAAGTTTAAAGAGTTTCACCATCAAATGCAGTGTCGTGGACTGCCCCTGAACAGGTGTTTATAATCACGTGTGCAAGTGAAGCAAGCACAAATCCTCAGTGGAAAACGGGCAGAGGACACGAGCagacaattctttttaaaaactgcacaaATTAGCACA
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
CTAGGCACGGATGAGCGTGCCTACCGTGTTGCATGGAGGTAACAGATGCCAGAGCCCGGAGGAGGCGCAAAGCTCACAAACAGATGCGGACCGCAGGAAGCCGGGACGGCCTTCCTCCCCTGAAGCAGGAGGACGCGCCCTACAGAAAGCCGCTCGATCCTCCAGGCATTTGTTGTGAGCACTTAATCATCATTCGATCATTTGACGTGTACTCACTAGTAAAAGGCAGGACTGTGTCCC
root@PC1:/home/test# for i in `seq 10`; do sed 's/\(^[a-zA-Z]\{50\}\)[^\n]/\1\n/' b.fna -i; done ## 50可修改為任意的鹼基數
root@PC1:/home/test# cat b.fna >NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
gataaaaaataaatagaaacaaaatcactgaagaaCCAGTGTGCCTGCTC
GGTCAGATGAAGCCAGAGGGCTGCCAGAGGGCAAGCGAGCTGCGTTGCCT
GAAAAAGTTAAACACACAGAGAGCATGGTGGCTCTGATACTTTCTAGAAG
ATTAAAGTCACTTTCCCAGTCTTTATGAGAATTGGGCCGAAGCTTAGCTG
TGCAACGAATTTAGAAATGAATGCACTTGCATTTGA
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
AGATGATGTGTCTTTGCCTTGAgctaaaaattttagaataatctgaACGT
ATCTGAGGAACCTGCTTCTGGCGTGGTTTTGGTGTCAGCATCTTCTCACC
TCTCTAGTAATTTTCAGTATGCATTTCTATTTTCGTGTAGTTATTTACAG
AGCATTTTATGGAAAACCGGCTCAAATCTTTTTGGGTGCAGGGGTAGTTC
AATGCACTGAGACCCTCAGTTTCACTTGCTAATCTC
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
CTCCAGAAACCCTGTTCTCCTCGAGTGACAAGGTCAGCAGGGCAGCACGT
TGTTCCTGTCACTGCCAACTCAAGAATATGAAGTTTAAAGAGTTTCACCA
CAAATGCAGTGTCGTGGACTGCCCCTGAACAGGTGTTTATAATCACGTGT
CAAGTGAAGCAAGCACAAATCCTCAGTGGAAAACGGGCAGAGGACACGAG
agacaattctttttaaaaactgcacaaATTAGCACA
>NC_019458.2 Ovis aries breed Texel chromosome 1, Oar_v4.0, [whole genome shotgun sequence]
CTAGGCACGGATGAGCGTGCCTACCGTGTTGCATGGAGGTAACAGATGCC
GAGCCCGGAGGAGGCGCAAAGCTCACAAACAGATGCGGACCGCAGGAAGC
GGGACGGCCTTCCTCCCCTGAAGCAGGAGGACGCGCCCTACAGAAAGCCG
TCGATCCTCCAGGCATTTGTTGTGAGCACTTAATCATCATTCGATCATTT
ACGTGTACTCACTAGTAAAAGGCAGGACTGTGTCCC

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM