Trimmomatic過濾Illumina低質量序列


1. 下載安裝

直接去官網下載二進制軟件,解壓后的trimmomatic-0.36.jar即為我們需要的軟件

官網:

http://www.usadellab.org/cms/index.php?page=trimmomatic

wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.38.zip

unzip Trimmomatic-0.38.zip

wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.36.zip
unzip Trimmomatic-0.36.zip 
[Trimmomatic-0.38]# tree
.
├── adapters
│   ├── NexteraPE-PE.fa
│   ├── TruSeq2-PE.fa
│   ├── TruSeq2-SE.fa
│   ├── TruSeq3-PE-2.fa
│   ├── TruSeq3-PE.fa
│   └── TruSeq3-SE.fa
├── LICENSE
└── trimmomatic-0.38.jar
 

2. 運行軟件

一般我們使用默認參數運行即可,具體使用方法可參見官網http://www.usadellab.org/cms/?page=trimmomatic
使用默認參數運行程序:

sudo java -jar trimmomatic-0.36.jar PE \
  -phred33 ~/SRR733/SRR2854733_1.fastq ~/SRR733/SRR2854733_2.fastq \
   ~/SRR733/clsseq/SRR2854733_1_paired.fq ~/SRR733/clsseq/SRR2854733_1_unpaired.fq \
   ~/SRR733/clsseq/SRR2854733_2_paired.fq ~/SRR733/clsseq/SRR2854733_2_unpaired.fq \
   ILLUMINACLIP:/usr/local/src/Trimmomatic/Trimmomatic-0.36/adapters/TruSeq3-PE.fa:2:30:10 \
   LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:8 MINLEN:36

 

運行結果:

Input Read Pairs: 23396043 
Both Surviving: 20842668 (89.09%)
Forward Only Surviving: 2537100 (10.84%)
Reverse Only Surviving: 13969 (0.06%)
Dropped: 2306 (0.01%) TrimmomaticPE: Completed successfully

 

3. 常用參數說明

PE/SE
    設定對Paired-End或Single-End的reads進行處理,其輸入和輸出參數稍有不一樣。
-threads
    設置多線程運行數
-phred33
    設置鹼基的質量格式,可選pred64
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10
    切除adapter序列。參數后面分別接adapter序列的fasta文件:允許的最大mismatch數:palindrome模式下匹配鹼基數閾值:simple模式下的匹配鹼基數閾值。
LEADING:3
    切除首端鹼基質量小於3的鹼基
TRAILING:3
    切除尾端鹼基質量小於3的鹼基
SLIDINGWINDOW:4:15
    從5'端開始進行滑動,當滑動位點周圍一段序列(window)的平均鹼基低於閾值,則從該處進行切除。Windows的size是4個鹼基,其平均鹼基
質量小於15,則切除。
MINLEN:50
    最小的reads長度
CROP:<length> 保留reads到指定的長度 HEADCROP:<length> 在reads的首端切除指定的長度 TOPHRED33 將鹼基質量轉換為pred33格式 TOPHRED64 將鹼基質量轉換為pred64格式

Question: Which truseq trimmomatic adapters file to use when removing truseq adapters?
It depends mostly on which TruSeq protocol was used (V2 - which is old at this stage and usually data from the GAII, or V3, which is everything from the HiSeq or later machines), and whether the data is single-ended or paired ended (SE or PE). The only exception is TruSeq-3-PE which has two sets - TruSeq-3-PE.fa works fine for high quality libraries, but TruSeq-3-PE-2.fa contains some additional sequences which find partial adapters in unusual location/orientation.

ref:
https://www.jianshu.com/p/7b5591673255
https://www.biostars.org/p/323087/

 
 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM