官網:http://subread.sourceforge.net/
Subread package: high-performance read alignment, quantification and mutation discovery
The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:
- Subread: a general-purpose read aligner which can align both genomic DNA-seq and RNA-seq reads. It can also be used to discover genomic mutations including short indels and structural variants.
- Subjunc: a read aligner developed for aligning RNA-seq reads and for the detection of exon-exon junctions. Gene fusion events can be detected as well.
- featureCounts: a software program developed for counting reads to genomic features such as genes, exons, promoters and genomic bins.
- Sublong: a long-read aligner that is designed based on seed-and-vote.
- exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.
These programs were also implemented in Bioconductor R package Rsubread.
下載安裝:
https://sourceforge.net/projects/subread/files/
解壓完成即可使用,可執行程序在bin目錄
wget https://jaist.dl.sourceforge.net/project/subread/subread-2.0.2/subread-2.0.2-Linux-x86_64.tar.gz
tar -zxvf subread-2.0.2-Linux-x86_64.tar.gz
cd subread-2.0.2-Linux-x86_64
cd bin
ls
五、軟件使用:
基本表達式
featureCounts [options] <input.file>
參數說明
| 參數 | 說明 |
|---|---|
| input file | 輸入的bam/sam文件,支持多個文件輸入 |
| -a < string > | 參考gtf文件名,支持Gzipped文件格式 |
| -F | 參考文件的格式,一般為GTF/SAF,C語言版本默認的格式為GTF格式 |
| -A | 提供一個逗號分割為兩列的文件,一列為gtf中的染色體名,另一列為read中對應的染色體名,用於將gtf和read中的名稱進行統一匹配,注意該文件提交時不需要列名 |
| -J | 對可變剪切進行計數 |
| -G < string > | 當-J設置的時候,通過-G提供一個比對的時候使用的參考基因組文件,輔助尋找可變剪切 |
| -M | 如果設置-M,多重map的read將會被統計到 |
| -o < string > | 輸出文件的名字,輸出文件的內容為read 的統計數目 |
| -O | 允許多重比對,即當一個read比對到多個feature或多個metafeature的時候,這條read會被統計多次 |
| -T | 線程數目,1~32 |
| 下面是有關featrue/metafeature選擇的參數 | 參數說明 |
| -p | 只能用在paired-end的情況中,會統計fragment而不統計read |
| -B | 在-p選擇的條件下,只有兩端read都比對上的fragment才會被統計 |
| -C | 如果-C被設置,那融合的fragment(比對到不同染色體上的fragment)就不會被計數,這個只有在-p被設置的條件下使用 |
| -d < int > | 最短的fragment,默認是50 |
| -D < int > | 最長的fragmen,默認是600 |
| -f | 如果-f被設置,那將會統計feature層面的數據,如exon-level,否則會統計meta-feature層面的數據,如gene-levels |
| -g < string > | 當參考的gtf提供的時候,我們需要提供一個id identifier 來將feature水平的統計匯總為meta-feature水平的統計,默認為gene_id,注意!選擇gtf中提供的id identifier!!! |
| -t < string > | 設置feature-type,-t指定的必須是gtf中有的feature,同時read只有落到這些feature上才會被統計到,默認是“exon” |
$ /home/software/subread-2.0.2-Linux-x86_64/bin/featureCounts -T 5 -t exon -g gene_id -a /path-to-gtf/ERCC.gtf -o /path-to-output/all.id.txt *.bam 1>counts.id.log 2>&1
鏈接:https://www.jianshu.com/p/9cc4e8657d62
