【轉錄組】使用hisat2+stringtie組合進行轉錄組定量分析


轉載於:https://www.ivistang.com/articles/312

 

准備軟件和數據

安裝miniconda

wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

安裝相關軟件

###安裝aspera加速下載
conda install -c hcc aspera-cli -y
###安裝samtools
conda install -c bioconda samtools openssl=1.0 -y
###安裝sratoolkit hisat2 stringtie
conda install -c bioconda sra-tools hisat2 stringtie -y
###下載trimmomatic並解壓
wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip
unzip Trimmomatic-0.39.zip

注:如果調用samtools出現如下報錯:samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
問題出在openssl的版本,通過conda install openssl=1.0 -y 來確保安裝openssl的1.0.2版本

下載sra數據並轉換成fastq

使用prefetch命令進行下載,命令形如prefetch SRRXXXXXX SRRXXXXX。
下載完成后,可以用tree來查看下載結果。
使用fasterq-dump將sra文件轉換成fastq:

fasterq-dump -e 60 -S *.sra
 

轉錄組定量

所有sra數據進行trimmomatic修剪

雙端PE

for i in *.sra
do
java -jar /data/software/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 60 ${i}_1.fastq ${i}_2.fastq ${i}_clean_1.fastq ${i}_unpaired_1.fastq ${i}_clean_2.fastq ${i}_unpaired_2.fastq ILLUMINACLIP:/data/software/Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36
done

單端SE

for i in *.sra
do
java -jar /data/software/Trimmomatic-0.39/trimmomatic-0.39.jar SE -threads 60 ${i}.fastq  ${i}_clean.fastq ILLUMINACLIP:/data/software/Trimmomatic-0.39/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
done

Hisat2+stringtie代碼舉例

構建索引Index

hisat2-build Ghirsutum_527_v2.0.fa genome

雙端PE

ref=Ghirsutum_527_v2.0.fa
gff=Ghirsutum_527_v2.1.gene.gff3
index=genome
label=Ghir
out=SRR5178068
fq1="SRR5178068.sra_clean_1.fastq,SRR5186513.sra_clean_1.fastq"
fq2="SRR5178068.sra_clean_2.fastq,SRR5186513.sra_clean_2.fastq"
hisat2 --dta -p 60 -1 ${fq1} -2 ${fq2} -x ${index} -S ${out}.sam
samtools view -@ 60 -b -S ${out}.sam > ${out}.bam
samtools sort -@ 60 -m 4G -o ${out}.sorted.bam ${out}.bam
stringtie -p 60 -G ${gff} -B -e -l ${label} -o ${out}/${out}.gtf ${out}.sorted.bam

單端SE

ref=G.arboreum_BGI-A2_assembly.fa
gff=G.arboreum_BGI-A2_v1.0_transcripts.gff3
index=genome
label=Garb
out=SRR617065
fq="SRR617065.sra_clean.fastq,SRR617067.sra_clean.fastq"
hisat2 --dta -p 60 -U ${fq} -x ${index} -S ${out}.sam
samtools view -@ 60 -b -S ${out}.sam > ${out}.bam
samtools sort -@ 60 -m 4G -o ${out}.sorted.bam ${out}.bam
stringtie -p 60 -G ${gff} -B -e -l ${label} -o ${out}/${out}.gtf ${out}.sorted.bam注
 
注:最后輸出文件包含我想要的表達量gtf文件(包括FPKM,TPM)以及幾個.ctab中間文件。我用了60個線程,注意修改代碼中的線程數。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM