轉載於:https://www.ivistang.com/articles/312
准備軟件和數據
安裝miniconda
wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh sh Miniconda3-latest-Linux-x86_64.sh source ~/.bashrc
安裝相關軟件
###安裝aspera加速下載 conda install -c hcc aspera-cli -y ###安裝samtools conda install -c bioconda samtools openssl=1.0 -y ###安裝sratoolkit hisat2 stringtie conda install -c bioconda sra-tools hisat2 stringtie -y ###下載trimmomatic並解壓 wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip unzip Trimmomatic-0.39.zip
注:如果調用samtools出現如下報錯:samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
問題出在openssl的版本,通過conda install openssl=1.0 -y 來確保安裝openssl的1.0.2版本
下載sra數據並轉換成fastq
使用prefetch命令進行下載,命令形如prefetch SRRXXXXXX SRRXXXXX。
下載完成后,可以用tree來查看下載結果。
使用fasterq-dump將sra文件轉換成fastq:
fasterq-dump -e 60 -S *.sra
轉錄組定量
所有sra數據進行trimmomatic修剪
雙端PE
for i in *.sra do java -jar /data/software/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 60 ${i}_1.fastq ${i}_2.fastq ${i}_clean_1.fastq ${i}_unpaired_1.fastq ${i}_clean_2.fastq ${i}_unpaired_2.fastq ILLUMINACLIP:/data/software/Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36 done
單端SE
for i in *.sra do java -jar /data/software/Trimmomatic-0.39/trimmomatic-0.39.jar SE -threads 60 ${i}.fastq ${i}_clean.fastq ILLUMINACLIP:/data/software/Trimmomatic-0.39/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 done
Hisat2+stringtie代碼舉例
構建索引Index
hisat2-build Ghirsutum_527_v2.0.fa genome
雙端PE
ref=Ghirsutum_527_v2.0.fa gff=Ghirsutum_527_v2.1.gene.gff3 index=genome label=Ghir out=SRR5178068 fq1="SRR5178068.sra_clean_1.fastq,SRR5186513.sra_clean_1.fastq" fq2="SRR5178068.sra_clean_2.fastq,SRR5186513.sra_clean_2.fastq" hisat2 --dta -p 60 -1 ${fq1} -2 ${fq2} -x ${index} -S ${out}.sam samtools view -@ 60 -b -S ${out}.sam > ${out}.bam samtools sort -@ 60 -m 4G -o ${out}.sorted.bam ${out}.bam stringtie -p 60 -G ${gff} -B -e -l ${label} -o ${out}/${out}.gtf ${out}.sorted.bam
單端SE
ref=G.arboreum_BGI-A2_assembly.fa gff=G.arboreum_BGI-A2_v1.0_transcripts.gff3 index=genome label=Garb out=SRR617065 fq="SRR617065.sra_clean.fastq,SRR617067.sra_clean.fastq" hisat2 --dta -p 60 -U ${fq} -x ${index} -S ${out}.sam samtools view -@ 60 -b -S ${out}.sam > ${out}.bam samtools sort -@ 60 -m 4G -o ${out}.sorted.bam ${out}.bam stringtie -p 60 -G ${gff} -B -e -l ${label} -o ${out}/${out}.gtf ${out}.sorted.bam注
注:最后輸出文件包含我想要的表達量gtf文件(包括FPKM,TPM)以及幾個.ctab中間文件。我用了60個線程,注意修改代碼中的線程數。