最近學習微生物宏基因組分箱(binning),按官方文檔安裝metaWRAP,踩了一堆坑,記錄一下報錯及解決方法:
1. metaWRAP安裝
作者推薦使用Conda/Mamba安裝,不推薦使用bioconda及docker,於是找了個包含conda的docker鏡像,開始了漫漫長路的第一步:
(1)conda安裝軟件
conda create -y -n metawrap-env python=2.7 source activate metawrap-env conda config --add channels defaults conda config --add channels conda-forge conda config --add channels bioconda conda config --add channels ursky conda install -y -c ursky metawrap-mg conda install -y blas=2.5=mkl
裝完大概5GB大小,提交到了docker hub上:
docker push raser216/metawrap:v1.0.0
本以為大功告成,結果隨之而來的是一系列的報錯……
(2)安裝libtbb2庫
運行到quant_bins,才發現少了個依賴庫沒裝,導致salmon軟件統計基因豐度時報錯:
salmon: error while loading shared libraries: libtbb.so.2
解決方法:
#安裝libtbb2庫 apt-get install libtbb2
(3)安裝
ImportError: Failed to import any qt binding #python2.7 已安裝matplotlib,但無法導入 import matplotlib import matplotlib.pyplot as plt ImportError: libGL.so.1: cannot open shared object file: No such file or directory
apt-get -y update apt-get install -y libgl1-mesa-glx #安裝后,python2可以導入該模塊,不再報錯 python 2.7 import matplotlib.pyplot as plt
(4)prokka安裝失敗,報錯
prokka無法使用,安裝失敗:
可能原因:metawrap安裝的perl版本不符合prokka要求 (metawrap不支持perl 5.26?)。
prokka -h Can't locate Bio/Root/Version.pm in @INC (you may need to install the Bio::Root::Version module) (@INC contains: /opt/conda/envs/metawrap-env/bin/../perl5 /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2//x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2/ /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2 /opt/conda/envs/metawrap-env/lib/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/5.26.2 .) at /opt/conda/envs/metawrap-env/bin/prokka line 32. BEGIN failed--compilation aborted at /opt/conda/envs/metawrap-env/bin/prokka line 32.
conda create -n prokka-test prokka=1.13 minced=0.3.0 parallel=20180522 blast=2.12.0 source activate prokka-test
2.conda報錯
(1)無法進入conda環境
/opt/conda/envs/metawrap-env/etc/conda/activate.d/activate-binutils_linux-64.sh: line 65: ADDR2LINE: unbound variable
解決方法:通過dockerfile進入conda環境,並把安裝軟件的路徑加到環境變量中:
cat metawrap_v1.dockerfile #dockerfile內容如下 FROM raser216/metawrap:v1.0.0 RUN echo "source activate metawrap-env" > ~/.bashrc
ENV PATH /opt/conda/envs/metawrap-env/bin:$PATH
3.數據庫路徑及版本
metaWRAP中調用的比對軟件(kraken、BLAST等)的數據庫可以外置,但數據庫外置的路徑需要在config中寫明:
#config文件路徑 which config-metawrap /opt/conda/envs/metawrap-env/bin/config-metawrap #用sed -i更改為各數據庫真實路徑 kraken_database=/database/kraken_database/kraken_newdb2/axel_dowload nt_database=/database/newdownload3 tax_database=/database/metawrap_database/ncbi_taxonomy sed -i "s#~/KRAKEN_DB#$kraken_database#g" /opt/conda/envs/metawrap-env/bin/config-metawrap sed -i "s#~/NCBI_NT_DB#$nt_database#g" /opt/conda/envs/metawrap-env/bin/config-metawrap sed -i "s#~/NCBI_TAX_DB#$tax_database#g" /opt/conda/envs/metawrap-env/bin/config-metawrap
該文件必須有寫權限,否則bin_refinement步驟報錯:
#bin_refinement步驟報錯 You do not seem to have permission to edit the checkm config file located at /opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/DATA_CONFIG
解決方法:改變config文件權限,不再報錯。
chmod 777 /opt/conda/envs/metawrap-env/bin/config-metawrap
4. kraken軟件報錯
kraken是個直接對測序reads(fastq)進行物種注釋的軟件,目前有兩個主版本,1代(kraken)耗內存極高(>100GB),2代(kraken2)改良了很多(35GB左右就行)。
(1)注釋行導致的報錯
kraken.sh腳本路徑在/opt/conda/envs/metawrap-env/bin/metawrap-modules/,該腳本第123-125行的注釋信息直接寫在行后,導致kraken.sh運行報錯(錯誤信息未記錄):
123 awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | \ #combine paired end reads onto one line
124 shuf | head -n $depth | sed 's/\t\t/\n/g' | \ #shuffle reads, select top N reads, and then restore tabulation 125 awk -F"\t" '{print $1 > "'"${out}/tmp_1.fastq"'"; print $2 > "'"${out}/tmp_2.fastq"'"}' #separate reads into F and R files
解決方法:把注釋行全部換到新行
123 # combine paired end reads onto one line, then
124 # shuffle reads, select top N reads, and then restore tabulation, then
125 # separate reads into F and R files 126 awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | \ 127 shuf | head -n $depth | sed 's/\t\t/\n/g' | \ 128 awk -F"\t" '{print $1 > "'"${out}/tmp_1.fastq"'"; print $2 > "'"${out}/tmp_2.fastq"'"}'
(2) 腳本無權限報錯
注意kraken.sh腳本權限應為可執行,否則使用時報錯:
/opt/conda/envs/metawrap-env/bin/metawrap: line 69: /opt/conda/envs/metawrap-env/bin/metawrap-modules/kraken.sh: Permission denied
解決方法:修改腳本權限為775,不再報錯。
chmod 775 kraken.sh ls -l kraken.sh -rwxrwxr-x 1 root root 8.9K Sep 22 20:12 kraken.sh
(3)python注釋腳本報錯
python腳本kraken2_translate.py,字典names_map遇到未知key,報KeyError錯誤。
Something went wrong with running kraken-translate... Exiting. Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 120, in <module> main() File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 114, in main translate_kraken2_annotations(annotation_file=kraken_file, kraken2_db=database_location, output=output_file) File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 98, in translate_kraken2_annotations taxonomy = get_full_name(taxid, names_map, ranks_map) File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 30, in get_full_name name = names_map[taxid] KeyError: '1054037'
解決方法:修改字典獲取值的方式,改為dict.get()函數,並加入None值判斷。
vi /opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py #修改get_full_name函數,使key不存在時names_map不報錯: for taxid in taxid_lineage: #name = names_map[taxid] name = names_map.get(taxid) if name == None: name = "unknown" names_lineage.append(name)
(4)找不到taxonomy數據庫報錯
下載的NCBI taxonomy數據庫需要放到下載的kraken數據庫目錄下,否則報錯:
Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 120, in <module> main() File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 114, in main translate_kraken2_annotations(annotation_file=kraken_file, kraken2_db=database_location, output=output_file) File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 87, in translate_kraken2_annotations names_map, ranks_map = load_kraken_db_metadata(kraken2_db) File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 50, in load_kraken_db_metadata with open(names_path) as input: IOError: [Errno 2] No such file or directory: '/database/kraken_database/kraken_newdb2/axel_dowload/taxonomy/names.dmp'
(5)kraken軟件與數據庫版本不相符,報錯
此前用過kraken2(2代軟件),服務器上已經下載了2代所需的(巨大的)數據庫,不想再下一次kraken(1代軟件)數據庫,於是試了試2代的數據庫能否兼容1代軟件,果然不行,報錯:
kraken: database ("/database/kraken_database/kraken_newdb2/axel_dowload") does not contain necessary file database.kdb
遂考慮更新metaWRAP中的kraken版本,結果發現,默認安裝的metaWRAP不支持kraken2,需要更新到最新的1.3.2版本:
解決方法:更新metaWRAP版本至1.3.2。
conda install -y -c ursky metawrap-mg=1.3.2 #更新后需要重新修改config文件權限,及其中的內容 chmod 777 /opt/conda/envs/metawrap-env/bin/config-metawrap
5.checkM軟件報錯
(1)py換行符報錯
checkM是用於檢測基因組拼接組裝完整性的軟件,bin_refinement會用到,直接報錯:
Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/bin/checkm", line 36, in <module> from checkm import main File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 25, in <module> from checkm.defaultValues import DefaultValues File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line 26, in <module> class DefaultValues(): File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line 29, in DefaultValues __DBM = DBManager() File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 114, in __init__ if not self.setRoot(): File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 140, in setRoot path = self.confirmPath(path=path) File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 162, in confirmPath path = raw_input("Where should CheckM store it's data?\n" \ EOFError: EOF when reading a line
解決方法:
/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/
報錯原因:第162行的raw_input()函數加了“\”作為換行符,python沒識別
162 path = raw_input("Where should CheckM store it's data?\n" \ 163 Please specify a location or type 'abort' to stop trying: \n")
解決方法:刪除該換行符。
162 path = raw_input("Where should CheckM store it's data?\nPlease specify a location or type 'abort' to stop trying: \n")
(2)找不到數據庫報錯
第一次運行checkM時,會被要求選擇數據庫位置,所以最好是在安裝后就運行一下checkm data setRoot,先設置好數據庫路徑:
checkm data setRoot ******************************************************************************* [CheckM - data] Check for database updates. [setRoot] ******************************************************************************* Where should CheckM store it's data? Please specify a location or type 'abort' to stop trying: /checkm_database Path [/checkm_database] exists and you have permission to write to this folder.
否則,checkM找不到數據庫,會顯示以下信息:
It seems that the CheckM data folder has not been set yet or has been removed. Running: 'checkm data setRoot'. You do not seem to have permission to edit the checkm config file located at /opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/DATA_CONFIG Please try again with updated privileges. Unexpected error: <type 'exceptions.TypeError'>
******************************************************************************* [CheckM - tree] Placing bins in reference genome tree. ******************************************************************************* Identifying marker genes in 8 bins with 32 threads: Process SyncManager-1: Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line 550, in _run_server server = cls._Server(registry, address, authkey, serializer) File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line 162, in __init__ self.listener = Listener(address=address, backlog=16) File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/connection.py", line 132, in __init__ self._listener = SocketListener(address, family, backlog) File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/connection.py", line 256, in __init__ self._socket.bind(address) File "/opt/conda/envs/metawrap-env/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: AF_UNIX path too long Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/bin/checkm", line 708, in <module> checkmParser.parseOptions(args) File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 1251, in parseOptions self.tree(options) File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 133, in tree options.bCalledGenes) File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/markerGeneFinder.py", line 67, in find binIdToModels = mp.Manager().dict() File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/__init__.py", line 99, in Manager m.start() File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line 528, in start self._address = reader.recv() EOFError
解決方法:修改binning.sh等腳本中指定的checkm --tmpdir,指定一個絕對路徑較短的臨時文件存放目錄。
#該路徑下這3個腳本都用到checkM,都需要改默認的--tmpdir cd /opt/conda/envs/metawrap-env/bin/metawrap-modules grep checkm *sh|awk -F ":" '{print $1}'|sort|uniq bin_refinement.sh binning.sh reassemble_bins.sh #以binning.sh為例 #在checkm命令前加一行,新建一個較短的tmp目錄,用於存放checkM的tmp文件 mkdir -p /tmp/$(basename ${1}).tmp #修改checkm的--tmpdir 61 checkm lineage_wf -x fa ${1} ${1}.checkm -t $threads --tmpdir /tmp/$(basename ${1}).tmp --pplacer_threads $p_threads 62 if [[ ! -s ${1}.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi #運行完畢后刪除該tmp目錄 rm -r /tmp/$(basename ${1}).tmp
#其余兩個腳本同樣需要修改對應checkm行 #bin_refinement.sh腳本修改 if [ ! -d /tmp/$(basename ${bin_set}) ]; then mkdir -p /tmp/$(basename ${bin_set}).tmp; fi if [ "$quick" == "true" ]; then comm "Note: running with --reduced_tree option" checkm lineage_wf -x fa $bin_set ${bin_set}.checkm -t $threads --tmpdir /tmp/$(basename ${bin_set}).tmp --pplacer_threads $p_threads --reduced_tree else checkm lineage_wf -x fa $bin_set ${bin_set}.checkm -t $threads --tmpdir /tmp/$(basename ${bin_set}).tmp --pplacer_threads $p_threads fi if [[ ! -s ${bin_set}.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi ${SOFT}/summarize_checkm.py ${bin_set}.checkm/storage/bin_stats_ext.tsv $bin_set | (read -r; printf "%s\n" "$REPLY"; sort) > ${bin_set}.stats if [[ $? -ne 0 ]]; then error "Cannot make checkm summary file. Exiting."; fi rm -r ${bin_set}.checkm; rm -r /tmp/$(basename ${bin_set}).tmp mkdir -p /tmp/binsO.tmp if [ "$quick" == "true" ]; then checkm lineage_wf -x fa binsO binsO.checkm -t $threads --tmpdir /tmp/binsO.tmp --pplacer_threads $p_threads --reduced_tree else checkm lineage_wf -x fa binsO binsO.checkm -t $threads --tmpdir /tmp/binsO.tmp --pplacer_threads $p_threads fi if [[ ! -s binsO.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi rm -r /tmp/binsO.tmp #reassemble_bins.sh腳本修改 mkdir -p /tmp/$(basename ${out}).tmp checkm lineage_wf -x fa ${out}/reassembled_bins ${out}/reassembled_bins.checkm -t $threads --tmpdir /tmp/$(basename ${out}).tmp --pplacer_threads $p_threads if [[ ! -s ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi ${SOFT}/summarize_checkm.py ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv | (read -r; printf "%s\n" "$REPLY"; sort) > ${out}/reassembled_bins.stats if [[ $? -ne 0 ]]; then error "Cannot make checkm summary file. Exiting."; fi rm -r /tmp/$(basename ${out}).tmp mkdir -p /tmp/$(basename ${out}).tmp checkm lineage_wf -x fa ${out}/reassembled_bins ${out}/reassembled_bins.checkm -t $threads --tmpdir /tmp/$(basename ${out}).tmp --pplacer_threads $p_threads if [[ ! -s ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi rm -r /tmp/$(basename ${out}).tmp
該錯誤會連帶導致bin_refinement報錯(因為checkM未正確運行,無對應統計結果):
Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line 277, in _run_finalizers finalizer() File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line 207, in __call__ res = self._callback(*self._args, **self._kwargs) File "/opt/conda/envs/metawrap-env/lib/python2.7/shutil.py", line 266, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "/opt/conda/envs/metawrap-env/lib/python2.7/shutil.py", line 264, in rmtree os.remove(fullname) OSError: [Errno 16] Device or resource busy: 'binsO.tmp/pymp-REeR36/.nfs9061e516f4bd263400000b82' mv: cannot stat 'binning_results.eps': No such file or directory mv: cannot stat 'binning_results.eps': No such file or directory
6.BLAST報錯
BLAST Database error: Error: Not a valid version 4 database.
解決方法:更新BLAST版本。
#下載並解壓新版BLAST軟件 wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.12.0+-x64-linux.tar.gz tar -xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz #替換掉conda鏡像中的BLAST mkdir /opt/conda/envs/metawrap-env/bin/bak for i in $(ls);do mv /opt/conda/envs/metawrap-env/bin/$i /opt/conda/envs/metawrap-env/bin/bak;cp $i /opt/conda/envs/metawrap-env/bin;done
7.prokka報錯
(1)不識別blast版本,報錯
prokka軟件用於注釋組裝好的基因組,是一個perl腳本,對軟件blastp及makeblastdb的要求為版本大於2.8及以上,但此處判斷條件有點問題,識別不了我的blast 2.12.0(認為版本2.12小於2.8……)。
不懂perl語言,沒法優化,只好把MINVER都改成了2.1:
'blastp' => { GETVER => "blastp -version", REGEXP => qr/blastp:\s+($BIDEC)/, MINVER => "2.1", NEEDED => 1, }, 'makeblastdb' => { GETVER => "makeblastdb -version", REGEXP => qr/makeblastdb:\s+($BIDEC)/, MINVER => "2.1", NEEDED => 0, # only if --proteins used },