參考:
qsub是最為穩定的底層任務投遞系統,就是把一個腳本投遞到集群的計算節點上運行。
注意,只有登錄節點才有資格投遞任務,計算節點沒有權限投遞任務,只能執行,所以千萬不要在投遞的腳本內嵌套投遞,會報錯的。
下面是我最為常用的投遞命令:
qsub -cwd -l vf=5g -P 任務單元 -q 隊列名
先逐條解釋:
-cwd: 就是 current working directory,從當前的目錄開始執行作業,也就是log文件會寫到當前目錄;如果不加cwd的話,就會默認輸出到用戶的 home 目錄。如果你想指定輸出目錄的話,就可以使用wd命令,log會輸出到你指定的目錄。
-l:resource=value, 表明作業運行所需要的資源。可以看到我們后面指定了預估內存 vf=5g,一般不用指定 CPU 數。注意,實際這個沒什么卵用,很少有集群能嚴格限制用戶的內存使用,vf 只會影響你投遞的效率,有人就會鑽空子,盡量把內存往低了投,盡快排上。這一部分其實就是個道德約束。
-P:大型組織里會分團隊,分項目,不同的項目需要制定項目名,主要是為了后期方便統計計算資源的消耗,算錢,其實這個命令沒卵用。
-q:指定隊列名,這個就非常重要了,隊列就是計算機的隊列,一個隊列只有一些特定的計算節點,你投了哪個節點,你就只能用該節點指定的計算資源。
待續~
qsub -help
OGS/GE 2011.11p1 usage: qsub [options] [-a date_time] request a start time [-ac context_list] add context variable(s) [-ar ar_id] bind job to advance reservation [-A account_string] account string in accounting record [-b y[es]|n[o]] handle command as binary [-binding [env|pe|set] exp|lin|str] binds job to processor cores [-c ckpt_selector] define type of checkpointing for job [-ckpt ckpt-name] request checkpoint method [-clear] skip previous definitions for job [-cwd] use current working directory [-C directive_prefix] define command prefix for job script [-dc simple_context_list] delete context variable(s) [-dl date_time] request a deadline initiation time [-e path_list] specify standard error stream path(s) [-h] place user hold on job [-hard] consider following requests "hard" [-help] print this help [-hold_jid job_identifier_list] define jobnet interdependencies [-hold_jid_ad job_identifier_list] define jobnet array interdependencies [-i file_list] specify standard input stream file(s) [-j y[es]|n[o]] merge stdout and stderr stream of job [-js job_share] share tree or functional job share [-jsv jsv_url] job submission verification script to be used [-l resource_list] request the given resources [-m mail_options] define mail notification events [-masterq wc_queue_list] bind master task to queue(s) [-notify] notify job before killing/suspending it [-now y[es]|n[o]] start job immediately or not at all [-M mail_list] notify these e-mail addresses [-N name] specify job name [-o path_list] specify standard output stream path(s) [-P project_name] set job's project [-p priority] define job's relative priority [-pe pe-name slot_range] request slot range for parallel jobs [-q wc_queue_list] bind job to queue(s) [-R y[es]|n[o]] reservation desired [-r y[es]|n[o]] define job as (not) restartable [-sc context_list] set job context (replaces old context) [-shell y[es]|n[o]] start command with or without wrapping <loginshell> -c [-soft] consider following requests as soft [-sync y[es]|n[o]] wait for job to end and return exit code [-S path_list] command interpreter to be used [-t task_id_range] create a job-array with these tasks [-tc max_running_tasks] throttle the number of concurrent tasks (experimental) [-terse] tersed output, print only the job-id [-v variable_list] export these environment variables [-verify] do not submit just verify [-V] export all environment variables [-w e|w|n|v|p] verify mode (error|warning|none|just verify|poke) for jobs [-wd working_directory] use working_directory [-@ file] read commandline input from file [{command|-} [command_args]] account_string account_name complex_list complex[,complex,...] context_list variable[=value][,variable[=value],...] ckpt_selector `n' `s' `m' `x' <interval> date_time [[CC]YY]MMDDhhmm[.SS] job_identifier_list {job_id|job_name|reg_exp}[,{job_id|job_name|reg_exp},...] jsv_url [script:][username@]path mail_address username[@host] mail_list mail_address[,mail_address,...] mail_options `e' `b' `a' `n' `s' working_directory path path_list [host:]path[,[host:]path,...] file_list [host:]file[,[host:]file,...] priority -1023 - 1024 resource_list resource[=value][,resource[=value],...] simple_context_list variable[,variable,...] slot_range [n[-m]|[-]m] - n,m > 0 task_id_range task_id['-'task_id[':'step]] variable_list variable[=value][,variable[=value],...] wc_cqueue wildcard expression matching a cluster queue wc_host wildcard expression matching a host wc_hostgroup wildcard expression matching a hostgroup wc_qinstance wc_cqueue@wc_host wc_qdomain wc_cqueue@wc_hostgroup wc_queue wc_cqueue|wc_qdomain|wc_qinstance wc_queue_list wc_queue[,wc_queue,...] ar_id advance reservation id max_running_tasks maximum number of simultaneously running tasks exp explicit:<socket>,<core>[:...] lin linear:<amount>[:<socket>,<core>] str striding:<amount>:<stepsize>[:<socket>,<core>]