SGE:qsub/qstat/qdel/qhost 任務投遞和監控


參考:

Oracle Grid Engine

qsub命令

SGE - qsub使用范例

SGE作業基本用法

qsub是最為穩定的底層任務投遞系統,就是把一個腳本投遞到集群的計算節點上運行。

注意,只有登錄節點才有資格投遞任務,計算節點沒有權限投遞任務,只能執行,所以千萬不要在投遞的腳本內嵌套投遞,會報錯的。

下面是我最為常用的投遞命令:

qsub -cwd -l vf=5g -P 任務單元 -q 隊列名

先逐條解釋:

-cwd: 就是 current working directory,從當前的目錄開始執行作業,也就是log文件會寫到當前目錄;如果不加cwd的話,就會默認輸出到用戶的 home 目錄。如果你想指定輸出目錄的話,就可以使用wd命令,log會輸出到你指定的目錄。

-l:resource=value, 表明作業運行所需要的資源。可以看到我們后面指定了預估內存 vf=5g,一般不用指定 CPU 數。注意,實際這個沒什么卵用,很少有集群能嚴格限制用戶的內存使用,vf 只會影響你投遞的效率,有人就會鑽空子,盡量把內存往低了投,盡快排上。這一部分其實就是個道德約束。

-P:大型組織里會分團隊,分項目,不同的項目需要制定項目名,主要是為了后期方便統計計算資源的消耗,算錢,其實這個命令沒卵用。

-q:指定隊列名,這個就非常重要了,隊列就是計算機的隊列,一個隊列只有一些特定的計算節點,你投了哪個節點,你就只能用該節點指定的計算資源。

待續~

qsub -help
OGS/GE 2011.11p1
usage: qsub [options]
   [-a date_time]                           request a start time
   [-ac context_list]                       add context variable(s)
   [-ar ar_id]                              bind job to advance reservation
   [-A account_string]                      account string in accounting record
   [-b y[es]|n[o]]                          handle command as binary
   [-binding [env|pe|set] exp|lin|str]      binds job to processor cores
   [-c ckpt_selector]                       define type of checkpointing for job
   [-ckpt ckpt-name]                        request checkpoint method
   [-clear]                                 skip previous definitions for job
   [-cwd]                                   use current working directory
   [-C directive_prefix]                    define command prefix for job script
   [-dc simple_context_list]                delete context variable(s)
   [-dl date_time]                          request a deadline initiation time
   [-e path_list]                           specify standard error stream path(s)
   [-h]                                     place user hold on job
   [-hard]                                  consider following requests "hard"
   [-help]                                  print this help
   [-hold_jid job_identifier_list]          define jobnet interdependencies
   [-hold_jid_ad job_identifier_list]       define jobnet array interdependencies
   [-i file_list]                           specify standard input stream file(s)
   [-j y[es]|n[o]]                          merge stdout and stderr stream of job
   [-js job_share]                          share tree or functional job share
   [-jsv jsv_url]                           job submission verification script to be used
   [-l resource_list]                       request the given resources
   [-m mail_options]                        define mail notification events
   [-masterq wc_queue_list]                 bind master task to queue(s)
   [-notify]                                notify job before killing/suspending it
   [-now y[es]|n[o]]                        start job immediately or not at all
   [-M mail_list]                           notify these e-mail addresses
   [-N name]                                specify job name
   [-o path_list]                           specify standard output stream path(s)
   [-P project_name]                        set job's project
   [-p priority]                            define job's relative priority
   [-pe pe-name slot_range]                 request slot range for parallel jobs
   [-q wc_queue_list]                       bind job to queue(s)
   [-R y[es]|n[o]]                          reservation desired
   [-r y[es]|n[o]]                          define job as (not) restartable
   [-sc context_list]                       set job context (replaces old context)
   [-shell y[es]|n[o]]                      start command with or without wrapping <loginshell> -c
   [-soft]                                  consider following requests as soft
   [-sync y[es]|n[o]]                       wait for job to end and return exit code
   [-S path_list]                           command interpreter to be used
   [-t task_id_range]                       create a job-array with these tasks
   [-tc max_running_tasks]                  throttle the number of concurrent tasks (experimental)
   [-terse]                                 tersed output, print only the job-id
   [-v variable_list]                       export these environment variables
   [-verify]                                do not submit just verify
   [-V]                                     export all environment variables
   [-w e|w|n|v|p]                           verify mode (error|warning|none|just verify|poke) for jobs
   [-wd working_directory]                  use working_directory
   [-@ file]                                read commandline input from file
   [{command|-} [command_args]]

account_string          account_name
complex_list            complex[,complex,...]
context_list            variable[=value][,variable[=value],...]
ckpt_selector           `n' `s' `m' `x' <interval> 
date_time               [[CC]YY]MMDDhhmm[.SS]
job_identifier_list     {job_id|job_name|reg_exp}[,{job_id|job_name|reg_exp},...]
jsv_url                 [script:][username@]path
mail_address            username[@host]
mail_list               mail_address[,mail_address,...]
mail_options            `e' `b' `a' `n' `s'
working_directory       path
path_list               [host:]path[,[host:]path,...]
file_list               [host:]file[,[host:]file,...]
priority                -1023 - 1024
resource_list           resource[=value][,resource[=value],...]
simple_context_list     variable[,variable,...]
slot_range              [n[-m]|[-]m] - n,m > 0
task_id_range           task_id['-'task_id[':'step]]
variable_list           variable[=value][,variable[=value],...]
wc_cqueue               wildcard expression matching a cluster queue
wc_host                 wildcard expression matching a host
wc_hostgroup            wildcard expression matching a hostgroup
wc_qinstance            wc_cqueue@wc_host
wc_qdomain              wc_cqueue@wc_hostgroup
wc_queue                wc_cqueue|wc_qdomain|wc_qinstance
wc_queue_list           wc_queue[,wc_queue,...]
ar_id                   advance reservation id
max_running_tasks       maximum number of simultaneously running tasks
exp                     explicit:<socket>,<core>[:...]
lin                     linear:<amount>[:<socket>,<core>]
str                     striding:<amount>:<stepsize>[:<socket>,<core>]


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM