Centos7 SGE安裝部署


CentOS7 安裝SGE 過程
注:
1、節點之間需要做免密登錄
2、需要做NIS用戶同步
3、共享家目錄
服務端
一、修改主機名,寫入hosts
localhost# hostnamectl set-hostname master
master# vi /etc/hosts
192.168.56.101 master
192.168.56.102 compute01
二、創建SGEADMIN
master# groupadd -g 490 sgeadmin
master# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin"sgeadmin
master# visudo
%sgeadmin ALL=(ALL) NOPASSWD: ALL

三、安裝依賴文件

master# yum -y install jemalloc-devel openssl-devel ncurses-devel pam-devel libXmu-devel hwloc-devel hwloc hwloc-libs java-devel javacc ant-junit libdb-devel motif-devel csh ksh xterm db4-utils perl-XML-Simple perl-Env xorg-x11-fonts-ISO8859-1-100dpi xorg-x11-fonts-ISO8859-1-75dpi
 
四、下載源碼編譯安裝
master# mkdir -p /BiO/src
master# cd /BiO/src
master# wget http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge-8.1.9.tar.gz
master# tar -zxvf sge-8.1.9.tar.gz
master# cd sge-8.1.9/source/
master# sh scripts/bootstrap.sh && ./aimk && ./aimk -man
master# export SGE_ROOT=/BiO/gridengine && mkdir $SGE_ROOT
master# echo Y | ./scripts/distinst -local -allall -libs -noexit
master# chown -R sgeadmin:sgeadmin /BiO/gridengine
master# cd $SGE_ROOT
master# ./install_qmaster (一直回車即可)
press enter at the intro screen
press "y" and then specify sgeadmin as the user id
leave the install dir as /BiO/gridengine
You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
accept the sge_qmaster info
You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file
accept the sge_execd info
leave the cell name as "default"
Enter an appropriate cluster name when requested
leave the spool dir as is
press "n" for no windows hosts!
press "y" (permissions are set correctly)
press "y" for all hosts in one domain
If you have Java available on your Qmaster and wish to use SGE Inspect or SDM then enable the JMX MBean server and provide the requested information - probably answer "n" at this point!
press enter to accept the directory creation notification
enter "classic" for classic spooling (berkeleydb may be more appropriate for large clusters)
press enter to accept the next notice
enter "20000-20100" as the GID range (increase this range if you have execution nodes capable of running more than 100 concurrent jobs)
accept the default spool dir or specify a different folder (for example if you wish to use a shared or local folder outside of SGE_ROOT
enter an email address that will be sent problem reports
press "n" to refuse to change the parameters you have just configured
press enter to accept the next notice
press "y" to install the startup scripts
press enter twice to confirm the following messages
press "n" for a file with a list of hosts
enter the names of your hosts who will be able to administer and submit jobs (enter alone to finish adding hosts)
skip shadow hosts for now (press "n")
choose "1" for normal configuration and agree with "y"
press enter to accept the next message and "n" to refuse to see the previous screen again and then finally enter to exit the installer
master# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/
五、添加計算機到sge集群
master# qconf -ah compute01
六、NFS共享安裝路徑
master# yum -y install nfs-utils
master# vi /etc/exports
/BiO 192.168.56.0/24(rw,no_root_squash)
master# systemctl start rpcbind nfs-server
master# systemctl enable rpcbind nfs-server

客戶端

一、修改主機名,寫入hosts
compute01# hostnamectl set-hostname compute01
compute01# vi /etc/hosts
192.168.56.101 master
192.168.56.102 compute01
二、創建AGEADMIN
compute01# groupadd -g 490 sgeadmin
compute01# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin" sgeadmin
三、安裝依賴
compute01# yum -y install hwloc-devel
四、掛載服務端安裝目錄
compute01# yum -y install nfs-utils
compute01# systemctl start rpcbind
compute01# systemctl enable rpcbind
compute01# mkdir /BiO
compute01# mount -t nfs 192.168.56.101:/BiO /BiO
五、安裝客戶端
compute01# export SGE_ROOT=/BiO/gridengine
compute01# export SGE_CELL=default
compute01# cd $SGE_ROOT
compute01# ./install_execd
compute01# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/

六、使用命令

#qstat -u "*"         顯示所有人的任務
#qhost                  顯示所有資源信息
#qhost -q             顯示節點是否離線 au異常
#qdel jobsID          殺掉某個任務
#qdel -u "user"      殺掉某用戶所有任務
#qconf -ah client1  添加主機到sge集群
#qconf -as server  添加主機為submit
#qconf -sh            查看集群有哪些主機
#qconf -sql           查看所有隊列
 
添加隊列的方法
#qconf -ahgrp @allcu 添加節點組
group_name @allcu
hostlist hcu-0001 hcu-0002 hcu-0003 hcu-0004 hcu-0005
# qconf -shgrp @allcu     查看節點組
# qconf -mhgrp @allcu    修改節點組
# qconf -aq cu                添加cu隊列,需要修改的地方已標紅
 
 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM