CentOS7 安装SGE 过程
注:
1、节点之间需要做免密登录
2、需要做NIS用户同步
3、共享家目录
服务端
一、修改主机名,写入hosts
localhost# hostnamectl set-hostname master master# vi /etc/hosts 192.168.56.101 master 192.168.56.102 compute01
二、创建SGEADMIN
master# groupadd -g 490 sgeadmin master# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin"sgeadmin master# visudo %sgeadmin ALL=(ALL) NOPASSWD: ALL
三、安装依赖文件
master# yum -y install jemalloc-devel openssl-devel ncurses-devel pam-devel libXmu-devel hwloc-devel hwloc hwloc-libs java-devel javacc ant-junit libdb-devel motif-devel csh ksh xterm db4-utils perl-XML-Simple perl-Env xorg-x11-fonts-ISO8859-1-100dpi xorg-x11-fonts-ISO8859-1-75dpi
四、下载源码编译安装
master# mkdir -p /BiO/src master# cd /BiO/src master# wget http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge-8.1.9.tar.gz master# tar -zxvf sge-8.1.9.tar.gz master# cd sge-8.1.9/source/ master# sh scripts/bootstrap.sh && ./aimk && ./aimk -man master# export SGE_ROOT=/BiO/gridengine && mkdir $SGE_ROOT master# echo Y | ./scripts/distinst -local -allall -libs -noexit master# chown -R sgeadmin:sgeadmin /BiO/gridengine master# cd $SGE_ROOT master# ./install_qmaster (一直回车即可)
press enter at the intro screen press "y" and then specify sgeadmin as the user id leave the install dir as /BiO/gridengine You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file accept the sge_qmaster info You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file accept the sge_execd info leave the cell name as "default" Enter an appropriate cluster name when requested leave the spool dir as is press "n" for no windows hosts! press "y" (permissions are set correctly) press "y" for all hosts in one domain If you have Java available on your Qmaster and wish to use SGE Inspect or SDM then enable the JMX MBean server and provide the requested information - probably answer "n" at this point! press enter to accept the directory creation notification enter "classic" for classic spooling (berkeleydb may be more appropriate for large clusters) press enter to accept the next notice enter "20000-20100" as the GID range (increase this range if you have execution nodes capable of running more than 100 concurrent jobs) accept the default spool dir or specify a different folder (for example if you wish to use a shared or local folder outside of SGE_ROOT enter an email address that will be sent problem reports press "n" to refuse to change the parameters you have just configured press enter to accept the next notice press "y" to install the startup scripts press enter twice to confirm the following messages press "n" for a file with a list of hosts enter the names of your hosts who will be able to administer and submit jobs (enter alone to finish adding hosts) skip shadow hosts for now (press "n") choose "1" for normal configuration and agree with "y" press enter to accept the next message and "n" to refuse to see the previous screen again and then finally enter to exit the installer
master# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/
五、添加计算机到sge集群
master# qconf -ah compute01
六、NFS共享安装路径
master# yum -y install nfs-utils master# vi /etc/exports /BiO 192.168.56.0/24(rw,no_root_squash) master# systemctl start rpcbind nfs-server master# systemctl enable rpcbind nfs-server
客户端
一、修改主机名,写入hosts
compute01# hostnamectl set-hostname compute01 compute01# vi /etc/hosts 192.168.56.101 master 192.168.56.102 compute01
二、创建AGEADMIN
compute01# groupadd -g 490 sgeadmin compute01# useradd -u 495 -g 490 -r -m -d /home/sgeadmin -s /bin/bash -c "SGE Admin" sgeadmin
三、安装依赖
compute01# yum -y install hwloc-devel
四、挂载服务端安装目录
compute01# yum -y install nfs-utils compute01# systemctl start rpcbind compute01# systemctl enable rpcbind compute01# mkdir /BiO compute01# mount -t nfs 192.168.56.101:/BiO /BiO
五、安装客户端
compute01# export SGE_ROOT=/BiO/gridengine compute01# export SGE_CELL=default compute01# cd $SGE_ROOT compute01# ./install_execd compute01# cp /BiO/gridengine/default/common/settings.sh /etc/profile.d/
六、使用命令
#qstat -u "*" 显示所有人的任务
#qhost 显示所有资源信息
#qhost -q 显示节点是否离线 au异常
#qdel jobsID 杀掉某个任务
#qdel -u "user" 杀掉某用户所有任务
#qconf -ah client1 添加主机到sge集群
#qconf -as server 添加主机为submit
#qconf -sh 查看集群有哪些主机
#qconf -sql 查看所有队列
添加队列的方法
#qconf -ahgrp @allcu 添加节点组
group_name @allcu
hostlist hcu-0001 hcu-0002 hcu-0003 hcu-0004 hcu-0005

# qconf -shgrp @allcu 查看节点组
# qconf -mhgrp @allcu 修改节点组
# qconf -aq cu 添加cu队列,需要修改的地方已标红
