最近開始接觸Greenplum,線上也在使用了,感覺還不錯,本次介紹一下集群的部署方法。那么Greenplum的架構如下:
(架構圖來源網絡)
簡單來說GPDB是一個分布式數據庫軟件,其可以管理和處理分布在多個不同主機上的海量數據。對於GPDB來說,一個DB實例實際上是由多個獨立的PostgreSQL實例組成的,它們分布在不同的物理主機上,協同工作,呈現給用戶的是一個DB的效果。Master是GPDB系統的訪問入口,其負責處理客戶端的連接及SQL 命令、協調系統中的其他Instance(Segment)工作,Segment負責管理和處理用戶數據。
環境准備:
操作系統:centos 6.3 64位
master 1台(架構圖中的主節點),Standby 1台(架構圖中的從節點),Segment 2台。共4台服務器。
ip 主機名 10.69.30.251 mdw 10.36.25.152 smdw 10.36.25.250 sdw1 221.5.32.152 sdw2
1. 修改/etc/hosts文件,添加下面內容(4台服務器相同的配置)
10.69.30.251 mdw 10.36.25.152 smdw 10.36.25.250 sdw1 10.37.237.152 sdw2
2. 服務器關閉selinux,防火牆4台服務器相互開放,測試環境可以直接先關閉防火牆。(4台服務器)
3. 操作系統參數設置
vim /etc/sysctl.conf (注:4台服務器)
kernel.shmmax = 500000000 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 250 512000 100 2048 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 net.ipv4.tcp_syncookies = 1 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.default.arp_filter = 1 net.core.netdev_max_backlog = 10000 vm.overcommit_memory = 2 kernel.msgmni = 2048 net.ipv4.ip_local_port_range = 1025 65535
vim /etc/security/limits.conf (注:4台服務器)
* soft nofile 65536 * hard nofile 65536 * soft nproc 131072 * hard nproc 131072
我的操作系統是centos 6.3,所以還有一個文件要修改:那就是/etc/security/limits.d/90-nproc.conf (注:4台服務器都要修改)
修改后的內容如下:
* soft nproc 131072 * hard nproc 131072 * soft nofile 1048576 * hard nofile 1048576
磁盤預讀參數及 deadline算法修改 (注:4台服務器)
blockdev --setra 65536 /dev/sda echo deadline > /sys/block/sda/queue/scheduler
軟件下載地址:https://network.pivotal.io/products/pivotal-gpdb
在Master主機上安裝GP二進制文件,也就是主機名是mdw的服務器。(注:在master上安裝即可,后面通過批量的方法安裝剩下的服務器)
我下載的文件是greenplum-db-4.3.7.3-build-2-RHEL5-x86_64.zip
文件解壓之是一個可執行文件,greenplum-db-4.3.7.3-build-2-RHEL5-x86_64.bin,直接執行安裝。
./greenplum-db-4.3.7.3-build-2-RHEL5-x86_64.bin
然后有些說明,一直回車,出現如下圖按照提示安裝即可完成。
在master上添加用戶
useradd gpadmin
確保所有的服務器安裝了ed軟件,否則后面初始化集群就會報錯。
yum install ed -y
准備文件,all_hosts ,seg_hosts放到/home/gpadmin。(用於批量安裝軟件以及后續集群的初始化)
[gpadmin@mdw ~]$ cat seg_hosts sdw1 sdw2 [gpadmin@mdw ~]$ cat all_hosts mdw smdw sdw1 sdw2 [gpadmin@mdw ~]$
建立節點服務器間的信任(我服務器已經打通了信任的,沒有打通的會要求輸入密碼)
[root@mdw gpadmin]# gpssh-exkeys -f all_hosts [STEP 1 of 5] create local ID and authorize on local host ... /root/.ssh/id_rsa file exists ... key generation skipped [STEP 2 of 5] keyscan all hosts and update known_hosts file [STEP 3 of 5] authorize current user on remote hosts ... send to smdw ... send to sdw1 ... send to sdw2 [STEP 4 of 5] determine common authentication file content [STEP 5 of 5] copy authentication files to all remote hosts ... finished key exchange with smdw ... finished key exchange with sdw1 ... finished key exchange with sdw2 [INFO] completed successfully
批量安裝軟件(GP)
cd /home/gpadmin/
source /usr/local/greenplum-db/greenplum_path.sh gpseginstall -f all_hosts -u gpadmin -p gpadmin
最后出現如下提示代表批量安裝成功。
20160827:13:58:47:020986 gpseginstall:greenplum_master:root-[INFO]:-SUCCESS -- Requested commands completed
檢查是否安裝完成
su - gpadmin source /usr/local/greenplum-db/greenplum_path.sh gpssh -f all_hosts -e ls -l $GPHOME
輸出如下代表成功:

[ mdw] ls -l /usr/local/greenplum-db/. [ mdw] total 276 [ mdw] drwxr-xr-x 3 gpadmin gpadmin 4096 Feb 25 2016 bin [ mdw] drwxr-xr-x 2 gpadmin gpadmin 4096 Feb 25 2016 demo [ mdw] drwxr-xr-x 5 gpadmin gpadmin 4096 Feb 25 2016 docs [ mdw] drwxr-xr-x 2 gpadmin gpadmin 4096 Feb 25 2016 etc [ mdw] drwxr-xr-x 3 gpadmin gpadmin 4096 Feb 25 2016 ext [ mdw] -rw-r--r-- 1 gpadmin gpadmin 43025 Feb 25 2016 GPDB-LICENSE.txt [ mdw] -rw-rw-r-- 1 gpadmin gpadmin 735 Aug 27 12:30 greenplum_path.sh [ mdw] drwxr-xr-x 6 gpadmin gpadmin 4096 Feb 25 2016 include [ mdw] drwxr-xr-x 9 gpadmin gpadmin 4096 Feb 25 2016 lib [ mdw] -rw-r--r-- 1 gpadmin gpadmin 192912 Feb 25 2016 LICENSE.thirdparty [ mdw] drwxr-xr-x 2 gpadmin gpadmin 4096 Aug 27 15:16 sbin [ mdw] drwxr-xr-x 4 gpadmin gpadmin 4096 Feb 25 2016 share [sdw1] ls -l /usr/local/greenplum-db/. [sdw1] total 276 [sdw1] drwxr-xr-x 3 gpadmin gpadmin 4096 Feb 25 2016 bin [sdw1] drwxr-xr-x 2 gpadmin gpadmin 4096 Feb 25 2016 demo [sdw1] drwxr-xr-x 5 gpadmin gpadmin 4096 Feb 25 2016 docs [sdw1] drwxr-xr-x 2 gpadmin gpadmin 4096 Feb 25 2016 etc [sdw1] drwxr-xr-x 3 gpadmin gpadmin 4096 Feb 25 2016 ext [sdw1] -rw-r--r-- 1 gpadmin gpadmin 43025 Feb 25 2016 GPDB-LICENSE.txt [sdw1] -rw-rw-r-- 1 gpadmin gpadmin 735 Aug 27 12:30 greenplum_path.sh [sdw1] drwxr-xr-x 6 gpadmin gpadmin 4096 Feb 25 2016 include [sdw1] drwxr-xr-x 9 gpadmin gpadmin 4096 Feb 25 2016 lib [sdw1] -rw-r--r-- 1 gpadmin gpadmin 192912 Feb 25 2016 LICENSE.thirdparty [sdw1] drwxr-xr-x 2 gpadmin gpadmin 4096 Aug 27 15:16 sbin [sdw1] drwxr-xr-x 4 gpadmin gpadmin 4096 Feb 25 2016 share [smdw] ls -l /usr/local/greenplum-db/. [smdw] total 276 [smdw] drwxr-xr-x 3 gpadmin gpadmin 4096 Feb 25 2016 bin [smdw] drwxr-xr-x 2 gpadmin gpadmin 4096 Feb 25 2016 demo [smdw] drwxr-xr-x 5 gpadmin gpadmin 4096 Feb 25 2016 docs [smdw] drwxr-xr-x 2 gpadmin gpadmin 4096 Feb 25 2016 etc [smdw] drwxr-xr-x 3 gpadmin gpadmin 4096 Feb 25 2016 ext [smdw] -rw-r--r-- 1 gpadmin gpadmin 43025 Feb 25 2016 GPDB-LICENSE.txt [smdw] -rw-rw-r-- 1 gpadmin gpadmin 735 Aug 27 12:30 greenplum_path.sh [smdw] drwxr-xr-x 6 gpadmin gpadmin 4096 Feb 25 2016 include [smdw] drwxr-xr-x 9 gpadmin gpadmin 4096 Feb 25 2016 lib [smdw] -rw-r--r-- 1 gpadmin gpadmin 192912 Feb 25 2016 LICENSE.thirdparty [smdw] drwxr-xr-x 2 gpadmin gpadmin 4096 Aug 27 15:16 sbin [smdw] drwxr-xr-x 4 gpadmin gpadmin 4096 Feb 25 2016 share [sdw2] ls -l /usr/local/greenplum-db/. [sdw2] total 276 [sdw2] drwxr-xr-x 3 gpadmin gpadmin 4096 Feb 25 2016 bin [sdw2] drwxr-xr-x 2 gpadmin gpadmin 4096 Feb 25 2016 demo [sdw2] drwxr-xr-x 5 gpadmin gpadmin 4096 Feb 25 2016 docs [sdw2] drwxr-xr-x 2 gpadmin gpadmin 4096 Feb 25 2016 etc [sdw2] drwxr-xr-x 3 gpadmin gpadmin 4096 Feb 25 2016 ext [sdw2] -rw-r--r-- 1 gpadmin gpadmin 43025 Feb 25 2016 GPDB-LICENSE.txt [sdw2] -rw-rw-r-- 1 gpadmin gpadmin 735 Aug 27 12:30 greenplum_path.sh [sdw2] drwxr-xr-x 6 gpadmin gpadmin 4096 Feb 25 2016 include [sdw2] drwxr-xr-x 9 gpadmin gpadmin 4096 Feb 25 2016 lib [sdw2] -rw-r--r-- 1 gpadmin gpadmin 192912 Feb 25 2016 LICENSE.thirdparty [sdw2] drwxr-xr-x 2 gpadmin gpadmin 4096 Aug 27 15:16 sbin [sdw2] drwxr-xr-x 4 gpadmin gpadmin 4096 Feb 25 2016 share
創建存儲目錄
master,standby 操作命令:
mkdir -p /data/master chown -R gpadmin.gpadmin /data/master/
在master上用下面命令批量操作segment,建立目錄,改權限
gpssh -f seg_hosts -e 'mkdir -p /data/primary' gpssh -f seg_hosts -e 'mkdir -p /data/mirror' gpssh -f seg_hosts -e 'chown -R gpadmin.gpadmin /data/primary' gpssh -f seg_hosts -e 'chown -R gpadmin.gpadmin /data/mirror'
設置NTP同步
在每個Segment主機,以root登錄編輯/etc/ntp.conf文件。設置第一個server參數指向Master主機,第二個server參數指向Standby主機。如下面:
server mdw prefer
server smdw
在Standby主機,以root登錄編輯/etc/ntp.conf文件。設置第一個server參數指向Master主機,第二個參數指向數據中心的時間服務器。
server mdw prefer
在Master主機,使用NTP守護進程同步所有Segment主機的系統時鍾。例如,使用gpssh來完成:
gpssh -f all_hosts -v -e 'ntpd'
輸出如下代表成功:
[root@mdw gpadmin]# gpssh -f all_hosts -v -e 'ntpd' [WARN] Reference default values as $MASTER_DATA_DIRECTORY/gpssh.conf could not be found Using delaybeforesend 0.05 and prompt_validation_timeout 1.0 [Reset ...] [INFO] login mdw [INFO] login smdw [INFO] login sdw1 [INFO] login sdw2 [ mdw] ntpd [smdw] ntpd [sdw1] ntpd [sdw2] ntpd [INFO] completed successfully [Cleanup...]
要配置集群自動同步系統時鍾,應開啟ntpd服務,並設置為開機時自動運行,4台服務器都啟動:
/etc/init.d/ntpd start
檢查環境(我這里檢查一下操作系統參數設置是否ok,網絡,io,都可以檢查的)
[root@mdw gpadmin]# gpcheck -f all_hosts -m mdw 20160827:16:47:47:022142 gpcheck:mdw:root-[INFO]:-dedupe hostnames 20160827:16:47:47:022142 gpcheck:mdw:root-[INFO]:-Detected platform: Generic Linux Cluster 20160827:16:47:47:022142 gpcheck:mdw:root-[INFO]:-generate data on servers 20160827:16:47:47:022142 gpcheck:mdw:root-[INFO]:-copy data files from servers 20160827:16:47:48:022142 gpcheck:mdw:root-[INFO]:-delete remote tmp files 20160827:16:47:48:022142 gpcheck:mdw:root-[INFO]:-Using gpcheck config file: /usr/local/greenplum-db/./etc/gpcheck.cnf 20160827:16:47:48:022142 gpcheck:mdw:root-[INFO]:-GPCHECK_NORMAL 20160827:16:47:48:022142 gpcheck:mdw:root-[INFO]:-gpcheck completing...
在master上進行下面的操作:
創建Greenplum數據庫配置文件,以gpadmin用戶登錄
su - gpadmin cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpinitsystem_config chmod 775 gpinitsystem_config
設置所有必須的參數
ARRAY_NAME="EMC Greenplum DW" SEG_PREFIX=gpseg PORT_BASE=40000 declare -a DATA_DIRECTORY=(/data/primary) MASTER_HOSTNAME=mdw MASTER_DIRECTORY=/data/master MASTER_PORT=5432 TRUSTED_SHELL=ssh CHECK_POINT_SEGMENTS=8 ENCODING=UNICODE
設置可選參數
MIRROR_PORT_BASE=50000 REPLICATION_PORT_BASE=41000 declare -a MIRROR_DATA_DIRECTORY=(/data/mirror)
運行初始化工具初始化數據庫
su - gpadmin source /usr/local/greenplum-db/greenplum_path.sh gpinitsystem -c gpinitsystem_config -h seg_hosts
正常情況下初始化完成以后最后的日志沒有任何報錯,而且gp進程也是啟動了的。
初始化日志:
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Review options for gpinitstandby 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:------------------------------------------------------- 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-The Master /data/master/gpseg-1/pg_hba.conf post gpinitsystem 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-has been configured to allow all hosts within this new 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-new array must be explicitly added to this file 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-located in the /usr/local/greenplum-db/./docs directory 20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------
啟動進程:
[root@mdw gpadmin]# ps -ef | grep postgres gpadmin 18082 1 0 16:23 ? 00:00:00 /usr/local/greenplum-db-4.3.7.3/bin/postgres -D /data/master/gpseg-1 -p 5432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E gpadmin 18083 18082 0 16:23 ? 00:00:00 postgres: port 5432, master logger process gpadmin 18086 18082 0 16:23 ? 00:00:00 postgres: port 5432, stats collector process gpadmin 18087 18082 0 16:23 ? 00:00:00 postgres: port 5432, writer process gpadmin 18088 18082 0 16:23 ? 00:00:00 postgres: port 5432, checkpoint process gpadmin 18089 18082 0 16:23 ? 00:00:00 postgres: port 5432, seqserver process gpadmin 18090 18082 0 16:23 ? 00:00:00 postgres: port 5432, ftsprobe process gpadmin 18091 18082 0 16:23 ? 00:00:00 postgres: port 5432, sweeper process root 22716 20223 0 16:52 pts/0 00:00:00 grep postgres [root@mdw gpadmin]#
現在只有1個master,2個segment,沒有standby,那么接下來把standby加入集群。
在master服務器上執行
gpinitstandby -s smdw
輸出如下:

[gpadmin@mdw ~]$ gpinitstandby -s smdw 20160827:16:59:24:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking for filespace directory /data/master/gpseg-1 on smdw 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------ 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master initialization parameters 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------ 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master hostname = mdw 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master data directory = /data/master/gpseg-1 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master port = 5432 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master hostname = smdw 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master port = 5432 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master data directory = /data/master/gpseg-1 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum update system catalog = On 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------ 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:- Filespace locations 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------ 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_system -> /data/master/gpseg-1 Do you want to continue with standby master initialization? Yy|Nn (default=N): > y 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-The packages on smdw are consistent. 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Adding standby master to catalog... 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Database catalog updated successfully. 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating pg_hba.conf file... 20160827:16:59:37:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_hba.conf files updated successfully. 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating filespace flat files... 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Filespace flat file updated successfully. 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Starting standby master 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw in directory: /data/master/gpseg-1 20160827:16:59:40:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files... 20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully. 20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Successfully created standby master on smdw
設置環境變量,master,standby都設置。
1. 以gpadmin用戶登錄:
su - gpadmin
2. 打開用戶啟動文件(如.bashrc)。比如:
vim ~/.bashrc
3. 添加新行用以加載greenplum_path.sh文件和設置MASTER_DATA_DIRECTORY環境變量。例如:
source /usr/local/greenplum-db/greenplum_path.sh export MASTER_DATA_DIRECTORY=/data/master/gpseg-1
啟動和停止數據庫測試是否能正常啟動和關閉,命令如下
gpstart
gpstop
到此 greenplum 就部署完成了。下面進行一些簡單的測試。
登錄數據庫:psql -d postgres
建表,插入,查詢
postgres=# create table student ( no int primary key,student_name varchar(40),age int); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "student_pkey" for table "student" CREATE TABLE postgres=# insert into student values(1,'yayun',18); INSERT 0 1 postgres=# select * from student; no | student_name | age ----+--------------+----- 1 | yayun | 18 (1 row)