在上一篇博客中我們簡單的介紹了下rabbitmq安裝配置相關指令的說明以及rabbitmqctl的相關子命令的說明;回顧請參考https://www.cnblogs.com/qiuhom-1874/p/13561245.html;今天我們來聊一聊rabbitmq集群;之所以要用集群是因為在一個分布式應用環境中,rabbitmq的作用是連接各組件,一旦rabbitmq服務掛掉,可能影響整個線上業務,為了避免這樣的問題發生,我們就必須想辦法對rabbitmq做高可用,能夠讓集群中的每個rabbitmq節點把自身接收到的消息通過網絡同步到其他節點,這樣一來使得每個節點都有整個rabbitmq集群的所有消息,即便其中一台rabbitmq宕機不影響消息丟失的情況;rabbitmq集群它的主要作用就是各節點互相同步消息,從而實現了數據的冗余;除了rabbitmq的數據冗余,我們還需要考慮,一旦后端有多台rabbitmq我們就需要通過對后端多台rabbitmq-server做負載均衡,使得每個節點能夠分擔一部分流量,同時對客戶端訪問提供一個統一的訪問接口;客戶端就可以基於負載均衡的地址來請求rabbitmq,通過負載均衡調度,把客戶端的請求分攤到后端多個rabbitmq上;如果某一台rabbitmq宕機了,根據負載均衡的健康狀態監測,自動將請求不調度到宕機的rabbitmq-server上,從而也實現了對rabbitmq高可用;
在實現rabbitmq集群前我們需要做以下准備
1、更改各節點的主機名同hosts文件解析的主機名相同,必須保證各節點主機名稱不一樣,並且可以通過hosts文件解析出來;
2、時間同步,時間同步對於一個集群來講是最基本的要求;
3、各節點的cookie信息必須保持一致;
實驗環境說明
節點名 | 主機名 | ip地址 |
node01 | node01 | 192.168.0.41 |
node2 | node2 | 192.168.0.42 |
負載均衡 | node3 | 192.168.0.43 |
1、配置各節點的主機名稱
[root@node01 ~]# hostnamectl set-hostname node01 [root@node01 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.0.41 node01 192.168.0.42 node2 192.168.0.43 node3 [root@node01 ~]# scp /etc/hosts node2:/etc/ hosts 100% 218 116.4KB/s 00:00 [root@node01 ~]# scp /etc/hosts node3:/etc/ hosts 100% 218 119.2KB/s 00:00 [root@node01 ~]#
提示:對於rabbitmq集群來講就只有node01和node2,這兩個節點互相同步消息;而負載均衡是為了做流量負載而設定的,本質上不屬於rabbitmq集群;所以對於負載均衡的主機名是什么都可以;
驗證:鏈接個節點驗證主機名是否正確,以及hosts文件
[root@node2 ~]# hostname node2 [root@node2 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.0.41 node01 192.168.0.42 node2 192.168.0.43 node3 [root@node2 ~]#
在各節點安裝rabbitmq-server
yum install rabbitmq-server -y
啟動各節點rabbitmq-server
提示:node01上啟動了rabbitmq-management插件,所以15672處於監聽;而node2沒有啟動rabbitmq-management插件,15672端口並沒有處於監聽狀體;對於一個rabbitmq集群,25672這個端口就是專用於集群個節點通信;
現在基本環境已經准備好,現在我們就可以來配置集群了,rabbitmq集群的配置非常簡單,默認情況啟動一個rabbitmq,它就是一個集群,所以25672處於監聽狀態嘛,只不過集群中就只有一個自身節點;
驗證:各節點集群狀態信息,節點名是否同主機hostname名稱相同
提示:從上面的信息可以看到兩個節點的集群名稱都是同host主機名相同;
停止node2上的應用,把node2加入node01集群
提示:這里提示我們無法連接到rabbit@node01,出現以上錯誤的主要原因有兩個,第一個是主機名稱解析不正確;第二是cookie不一致;
復制cookie信息
[root@node2 ~]# scp /var/lib/rabbitmq/.erlang.cookie node01:/var/lib/rabbitmq/ The authenticity of host 'node01 (192.168.0.41)' can't be established. ECDSA key fingerprint is SHA256:EG9nua4JJuUeofheXlgQeL9hX5H53JynOqf2vf53mII. ECDSA key fingerprint is MD5:57:83:e6:46:2c:4b:bb:33:13:56:17:f7:fd:76:71:cc. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'node01,192.168.0.41' (ECDSA) to the list of known hosts. .erlang.cookie 100% 20 10.6KB/s 00:00 [root@node2 ~]#
驗證:md5sum驗證各節點cookie是否一致
[root@node2 ~]# md5sum /var/lib/rabbitmq/.erlang.cookie 1d4f9e4d6c92cf0c749cc4ace68317f6 /var/lib/rabbitmq/.erlang.cookie [root@node2 ~]# ssh node01 Last login: Wed Aug 26 19:41:30 2020 from 192.168.0.232 [root@node01 ~]# md5sum /var/lib/rabbitmq/.erlang.cookie 1d4f9e4d6c92cf0c749cc4ace68317f6 /var/lib/rabbitmq/.erlang.cookie [root@node01 ~]#
提示:現在兩個節點的cookie信息一致了,再次把node2加入到node01上看看是否能夠加入?
[root@node2 ~]# rabbitmqctl join_cluster rabbit@node01 Clustering node rabbit@node2 with rabbit@node01 ... Error: unable to connect to nodes [rabbit@node01]: nodedown DIAGNOSTICS =========== attempted to contact: [rabbit@node01] rabbit@node01: * connected to epmd (port 4369) on node01 * epmd reports node 'rabbit' running on port 25672 * TCP connection succeeded but Erlang distribution failed * suggestion: hostname mismatch? * suggestion: is the cookie set correctly? current node details: - node name: rabbitmqctl2523@node2 - home dir: /var/lib/rabbitmq - cookie hash: HU+eTWySzwx0nMSs5oMX9g== [root@node2 ~]#
提示:還是提示我們加不進去,這里的原因是我們更新了node01的cookie信息,沒有重啟rabbitmq-server,所以它默認還是以前的cookie;
重啟node01上的rabbitmq-server
[root@node01 ~]# systemctl restart rabbitmq-server.service [root@node01 ~]# ss -tnl State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 127.0.0.1:631 *:* LISTEN 0 128 *:15672 *:* LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 100 127.0.0.1:64667 *:* LISTEN 0 128 *:8000 *:* LISTEN 0 128 *:8001 *:* LISTEN 0 128 *:25672 *:* LISTEN 0 5 127.0.0.1:8010 *:* LISTEN 0 128 *:111 *:* LISTEN 0 128 *:80 *:* LISTEN 0 128 *:4369 *:* LISTEN 0 5 192.168.122.1:53 *:* LISTEN 0 128 *:22 *:* LISTEN 0 128 ::1:631 :::* LISTEN 0 100 ::1:25 :::* LISTEN 0 128 :::5672 :::* LISTEN 0 128 :::111 :::* LISTEN 0 128 :::80 :::* LISTEN 0 128 :::4369 :::* LISTEN 0 128 :::22 :::* [root@node01 ~]#
提示:如果是把node01的cookie復制給node2,我們需要重啟node2,總之拿到新cookie節點都要重啟,保證在用cookie的信息一致就可以了;
再次把node2加入到node01
[root@node2 ~]# rabbitmqctl join_cluster rabbit@node01 Clustering node rabbit@node2 with rabbit@node01 ... ...done. [root@node2 ~]#
提示:加入對應節點集群沒有報錯就表示加入集群成功;
驗證:查看各節點的集群狀態信息
提示:在兩個節點上我們都可以看到兩個節點;到此node2就加入到node01這個集群中了;但是兩個節點的集群狀態信息不一樣,原因是node2上沒有啟動應用,啟動應用以后,它倆的狀態信息就會是一樣;
啟動node2上的應用
提示:此時兩個節點的狀態信息就一樣了;到此rabbitmq集群就搭建好了;
驗證:在瀏覽器登錄node1的15672,看看web管理界面是否有節點信息?
提示:node2之所以沒有統計信息是因為node2上沒有啟動rabbitmq-management插件;啟用插件就可以統計到數據;
rabbitmqctl集群相關子命令
join_cluster <clusternode> [--ram]:加入指定節點集群;
cluster_status:查看集群狀態
change_cluster_node_type disc | ram:更改節點存儲類型,disc表示磁盤,ram表示內存;一個集群中必須有一個節點為disc類型;
[root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}, {running_nodes,[rabbit@node01,rabbit@node2]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node2 ~]# rabbitmqctl change_cluster_node_type ram Turning rabbit@node2 into a ram node ... Error: mnesia_unexpectedly_running [root@node2 ~]#
提示:這里提示我們mnesia_unexpectedly_running,所以我們更改不了節點類型;解決辦法是停止node2上的應用,然后在更改類型,在啟動應用即可;
[root@node2 ~]# rabbitmqctl stop_app Stopping node rabbit@node2 ... ...done. [root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}] ...done. [root@node2 ~]# rabbitmqctl change_cluster_node_type ram Turning rabbit@node2 into a ram node ... ...done. [root@node2 ~]# rabbitmqctl start_app Starting node rabbit@node2 ... ...done. [root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01]},{ram,[rabbit@node2]}]}, {running_nodes,[rabbit@node01,rabbit@node2]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node2 ~]#
提示:可以看到node2就變成了ram類型了;
[root@node01 ~]# rabbitmqctl change_cluster_node_type ram Turning rabbit@node01 into a ram node ... Error: mnesia_unexpectedly_running [root@node01 ~]# rabbitmqctl stop_app Stopping node rabbit@node01 ... ...done. [root@node01 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node01 ... [{nodes,[{disc,[rabbit@node01]},{ram,[rabbit@node2]}]}] ...done. [root@node01 ~]# rabbitmqctl change_cluster_node_type ram Turning rabbit@node01 into a ram node ... Error: {resetting_only_disc_node,"You cannot reset a node when it is the only disc node in a cluster. Please convert another node of the cluster to a disc node first."} [root@node01 ~]#
提示:這里需要注意一個集群中至少保持一個節點是disc類型;所以node2更改成ram類型,node01就必須是disc類型;
forget_cluster_node [--offline]:離開集群;
[root@node01 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node01 ... [{nodes,[{disc,[rabbit@node01]},{ram,[rabbit@node2]}]}, {running_nodes,[rabbit@node2,rabbit@node01]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node01 ~]# rabbitmqctl forget_cluster_node rabbit@node2 Removing node rabbit@node2 from cluster ... Error: {failed_to_remove_node,rabbit@node2, {active,"Mnesia is running",rabbit@node2}} [root@node01 ~]#
提示:我們在node01上移除node2,提示我們node2節點處於活躍狀態不能移除;這也告訴我們這個子命令只能移除不在線的節點;
下線node2上的應用
[root@node2 ~]# rabbitmqctl stop_app Stopping node rabbit@node2 ... ...done. [root@node2 ~]#
再次移除node2
[root@node01 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node01 ... [{nodes,[{disc,[rabbit@node01]},{ram,[rabbit@node2]}]}, {running_nodes,[rabbit@node01]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node01 ~]# rabbitmqctl forget_cluster_node rabbit@node2 Removing node rabbit@node2 from cluster ... ...done. [root@node01 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node01 ... [{nodes,[{disc,[rabbit@node01]}]}, {running_nodes,[rabbit@node01]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node01 ~]#
update_cluster_nodes clusternode:更新集群節點信息;
把node2加入node01這個集群
[root@node2 ~]# rabbitmqctl stop_app Stopping node rabbit@node2 ... ...done. [root@node2 ~]# rabbitmqctl join_cluster rabbit@node01 Clustering node rabbit@node2 with rabbit@node01 ... ...done. [root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}] ...done. [root@node2 ~]# rabbitmqctl start_app Starting node rabbit@node2 ... ...done. [root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}, {running_nodes,[rabbit@node01,rabbit@node2]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node2 ~]#
停掉node2上的應用
[root@node2 ~]# rabbitmqctl stop_app Stopping node rabbit@node2 ... ...done. [root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}] ...done. [root@node2 ~]#
提示:如果此時有新節點加入集群,如果在把node01上的應用停掉,node2再次啟動應用就會提示錯誤;如下
把node3加入node01
[root@node3 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node3 ... [{nodes,[{disc,[rabbit@node3]}]}, {running_nodes,[rabbit@node3]}, {cluster_name,<<"rabbit@node3">>}, {partitions,[]}] ...done. [root@node3 ~]# rabbitmqctl stop_app Stopping node rabbit@node3 ... ...done. [root@node3 ~]# rabbitmqctl join_cluster rabbit@node01 Clustering node rabbit@node3 with rabbit@node01 ... ...done. [root@node3 ~]# rabbitmqctl start_app Starting node rabbit@node3 ... ...done. [root@node3 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node3 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]}, {running_nodes,[rabbit@node01,rabbit@node3]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node3 ~]#
停掉node01上的應用
[root@node01 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node01 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]}, {running_nodes,[rabbit@node3,rabbit@node01]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node01 ~]# rabbitmqctl stop_app Stopping node rabbit@node01 ... ...done. [root@node01 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node01 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]}] ...done. [root@node01 ~]#
啟動node2上的應用
[root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2]}]}] ...done. [root@node2 ~]# rabbitmqctl start_app Starting node rabbit@node2 ... BOOT FAILED =========== Error description: {could_not_start,rabbit, {bad_return, {{rabbit,start,[normal,[]]}, {'EXIT', {rabbit,failure_during_boot, {error, {timeout_waiting_for_tables, [rabbit_user,rabbit_user_permission,rabbit_vhost, rabbit_durable_route,rabbit_durable_exchange, rabbit_runtime_parameters, rabbit_durable_queue]}}}}}}} Log files (may contain more information): /var/log/rabbitmq/rabbit@node2.log /var/log/rabbitmq/rabbit@node2-sasl.log Error: {rabbit,failure_during_boot, {could_not_start,rabbit, {bad_return, {{rabbit,start,[normal,[]]}, {'EXIT', {rabbit,failure_during_boot, {error, {timeout_waiting_for_tables, [rabbit_user,rabbit_user_permission, rabbit_vhost,rabbit_durable_route, rabbit_durable_exchange, rabbit_runtime_parameters, rabbit_durable_queue]}}}}}}}} [root@node2 ~]#
提示:此時node2就啟動不起來了,這時我們就需要用到update_cluster_nodes子命令向node3更新集群信息,然后再次在node2上啟動應用就不會報錯了;
向node3詢問更新集群節點信息,並啟動node2上的應用
[root@node2 ~]# rabbitmqctl update_cluster_nodes rabbit@node3 Updating cluster nodes for rabbit@node2 from rabbit@node3 ... ...done. [root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]}] ...done. [root@node2 ~]# rabbitmqctl start_app Starting node rabbit@node2 ... ...done. [root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]}, {running_nodes,[rabbit@node3,rabbit@node2]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node2 ~]#
提示:可以看到更新了集群節點信息后,在node2上查看集群狀態信息就可以看到node3了;此時在啟動node2上的應用就沒有任何問題;
sync_queue queue:同步指定隊列;
cancel_sync_queue queue:取消指定隊列同步
set_cluster_name name:設置集群名稱
[root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]}, {running_nodes,[rabbit@node01,rabbit@node3,rabbit@node2]}, {cluster_name,<<"rabbit@node01">>}, {partitions,[]}] ...done. [root@node2 ~]# rabbitmqctl set_cluster_name rabbit@rabbit_node02 Setting cluster name to rabbit@rabbit_node02 ... ...done. [root@node2 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@node2 ... [{nodes,[{disc,[rabbit@node01,rabbit@node2,rabbit@node3]}]}, {running_nodes,[rabbit@node01,rabbit@node3,rabbit@node2]}, {cluster_name,<<"rabbit@rabbit_node02">>}, {partitions,[]}] ...done. [root@node2 ~]#
提示:在集群任意一個節點更改名稱都會同步到其他節點;也就是說集群狀態信息在每個節點都是保持一致的;
基於haproxy負載均衡rabbitmq集群
1、安裝haproxy
[root@node3 ~]# yum install -y haproxy Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirrors.aliyun.com * extras: mirrors.aliyun.com * updates: mirrors.aliyun.com Resolving Dependencies --> Running transaction check ---> Package haproxy.x86_64 0:1.5.18-9.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ==================================================================================================== Package Arch Version Repository Size ==================================================================================================== Installing: haproxy x86_64 1.5.18-9.el7 base 834 k Transaction Summary ==================================================================================================== Install 1 Package Total download size: 834 k Installed size: 2.6 M Downloading packages: haproxy-1.5.18-9.el7.x86_64.rpm | 834 kB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : haproxy-1.5.18-9.el7.x86_64 1/1 Verifying : haproxy-1.5.18-9.el7.x86_64 1/1 Installed: haproxy.x86_64 0:1.5.18-9.el7 Complete! [root@node3 ~]#
提示:haproxy可以重新找個主機部署,也可以在集群中的某台節點上部署;建議重新找個主機部署,這樣可避免端口沖突;
配置haproxy
提示:以上就是haproxy負載均衡rabbitmq集群的示例,我們通過使用haproxy的tcp模式去代理rabbitmq,並且使用輪詢的算法把請求調度到后端server上;
驗證:啟動haproxy,看看對應的端口是否處於監聽狀態,狀態頁面是否能夠正常檢測到后端server是否在線?
提示:此時負載均衡就搭建好了,后續使用這個集群,我們就可以把這個負載均衡上監聽的地址給用戶訪問即可;這里要考慮一點haproxy是新的單點;
在瀏覽器打開haproxy的狀態頁看看后端server是否在線?
提示:可以看到后端3台rabbitmq-server都是正常在線;
停止node3上的rabbitmq,看看haproxy是否能夠及時發現node3不再線,並把它標記為down?
提示:我們根據haproxy對后端server做健康狀態檢查來實現rabbitmq集群的故障轉移,所以對於rabbitmq集群來講,它只復制消息的同步,實現數據冗余,真正高可用還是要靠前端的調度器實現;對於nginx負載均衡rabbitmq可以參考ngixn對tcp協議的代理來寫配置;有關nginx負載均衡tcp應用相關話題,可以參考本人博客https://www.cnblogs.com/qiuhom-1874/p/12468946.html我這里就不過多闡述;