我們在使用RabbitMQ的過程中遇到了一個很麻煩的問題。(RabbitMQ version 2.8.1, Erlang version 5.7.4)
我們的使用場景如下:
1.cluster模式(假設有3台機器組成的集群);
2.一個exchange,后邊綁定多個隊列;
3.多個producer(producer數目無法減少,和上游流程的處理能力相關);
4.producer可能向多個隊列里發消息,消息的到來不均勻,導致可能存在多個producer同時向一個隊列持續發送消息的狀況;
5.消息大小在100KB上下;
6.客戶端隨機連接一台server;
7.為了增加發送消息的速率,一個連接對應10個channel;
8.消費者能力足夠;
發現的問題,如果由多個producer(假如有50個)同時經由一台中轉機向同一個隊列里發送消息(中轉機是指隊列不在連接所在的機器上),內存出現暴漲,即使隊列為空,也是如此,如下圖所示:
顯示內存的消耗主要來自於Erlang的binaries類型。
自己使用的rabbitmq-c客戶端(參見RabbitMQ用戶指南(RabbitMQ-C)),為了避免是客戶端的原因造成的,還測了python的客戶端pika 0.9.5,現象依舊。
經測試,避免內存暴漲有如下兩個措施:
1.減少producer數(比如從50個減到5個,並不會明顯降低發送消息的速率);
2.在發送消息選擇連接時,只使用隊列所在的連接發送消息(意味着客戶端需要知道隊列所在的server地址);
伴隨而來的問題:
1.RabbitMQ自帶的HA機制不可用;
2.Topic轉發類型不可用;(因為這兩個都涉及到消息的轉發);
當然,使用場景比較極端,大家在使用時,如果消息數不多或是對速率沒有很高的要求,應該不會遇到這個“大坑”;
附與開發者的郵件溝通(Liu Hao is me and Matthias is a RabbitMQ developper):
1.Hi, all,
I am a RabbitMQ user in China. When we use in cluster pattern, we have found a problem. The problem is that if we use many producers (such as fifty) to send messages to the same queue ceaselessly and the connection to the RabbitMQ cluster is not the server which the queue is , the memory increases very quickly. If we use less producers (such as five) or we use the connection to the server which the queue is, the memory is the normal. The consumers have enough power to consume the message. The version of RabbitMQ is 2.8.1.
Has anyone met the same problem?
The attachment is the monitor interface.
Thank all of you very much.
--
-------------------------------------------------------------
劉浩
網絡與交換技術國家重點實驗室
北京郵電大學
電子郵件: liuhaobupt@gmail.com
-------------------------------------------------------------
Liu Hao
State Key Laboratory of Networking & Switching Technology
Beijing University of Posts & Telecommunications
Email&Gtalk: liuhaobupt@gmail.com
2.Matthias Radestock
8月13日 (1 天前)
發送至 Discussions, 我
On 13/08/12 02:53, Liu Hao wrote:
I am a RabbitMQ user in China. When we use in cluster pattern,
we have found a problem. The problem is that if we use many producers
(such as fifty) to send messages to the same queue ceaselessly and the
connection to the RabbitMQ cluster is not the server which the queue is
, the memory increases very quickly. If we use less producers (such as
five) or we use the connection to the server which the queue is, the
memory is the normal. The consumers have enough power to consume the
message. The version of RabbitMQ is 2.8.1.
Please post the output of 'rabbitmqctl report' for all three machines in your cluster at the time the memory threshold has been exceeded on the queue node.
Matthias.
3.Liu Hao
10:40 (7 小時前)
發送至 Matthias, Discussions
Hi, Matthias,
The destination queue's name is "mqclient_test_queue_1" and it is on the cnbj-cuc-tst01-crl0015 node.
The rabbitmqctl report output of cnbj-cuc-tst01-crl0015 node is "cnbj-cuc-tst01-crl0015" attachment, the other two are similar.
The picture is the monitor interface.
Thank you very much.
4.Matthias Radestock
12:12 (5 小時前)
發送至 我, Discussions
On 14/08/12 03:40, Liu Hao wrote:
> The destination queue's name is "mqclient_test_queue_1" and it is
> on the cnbj-cuc-tst01-crl0015 node.
>
> The rabbitmqctl report output of cnbj-cuc-tst01-crl0015 node is
> "cnbj-cuc-tst01-crl0015" attachment, the other two are similar.
Ah, I completely forgot that 'report' reports on all nodes. Sorry.
There are about 1100 connections and 8400 channels. Are those the
numbers you expect to see?
How big are the messages?
Please run the following on crl0015 when it is using lots of memory:
rabbitmqctl eval 'begin {L, Pid} =
lists:last(lists:sort([{length(element(2, process_info(P, binary))), P}
|| P <- processes()])), {L, Pid, process_info(Pid)} end.'
(all on one line) and post the output.
Regards,
Matthias.
5.Liu Hao
14:14 (3 小時前)
發送至 Matthias, Discussions
Hi, Matthias,
The connections and channels are actually too much. I decrease the connections and channels. Now, I have 40 consumer connections (one connection with one channel), and 50 producer connections (one connection with 10 channels). The memory is the same , acquires a lot.
But I find an interesting fact that If I use 50 producer connections (one connection with only one channel) , the memory will be under 2G, but the most connections are flowed and the publish rate is too slow.
This is just a test demo, and one message is 10KB.
The command report is so big(35M), and I give the beginning and the end of the output to you as the attachment.
Thank you very much.
6.Matthias Radestock
16:38 (1 小時前)
發送至 我, Discussions
On 14/08/12 07:14, Liu Hao wrote:
The connections and channels are actually too much. I decrease the
connections and channels. Now, I have 40 consumer connections (one
connection with one channel), and 50 producer connections (one
connection with 10 channels). The memory is the same , acquires a lot.
But I find an interesting fact that If I use 50 producer
connections (one connection with only one channel) , the memory will be
under 2G, but the most connections are flowed and the publish rate is
too slow.
This is just a test demo, and one message is 10KB.
The command report is so big(35M), and I give the beginning and
the end of the output to you as the attachment.
I think you are simply pushing rabbit beyond the limit of its capability. Internal flow control happens on a per-process-link basis, so when you increase the number of publishing channels that corresponds to a linear increase in the amount of internal buffer space that is potentially required. To the point where all memory is taken up by messages sitting in these buffers.
Publishing across nodes carries an extra cost, so the buffers will fill up at lower publishing rates.
If with 50 producer connections x 1 channel you see most connections flowed, then that is an indications that rabbit is already operating at capacity but is still able to keep overall memory use under control. Adding more producer connections/channels will not increase the sustainable sending rate but will degrade rabbit's ability to control memory use.
Btw, I suggest you upgrade to the latest rabbit version - the flow control code has changed somewhat and there have been some performance improvements.
Regards,
Matthias.
7.Matthias Radestock
16:46 (1 小時前)
發送至 Discussions, 我
On 14/08/12 09:38, Matthias Radestock wrote:
Btw, I suggest you upgrade to the latest rabbit version - the flow
control code has changed somewhat and there have been some performance
improvements.
You may also want to enable hipe compilation - see http://www.rabbitmq.com/configure.html.
I doubt any of this will make much difference though since the bottleneck in your system are the queues, and hipe compilation and most of the performance improvements have little impact on queue performance.