Flume性能測試報告(翻譯Flume官方wiki報告)


因使用flume的時候總是會對其性能有所調研,網上找的要么就是自測的
這里找到一份官方wiki的測試報告供大家參考



https://cwiki.apache.org/confluence/display/FLUME/Performance+Measurements+-+round+2


 

測試環境:

以下測試基於單個agent

hadoop集群配置:20-node Hadoop cluster (1 name node and 19 data nodes).

服務器配置: 24 cores – Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM,  7200 rpm Hard Drive.  

1.     File channel with HDFS Sink (Sequence File):

基於1.4版本的flume測試,source為4個exec,channel為file,sink為hdfs

Flume version: 1.4

Source: 4 x Exec Source, 100k batchSize

HDFS Sink Batch size: 500,000

Event Size: 500 byte events.

Channel: File

Events/Sec
Sinks 1 data dirs 2 data dirs 4 data dirs 6 data dirs 8 data dirs 10 data dirs
1 14.3k(7Mb/s)          
2 21.9k          
4   35.8k        
8     72.5k 77k 78.6(37Mb/s) 76.6k
10     58k      
12     49.3k 49k    
 

 

Measurements were taken to get an idea around the configuration that yields best performance. So took measurements only for all data points in the grid that made sense. For example it was not necessary to take measurements for multiple dataDirs at single sink, as it was evident multiple HDFS sink would better than single sink config.

混合的多sinks要比單sink的效果好

2.     HDFS Sink:

相比1使用了內存channel ,memory channel

Flume version: 1.4

Channel: Memory

Event Size: 500 byte events.

#hdfs sinks

snappy batch

sz:1.2mill 

snappy batch

sz:1.4mill

 Sequence File

batch sz:1.2mill

 1  34.3k(17Mb/s)  33k  33k
 2

71k 

 75k  69k
 4 141k   145k  141k
 8 271k   273k  251k
 12 382k   380k  370k
 16 478k   538k(240M/s)  486k(232M/s)
 

 

Some simple observations:

  • increasing number of dataDirs helps FC perf even on single disk systems  
  • Increasing  number of sinks helps

 提高sink的數量是有顯著效果的

3.     Hive Sink:

hive sink ,channel為內存,flume版本為1.5或者1.6

Flume version: 1.5 & 1.6

Channel: Memory

BatchSz:1million

Event Size: 500 byte events.

  Flume 1.5 Flume 1.6
  Events/s Mps Events/s Mps
  1 Sink      
DELIMITED Text 36,885 18 138,461 66
Json 12,735 6    
         
         
  16 sinks(agent maxed out)    
DELIMITED Text 209,600 100 348,214 166
Json 25,751 12 31,135 14
         
 

 

Observation: Feeding JSON data to Hive sink is much slower, potentially due to higher parsing overhead of JSON in part.

 發送json數據格式會慢一些,主要是慢在json的解析上

 

4.     HBase Sink:

Flume version: 1.5

Channel: Memory

Serializer: RegexHbaseEventSerializer

Total Sinks: 1

Event Size(bytes) Batch Sz:1 Batch Sz:100 Batch Sz:1000 Batch Sz:10000
500   11mb/s   11mb/s
1000 0.5bB/s 14/mb/s 22mb/s 27mb/s
 

 

5.     ASync HBase Sink:

Flume version: 1.5

Channel: Memory

Serializer: SimpleAsyncHbaseEventSerializer

Total Sinks: 1

Event Size(bytes) Batch Sz:1 Batch Sz:100 Batch Sz:1000
500   0.4mb/s 0.5mb/s
1000 0.8mb/s 0.8mb/s 0.9mb/s
 

 

6.     Kafka Source:

Flume version: 1.6

Channel: Memory

Sink: Null Sink

Event Size: 1000 bytes

Total Sinks: 1

Batch Size

(bytes)

Mb/s
1,000 62
10,000 112
20,000 125
40,000 147
80,000 153

作 者:小閃電 

出處:http://www.cnblogs.com/yueyanyu/ 

本文版權歸作者和博客園共有,歡迎轉載、交流,但未經作者同意必須保留此段聲明,且在文章頁面明顯位置給出原文鏈接。如果覺得本文對您有益,歡迎點贊、歡迎探討。本博客來源於互聯網的資源,若侵犯到您的權利,請聯系博主予以刪除。

 


 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM