關於Storm 中Topology的並發度的理解


來自:https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html

http://blog.csdn.net/derekjiang/article/details/9040243

概念理解

原文中用了一張圖來說明在一個storm cluster中,topology運行時的並發機制。



其實說白了,當一個topology在storm cluster中運行時,它的並發主要跟3個邏輯實體想過:worker,executor 和task

1. Worker 是運行在工作節點上面,被Supervisor守護進程創建的用來干活的進程。每個Worker對應於一個給定topology的全部執行任務的一個子集。反過來說,一個Worker里面不會運行屬於不同的topology的執行任務。

2. Executor可以理解成一個Worker進程中的工作線程。一個Executor中只能運行隸屬於同一個component(spout/bolt) 的task。一個Worker進程中可以有一個或多個Executor線程。在默認情況下,一個Executor運行一個task。

3. Task則是spout和bolt中具體要干的活了。一個Executor可以負責1個或多個task。每個component(spout/bolt) 的並發度就是這個component對應的task數量。同時,task也是各個節點之間進行grouping(partition)的單位。



並發度的配置

有多種方法可以進行並發度的配置,其優先級如下:

defaults.yaml < storm.yaml < topology 私有配置 < component level(spout/bolt) 的私有配置 

至於具體怎么配置,至今拷貝過來大家看看便知:

設置worker數量

設置executor數量



  • Description: 給指定component創建的executor數量
  • Configuration option: ?
  • How to set in your code (examples):

設置task數量

Here is an example code snippet to show these settings in practice:

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) .shuffleGrouping(blue-spout);

一個運行時的topology的例子

 


 

The GreenBolt was configured as per the code snippet above whereas BlueSpout and YellowBolt only set the parallelism hint (number of executors). Here is the relevant code:

 

Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes

topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
               .setNumTasks(4)
               .shuffleGrouping("blue-spout");

topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
               .shuffleGrouping("green-bolt");

StormSubmitter.submitTopology(
        "mytopology",
        conf,
        topologyBuilder.createTopology()
    );

 

 

And of course Storm comes with additional configuration settings to control the parallelism of a topology, including:

 

  • TOPOLOGY_MAX_TASK_PARALLELISM: This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. Config#setMaxTaskParallelism().

 

 

怎么樣在運行過程中修改一個topology的並發度

Storm支持在不restart topology的情況下, 動態的改變(增減)worker processes的數目和executors的數目, 稱為rebalancing. 

主要有兩種方法可以rebalance一個topology:

  1. 使用Storm web UI 來 rebalance topology.
  2. 使用CLI 工具 rebalance topology,一個例子如下:
# Reconfigure the topology "mytopology" to use 5 worker processes, # the spout "blue-spout" to use 3 executors and # the bolt "yellow-bolt" to use 10 executors. storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM