來自:https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
http://blog.csdn.net/derekjiang/article/details/9040243
概念理解
原文中用了一張圖來說明在一個storm cluster中,topology運行時的並發機制。
其實說白了,當一個topology在storm cluster中運行時,它的並發主要跟3個邏輯實體想過:worker,executor 和task
1. Worker 是運行在工作節點上面,被Supervisor守護進程創建的用來干活的進程。每個Worker對應於一個給定topology的全部執行任務的一個子集。反過來說,一個Worker里面不會運行屬於不同的topology的執行任務。
2. Executor可以理解成一個Worker進程中的工作線程。一個Executor中只能運行隸屬於同一個component(spout/bolt) 的task。一個Worker進程中可以有一個或多個Executor線程。在默認情況下,一個Executor運行一個task。
3. Task則是spout和bolt中具體要干的活了。一個Executor可以負責1個或多個task。每個component(spout/bolt) 的並發度就是這個component對應的task數量。同時,task也是各個節點之間進行grouping(partition)的單位。
並發度的配置
有多種方法可以進行並發度的配置,其優先級如下:
defaults.yaml
< storm.yaml
< topology 私有配置 < component level(spout/bolt) 的私有配置
至於具體怎么配置,至今拷貝過來大家看看便知:
設置worker數量
- Description: 在當前storm cluster中給這個topology創建的worker數量
- Configuration option: TOPOLOGY_WORKERS
- How to set in your code (examples):
設置executor數量
- Description: 給指定component創建的executor數量
- Configuration option: ?
- How to set in your code (examples):
- TopologyBuilder#setSpout()
- TopologyBuilder#setBolt()
- Note that as of Storm 0.8 the
parallelism_hint
parameter now specifies the initial number of executors (not tasks!) for that bolt.
設置task數量
- Description: 給指定 component 創建的task數量
- Configuration option: TOPOLOGY_TASKS
- How to set in your code (examples):
Here is an example code snippet to show these settings in practice:
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) .shuffleGrouping(blue-spout);
一個運行時的topology的例子
The GreenBolt
was configured as per the code snippet above whereas BlueSpout
and YellowBolt
only set the parallelism hint (number of executors). Here is the relevant code:
Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes
topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4)
.shuffleGrouping("blue-spout");
topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
.shuffleGrouping("green-bolt");
StormSubmitter.submitTopology(
"mytopology",
conf,
topologyBuilder.createTopology()
);
And of course Storm comes with additional configuration settings to control the parallelism of a topology, including:
- TOPOLOGY_MAX_TASK_PARALLELISM: This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. Config#setMaxTaskParallelism().
怎么樣在運行過程中修改一個topology的並發度
主要有兩種方法可以rebalance一個topology:
- 使用Storm web UI 來 rebalance topology.
- 使用CLI 工具 rebalance topology,一個例子如下:
# Reconfigure the topology "mytopology" to use 5 worker processes, # the spout "blue-spout" to use 3 executors and # the bolt "yellow-bolt" to use 10 executors. storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10