1、分布式版本安裝步驟
1.conda安裝:conda install dask distributed-cconda-forge 2.pip 安裝:pip install dask distributed --upgrade 3.source安裝: git clone https://github.com/dask/distributed.git cd distributed python setup.py install
2、主節點啟動方法
dask-scheduler
控制台顯示信息如下:
distributed.scheduler - INFO - ----------------------------------------------- distributed.scheduler - INFO - Clear task state distributed.scheduler - INFO - Scheduler at: tcp://192.168.1.42:8786 distributed.scheduler - INFO - :8787 distributed.scheduler - INFO - Local Directory: C:\Users\User\AppData\Local\Temp\scheduler-gd9uk980 distributed.scheduler - INFO - -----------------------------------------------
3、工作節點啟動方法
dask-worker 192.168.1.42:8786
工作節點啟動成功后,此時主節點會顯示多出信息: distributed.scheduler - INFO - Register tcp://192.168.1.184:45772 distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.1.184:45772 distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Register tcp://192.168.1.183:43405 distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.1.183:43405 distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Register tcp://192.168.1.188:38095 distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.1.188:38095 distributed.core - INFO - Starting established connection
4、官方測試代碼
"""分布式dask""" import time from dask.distributed import Client client = Client('192.168.1.42:8786' ,asynchronous=True) ts = time.time() A = client.map(square, range(10000)) B = client.map(neg, A) total = client.submit(sum, B) print(total.result()) print('cost time :%s'%(time.time()-ts)) cost time :3.793848991394043
5、參考鏈接
dask官網地址:https://dask.org/
優勢:dask內部自動實現了分布式調度、無需用戶自行編寫復雜的調度邏輯和程序;通過調用簡單的方法就可以進行分布式計算、並支持部分模型的並行化處理;內部實現的分布式算法:xgboost、LR、sklearn的部分方法等
用一句話說:dask就是python版本的spark,是一個用Python 語言實現的分布式計算框架
作者:宇智波鼬_adb8
鏈接: https://www.jianshu.com/p/8ca5d70e0810?utm_campaign=haruki
來源:簡書
著作權歸作者所有。商業轉載請聯系作者獲得授權,非商業轉載請注明出處。