spark-sklearn(spark擴展scikitlearn)


(1)官方規定安裝條件:此包裝具有以下要求:

-*最新版本的scikit學習。 版本0.17已經過測試,舊版本也可以使用。
- *Spark> = 2.0。 Spark可以從對應官網下載
[Spark官方網站](http://spark.apache.org/)

-*為了使用spark-sklearn,您需要使用pyspark解釋器或其他Spark兼容的python解釋器。

有關詳細信息,請參閱[Spark指南](https://spark.apache.org/docs/latest/programming-guide.html#overview)。
- (https://nose.readthedocs.org)(僅測試依賴關系)

英文原文:This package has the following requirements:
- a recent version of scikit-learn. Version 0.17 has been tested, older versions may work too.
- Spark >= 2.0. Spark may be downloaded from the
[Spark official website](http://spark.apache.org/) In order to use spark-sklearn, you need to use the pyspark interpreter or another Spark-compliant python interpreter. See the [Spark guide](https://spark.apache.org/docs/latest/programming-guide.html#overview) for more details.
- [nose](https://nose.readthedocs.org) (testing dependency only)

(2)首先安裝pyspark:

參考為的博客:http://www.cnblogs.com/jackchen-Net/p/6667205.html#_label5

(3)訪問網址:https://pypi.python.org/pypi/spark-sklearn

目前Spark集成了Scikit-learn包,這樣可以極大的簡化了python數據科學家們的工作,這個包可以在Spark集群上自動分配模型參數優化計算任務

 (4)官方文檔的例子測試

 1 ## Example
 2 
 3 Here is a simple example that runs a grid search with Spark. See the [Installation](#Installation) section on how to install spark-sklearn.
 4 
 5 ```python
 6 from sklearn import svm, grid_search, datasets
 7 from spark_sklearn import GridSearchCV
 8 iris = datasets.load_iris()
 9 parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
10 svr = svm.SVC()
11 clf = GridSearchCV(sc, svr, parameters)
12 clf.fit(iris.data, iris.target)
13 ```
14 
15 This classifier can be used as a drop-in replacement for any scikit-learn classifier, with the same API.

END~


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM