(1)官方規定安裝條件:此包裝具有以下要求:
-*最新版本的scikit學習。 版本0.17已經過測試,舊版本也可以使用。
- *Spark> = 2.0。 Spark可以從對應官網下載
[Spark官方網站](http://spark.apache.org/)
-*為了使用spark-sklearn,您需要使用pyspark解釋器或其他Spark兼容的python解釋器。
有關詳細信息,請參閱[Spark指南](https://spark.apache.org/docs/latest/programming-guide.html#overview)。
- (https://nose.readthedocs.org)(僅測試依賴關系)
英文原文:This package has the following requirements:
- a recent version of scikit-learn. Version 0.17 has been tested, older versions may work too.
- Spark >= 2.0. Spark may be downloaded from the
[Spark official website](http://spark.apache.org/) In order to use spark-sklearn, you need to use the pyspark interpreter or another Spark-compliant python interpreter. See the [Spark guide](https://spark.apache.org/docs/latest/programming-guide.html#overview) for more details.
- [nose](https://nose.readthedocs.org) (testing dependency only)
(2)首先安裝pyspark:
參考為的博客:http://www.cnblogs.com/jackchen-Net/p/6667205.html#_label5
(3)訪問網址:https://pypi.python.org/pypi/spark-sklearn
目前Spark集成了Scikit-learn包,這樣可以極大的簡化了python數據科學家們的工作,這個包可以在Spark集群上自動分配模型參數優化計算任務
(4)官方文檔的例子測試
1 ## Example 2 3 Here is a simple example that runs a grid search with Spark. See the [Installation](#Installation) section on how to install spark-sklearn. 4 5 ```python 6 from sklearn import svm, grid_search, datasets 7 from spark_sklearn import GridSearchCV 8 iris = datasets.load_iris() 9 parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} 10 svr = svm.SVC() 11 clf = GridSearchCV(sc, svr, parameters) 12 clf.fit(iris.data, iris.target) 13 ``` 14 15 This classifier can be used as a drop-in replacement for any scikit-learn classifier, with the same API.
END~